AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
236 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯πŸ”₯ SAM v2 is out! πŸ”₯πŸ”₯

πŸ‘‰#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesπŸ’™

πŸ‘‰Review https://t.ly/oovJZ
πŸ‘‰Paper https://t.ly/sCxMY
πŸ‘‰Demo https://sam2.metademolab.com
πŸ‘‰Project ai.meta.com/blog/segment-anything-2/
πŸ‘‰Models github.com/facebookresearch/segment-anything-2
πŸ”₯27❀10🀯4πŸ‘2🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘‹ Real-time Expressive Hands πŸ‘‹

πŸ‘‰Zhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024πŸ’™

πŸ‘‰Review https://t.ly/8obbB
πŸ‘‰Project https://lnkd.in/dRtVGe6i
πŸ‘‰Paper https://lnkd.in/daCx2iB7
πŸ‘‰Code https://lnkd.in/dZ9pgzug
πŸ‘6πŸ‘3❀2🀣2⚑1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§ͺ Click-Attention Segmentation πŸ§ͺ

πŸ‘‰An interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under ApacheπŸ’™

πŸ‘‰Review https://t.ly/tG05L
πŸ‘‰Paper https://arxiv.org/pdf/2408.06021
πŸ‘‰Code https://github.com/hahamyt/ClickAttention
❀12πŸ”₯3πŸ‘2πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ—οΈ #Adobe Instant TurboEdit πŸ—οΈ

πŸ‘‰Adobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.

πŸ‘‰Review https://t.ly/Na7-y
πŸ‘‰Paper https://lnkd.in/dVs9RcCK
πŸ‘‰Project https://lnkd.in/dGCqwh9Z
πŸ‘‰Code 😒
πŸ”₯14πŸ‘4πŸ₯°2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦“ Zebra Detection & Pose πŸ¦“

πŸ‘‰The first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourcedπŸ’™

πŸ‘‰Review https://t.ly/HTEZZ
πŸ‘‰Paper https://lnkd.in/dQYT-fyq
πŸ‘‰Project https://lnkd.in/dAnNXgG3
πŸ‘‰Code https://lnkd.in/dhvU97xD
πŸ‘7πŸ‘3❀1πŸ”₯1πŸ₯°1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦧Sapiens: SOTA ViTs for human🦧

πŸ‘‰META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/GKQI0
πŸ‘‰Paper arxiv.org/pdf/2408.12569
πŸ‘‰Project rawalkhirodkar.github.io/sapiens
πŸ‘‰Code github.com/facebookresearch/sapiens
πŸ”₯19❀7πŸ₯°2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🐺 Diffusion Game Engine 🐺

πŸ‘‰#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it πŸ’™

πŸ‘‰Review https://t.ly/_WR5z
πŸ‘‰Paper https://lnkd.in/dZqgiqb9
πŸ‘‰Project https://lnkd.in/dJUd2Fr6
πŸ”₯10πŸ‘5❀2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ«’ Omni Urban Scene Reconstruction πŸ«’

πŸ‘‰OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedπŸ’™

πŸ‘‰Review https://t.ly/SXVPa
πŸ‘‰Paper arxiv.org/pdf/2408.16760
πŸ‘‰Project ziyc.github.io/omnire/
πŸ‘‰Code github.com/ziyc/drivestudio
πŸ”₯10πŸ‘9❀3🀯1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’„Interactive Drag-based EditingπŸ’„

πŸ‘‰CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/hy6SL
πŸ‘‰Paper arxiv.org/pdf/2409.08857
πŸ‘‰Project joonghyuk.com/instantdrag-web/
πŸ‘‰Code github.com/alex4727/InstantDrag
πŸ”₯13πŸ‘3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌭Hand-Object interaction Pretraining🌭

πŸ‘‰Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.

πŸ‘‰Review https://t.ly/FLqvJ
πŸ‘‰Paper https://arxiv.org/pdf/2409.08273
πŸ‘‰Project https://hgaurav2k.github.io/hop/
πŸ₯°3❀1πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧸Motion Instruction Fine-Tuning🧸

πŸ‘‰MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/iJ2UY
πŸ‘‰Paper https://arxiv.org/pdf/2409.10683
πŸ‘‰Project https://motif-1k.github.io/
πŸ‘‰Code coming
πŸ‘1πŸ”₯1🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ SoccerNet 2024 Results ⚽

πŸ‘‰SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!

πŸ‘‰Review https://t.ly/DUPgx
πŸ‘‰Paper arxiv.org/pdf/2409.10587
πŸ‘‰Repo github.com/SoccerNet
πŸ‘‰Project www.soccer-net.org/
πŸ”₯12πŸ‘6🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌏 JoyHallo: Mandarin Digital Human 🌏

πŸ‘‰JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the language’s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableπŸ’™

πŸ‘‰Review https://t.ly/5NGDh
πŸ‘‰Paper arxiv.org/pdf/2409.13268
πŸ‘‰Project jdh-algo.github.io/JoyHallo/
πŸ‘‰Code github.com/jdh-algo/JoyHallo
πŸ”₯9πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎒 Robo-quadruped Parkour🎒

πŸ‘‰LAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableπŸ’™

πŸ‘‰Review https://t.ly/-6VRm
πŸ‘‰Paper arxiv.org/pdf/2409.13678
πŸ‘‰Project gepetto.github.io/SoloParkour/
πŸ‘‰Code github.com/Gepetto/SoloParkour
πŸ”₯5πŸ‘2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🩰 Dressed Humans in the wild 🩰

πŸ‘‰ETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming πŸ’™

πŸ‘‰Review https://t.ly/evgmN
πŸ‘‰Paper arxiv.org/pdf/2409.15269
πŸ‘‰Project moygcc.github.io/ReLoo/
πŸ‘‰Code github.com/eth-ait/ReLoo
🀯9❀2πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌾 New SOTA Edge Detection 🌾

πŸ‘‰CUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zUMcS
πŸ‘‰Paper arxiv.org/pdf/2409.14976
πŸ‘‰Code github.com/Li-yachuan/NBED
πŸ”₯11πŸ‘5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘©β€πŸ¦° SOTA Gaussian Haircut πŸ‘©β€πŸ¦°

πŸ‘‰ETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedπŸ’™

πŸ‘‰Review https://t.ly/aiOjq
πŸ‘‰Paper arxiv.org/pdf/2409.14778
πŸ‘‰Project https://lnkd.in/dFRm2ycb
πŸ‘‰Repo https://lnkd.in/d5NWNkb5
πŸ”₯16πŸ‘2❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‡SPARK: Real-time Face CaptureπŸ‡

πŸ‘‰Technicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedπŸ’™

πŸ‘‰Review https://t.ly/rZOgp
πŸ‘‰Paper arxiv.org/pdf/2409.07984
πŸ‘‰Project kelianb.github.io/SPARK/
πŸ‘‰Repo github.com/KelianB/SPARK/
πŸ”₯10❀2πŸ‘1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴 One-Image Object Detection 🦴

πŸ‘‰Delft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/-li2G
πŸ‘‰Paper arxiv.org/pdf/2410.00900
πŸ‘‰Code github.com/RobinGerster7/OSSA
πŸ”₯19πŸ‘2⚑1πŸ‘1πŸ₯°1