This media is not supported in your browser
VIEW IN TELEGRAM
ðͧSapiens: SOTA ViTs for humanðͧ
ðMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingð
ðReview https://t.ly/GKQI0
ðPaper arxiv.org/pdf/2408.12569
ðProject rawalkhirodkar.github.io/sapiens
ðCode github.com/facebookresearch/sapiens
ðMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingð
ðReview https://t.ly/GKQI0
ðPaper arxiv.org/pdf/2408.12569
ðProject rawalkhirodkar.github.io/sapiens
ðCode github.com/facebookresearch/sapiens
ðĨ19âĪ7ðĨ°2ð1ðĪŊ1
AI with Papers - Artificial Intelligence & Deep Learning
ðͧSapiens: SOTA ViTs for humanðͧ ðMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingð ðReview https://t.ly/GKQI0âĶ
ðĨðĨðĨðĨðĨ SOURCE CODE IS OUT !!! ðĨðĨðĨðĨðĨ
Thanks Danny for the info ðĨ
Thanks Danny for the info ðĨ
ð11ðĨ4ð4âĪ3ðą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðš Diffusion Game Engine ðš
ð#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it ð
ðReview https://t.ly/_WR5z
ðPaper https://lnkd.in/dZqgiqb9
ðProject https://lnkd.in/dJUd2Fr6
ð#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it ð
ðReview https://t.ly/_WR5z
ðPaper https://lnkd.in/dZqgiqb9
ðProject https://lnkd.in/dJUd2Fr6
ðĨ10ð5âĪ2ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŦ Omni Urban Scene Reconstruction ðŦ
ðOmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedð
ðReview https://t.ly/SXVPa
ðPaper arxiv.org/pdf/2408.16760
ðProject ziyc.github.io/omnire/
ðCode github.com/ziyc/drivestudio
ðOmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedð
ðReview https://t.ly/SXVPa
ðPaper arxiv.org/pdf/2408.16760
ðProject ziyc.github.io/omnire/
ðCode github.com/ziyc/drivestudio
ðĨ10ð9âĪ3ðĪŊ1ðū1
This media is not supported in your browser
VIEW IN TELEGRAM
ðInteractive Drag-based Editingð
ðCSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingð
ðReview https://t.ly/hy6SL
ðPaper arxiv.org/pdf/2409.08857
ðProject joonghyuk.com/instantdrag-web/
ðCode github.com/alex4727/InstantDrag
ðCSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingð
ðReview https://t.ly/hy6SL
ðPaper arxiv.org/pdf/2409.08857
ðProject joonghyuk.com/instantdrag-web/
ðCode github.com/alex4727/InstantDrag
ðĨ13ð3ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðHand-Object interaction Pretrainingð
ðBerkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
ðReview https://t.ly/FLqvJ
ðPaper https://arxiv.org/pdf/2409.08273
ðProject https://hgaurav2k.github.io/hop/
ðBerkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
ðReview https://t.ly/FLqvJ
ðPaper https://arxiv.org/pdf/2409.08273
ðProject https://hgaurav2k.github.io/hop/
ðĨ°3âĪ1ð1ðĨ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð§ļMotion Instruction Fine-Tuningð§ļ
ðMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingð
ðReview https://t.ly/iJ2UY
ðPaper https://arxiv.org/pdf/2409.10683
ðProject https://motif-1k.github.io/
ðCode coming
ðMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingð
ðReview https://t.ly/iJ2UY
ðPaper https://arxiv.org/pdf/2409.10683
ðProject https://motif-1k.github.io/
ðCode coming
ð1ðĨ1ðĪŊ1ðĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
â― SoccerNet 2024 Results â―
ðSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
ðReview https://t.ly/DUPgx
ðPaper arxiv.org/pdf/2409.10587
ðRepo github.com/SoccerNet
ðProject www.soccer-net.org/
ðSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
ðReview https://t.ly/DUPgx
ðPaper arxiv.org/pdf/2409.10587
ðRepo github.com/SoccerNet
ðProject www.soccer-net.org/
ðĨ12ð6ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð JoyHallo: Mandarin Digital Human ð
ðJD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageâs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableð
ðReview https://t.ly/5NGDh
ðPaper arxiv.org/pdf/2409.13268
ðProject jdh-algo.github.io/JoyHallo/
ðCode github.com/jdh-algo/JoyHallo
ðJD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageâs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableð
ðReview https://t.ly/5NGDh
ðPaper arxiv.org/pdf/2409.13268
ðProject jdh-algo.github.io/JoyHallo/
ðCode github.com/jdh-algo/JoyHallo
ðĨ9ð1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĒ Robo-quadruped ParkourðĒ
ðLAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableð
ðReview https://t.ly/-6VRm
ðPaper arxiv.org/pdf/2409.13678
ðProject gepetto.github.io/SoloParkour/
ðCode github.com/Gepetto/SoloParkour
ðLAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableð
ðReview https://t.ly/-6VRm
ðPaper arxiv.org/pdf/2409.13678
ðProject gepetto.github.io/SoloParkour/
ðCode github.com/Gepetto/SoloParkour
ðĨ5ð2ð1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðа Dressed Humans in the wild ðа
ðETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming ð
ðReview https://t.ly/evgmN
ðPaper arxiv.org/pdf/2409.15269
ðProject moygcc.github.io/ReLoo/
ðCode github.com/eth-ait/ReLoo
ðETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming ð
ðReview https://t.ly/evgmN
ðPaper arxiv.org/pdf/2409.15269
ðProject moygcc.github.io/ReLoo/
ðCode github.com/eth-ait/ReLoo
ðĪŊ9âĪ2ð1ðĨ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðū New SOTA Edge Detection ðū
ðCUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedð
ðReview https://t.ly/zUMcS
ðPaper arxiv.org/pdf/2409.14976
ðCode github.com/Li-yachuan/NBED
ðCUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedð
ðReview https://t.ly/zUMcS
ðPaper arxiv.org/pdf/2409.14976
ðCode github.com/Li-yachuan/NBED
ðĨ11ð5ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĐâðͰ SOTA Gaussian Haircut ðĐâðͰ
ðETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedð
ðReview https://t.ly/aiOjq
ðPaper arxiv.org/pdf/2409.14778
ðProject https://lnkd.in/dFRm2ycb
ðRepo https://lnkd.in/d5NWNkb5
ðETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedð
ðReview https://t.ly/aiOjq
ðPaper arxiv.org/pdf/2409.14778
ðProject https://lnkd.in/dFRm2ycb
ðRepo https://lnkd.in/d5NWNkb5
ðĨ16ð2âĪ1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðSPARK: Real-time Face Captureð
ðTechnicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedð
ðReview https://t.ly/rZOgp
ðPaper arxiv.org/pdf/2409.07984
ðProject kelianb.github.io/SPARK/
ðRepo github.com/KelianB/SPARK/
ðTechnicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedð
ðReview https://t.ly/rZOgp
ðPaper arxiv.org/pdf/2409.07984
ðProject kelianb.github.io/SPARK/
ðRepo github.com/KelianB/SPARK/
ðĨ10âĪ2ð1ðĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĶī One-Image Object Detection ðĶī
ðDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedð
ðReview https://t.ly/-li2G
ðPaper arxiv.org/pdf/2410.00900
ðCode github.com/RobinGerster7/OSSA
ðDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedð
ðReview https://t.ly/-li2G
ðPaper arxiv.org/pdf/2410.00900
ðCode github.com/RobinGerster7/OSSA
ðĨ19ð2âĄ1ð1ðĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðģïļ EVER Ellipsoid Rendering ðģïļ
ðUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving âž30 FPS at 720p on #NVIDIA RTX4090.
ðReview https://t.ly/zAfGU
ðPaper arxiv.org/pdf/2410.01804
ðProject half-potato.gitlab.io/posts/ever/
ðUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving âž30 FPS at 720p on #NVIDIA RTX4090.
ðReview https://t.ly/zAfGU
ðPaper arxiv.org/pdf/2410.01804
ðProject half-potato.gitlab.io/posts/ever/
ðĨ13âĪ2ð2ð1ðĪŊ1ðą1ðū1
ðĨ "Deep Gen-AI" Full Course ðĨ
ðA fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...
ðReview https://t.ly/ylBxq
ðCourse https://lnkd.in/dMKH9gNe
ðLectures https://lnkd.in/d_uwDvT6
ðA fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...
ðReview https://t.ly/ylBxq
ðCourse https://lnkd.in/dMKH9gNe
ðLectures https://lnkd.in/d_uwDvT6
âĪ21ðĨ7ð2ð1ðĨ°1ðĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð EFM3D: 3D Ego-Foundation ð
ð#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedð
ðReview https://t.ly/cDJv6
ðPaper arxiv.org/pdf/2406.10224
ðProject www.projectaria.com/datasets/aeo/
ðRepo github.com/facebookresearch/efm3d
ð#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedð
ðReview https://t.ly/cDJv6
ðPaper arxiv.org/pdf/2406.10224
ðProject www.projectaria.com/datasets/aeo/
ðRepo github.com/facebookresearch/efm3d
ðĨ9âĪ2ð2âĄ1ð1ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĨĶGaussian Splatting VTONðĨĶ
ðGS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedð
ðReview https://t.ly/sTPbW
ðPaper arxiv.org/pdf/2410.05259
ðProject yukangcao.github.io/GS-VTON/
ðRepo github.com/yukangcao/GS-VTON
ðGS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedð
ðReview https://t.ly/sTPbW
ðPaper arxiv.org/pdf/2410.05259
ðProject yukangcao.github.io/GS-VTON/
ðRepo github.com/yukangcao/GS-VTON
ðĨ14âĪ3ð1ð1ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĄDiffusion Models RelightingðĄ
ð#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.
ðReview https://t.ly/fliXU
ðPaper arxiv.org/pdf/2410.08188
ðProject www.eyelinestudios.com/research/diffrelight.html
ð#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.
ðReview https://t.ly/fliXU
ðPaper arxiv.org/pdf/2410.08188
ðProject www.eyelinestudios.com/research/diffrelight.html
ðĨ17âĪ7âĄ2ð2ð2ð1