This media is not supported in your browser
VIEW IN TELEGRAM
π§ΈMotion Instruction Fine-Tuningπ§Έ
πMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingπ
πReview https://t.ly/iJ2UY
πPaper https://arxiv.org/pdf/2409.10683
πProject https://motif-1k.github.io/
πCode coming
πMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingπ
πReview https://t.ly/iJ2UY
πPaper https://arxiv.org/pdf/2409.10683
πProject https://motif-1k.github.io/
πCode coming
π1π₯1π€―1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
β½ SoccerNet 2024 Results β½
πSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
πReview https://t.ly/DUPgx
πPaper arxiv.org/pdf/2409.10587
πRepo github.com/SoccerNet
πProject www.soccer-net.org/
πSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
πReview https://t.ly/DUPgx
πPaper arxiv.org/pdf/2409.10587
πRepo github.com/SoccerNet
πProject www.soccer-net.org/
π₯12π6π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π JoyHallo: Mandarin Digital Human π
πJD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageβs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableπ
πReview https://t.ly/5NGDh
πPaper arxiv.org/pdf/2409.13268
πProject jdh-algo.github.io/JoyHallo/
πCode github.com/jdh-algo/JoyHallo
πJD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageβs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableπ
πReview https://t.ly/5NGDh
πPaper arxiv.org/pdf/2409.13268
πProject jdh-algo.github.io/JoyHallo/
πCode github.com/jdh-algo/JoyHallo
π₯9π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π’ Robo-quadruped Parkourπ’
πLAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableπ
πReview https://t.ly/-6VRm
πPaper arxiv.org/pdf/2409.13678
πProject gepetto.github.io/SoloParkour/
πCode github.com/Gepetto/SoloParkour
πLAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableπ
πReview https://t.ly/-6VRm
πPaper arxiv.org/pdf/2409.13678
πProject gepetto.github.io/SoloParkour/
πCode github.com/Gepetto/SoloParkour
π₯5π2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π©° Dressed Humans in the wild π©°
πETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming π
πReview https://t.ly/evgmN
πPaper arxiv.org/pdf/2409.15269
πProject moygcc.github.io/ReLoo/
πCode github.com/eth-ait/ReLoo
πETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming π
πReview https://t.ly/evgmN
πPaper arxiv.org/pdf/2409.15269
πProject moygcc.github.io/ReLoo/
πCode github.com/eth-ait/ReLoo
π€―9β€2π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πΎ New SOTA Edge Detection πΎ
πCUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedπ
πReview https://t.ly/zUMcS
πPaper arxiv.org/pdf/2409.14976
πCode github.com/Li-yachuan/NBED
πCUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedπ
πReview https://t.ly/zUMcS
πPaper arxiv.org/pdf/2409.14976
πCode github.com/Li-yachuan/NBED
π₯11π5π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©βπ¦° SOTA Gaussian Haircut π©βπ¦°
πETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedπ
πReview https://t.ly/aiOjq
πPaper arxiv.org/pdf/2409.14778
πProject https://lnkd.in/dFRm2ycb
πRepo https://lnkd.in/d5NWNkb5
πETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedπ
πReview https://t.ly/aiOjq
πPaper arxiv.org/pdf/2409.14778
πProject https://lnkd.in/dFRm2ycb
πRepo https://lnkd.in/d5NWNkb5
π₯16π2β€1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πSPARK: Real-time Face Captureπ
πTechnicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedπ
πReview https://t.ly/rZOgp
πPaper arxiv.org/pdf/2409.07984
πProject kelianb.github.io/SPARK/
πRepo github.com/KelianB/SPARK/
πTechnicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedπ
πReview https://t.ly/rZOgp
πPaper arxiv.org/pdf/2409.07984
πProject kelianb.github.io/SPARK/
πRepo github.com/KelianB/SPARK/
π₯10β€2π1π©1
This media is not supported in your browser
VIEW IN TELEGRAM
𦴠One-Image Object Detection π¦΄
πDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπ
πReview https://t.ly/-li2G
πPaper arxiv.org/pdf/2410.00900
πCode github.com/RobinGerster7/OSSA
πDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπ
πReview https://t.ly/-li2G
πPaper arxiv.org/pdf/2410.00900
πCode github.com/RobinGerster7/OSSA
π₯19π2β‘1π1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π³οΈ EVER Ellipsoid Rendering π³οΈ
πUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving βΌ30 FPS at 720p on #NVIDIA RTX4090.
πReview https://t.ly/zAfGU
πPaper arxiv.org/pdf/2410.01804
πProject half-potato.gitlab.io/posts/ever/
πUCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving βΌ30 FPS at 720p on #NVIDIA RTX4090.
πReview https://t.ly/zAfGU
πPaper arxiv.org/pdf/2410.01804
πProject half-potato.gitlab.io/posts/ever/
π₯13β€2π2π1π€―1π±1πΎ1
π₯ "Deep Gen-AI" Full Course π₯
πA fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...
πReview https://t.ly/ylBxq
πCourse https://lnkd.in/dMKH9gNe
πLectures https://lnkd.in/d_uwDvT6
πA fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...
πReview https://t.ly/ylBxq
πCourse https://lnkd.in/dMKH9gNe
πLectures https://lnkd.in/d_uwDvT6
β€21π₯7π2π1π₯°1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π EFM3D: 3D Ego-Foundation π
π#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedπ
πReview https://t.ly/cDJv6
πPaper arxiv.org/pdf/2406.10224
πProject www.projectaria.com/datasets/aeo/
πRepo github.com/facebookresearch/efm3d
π#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedπ
πReview https://t.ly/cDJv6
πPaper arxiv.org/pdf/2406.10224
πProject www.projectaria.com/datasets/aeo/
πRepo github.com/facebookresearch/efm3d
π₯9β€2π2β‘1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯¦Gaussian Splatting VTONπ₯¦
πGS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedπ
πReview https://t.ly/sTPbW
πPaper arxiv.org/pdf/2410.05259
πProject yukangcao.github.io/GS-VTON/
πRepo github.com/yukangcao/GS-VTON
πGS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedπ
πReview https://t.ly/sTPbW
πPaper arxiv.org/pdf/2410.05259
πProject yukangcao.github.io/GS-VTON/
πRepo github.com/yukangcao/GS-VTON
π₯14β€3π1π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘Diffusion Models Relightingπ‘
π#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.
πReview https://t.ly/fliXU
πPaper arxiv.org/pdf/2410.08188
πProject www.eyelinestudios.com/research/diffrelight.html
π#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.
πReview https://t.ly/fliXU
πPaper arxiv.org/pdf/2410.08188
πProject www.eyelinestudios.com/research/diffrelight.html
π₯17β€7β‘2π2π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯POKEFLEX: Soft Object Datasetπ₯
πPokeFlex from ETH is a dataset that includes 3D textured meshes, point clouds, RGB & depth maps of deformable objects. Pretrained models & dataset announcedπ
πReview https://t.ly/GXggP
πPaper arxiv.org/pdf/2410.07688
πProject https://lnkd.in/duv-jS7a
πRepo
πPokeFlex from ETH is a dataset that includes 3D textured meshes, point clouds, RGB & depth maps of deformable objects. Pretrained models & dataset announcedπ
πReview https://t.ly/GXggP
πPaper arxiv.org/pdf/2410.07688
πProject https://lnkd.in/duv-jS7a
πRepo
π7π₯2π₯°1π1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ DEPTH ANY VIDEO is out! π₯
πDAV is a novel foundation model for image/video depth estimation.The new SOTA for accuracy & consistency, up to 150 FPS!
πReview https://t.ly/CjSz2
πPaper arxiv.org/pdf/2410.10815
πProject depthanyvideo.github.io/
πCode github.com/Nightmare-n/DepthAnyVideo
πDAV is a novel foundation model for image/video depth estimation.The new SOTA for accuracy & consistency, up to 150 FPS!
πReview https://t.ly/CjSz2
πPaper arxiv.org/pdf/2410.10815
πProject depthanyvideo.github.io/
πCode github.com/Nightmare-n/DepthAnyVideo
π₯14π€―3β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺRobo-Emulation via Video Imitationπͺ
πOKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
πReview https://t.ly/_N29-
πPaper arxiv.org/pdf/2410.11792
πProject https://lnkd.in/d6bHF_-s
πOKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
πReview https://t.ly/_N29-
πPaper arxiv.org/pdf/2410.11792
πProject https://lnkd.in/d6bHF_-s
π4π€―2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ CoTracker3 by #META is out! π₯
π#Meta (+VGG Oxford) unveils CoTracker3, a new tracker that outperforms the previous SoTA by a large margin using only the 0.1% of the training data π€―π€―π€―
πReview https://t.ly/TcRIv
πPaper arxiv.org/pdf/2410.11831
πProject cotracker3.github.io/
πCode github.com/facebookresearch/co-tracker
π#Meta (+VGG Oxford) unveils CoTracker3, a new tracker that outperforms the previous SoTA by a large margin using only the 0.1% of the training data π€―π€―π€―
πReview https://t.ly/TcRIv
πPaper arxiv.org/pdf/2410.11831
πProject cotracker3.github.io/
πCode github.com/facebookresearch/co-tracker
β€14π₯3π€―3πΎ2π1π±1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Neural Metamorphosis π¦
πNU Singapore unveils NeuMeta to transform neural nets by allowing a single model to adapt on the fly to different sizes, generating the right weights when needed.
πReview https://t.ly/DJab3
πPaper arxiv.org/pdf/2410.11878
πProject adamdad.github.io/neumeta
πCode github.com/Adamdad/neumeta
πNU Singapore unveils NeuMeta to transform neural nets by allowing a single model to adapt on the fly to different sizes, generating the right weights when needed.
πReview https://t.ly/DJab3
πPaper arxiv.org/pdf/2410.11878
πProject adamdad.github.io/neumeta
πCode github.com/Adamdad/neumeta
β€7π₯3π€―3π±2β‘1π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ GS + Depth = SOTA βοΈ
πDepthSplat, the new SOTA in depth estimation & novel view synthesis. The key feature is the cross-task interaction between Gaussian Splatting & depth estimation. Source Code to be released soonπ
πReview https://t.ly/87HuH
πPaper arxiv.org/abs/2410.13862
πProject haofeixu.github.io/depthsplat/
πCode github.com/cvg/depthsplat
πDepthSplat, the new SOTA in depth estimation & novel view synthesis. The key feature is the cross-task interaction between Gaussian Splatting & depth estimation. Source Code to be released soonπ
πReview https://t.ly/87HuH
πPaper arxiv.org/abs/2410.13862
πProject haofeixu.github.io/depthsplat/
πCode github.com/cvg/depthsplat
π€―9π₯8β€3β‘1π1