This media is not supported in your browser
VIEW IN TELEGRAM
π―UniAnimate-DiT: Human Animationπ―
πUniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original modelβs generative skills. Training and inference code releasedπ
πReview https://t.ly/1I50N
πPaper https://arxiv.org/pdf/2504.11289
πRepo https://github.com/ali-vilab/UniAnimate-DiT
πUniAnimate-DiT is a novel n' effective framework based on Wan2.1 for consistent human image animation. LoRAs to finetune the model parameters -reducing memory- maintaining the original modelβs generative skills. Training and inference code releasedπ
πReview https://t.ly/1I50N
πPaper https://arxiv.org/pdf/2504.11289
πRepo https://github.com/ali-vilab/UniAnimate-DiT
π₯9π4π2π2
This media is not supported in your browser
VIEW IN TELEGRAM
π₯General attention-based objectπ₯
πGATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.
πReview https://t.ly/O7wqH
πPaper https://lnkd.in/dc5VTUj9
πProject https://lnkd.in/dzrt-qQV
πGATE3D is a novel framework designed specifically for generalized monocular 3D object detection via weak supervision. GATE3D effectively bridges domain gaps by employing consistency losses between 2D and 3D predictions.
πReview https://t.ly/O7wqH
πPaper https://lnkd.in/dc5VTUj9
πProject https://lnkd.in/dzrt-qQV
π₯8π3π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πEvent Blurry Super-Resolutionπ
πUSTC unveils Ev-DeblurVSR: event signals into BVSR for a novel event-enhanced network. Blurry Video Super-Resolution (BVSR) aiming at generating HR videos from low-resolution and blurry inputs. Pretrained models and test released under Apacheπ
πReview https://t.ly/x6hRs
πPaper https://lnkd.in/dzbkCJMh
πRepo https://lnkd.in/dmvsc-yS
πUSTC unveils Ev-DeblurVSR: event signals into BVSR for a novel event-enhanced network. Blurry Video Super-Resolution (BVSR) aiming at generating HR videos from low-resolution and blurry inputs. Pretrained models and test released under Apacheπ
πReview https://t.ly/x6hRs
πPaper https://lnkd.in/dzbkCJMh
πRepo https://lnkd.in/dmvsc-yS
π₯18β€8π€―5π€©1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ #Apple Co-Motion is out! π₯
πApple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for researchπ
πReview https://t.ly/-86CO
πPaper https://lnkd.in/dQsVGY7q
πRepo https://lnkd.in/dh7j7N89
πApple unveils a novel approach for detecting & tracking detailed 3D poses of multiple people from single monocular stream. Temporally coherent predictions in crowded scenes with hard poses & occlusions. New SOTA, 10x faster! Code & Models released only for researchπ
πReview https://t.ly/-86CO
πPaper https://lnkd.in/dQsVGY7q
πRepo https://lnkd.in/dh7j7N89
π7π€£6β€5π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§TAP in Persistent 3D Geometryπ§
πTAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under Apacheπ
πReview https://t.ly/oooMy
πPaper https://lnkd.in/d8uqjdE4
πProject https://tapip3d.github.io/
πRepo https://lnkd.in/dsvHP_8u
πTAPIP3D is the novel SOTA for long-term 3D point tracking in mono-RGB/RGB-D. Videos as camera-stabilized spatio-temporal feature clouds, leveraging depth & motion to lift 2D video feats into a 3D world space where camera motion is effectively canceled. Code under Apacheπ
πReview https://t.ly/oooMy
πPaper https://lnkd.in/d8uqjdE4
πProject https://tapip3d.github.io/
πRepo https://lnkd.in/dsvHP_8u
π₯7β€2π2π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
𦧠#Nvidia Describe Anything π¦§
πNvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on π€
πReview https://t.ly/la4JD
πPaper https://lnkd.in/dZh82xtV
πProject https://lnkd.in/dcv9V2ZF
πRepo https://lnkd.in/dJB9Ehtb
π€Demo https://lnkd.in/dXDb2MWU
πNvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on π€
πReview https://t.ly/la4JD
πPaper https://lnkd.in/dZh82xtV
πProject https://lnkd.in/dcv9V2ZF
πRepo https://lnkd.in/dJB9Ehtb
π€Demo https://lnkd.in/dXDb2MWU
π₯10π5β€1
This media is not supported in your browser
VIEW IN TELEGRAM
πMoving Points -> Depthπ
πKAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be releasedπ
πReview https://t.ly/qA2P5
πPaper https://lnkd.in/dpXDaQtM
πProject https://lnkd.in/d9qWYsjP
πRepo https://lnkd.in/dZEMDiJh
πKAIST & Adobe propose Seurat, a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories (via off-the-shelf point tracking models). Repo & Demo to be releasedπ
πReview https://t.ly/qA2P5
πPaper https://lnkd.in/dpXDaQtM
πProject https://lnkd.in/d9qWYsjP
πRepo https://lnkd.in/dZEMDiJh
β€8π₯3π1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πΌSOTA Textured 3D-Guided VTONπΌ
π#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be releasedπ
πReview https://t.ly/0tjdC
πPaper https://lnkd.in/dFseYSXz
πProject https://lnkd.in/djtqzrzs
πRepo TBA
π#ALIBABA unveils 3DV-TON, a novel diffusion model for HQ and temporally consistent video. Generating animatable textured 3D meshes as explicit frame-level guidance, alleviating the issue of models over-focusing on appearance fidelity at the expanse of motion coherence. Code & benchmark to be releasedπ
πReview https://t.ly/0tjdC
πPaper https://lnkd.in/dFseYSXz
πProject https://lnkd.in/djtqzrzs
πRepo TBA
π€―9π7β€4π₯2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π#Nvidia Dynamic Pose π
πNvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπ
πReview https://t.ly/wrcb0
πPaper https://lnkd.in/dycGjAyy
πProject https://lnkd.in/dDZ2Ej_Q
π€Data https://lnkd.in/d8yUSB7m
πNvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia licenseπ
πReview https://t.ly/wrcb0
πPaper https://lnkd.in/dycGjAyy
πProject https://lnkd.in/dDZ2Ej_Q
π€Data https://lnkd.in/d8yUSB7m
π₯4π2π€―1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ S3MOT: SOTA 3D MOT π₯
πS3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT licenseπ
πReview https://t.ly/H_JPv
πPaper https://arxiv.org/pdf/2504.18068
πRepo https://github.com/bytepioneerX/s3mot
πS3MOT: Selective-State-Space model-based MOT that efficiently infers 3D motion and object associations from 2D images through three core components. New SOTA on KITTI with 76.86 HOTA at 31 FPS! Code & Weights to be released under MIT licenseπ
πReview https://t.ly/H_JPv
πPaper https://arxiv.org/pdf/2504.18068
πRepo https://github.com/bytepioneerX/s3mot
π₯7π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ Diffusion Model <-> Depth π₯
πETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo availableπ
πReview https://t.ly/sP9ma
πPaper arxiv.org/pdf/2411.19189
πProject rollingdepth.github.io/
πRepo github.com/prs-eth/rollingdepth
π€Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
πETH & CMU on how to turn a single-image latent diffusion model (LDM) into the SOTA video depth estimator: video depth without video models. Repo released under Apache 2.0 and HF demo availableπ
πReview https://t.ly/sP9ma
πPaper arxiv.org/pdf/2411.19189
πProject rollingdepth.github.io/
πRepo github.com/prs-eth/rollingdepth
π€Demo huggingface.co/spaces/prs-eth/rollingdepthhttps://t.ly/sP9ma
β€11π₯6π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©·Dance vs. #ComputerVisionπ©·
πThe Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo releasedπ
πReview https://t.ly/JEdM3
πPaper arxiv.org/pdf/2505.07249
πProject https://lnkd.in/dD5dsMv5
πThe Saint-Etienne university proposed a new 3D human body pose estimation pipeline to deal with dance analysis. Project page w/ results and interactive demo releasedπ
πReview https://t.ly/JEdM3
πPaper arxiv.org/pdf/2505.07249
πProject https://lnkd.in/dD5dsMv5
β€8π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ββοΈGENMO: Generalist Human Motion π§ββοΈ
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the momentπ₯²
πReview https://t.ly/Q5T_Y
πPaper https://lnkd.in/ds36BY49
πProject https://lnkd.in/dAYHhuFU
π₯12β€3π2π’1π1
Dear friends,
Iβm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iβm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donβt have right now. Iβm sorry, be patient. Iβll be back.
Love u all,
Alessandro.
Iβm truly sorry for being away from the group for so long. I know: no updates so far while AI is running faster than speed of light.
Iβm going through a very difficult time in my life and I need some space to heal. This spare-time project (but important for a lot of people here) needs energy and commitment I donβt have right now. Iβm sorry, be patient. Iβll be back.
Love u all,
Alessandro.
β€371π27π’24