This media is not supported in your browser
VIEW IN TELEGRAM
๐ชRobo-Emulation via Video Imitation๐ช
๐OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
๐Review https://t.ly/_N29-
๐Paper arxiv.org/pdf/2410.11792
๐Project https://lnkd.in/d6bHF_-s
๐OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.
๐Review https://t.ly/_N29-
๐Paper arxiv.org/pdf/2410.11792
๐Project https://lnkd.in/d6bHF_-s
๐4๐คฏ2๐ฅ1
๐ฅ "Nuclear" AI vs. Hyper-Cheap Inference ๐ฅ
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
โญ What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
23%
๐คฒPortabile Training Workstation
35%
โ๏ธNuclear energy for AI training
33%
๐ฒ๏ธCheaper Only-inference devices
9%
๐ฐCloud-intensive Only-inference
๐4โค1๐ฅ1๐คฏ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งโโ๏ธOmni-RGPT: SOTA MLLM Understanding๐งโโ๏ธ
๐ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
๐Review https://t.ly/KHnQ7
๐Paper arxiv.org/pdf/2501.08326
๐Project miranheo.github.io/omni-rgpt/
๐Repo TBA soon
๐ #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.
๐Review https://t.ly/KHnQ7
๐Paper arxiv.org/pdf/2501.08326
๐Project miranheo.github.io/omni-rgpt/
๐Repo TBA soon
๐ฅ10โค3๐พ2โก1๐1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ #Nvidia Foundation ZS-Stereo ๐
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
๐Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released๐
๐Review https://t.ly/rfBr5
๐Paper arxiv.org/pdf/2501.09898
๐Project nvlabs.github.io/FoundationStereo/
๐Repo github.com/NVlabs/FoundationStereo/tree/master
โค6๐ฅ6๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅHAMSTER: Hierarchical VLA Manipulation๐ฅ
๐#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced๐
๐Review https://t.ly/2yXaY
๐Paper https://arxiv.org/pdf/2502.05485
๐Project https://hamster-robot.github.io/
๐Repo TBA
๐#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced๐
๐Review https://t.ly/2yXaY
๐Paper https://arxiv.org/pdf/2502.05485
๐Project https://hamster-robot.github.io/
๐Repo TBA
๐ฅ4โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Unified Low-Level 4D Vision๐
๐#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced ๐
๐Review https://t.ly/04DGj
๐Paper arxiv.org/pdf/2502.13078
๐Project research.nvidia.com/labs/lpr/l4p/
๐Repo TBA
๐#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced ๐
๐Review https://t.ly/04DGj
๐Paper arxiv.org/pdf/2502.13078
๐Project research.nvidia.com/labs/lpr/l4p/
๐Repo TBA
๐ฅ5๐2๐คฏ1๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฝNeural-Free Sparse Voxels Rasterization๐ฝ
๐#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)๐
๐Review https://t.ly/Nh_ic
๐Paper https://lnkd.in/g8k8Zs6R
๐Project https://lnkd.in/gR-bD4Wx
๐Repo https://lnkd.in/gNHX-w4t
๐#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)๐
๐Review https://t.ly/Nh_ic
๐Paper https://lnkd.in/g8k8Zs6R
๐Project https://lnkd.in/gR-bD4Wx
๐Repo https://lnkd.in/gNHX-w4t
๐ฅ14๐4๐คฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐3D MultiModal Memory๐
๐M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
๐Review https://t.ly/OrXZO
๐Paper arxiv.org/pdf/2503.16413
๐Project https://lnkd.in/dXAZ97KH
๐Repo https://lnkd.in/dWvunCET
๐M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos
๐Review https://t.ly/OrXZO
๐Paper arxiv.org/pdf/2503.16413
๐Project https://lnkd.in/dXAZ97KH
๐Repo https://lnkd.in/dWvunCET
๐ฅ10โค4๐1๐1
๐ฆ Scaling Vision to 4K๐ฆ
๐PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & ๐ค announced๐
๐Review https://t.ly/WN479
๐Paper https://lnkd.in/ddWq8UpX
๐Project https://lnkd.in/dMkTY8-k
๐Repo https://lnkd.in/d9YSB6yv
๐PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & ๐ค announced๐
๐Review https://t.ly/WN479
๐Paper https://lnkd.in/ddWq8UpX
๐Project https://lnkd.in/dMkTY8-k
๐Repo https://lnkd.in/d9YSB6yv
๐ฅ14โค4๐2๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐PartField #3D Part Segmentation๐
๐#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License๐
๐Review https://t.ly/fGb2O
๐Paper https://lnkd.in/dGeyKSzG
๐Code https://lnkd.in/dbe57XGH
๐Project https://lnkd.in/dhEgf7X2
๐#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License๐
๐Review https://t.ly/fGb2O
๐Paper https://lnkd.in/dGeyKSzG
๐Code https://lnkd.in/dbe57XGH
๐Project https://lnkd.in/dhEgf7X2
โค2๐ฅ2๐คฏ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆง #Nvidia Describe Anything ๐ฆง
๐Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on ๐ค
๐Review https://t.ly/la4JD
๐Paper https://lnkd.in/dZh82xtV
๐Project https://lnkd.in/dcv9V2ZF
๐Repo https://lnkd.in/dJB9Ehtb
๐คDemo https://lnkd.in/dXDb2MWU
๐Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on ๐ค
๐Review https://t.ly/la4JD
๐Paper https://lnkd.in/dZh82xtV
๐Project https://lnkd.in/dcv9V2ZF
๐Repo https://lnkd.in/dJB9Ehtb
๐คDemo https://lnkd.in/dXDb2MWU
๐ฅ10๐5โค1
This media is not supported in your browser
VIEW IN TELEGRAM
๐#Nvidia Dynamic Pose ๐
๐Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐
๐Review https://t.ly/wrcb0
๐Paper https://lnkd.in/dycGjAyy
๐Project https://lnkd.in/dDZ2Ej_Q
๐คData https://lnkd.in/d8yUSB7m
๐Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license๐
๐Review https://t.ly/wrcb0
๐Paper https://lnkd.in/dycGjAyy
๐Project https://lnkd.in/dDZ2Ej_Q
๐คData https://lnkd.in/d8yUSB7m
๐ฅ4๐2โค1๐คฏ1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งโโ๏ธGENMO: Generalist Human Motion ๐งโโ๏ธ
๐#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐ฅฒ
๐Review https://t.ly/Q5T_Y
๐Paper https://lnkd.in/ds36BY49
๐Project https://lnkd.in/dAYHhuFU
๐#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment๐ฅฒ
๐Review https://t.ly/Q5T_Y
๐Paper https://lnkd.in/ds36BY49
๐Project https://lnkd.in/dAYHhuFU
๐ฅ13โค3๐2๐ข1๐1
This media is not supported in your browser
VIEW IN TELEGRAM
๐งคDiffusive Hand from Signs๐งค
๐LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐
๐Review https://t.ly/HonX_
๐Paper https://arxiv.org/pdf/2508.15902
๐Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Repo TBA
๐LIGM + #NVIDIA unveil a novel generative model of 3D hand motions from Sign Language Data. Motion characteristics such as handshapes, locations, finger, hand & arm movements. Code, Models & Data to be released ๐
๐Review https://t.ly/HonX_
๐Paper https://arxiv.org/pdf/2508.15902
๐Project https://imagine.enpc.fr/~leore.bensabath/HandMDM/
๐Data drive.google.com/drive/u/1/folders/1BLsu2hAqhAJ_gnGb9TNXW7MLiSuSEzEj
๐Repo TBA
๐3๐ฅ2