AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🌦️ 100+ GPU weather training 🌦️

👉#NVIDIA just released Makani: massively parallel training of weather and climate prediction models on 100+ GPUs and to enable the development of the next generation of weather and climate models.

👉 Review https://t.ly/jageY
👉 Code https://lnkd.in/d4NFZ5xi
23🤯71😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 #6D Foundation Pose 🔥

👉#Nvidia unveils FoundationPose, a novel (and unified) foundation model for 6D object pose estimation and tracking.

👉Review https://t.ly/HGd4h
👉Project https://lnkd.in/dPcnBKWm
👉Paper https://lnkd.in/dixn_iHZ
👉Code coming 🩷
🔥125👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦩 WildRGB-D: Objects in the Wild 🦩

👉#NVIDIA unveils a novel RGB-D object dataset captured in the wild: ~8500 recorded objects, ~20,000 RGBD videos, 46 categories with corresponding masks and 3D point clouds.

👉Review https://t.ly/WCqVz
👉Data github.com/wildrgbd/wildrgbd
👉Paper arxiv.org/pdf/2401.12592.pdf
👉Project wildrgbd.github.io/
👍93🔥2👏1🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌆 Up to 69x Faster SAM 🌆

👉EfficientViT-SAM is a new family of accelerated Segment Anything Models. The same old SAM’s lightweight prompt encoder and mask decoder, while replacing the heavy image encoder with EfficientViT. Up to 69x faster, source code released. Authors: Tsinghua, MIT & #Nvidia

👉Review https://t.ly/zGiE9
👉Paper arxiv.org/pdf/2402.05008.pdf
👉Code github.com/mit-han-lab/efficientvit
🔥19👍74🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🔌 BodyMAP: human body & pressure 🔌

👉#Nvidia (+CMU) unveils BodyMAP, the new SOTA in predicting body mesh (3D pose & shape) and 3D applied pressure on the human body. Source Code released, Dataset coming 💙

👉Review https://t.ly/8926S
👉Project bodymap3d.github.io/
👉Paper https://lnkd.in/gCxH4ev3
👉Code https://lnkd.in/gaifdy3q
8🤯41👍1🔥1
📈Gradient Boosting Reinforcement Learning📈

👉#Nvidia unveils GBRL, a framework that extends the advantages of Gradient Boosting Trees to the RL domain. GBRL adapts the power of Gradient Boosting Trees to the unique challenges of RL environments, including non-stationarity and absence of predefined targets. Code released💙

👉Review https://t.ly/zv9pl
👉Paper https://arxiv.org/pdf/2407.08250
👉Code https://github.com/NVlabs/gbrl
7🤯4👍3🔥1🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🛳️ EVER Ellipsoid Rendering 🛳️

👉UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

👉Review https://t.ly/zAfGU
👉Paper arxiv.org/pdf/2410.01804
👉Project half-potato.gitlab.io/posts/ever/
🔥132👍2👏1🤯1😱1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🪞Robo-Emulation via Video Imitation🪞

👉OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

👉Review https://t.ly/_N29-
👉Paper arxiv.org/pdf/2410.11792
👉Project https://lnkd.in/d6bHF_-s
👍4🤯2🔥1
🔥 "Nuclear" AI vs. Hyper-Cheap Inference 🔥

What do you expect in 2025 after the #Nvidia announcements at CES 2025? Free to comment :)
Anonymous Poll
23%
🤲Portabile Training Workstation
35%
⚛️Nuclear energy for AI training
33%
🖲️Cheaper Only-inference devices
9%
💰Cloud-intensive Only-inference
👍41🔥1🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞‍♂️Omni-RGPT: SOTA MLLM Understanding🧞‍♂️

👉 #NVIDIA presents Omni-RGPT, MLLM for region-level comprehension for both images & videos. New SOTA on image/video-based commonsense reasoning.

👉Review https://t.ly/KHnQ7
👉Paper arxiv.org/pdf/2501.08326
👉Project miranheo.github.io/omni-rgpt/
👉Repo TBA soon
🔥103🍾21👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈 #Nvidia Foundation ZS-Stereo 🌈

👉Nvidia unveils FoundationStereo, a foundation model for stereo depth estimation with strong zero-shot generalization. In addition, a large-scale (1M stereo pairs) synthetic training dataset featuring large diversity and high photorealism. Code, model & dataset to be released💙

👉Review https://t.ly/rfBr5
👉Paper arxiv.org/pdf/2501.09898
👉Project nvlabs.github.io/FoundationStereo/
👉Repo github.com/NVlabs/FoundationStereo/tree/master
6🔥6🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🥛HAMSTER: Hierarchical VLA Manipulation🥛

👉#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced💙

👉Review https://t.ly/2yXaY
👉Paper https://arxiv.org/pdf/2502.05485
👉Project https://hamster-robot.github.io/
👉Repo TBA
🔥41
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Unified Low-Level 4D Vision🌈

👉#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced 💙

👉Review https://t.ly/04DGj
👉Paper arxiv.org/pdf/2502.13078
👉Project research.nvidia.com/labs/lpr/l4p/
👉Repo TBA
🔥5👍2🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
👽Neural-Free Sparse Voxels Rasterization👽

👉#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)💙

👉Review https://t.ly/Nh_ic
👉Paper https://lnkd.in/g8k8Zs6R
👉Project https://lnkd.in/gR-bD4Wx
👉Repo https://lnkd.in/gNHX-w4t
🔥14👍4🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🙀3D MultiModal Memory🙀

👉M3 is a novel framework by UCSD & #NVIDIA for rendering 3D scenes w/ RGB & foundation model embeddings. Rich spatial & semantic understanding via novel memory system designed to retain multimodal info through videos

👉Review https://t.ly/OrXZO
👉Paper arxiv.org/pdf/2503.16413
👉Project https://lnkd.in/dXAZ97KH
👉Repo https://lnkd.in/dWvunCET
🔥104👍1👏1
🦎 Scaling Vision to 4K🦎

👉PS3 by #Nvidia (+UC Berkeley) to scale-up CLIP-style vision pre-training to 4K with *near-constant* cost. Encoding LR global image and selectively processes only informative HR regions. Impressive work. Code/weights & 🤗 announced💙

👉Review https://t.ly/WN479
👉Paper https://lnkd.in/ddWq8UpX
👉Project https://lnkd.in/dMkTY8-k
👉Repo https://lnkd.in/d9YSB6yv
🔥144👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🍏PartField #3D Part Segmentation🍏

👉#Nvidia unveils PartField, a FFW approach for learning part-based 3D features, which captures the general concept of parts and their hierarchy. Suitable for single-shape decomposition, co-segm., correspondence & more. Code & Models released under Nvidia License💙

👉Review https://t.ly/fGb2O
👉Paper https://lnkd.in/dGeyKSzG
👉Code https://lnkd.in/dbe57XGH
👉Project https://lnkd.in/dhEgf7X2
2🔥2🤯2
This media is not supported in your browser
VIEW IN TELEGRAM
🦧 #Nvidia Describe Anything 🦧

👉Nvidia unveils Describe Anything Model (DAM) the new SOTA in generating detailed descriptions for user-specified regions in images/videos, marked by points, boxes, scribbles, or masks. Repo under Apache, Dataset available and live demo on 🤗

👉Review https://t.ly/la4JD
👉Paper https://lnkd.in/dZh82xtV
👉Project https://lnkd.in/dcv9V2ZF
👉Repo https://lnkd.in/dJB9Ehtb
🤗Demo https://lnkd.in/dXDb2MWU
🔥10👍51
This media is not supported in your browser
VIEW IN TELEGRAM
🍏#Nvidia Dynamic Pose 🍏

👉Nvidia unveils DynPose-100K, the largest dataset of dynamic Internet videos annotated with camera poses. Dataset released under Nvidia license💙

👉Review https://t.ly/wrcb0
👉Paper https://lnkd.in/dycGjAyy
👉Project https://lnkd.in/dDZ2Ej_Q
🤗Data https://lnkd.in/d8yUSB7m
🔥4👍21🤯1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🧞‍♀️GENMO: Generalist Human Motion 🧞‍♀️

👉#Nvidia presents GENMO, a unified Generalist Model for Human Motion that bridges motion estimation and generation in a single framework. Conditioning on videos, 2D keypoints, text, music, and 3D keyframes. No code at the moment🥲

👉Review https://t.ly/Q5T_Y
👉Paper https://lnkd.in/ds36BY49
👉Project https://lnkd.in/dAYHhuFU
🔥133👍2😢1😍1