AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
☀️ Relightable Full-Body Avatars ☀️

👉#Meta unveils the first approach ever to jointly model the relightable appearance of the body, face, and hands of drivable avatars.

👉Review https://t.ly/kx9gf
👉Paper arxiv.org/pdf/2501.14726
👉Project neuralbodies.github.io/RFGCA
3👍3🔥31🤯1😢1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌅 Generative Human Mesh Recovery 🌅

👉GenHMR is a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in 2D-to-3D mapping process. Impressive results but no code announced 🥺

👉Review https://t.ly/Rrzpj
👉Paper https://arxiv.org/pdf/2412.14444
👉Project m-usamasaleem.github.io/publication/GenHMR/GenHMR.html
🔥6👍21🤯1🍾1
Social feed of everyone is broken because of unnecessary/not required opinions about DeepSeek. Your wish:
Anonymous Poll
37%
🛑 STOP posting about!
63%
🟩 Keep posting. we want more!
👍1
💎AI-driven Docs Conversion💎

👉Docling by IBM, is the ALL-in-ONE, open source solution for documents; parsing several types of popular formats into a unified, richly structured representation. Powered by SOTA models for layout (DocLayNet) and table structure (TableFormer), it runs efficiently on low-cost hardware. Code under MIT💙

👉Review https://t.ly/nSCfT
👉Paper https://lnkd.in/dc5Kpc2F
👉Repo https://lnkd.in/d9gvw9bt
18👍8🔥1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🈯 SOTA 0-Shot Multi-View 🈯

👉MVGD by #TOYOTA is the SOTA method that generates images and scale-consistent depth maps from novel viewpoints given an arbitrary number of posed input views. A novel diffusion-based architecture capable of direct pixel-level generation. Code announced 💙

👉Review https://t.ly/_ecKl
👉Paper arxiv.org/pdf/2501.18804
👉Project mvgd.github.io/
👉Repo TBA
🔥81😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🐙MambaGlue: SOTA feats. matching🐙

👉MambaGlue is a hybrid neural network combining the Mamba and the Transformer architectures to match local features. Source Code announced, to be released💙

👉Review https://shorturl.at/LxDG1
👉Paper arxiv.org/pdf/2502.00462
👉Repo https://lnkd.in/dAujfGZQ
🤩93🔥2👏2👍1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🛸Real-Time Differentiable Tracing🛸

👉 Radiant Foam is a novel scene representation by leveraging the decades-old efficient volumetric mesh ray tracing algorithm (largely overlooked in recent research). Performing like Gaussian Splatting, without the constraints of rasterization. Code announced💙

👉Review https://shorturl.at/26U06
👉Paper https://arxiv.org/pdf/2502.01157
👉Project https://radfoam.github.io/
👉Repo https://github.com/theialab/radfoam
🔥71😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 VideoJAM: #META's Video-Model (SOTA) 🔥

👉#META's VideoJAM: the new SOTA (by large margin) in motion coherence for video generation, much better than SORA! A strong motion prior into any video-gen model. Impressive results, no code announced🥲

👉Review https://shorturl.at/id7Bt
👉Paper https://arxiv.org/pdf/2502.02492
👉Project https://hila-chefer.github.io/videojam-paper.github.io/
🔥94👍1👏1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
👗3D Dynamic Garments👗

👉UCLA introduces Dress-1-to-3, a novel pipeline that reconstructs physics-plausible, simulation-ready separated garments with sewing patterns and humans from an in-the-wild image.

👉Review https://t.ly/qciHV
👉Paper arxiv.org/pdf/2502.03449
👉Project dress-1-to-3.github.io
🔥83👍3👏2🤩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🤖 META Human-Robot 🤖

👉#META PARTNR: novel benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. The largest benchmark of its kind: 100,000+ natural language tasks, spanning 60 houses and 5,819 unique objects. Code & Data (🤗) under MIT💙

👉Review https://t.ly/zcN0K
👉Paper arxiv.org/pdf/2411.00081
👉Repo github.com/facebookresearch/partnr-planner
🤗Data huggingface.co/datasets/ai-habitat/partnr_episodes
🔥8🤩21👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
💃HumanDiT Long-form Human💃

👉HumanDiT is a novel pose-guided Diffusion trained on a large and wild dataset w/ 14,000 hours of HQ video to produce HD videos with fine-grained bodies. Stunning results but no code announced🥲

👉Review https://t.ly/7rTRr
👉Paper https://arxiv.org/pdf/2502.04847
👉Project https://agnjason.github.io/HumanDiT-page/
5🔥3👍2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔮Flow-Based Foundation GenAI🔮

👉Goku is the novel SOTA family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance. Amazing results! Repo released (now, empty)💙

👉Review https://t.ly/dzi0O
👉Paper http://arxiv.org/pdf/2502.04896
👉Project saiyan-world.github.io/goku/
👉Repo github.com/Saiyan-World/goku
🔥72😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🥛HAMSTER: Hierarchical VLA Manipulation🥛

👉#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced💙

👉Review https://t.ly/2yXaY
👉Paper https://arxiv.org/pdf/2502.05485
👉Project https://hamster-robot.github.io/
👉Repo TBA
🔥41
This media is not supported in your browser
VIEW IN TELEGRAM
🦶 It's all About Foot 🦶

👉 A collection of three works all about human foot: synthetic foot renders, reconstruction and surface normals. Repos & Datasets available💙

👉Review https://t.ly/GY8mL
👉Paper (last) arxiv.org/pdf/2502.06367
👉Projects www.ollieboyne.com/
👉Repo github.com/OllieBoyne/FOUND
👉Repo github.com/OllieBoyne/SynFoot
👉Repo github.com/OllieBoyne/FOCUS (coming)
🤩42👍2🤣21😢1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🪛 Make anything "Rig-Ready" 🪛

👉RigAnything is a novel autoregressive transformer-based model, which makes 3D assets rig-ready by probabilistically generating joints, skeleton topologies, and assigning skinning weights in a template-free manner. Online demo announced💙

👉Review https://t.ly/bNwxq
👉Paper arxiv.org/pdf/2502.09615
👉Project www.liuisabella.com/RigAnything
🔥148👍4🤩1
Hi friends, what other kind of content would you like to *OCCASIONALLY* see in this group?
Anonymous Poll
44%
🔔 Job/Research offers
65%
📦 AI tools/news (with NO papers)
32%
🔥 Events & Hackathon
3%
📝 Other (comment please)
👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Animate Anyone 2 🔥

👉 The evolution of the first version that enables character animation w/ environment affordance. Amazing results but no code announced 🥲

👉Review https://t.ly/iNNLB
👉Paper https://arxiv.org/pdf/2502.06145
👉Project https://humanaigc.github.io/animate-anyone-2
17🤯8👍1😍1
🔥Large Language DIFFUSION Model🔥

👉Renmin University introduces LLaDA, a DIFFUSION model trained entirely from scratch, rivaling LLaMA3 8B in performance. Pre-trained from scratch on 2.3T tokens using 0.13M H800 GPU hours, followed by SFT on 4.5M pairs. A new paradigm is born? Repo by the end of Feb.25 💙

👉Review https://t.ly/7Cnrh
👉Paper https://lnkd.in/dCWi3byk
👉Project https://lnkd.in/dB7JRYeA
👉Repo https://lnkd.in/dAqzeCHJ
🤯123🔥3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Unified Low-Level 4D Vision🌈

👉#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced 💙

👉Review https://t.ly/04DGj
👉Paper arxiv.org/pdf/2502.13078
👉Project research.nvidia.com/labs/lpr/l4p/
👉Repo TBA
🔥5👍2🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 YOLOv12 is out (new SOTA) 🔥

👉YOLOv12 is a novel attention-centric YOLO framework that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. Source Code & Demo released💙

👉Review https://t.ly/jj1oR
👉Paper arxiv.org/pdf/2502.12524
👉Repo github.com/sunsmarterjie/yolov12
🤗Demo https://t.ly/w5rno
🔥22👍9🤯84💩3😍1🤣1