AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🤖 META Human-Robot 🤖

👉#META PARTNR: novel benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. The largest benchmark of its kind: 100,000+ natural language tasks, spanning 60 houses and 5,819 unique objects. Code & Data (🤗) under MIT💙

👉Review https://t.ly/zcN0K
👉Paper arxiv.org/pdf/2411.00081
👉Repo github.com/facebookresearch/partnr-planner
🤗Data huggingface.co/datasets/ai-habitat/partnr_episodes
🔥8🤩21👍1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
💃HumanDiT Long-form Human💃

👉HumanDiT is a novel pose-guided Diffusion trained on a large and wild dataset w/ 14,000 hours of HQ video to produce HD videos with fine-grained bodies. Stunning results but no code announced🥲

👉Review https://t.ly/7rTRr
👉Paper https://arxiv.org/pdf/2502.04847
👉Project https://agnjason.github.io/HumanDiT-page/
5🔥3👍2👏1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🔮Flow-Based Foundation GenAI🔮

👉Goku is the novel SOTA family of joint image-and-video generation models leveraging rectified flow Transformers to achieve industry-leading performance. Amazing results! Repo released (now, empty)💙

👉Review https://t.ly/dzi0O
👉Paper http://arxiv.org/pdf/2502.04896
👉Project saiyan-world.github.io/goku/
👉Repo github.com/Saiyan-World/goku
🔥72😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🥛HAMSTER: Hierarchical VLA Manipulation🥛

👉#Nvidia unveils HAMSTER: novel Hierarchical VLA architecture to enable robotic manipulation with semantic, visual & geometric generalization trained on easy to collect, off-domain data. Source Code announced💙

👉Review https://t.ly/2yXaY
👉Paper https://arxiv.org/pdf/2502.05485
👉Project https://hamster-robot.github.io/
👉Repo TBA
🔥41
This media is not supported in your browser
VIEW IN TELEGRAM
🦶 It's all About Foot 🦶

👉 A collection of three works all about human foot: synthetic foot renders, reconstruction and surface normals. Repos & Datasets available💙

👉Review https://t.ly/GY8mL
👉Paper (last) arxiv.org/pdf/2502.06367
👉Projects www.ollieboyne.com/
👉Repo github.com/OllieBoyne/FOUND
👉Repo github.com/OllieBoyne/SynFoot
👉Repo github.com/OllieBoyne/FOCUS (coming)
🤩42👍2🤣21😢1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🪛 Make anything "Rig-Ready" 🪛

👉RigAnything is a novel autoregressive transformer-based model, which makes 3D assets rig-ready by probabilistically generating joints, skeleton topologies, and assigning skinning weights in a template-free manner. Online demo announced💙

👉Review https://t.ly/bNwxq
👉Paper arxiv.org/pdf/2502.09615
👉Project www.liuisabella.com/RigAnything
🔥148👍4🤩1
Hi friends, what other kind of content would you like to *OCCASIONALLY* see in this group?
Anonymous Poll
44%
🔔 Job/Research offers
65%
📦 AI tools/news (with NO papers)
32%
🔥 Events & Hackathon
3%
📝 Other (comment please)
👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 Animate Anyone 2 🔥

👉 The evolution of the first version that enables character animation w/ environment affordance. Amazing results but no code announced 🥲

👉Review https://t.ly/iNNLB
👉Paper https://arxiv.org/pdf/2502.06145
👉Project https://humanaigc.github.io/animate-anyone-2
17🤯8👍1😍1
🔥Large Language DIFFUSION Model🔥

👉Renmin University introduces LLaDA, a DIFFUSION model trained entirely from scratch, rivaling LLaMA3 8B in performance. Pre-trained from scratch on 2.3T tokens using 0.13M H800 GPU hours, followed by SFT on 4.5M pairs. A new paradigm is born? Repo by the end of Feb.25 💙

👉Review https://t.ly/7Cnrh
👉Paper https://lnkd.in/dCWi3byk
👉Project https://lnkd.in/dB7JRYeA
👉Repo https://lnkd.in/dAqzeCHJ
🤯123🔥3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Unified Low-Level 4D Vision🌈

👉#Nvidia L4P is a novel feedforward, general-purpose, architecture to solve low-level 4D perception tasks in a unified framework. L4P combines a ViTbased backbone with per-task heads that are lightweight and therefore do not require extensive training. One backbone - many SOTAs. Code announced 💙

👉Review https://t.ly/04DGj
👉Paper arxiv.org/pdf/2502.13078
👉Project research.nvidia.com/labs/lpr/l4p/
👉Repo TBA
🔥5👍2🤯1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥 YOLOv12 is out (new SOTA) 🔥

👉YOLOv12 is a novel attention-centric YOLO framework that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. Source Code & Demo released💙

👉Review https://t.ly/jj1oR
👉Paper arxiv.org/pdf/2502.12524
👉Repo github.com/sunsmarterjie/yolov12
🤗Demo https://t.ly/w5rno
🔥22👍9🤯84💩3😍1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
👽Neural-Free Sparse Voxels Rasterization👽

👉#Nvidia unveils a novel efficient radiance field rendering algorithm that incorporates a rasterization process on adaptive sparse voxels without neural networks or 3D Gaussians. Code released (custom license)💙

👉Review https://t.ly/Nh_ic
👉Paper https://lnkd.in/g8k8Zs6R
👉Project https://lnkd.in/gR-bD4Wx
👉Repo https://lnkd.in/gNHX-w4t
🔥14👍4🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🏉MITracker: Multi-View Track🏉

👉MITracker is a novel Multi-View Integration Tracker to efficiently integrate multi-view object features and provide stable tracking. Code & Dataset announced💙

👉Review https://t.ly/RTNUo
👉Paper arxiv.org/pdf/2502.20111
👉Repo github.com/XuM007/MITracker
👉Project xum007.github.io/MITracker.github.io
👍11🔥8😍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧠 Distractor-Aware SAM2 🧠

👉A novel distractor-aware memory for SAM2 and an introspection-based update strategy for VOT. Code & Dataset released💙

👉Review https://t.ly/RBRpQ
👉Paper arxiv.org/pdf/2411.17576
👉Project jovanavidenovic.github.io/dam-4-sam
👉Repo github.com/jovanavidenovic/DAM4SAM/
8🔥5👍2😍1🤣1
This media is not supported in your browser
VIEW IN TELEGRAM
🔥Distill-Any-Depth: SOTA MDE🔥

👉Distill-Any-Depth is the new SOTA monocular depth estimation model trained with a novel knowledge distillation. Authors: ZJUT, WestLake University, LZU & NTU. Source Code, pre-trained models & HF-demo released💙

👉Review https://t.ly/GBJgi
👉Paper arxiv.org/pdf/2502.19204
👉Repo https://lnkd.in/dPtxNrQh
🤗Demo https://lnkd.in/d2TMPf4b
12🔥5👍3👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🍎FindTrack: text-driven VOS 🍎

👉Yonsei University introduces FindTrack, a novel decoupled framework that separates text-driven target ID from mask propagation. Impressive results (even under severe occlusions), new SOTA. Source Code & models to be released💙

👉Review https://t.ly/2smaF
👉Paper arxiv.org/pdf/2503.03492
👉Repo github.com/suhwan-cho/FindTrack
🔥10🤯4👍32😍1
This media is not supported in your browser
VIEW IN TELEGRAM
📒 Moving-Camera Diffusion 📒

👉Tencent unveils TrajectoryCrafter, a novel approach to redirect camera trajectories for monocular videos. Impressive results, the future of commercial #adv. Code & Demo released💙

👉Review https://t.ly/L-IoR
👉Paper https://arxiv.org/pdf/2503.05638
👉Project https://trajectorycrafter.github.io/
👉Repo github.com/TrajectoryCrafter/TrajectoryCrafter
🤗Demo https://huggingface.co/spaces/Doubiiu/TrajectoryCrafter
🔥12🤩42👍1👏1
💙 Announcing #Py4AI 2025 💙

👉 The second edition of Py4AI conference is official! An all-day, fully free, event for #AI & #Python lovers.

𝐓𝐡𝐞 𝐟𝐢𝐫𝐬𝐭 𝐛𝐚𝐭𝐜𝐡 𝐨𝐟 𝐬𝐩𝐞𝐚𝐤𝐞𝐫𝐬:
🚀Dana Aubakirova | Hugging Face🤗
🚀Yunhao Liu & Ruoya Sheng | ByteDance🔥
🚀Alice Casiraghi | 🌏🌎🌍
🚀Luca Arrotta, PhD | Datapizza🍕
🚀Valeria Zuccoli | Bettini Srl
🚀Mirco Planamente | ARGO Vision
🚀Daniele Zonca | Red Hat

👉 Info & registration: https://t.ly/37wWj
71👍1🔥1🤩1
This media is not supported in your browser
VIEW IN TELEGRAM
🎯RexSeek: Referring Any Object🎯

👉Novel referring detection model based on multimodal LLM to precisely locate objects based on user-input natural language. Model specialization on humans. Code released 💙

👉Review https://shorturl.at/CGsT2
👉Paper arxiv.org/pdf/2503.08507
👉Code github.com/IDEA-Research/RexSeek
👍176👏4🔥2
This media is not supported in your browser
VIEW IN TELEGRAM
🐶OVTR: E2E Transformer MOT🐶

👉HUST University proposes OVTR (End-to-End Open-Vocabulary Multiple Object Tracking with TRansformer), the first end-to-end open-vocabulary tracker that models motion, appearance, and category simultaneously. Source Code released under MIT💙

👉Review https://t.ly/K3ASX
👉Paper arxiv.org/pdf/2503.10616
👉Code https://github.com/jinyanglii/OVTR
🔥112👍1😍1