AI with Papers - Artificial Intelligence & Deep Learning
17.1K subscribers
159 photos
277 videos
14 files
1.45K links
All the AI with papers. Every day fresh updates about #DeepLearning #MachineLearning #LLM & #ComputerVision

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/

#AI #chatGPT
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🦄Unified Correspondence Transformer🦄

👉UniCorrn is the first correspondence model with shared weights that unifies 2D-2D, 2D-3D, and 3D-3D geometric matching with a transformer. CC BY-NC-SA 4.0💙

👉Review https://t.ly/2OBdq
👉Paper https://arxiv.org/pdf/2605.04044
👉Project https://neu-vi.github.io/UniCorrn/
👉Repo https://github.com/neu-vi/UniCorrn
👍5🔥54🤯4👏2
This media is not supported in your browser
VIEW IN TELEGRAM
🍒Count Anything, Any Granularity🍒

👉Open-world counting as multi-grained counting, where visual exemplars specify target appearance and fine-grained text specifies the intended semantic granularity across five explicit levels. Repo/Data under Apache💙

👉Review https://t.ly/nqz80
👉Paper https://lnkd.in/dp7khTRU
👉Project https://lnkd.in/d_jfX_Yn
👉Repo https://lnkd.in/dkTRGZkG
👉Data https://lnkd.in/dB83jRyT
115👍6👏2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🪔Latent Decoding Pixel Diffusion🪔

👉PiD by Nvidia is a plug-and-play diffusion decoder that replaces VAE/RAE decoders, turning latent representations directly into super-resolved pixels in a single pass. Repo under Apache 2.0💙

👉Review https://t.ly/y19mA
👉Paper https://lnkd.in/duVC25C2
👉Project https://lnkd.in/dW6TkzCB
👉Repo https://lnkd.in/dnGdgKRr
8🔥6👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔍 Nvidia Locate Anything 🔍

👉Diverse localization tasks under a unified vision-language model, including document understanding, GUI grounding, dense detection, and OCR. Repo released💙

👉Review https://t.ly/PvwFo
👉Paper https://lnkd.in/dWfNpzPZ
👉Project https://lnkd.in/dM89BX-8
👉Repo https://lnkd.in/dC4KCQSM
13🔥13👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🕷️Human Universal Grasping🕷️

👉HUG is a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera.

👉Review https://t.ly/VG1Eu
👉Paper https://arxiv.org/pdf/2606.17054
👉Repo https://github.com/KevinyWu/hug
👉Project https://grasping.io/
10🔥4👍1👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🔊VolHuMe - Volumetric Human Meshes🔊

👉VolHuMe (H/T @Martinella_94) is a novel, high-resolution large-scale dataset of volumetric human meshes with complete 4D GT: multi-view RGB-D, textured meshes, dense point clouds, normal maps, rigged assets, garment segmentation, and SMPL-X fittings in one dataset. Insane💙

👉Review https://t.ly/b5vxy
👉Paper https://arxiv.org/pdf/2606.23062
👉Project giuli13.github.io/volhume-website/#
👉Repo TBA soon
4🔥21👏1
This media is not supported in your browser
VIEW IN TELEGRAM
👋 Hi everyone!

Over the past few weeks, the number of join requests has increased dramatically, which unfortunately also means a much higher number of spam and bots (in the last days around five hundreds been cut off)

To help me distinguish real people from fake profiles - and avoid rejecting genuine requests by mistake - I'd really appreciate if your profile includes:
📷 A real profile photo
👤 Your full name (or something reasonably identifiable)
💬 If you contact me, please use English if possible.

I don't speak Russian, Arabic, or Chinese, so if your profile and messages are only in those languages, it's very difficult for me to tell whether you're a real person or an automated account. Thank you for your understanding and for helping keep this damn community welcoming and spam-free!

With love,
Alessandro 😈
18👍142🔥1
Media is too big
VIEW IN TELEGRAM
🍀OctoSense: Open Sensing🍀

👉OctoSense is an open-source sensor platform with stereo RGB and event cameras, LiDAR, a thermal camera, an inertial measurement unit, RTK-corrected global positioning system, and proprioception.

👉Review https://t.ly/oFN8L
👉Paper https://lnkd.in/dM3zpyju
👉Project https://lnkd.in/ddrQ3uJ6
👉Repo https://lnkd.in/dhSDjSfG
11🔥5💩3
This media is not supported in your browser
VIEW IN TELEGRAM
🛸PriorEye: Geospatial Self-Driving🛸

👉MRG (Oxford) introduces geospatial visual priors to leverage the street-level images in autonomous driving. Consistent improvement in performance. Repo under Apache💙

👉Review https://t.ly/7Jgav
👉Paper https://lnkd.in/dYeD2m7n
👉Project https://lnkd.in/dWJvNemr
👉Repo https://lnkd.in/dNExGGtx
🔥54👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🌒LUNA: Universal 3D Animation🌔

👉LUNA by HKUST + META is a novel LBS-free universal neural animation model that directly maps multiple 2D controls like images, keypoints, sketch and unseen characters into 3D-G deformations, bypassing explicit body fitting.

👉Review https://t.ly/ZX9Ex
👉Paper https://arxiv.org/pdf/2606.31981
👉Project https://penghtyx.github.io/LUNA/
👉Repo N/A 🥲
4🔥2
🔥Nvidia SpatialClaw is out🔥

👉From Nvidia a novel training-free framework for spatial reasoning that adopts code as the action interface. SpatialClaw lets a VLM-backed agent write Python in a persistent kernel, composing perception modules, inspecting intermediate results, and revising its strategy across steps. Impressive: +11.2 points on 20 benchmarks💙

👉Review https://t.ly/7JB0x
👉Paper https://arxiv.org/pdf/2606.13673
👉Project https://spatialclaw.github.io/
👉Repo https://github.com/NVlabs/SpatialClaw
🤯51🔥1