AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
238 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🧸Motion Instruction Fine-Tuning🧸

πŸ‘‰MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingπŸ’™

πŸ‘‰Review https://t.ly/iJ2UY
πŸ‘‰Paper https://arxiv.org/pdf/2409.10683
πŸ‘‰Project https://motif-1k.github.io/
πŸ‘‰Code coming
πŸ‘1πŸ”₯1🀯1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽ SoccerNet 2024 Results ⚽

πŸ‘‰SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!

πŸ‘‰Review https://t.ly/DUPgx
πŸ‘‰Paper arxiv.org/pdf/2409.10587
πŸ‘‰Repo github.com/SoccerNet
πŸ‘‰Project www.soccer-net.org/
πŸ”₯12πŸ‘6🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌏 JoyHallo: Mandarin Digital Human 🌏

πŸ‘‰JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the language’s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableπŸ’™

πŸ‘‰Review https://t.ly/5NGDh
πŸ‘‰Paper arxiv.org/pdf/2409.13268
πŸ‘‰Project jdh-algo.github.io/JoyHallo/
πŸ‘‰Code github.com/jdh-algo/JoyHallo
πŸ”₯9πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🎒 Robo-quadruped Parkour🎒

πŸ‘‰LAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableπŸ’™

πŸ‘‰Review https://t.ly/-6VRm
πŸ‘‰Paper arxiv.org/pdf/2409.13678
πŸ‘‰Project gepetto.github.io/SoloParkour/
πŸ‘‰Code github.com/Gepetto/SoloParkour
πŸ”₯5πŸ‘2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🩰 Dressed Humans in the wild 🩰

πŸ‘‰ETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming πŸ’™

πŸ‘‰Review https://t.ly/evgmN
πŸ‘‰Paper arxiv.org/pdf/2409.15269
πŸ‘‰Project moygcc.github.io/ReLoo/
πŸ‘‰Code github.com/eth-ait/ReLoo
🀯9❀2πŸ‘1πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌾 New SOTA Edge Detection 🌾

πŸ‘‰CUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedπŸ’™

πŸ‘‰Review https://t.ly/zUMcS
πŸ‘‰Paper arxiv.org/pdf/2409.14976
πŸ‘‰Code github.com/Li-yachuan/NBED
πŸ”₯11πŸ‘5πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘©β€πŸ¦° SOTA Gaussian Haircut πŸ‘©β€πŸ¦°

πŸ‘‰ETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedπŸ’™

πŸ‘‰Review https://t.ly/aiOjq
πŸ‘‰Paper arxiv.org/pdf/2409.14778
πŸ‘‰Project https://lnkd.in/dFRm2ycb
πŸ‘‰Repo https://lnkd.in/d5NWNkb5
πŸ”₯16πŸ‘2❀1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‡SPARK: Real-time Face CaptureπŸ‡

πŸ‘‰Technicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedπŸ’™

πŸ‘‰Review https://t.ly/rZOgp
πŸ‘‰Paper arxiv.org/pdf/2409.07984
πŸ‘‰Project kelianb.github.io/SPARK/
πŸ‘‰Repo github.com/KelianB/SPARK/
πŸ”₯10❀2πŸ‘1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴 One-Image Object Detection 🦴

πŸ‘‰Delft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπŸ’™

πŸ‘‰Review https://t.ly/-li2G
πŸ‘‰Paper arxiv.org/pdf/2410.00900
πŸ‘‰Code github.com/RobinGerster7/OSSA
πŸ”₯19πŸ‘2⚑1πŸ‘1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›³οΈ EVER Ellipsoid Rendering πŸ›³οΈ

πŸ‘‰UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∼30 FPS at 720p on #NVIDIA RTX4090.

πŸ‘‰Review https://t.ly/zAfGU
πŸ‘‰Paper arxiv.org/pdf/2410.01804
πŸ‘‰Project half-potato.gitlab.io/posts/ever/
πŸ”₯13❀2πŸ‘2πŸ‘1🀯1😱1🍾1
πŸ”₯ "Deep Gen-AI" Full Course πŸ”₯

πŸ‘‰A fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...

πŸ‘‰Review https://t.ly/ylBxq
πŸ‘‰Course https://lnkd.in/dMKH9gNe
πŸ‘‰Lectures https://lnkd.in/d_uwDvT6
❀21πŸ”₯7πŸ‘2πŸ‘1πŸ₯°1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🐏 EFM3D: 3D Ego-Foundation 🐏

πŸ‘‰#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code releasedπŸ’™

πŸ‘‰Review https://t.ly/cDJv6
πŸ‘‰Paper arxiv.org/pdf/2406.10224
πŸ‘‰Project www.projectaria.com/datasets/aeo/
πŸ‘‰Repo github.com/facebookresearch/efm3d
πŸ”₯9❀2πŸ‘2⚑1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯¦Gaussian Splatting VTONπŸ₯¦

πŸ‘‰GS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announcedπŸ’™

πŸ‘‰Review https://t.ly/sTPbW
πŸ‘‰Paper arxiv.org/pdf/2410.05259
πŸ‘‰Project yukangcao.github.io/GS-VTON/
πŸ‘‰Repo github.com/yukangcao/GS-VTON
πŸ”₯14❀3πŸ‘1πŸ‘1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’‘Diffusion Models RelightingπŸ’‘

πŸ‘‰#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.

πŸ‘‰Review https://t.ly/fliXU
πŸ‘‰Paper arxiv.org/pdf/2410.08188
πŸ‘‰Project www.eyelinestudios.com/research/diffrelight.html
πŸ”₯17❀7⚑2πŸ‘2😍2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ₯ŽPOKEFLEX: Soft Object DatasetπŸ₯Ž

πŸ‘‰PokeFlex from ETH is a dataset that includes 3D textured meshes, point clouds, RGB & depth maps of deformable objects. Pretrained models & dataset announcedπŸ’™

πŸ‘‰Review https://t.ly/GXggP
πŸ‘‰Paper arxiv.org/pdf/2410.07688
πŸ‘‰Project https://lnkd.in/duv-jS7a
πŸ‘‰Repo
πŸ‘7πŸ”₯2πŸ₯°1πŸ‘1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ DEPTH ANY VIDEO is out! πŸ”₯

πŸ‘‰DAV is a novel foundation model for image/video depth estimation.The new SOTA for accuracy & consistency, up to 150 FPS!

πŸ‘‰Review https://t.ly/CjSz2
πŸ‘‰Paper arxiv.org/pdf/2410.10815
πŸ‘‰Project depthanyvideo.github.io/
πŸ‘‰Code github.com/Nightmare-n/DepthAnyVideo
πŸ”₯14🀯3❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺžRobo-Emulation via Video ImitationπŸͺž

πŸ‘‰OKAMI (UT & #Nvidia) is a novel foundation method that generates a manipulation plan from a single RGB-D video and derives a policy for execution.

πŸ‘‰Review https://t.ly/_N29-
πŸ‘‰Paper arxiv.org/pdf/2410.11792
πŸ‘‰Project https://lnkd.in/d6bHF_-s
πŸ‘4🀯2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ CoTracker3 by #META is out! πŸ”₯

πŸ‘‰#Meta (+VGG Oxford) unveils CoTracker3, a new tracker that outperforms the previous SoTA by a large margin using only the 0.1% of the training data 🀯🀯🀯

πŸ‘‰Review https://t.ly/TcRIv
πŸ‘‰Paper arxiv.org/pdf/2410.11831
πŸ‘‰Project cotracker3.github.io/
πŸ‘‰Code github.com/facebookresearch/co-tracker
❀14πŸ”₯3🀯3🍾2πŸ‘1😱1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🦠 Neural Metamorphosis 🦠

πŸ‘‰NU Singapore unveils NeuMeta to transform neural nets by allowing a single model to adapt on the fly to different sizes, generating the right weights when needed.

πŸ‘‰Review https://t.ly/DJab3
πŸ‘‰Paper arxiv.org/pdf/2410.11878
πŸ‘‰Project adamdad.github.io/neumeta
πŸ‘‰Code github.com/Adamdad/neumeta
❀7πŸ”₯3🀯3😱2⚑1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
β˜€οΈ GS + Depth = SOTA β˜€οΈ

πŸ‘‰DepthSplat, the new SOTA in depth estimation & novel view synthesis. The key feature is the cross-task interaction between Gaussian Splatting & depth estimation. Source Code to be released soonπŸ’™

πŸ‘‰Review https://t.ly/87HuH
πŸ‘‰Paper arxiv.org/abs/2410.13862
πŸ‘‰Project haofeixu.github.io/depthsplat/
πŸ‘‰Code github.com/cvg/depthsplat
🀯9πŸ”₯8❀3⚑1πŸ‘1