AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶ§Sapiens: SOTA ViTs for humanðŸĶ§

👉META unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, coming💙

👉Review https://t.ly/GKQI0
👉Paper arxiv.org/pdf/2408.12569
👉Project rawalkhirodkar.github.io/sapiens
👉Code github.com/facebookresearch/sapiens
ðŸ”Ĩ19âĪ7ðŸĨ°2👍1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
🐚 Diffusion Game Engine 🐚

👉#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it 💙

👉Review https://t.ly/_WR5z
👉Paper https://lnkd.in/dZqgiqb9
👉Project https://lnkd.in/dJUd2Fr6
ðŸ”Ĩ10👍5âĪ2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŦ’ Omni Urban Scene Reconstruction ðŸŦ’

👉OmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code released💙

👉Review https://t.ly/SXVPa
👉Paper arxiv.org/pdf/2408.16760
👉Project ziyc.github.io/omnire/
👉Code github.com/ziyc/drivestudio
ðŸ”Ĩ10👍9âĪ3ðŸĪŊ1ðŸū1
This media is not supported in your browser
VIEW IN TELEGRAM
💄Interactive Drag-based Editing💄

👉CSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, coming💙

👉Review https://t.ly/hy6SL
👉Paper arxiv.org/pdf/2409.08857
👉Project joonghyuk.com/instantdrag-web/
👉Code github.com/alex4727/InstantDrag
ðŸ”Ĩ13👍3😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌭Hand-Object interaction Pretraining🌭

👉Berkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.

👉Review https://t.ly/FLqvJ
👉Paper https://arxiv.org/pdf/2409.08273
👉Project https://hgaurav2k.github.io/hop/
ðŸĨ°3âĪ1👍1ðŸ”Ĩ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ§ļMotion Instruction Fine-TuningðŸ§ļ

👉MotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, coming💙

👉Review https://t.ly/iJ2UY
👉Paper https://arxiv.org/pdf/2409.10683
👉Project https://motif-1k.github.io/
👉Code coming
👍1ðŸ”Ĩ1ðŸĪŊ1ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
âš― SoccerNet 2024 Results âš―

👉SoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!

👉Review https://t.ly/DUPgx
👉Paper arxiv.org/pdf/2409.10587
👉Repo github.com/SoccerNet
👉Project www.soccer-net.org/
ðŸ”Ĩ12👍6ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
🌏 JoyHallo: Mandarin Digital Human 🌏

👉JD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the language’s intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models available💙

👉Review https://t.ly/5NGDh
👉Paper arxiv.org/pdf/2409.13268
👉Project jdh-algo.github.io/JoyHallo/
👉Code github.com/jdh-algo/JoyHallo
ðŸ”Ĩ9👍1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŽĒ Robo-quadruped ParkourðŸŽĒ

👉LAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code available💙

👉Review https://t.ly/-6VRm
👉Paper arxiv.org/pdf/2409.13678
👉Project gepetto.github.io/SoloParkour/
👉Code github.com/Gepetto/SoloParkour
ðŸ”Ĩ5👍2👏1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĐ° Dressed Humans in the wild ðŸĐ°

👉ETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming 💙

👉Review https://t.ly/evgmN
👉Paper arxiv.org/pdf/2409.15269
👉Project moygcc.github.io/ReLoo/
👉Code github.com/eth-ait/ReLoo
ðŸĪŊ9âĪ2👍1ðŸ”Ĩ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŒū New SOTA Edge Detection ðŸŒū

👉CUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code released💙

👉Review https://t.ly/zUMcS
👉Paper arxiv.org/pdf/2409.14976
👉Code github.com/Li-yachuan/NBED
ðŸ”Ĩ11👍5👏1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ‘Đ‍ðŸĶ° SOTA Gaussian Haircut ðŸ‘Đ‍ðŸĶ°

👉ETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announced💙

👉Review https://t.ly/aiOjq
👉Paper arxiv.org/pdf/2409.14778
👉Project https://lnkd.in/dFRm2ycb
👉Repo https://lnkd.in/d5NWNkb5
ðŸ”Ĩ16👍2âĪ1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
🍇SPARK: Real-time Face Capture🍇

👉Technicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announced💙

👉Review https://t.ly/rZOgp
👉Paper arxiv.org/pdf/2409.07984
👉Project kelianb.github.io/SPARK/
👉Repo github.com/KelianB/SPARK/
ðŸ”Ĩ10âĪ2👏1ðŸ’Đ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶī One-Image Object Detection ðŸĶī

👉Delft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code released💙

👉Review https://t.ly/-li2G
👉Paper arxiv.org/pdf/2410.00900
👉Code github.com/RobinGerster7/OSSA
ðŸ”Ĩ19👏2⚡1👍1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ›ģïļ EVER Ellipsoid Rendering ðŸ›ģïļ

👉UCSD & Google present EVER, a novel method for real-time differentiable emission-only volume rendering. Unlike 3DGS it does not suffer from popping artifacts and view dependent density, achieving ∞30 FPS at 720p on #NVIDIA RTX4090.

👉Review https://t.ly/zAfGU
👉Paper arxiv.org/pdf/2410.01804
👉Project half-potato.gitlab.io/posts/ever/
ðŸ”Ĩ13âĪ2👍2👏1ðŸĪŊ1ðŸ˜ą1ðŸū1
ðŸ”Ĩ "Deep Gen-AI" Full Course ðŸ”Ĩ

👉A fresh course from Stanford about the probabilistic foundations and algorithms for deep generative models. A novel overview about the evolution of the genAI in #computervision, language and more...

👉Review https://t.ly/ylBxq
👉Course https://lnkd.in/dMKH9gNe
👉Lectures https://lnkd.in/d_uwDvT6
âĪ21ðŸ”Ĩ7👏2👍1ðŸĨ°1ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
🐏 EFM3D: 3D Ego-Foundation 🐏

👉#META presents EFM3D, the first benchmark for 3D object detection and surface regression on HQ annotated egocentric data of Project Aria. Datasets & Code released💙

👉Review https://t.ly/cDJv6
👉Paper arxiv.org/pdf/2406.10224
👉Project www.projectaria.com/datasets/aeo/
👉Repo github.com/facebookresearch/efm3d
ðŸ”Ĩ9âĪ2👍2⚡1👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĨĶGaussian Splatting VTONðŸĨĶ

👉GS-VTON is a novel image-prompted 3D-VTON which, by leveraging 3DGS as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. Code announced💙

👉Review https://t.ly/sTPbW
👉Paper arxiv.org/pdf/2410.05259
👉Project yukangcao.github.io/GS-VTON/
👉Repo github.com/yukangcao/GS-VTON
ðŸ”Ĩ14âĪ3👍1👏1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸ’ĄDiffusion Models RelightingðŸ’Ą

👉#Netflix unveils DifFRelight, a novel free-viewpoint facial relighting via diffusion model. Precise lighting control, high-fidelity relit facial images from flat-lit inputs.

👉Review https://t.ly/fliXU
👉Paper arxiv.org/pdf/2410.08188
👉Project www.eyelinestudios.com/research/diffrelight.html
ðŸ”Ĩ17âĪ7⚡2👍2😍2👏1