AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
🫠 X-Portrait 2: SOTA(?) Portrait Animation 🫠

πŸ‘‰ByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.

πŸ‘‰Review https://t.ly/8Owh9 [UPDATE]
πŸ‘‰Paper ?
πŸ‘‰Project byteaigc.github.io/X-Portrait2/
πŸ‘‰Repo ?
πŸ”₯13🀯5πŸ‘4❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
❄️Don’t Look Twice: ViT by RLT❄️

πŸ‘‰CMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced πŸ’™

πŸ‘‰Review https://t.ly/ccSwN
πŸ‘‰Paper https://lnkd.in/d6VXur_q
πŸ‘‰Project https://lnkd.in/d4tXwM5T
πŸ‘‰Repo TBA
πŸ”₯9πŸ‘3❀1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”SeedEdit: foundational T2IπŸ”

πŸ‘‰ByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is onlineπŸ’™

πŸ‘‰Review https://t.ly/hPlnN
πŸ‘‰Paper https://arxiv.org/pdf/2411.06686
πŸ‘‰Project team.doubao.com/en/special/seededit
πŸ€—Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
πŸ”₯10❀6🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯ 4 NanoSeconds inference πŸ”₯

πŸ‘‰LogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!

πŸ‘‰Review https://t.ly/GflOW
πŸ‘‰Paper https://lnkd.in/dAZQr3dW
πŸ‘‰Full clip https://lnkd.in/dvDJ3j-u
πŸ”₯29🀯12πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›₯️ Global Tracklet Association MOT πŸ›₯️

πŸ‘‰A novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code releasedπŸ’™

πŸ‘‰Review https://t.ly/gk-yh
πŸ‘‰Paper https://lnkd.in/dvXQVKFw
πŸ‘‰Repo https://lnkd.in/dEJqiyWs
πŸ‘10πŸ”₯4❀2
This media is not supported in your browser
VIEW IN TELEGRAM
🧢 MagicQuill: super-easy Diffusion Editing 🧢

πŸ‘‰MagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released πŸ’™

πŸ‘‰Review https://t.ly/hJyLa
πŸ‘‰Paper https://arxiv.org/pdf/2411.09703
πŸ‘‰Project https://magicquill.art/demo/
πŸ‘‰Repo https://github.com/magic-quill/magicquill
πŸ‘‰Demo https://huggingface.co/spaces/AI4Editing/MagicQuill
🀩7πŸ”₯4❀3πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
🧰 EchoMimicV2: Semi-body Human 🧰

πŸ‘‰Alipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announcedπŸ’™

πŸ‘‰Review https://t.ly/enLxJ
πŸ‘‰Paper arxiv.org/pdf/2411.10061
πŸ‘‰Project antgroup.github.io/ai/echomimic_v2/
πŸ‘‰Repo-v2 github.com/antgroup/echomimic_v2
πŸ‘‰Repo-v1 https://github.com/antgroup/echomimic
❀5πŸ”₯5πŸ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
βš”οΈSAMurai: SAM for Trackingβš”οΈ

πŸ‘‰UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0πŸ’™

πŸ‘‰Review https://t.ly/yGU0P
πŸ‘‰Paper https://arxiv.org/pdf/2411.11922
πŸ‘‰Repo https://github.com/yangchris11/samurai
πŸ‘‰Project https://yangchris11.github.io/samurai/
πŸ”₯20❀6😍2⚑1πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦–Dino-X: Unified Obj-Centric LVMπŸ¦–

πŸ‘‰Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced πŸ’™

πŸ‘‰Review https://t.ly/CSQon
πŸ‘‰Paper https://lnkd.in/dc44ZM8v
πŸ‘‰Project https://lnkd.in/dehKJVvC
πŸ‘‰Repo https://lnkd.in/df8Kb6iz
πŸ”₯12🀯8❀4πŸ‘3🀩1
🌎All Languages Matter: LMMs vs. 100 Lang.🌎

πŸ‘‰ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset πŸ’™

πŸ‘‰Review https://t.ly/VsoJB
πŸ‘‰Paper https://lnkd.in/ddVVZfi2
πŸ‘‰Project https://lnkd.in/dpssaeRq
πŸ‘‰Code https://lnkd.in/dnbaJJE4
πŸ‘‰Dataset https://lnkd.in/drw-_95v
❀3πŸ‘1πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ¦™ EdgeCape: SOTA Agnostic Pose πŸ¦™

πŸ‘‰EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code releasedπŸ’™

πŸ‘‰Review https://t.ly/4TpAs
πŸ‘‰Paper https://arxiv.org/pdf/2411.16665
πŸ‘‰Project https://orhir.github.io/edge_cape/
πŸ‘‰Code https://github.com/orhir/EdgeCape
πŸ”₯10πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ›Ÿ StableAnimator: ID-aware Humans πŸ›Ÿ

πŸ‘‰StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!

πŸ‘‰Review https://t.ly/JDtL3
πŸ‘‰Paper https://arxiv.org/pdf/2411.17697
πŸ‘‰Project francis-rings.github.io/StableAnimator/
πŸ‘‰Code github.com/Francis-Rings/StableAnimator
πŸ‘12❀3🀯2πŸ”₯1
This media is not supported in your browser
VIEW IN TELEGRAM
🧢SOTA track-by-propagation🧢

πŸ‘‰SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 πŸ’™

πŸ‘‰Review https://t.ly/QSQ8L
πŸ‘‰Paper arxiv.org/pdf/2410.01806
πŸ‘‰Project sambamotr.github.io/
πŸ‘‰Repo https://lnkd.in/dRDX6nk2
❀5πŸ”₯2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘ΊHiFiVFS: Extreme Face SwappingπŸ‘Ί

πŸ‘‰HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced😒

πŸ‘‰Review https://t.ly/ea8dU
πŸ‘‰Paper https://arxiv.org/pdf/2411.18293
πŸ‘‰Project https://cxcx1996.github.io/HiFiVFS
🀯13❀2πŸ”₯2πŸ‘1πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Video Depth without Video ModelsπŸ”₯

πŸ‘‰RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth 🀯 Code under ApacheπŸ’™

πŸ‘‰Review https://t.ly/R4LqS
πŸ‘‰Paper https://arxiv.org/pdf/2411.19189
πŸ‘‰Project https://rollingdepth.github.io/
πŸ‘‰Repo https://github.com/prs-eth/rollingdepth
πŸ”₯14🀯4πŸ‘2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
⚽Universal Soccer Foundation Model⚽

πŸ‘‰Universal Soccer Video Understanding: SoccerReplay-1988 - the largest multi-modal soccer dataset - and MatchVision - the first vision-lang. foundation models for soccer. Code, dataset & checkpoints to be releasedπŸ’™

πŸ‘‰Review https://t.ly/-X90B
πŸ‘‰Paper https://arxiv.org/pdf/2412.01820
πŸ‘‰Project https://jyrao.github.io/UniSoccer/
πŸ‘‰Repo https://github.com/jyrao/UniSoccer
πŸ”₯8❀2πŸ‘2🀩1😍1
This media is not supported in your browser
VIEW IN TELEGRAM
🌈Motion Prompting Video Generation🌈

πŸ‘‰DeepMind unveils ControlNet, novel video generation model conditioned on spatio-temporally sparse or dense motion trajectories. Amazing results, but no code announced 😒

πŸ‘‰Review https://t.ly/VyKbv
πŸ‘‰Paper arxiv.org/pdf/2412.02700
πŸ‘‰Project motion-prompting.github.io
πŸ”₯13❀5πŸ‘1😒1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦘AniGS: Single Pic Animatable Avatar🦘

πŸ‘‰#Alibaba unveils AniGS: given a single human image as input it rebuilds a Hi-Fi 3D avatar in a canonical pose, which can be used for both photorealistic rendering & real-time animation. Source code announced, to be releasedπŸ’™

πŸ‘‰Review https://t.ly/4yfzn
πŸ‘‰Paper arxiv.org/pdf/2412.02684
πŸ‘‰Project lingtengqiu.github.io/2024/AniGS/
πŸ‘‰Repo github.com/aigc3d/AniGS
1❀11πŸ”₯7πŸ‘3🀩2πŸ‘1🍾1
This media is not supported in your browser
VIEW IN TELEGRAM
🧀GigaHands: Massive #3D Hands🧀

πŸ‘‰Novel massive #3D bimanual activities dataset: 34 hours of activities, 14k hand motions clips paired with 84k text annotation, 183M+ unique hand images

πŸ‘‰Review https://t.ly/SA0HG
πŸ‘‰Paper www.arxiv.org/pdf/2412.04244
πŸ‘‰Repo github.com/brown-ivl/gigahands
πŸ‘‰Project ivl.cs.brown.edu/research/gigahands.html
❀7πŸ‘1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦒 Track4Gen: Diffusion + Tracking 🦒

πŸ‘‰Track4Gen: spatially aware video generator that combines video diffusion loss with point tracking across frames, providing enhanced spatial supervision on the diffusion features. GenAI with points-based motion control. Stunning results but no code announced😒

πŸ‘‰Review https://t.ly/9ujhc
πŸ‘‰Paper arxiv.org/pdf/2412.06016
πŸ‘‰Project hyeonho99.github.io/track4gen/
πŸ‘‰Gallery hyeonho99.github.io/track4gen/full.html
❀3πŸ”₯3🍾1