AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
237 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿœ REM: Segment What You Describe ๐Ÿœ

๐Ÿ‘‰REM is a framework for segmenting concepts in video that can be described via LLM. Suitable for rare & non-object dynamic concepts, such as waves, smoke, etc. Code & Data announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/OyVtV
๐Ÿ‘‰Paper arxiv.org/pdf/2410.23287
๐Ÿ‘‰Project https://miccooper9.github.io/projects/ReferEverything/
๐Ÿ”ฅ18โค4๐Ÿ‘3๐Ÿคฉ2๐Ÿคฏ1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜€๏ธ Universal Relightable Avatars โ˜€๏ธ

๐Ÿ‘‰#Meta unveils URAvatar, photorealistic & relightable avatars from phone scan with unknown illumination. Stunning results!

๐Ÿ‘‰Review https://t.ly/U-ESX
๐Ÿ‘‰Paper arxiv.org/pdf/2410.24223
๐Ÿ‘‰Project junxuan-li.github.io/urgca-website
โค11๐Ÿ”ฅ5โšก1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฃ CityGaussianV2: Large-Scale City ๐Ÿฃ

๐Ÿ‘‰A novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency: 10x compression, 25% faster & -50% memory! Source code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Xgn59
๐Ÿ‘‰Paper arxiv.org/pdf/2411.00771
๐Ÿ‘‰Project dekuliutesla.github.io/CityGaussianV2/
๐Ÿ‘‰Code github.com/DekuLiuTesla/CityGaussian
๐Ÿ‘15๐Ÿ”ฅ9โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ช Muscles in Time Dataset ๐Ÿ’ช

๐Ÿ‘‰Muscles in Time (MinT) is a large-scale synthetic muscle activation dataset. MinT contains 9+ hours of simulation data covering 227 subjects and 402 simulated muscle strands. Code & Dataset available soon ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/108g6
๐Ÿ‘‰Paper arxiv.org/pdf/2411.00128
๐Ÿ‘‰Project davidschneider.ai/mint
๐Ÿ‘‰Code github.com/simplexsigil/MusclesInTime
๐Ÿ”ฅ8โค3๐Ÿ‘3
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿง  Single Neuron Reconstruction ๐Ÿง 

๐Ÿ‘‰SIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/Y5Xu0
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.04715
๐Ÿ‘‰Repo github.com/beanli161514/neurofly
โค4๐Ÿ”ฅ1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ  X-Portrait 2: SOTA(?) Portrait Animation ๐Ÿซ 

๐Ÿ‘‰ByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.

๐Ÿ‘‰Review https://t.ly/8Owh9 [UPDATE]
๐Ÿ‘‰Paper ?
๐Ÿ‘‰Project byteaigc.github.io/X-Portrait2/
๐Ÿ‘‰Repo ?
๐Ÿ”ฅ13๐Ÿคฏ5๐Ÿ‘4โค1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
โ„๏ธDonโ€™t Look Twice: ViT by RLTโ„๏ธ

๐Ÿ‘‰CMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/ccSwN
๐Ÿ‘‰Paper https://lnkd.in/d6VXur_q
๐Ÿ‘‰Project https://lnkd.in/d4tXwM5T
๐Ÿ‘‰Repo TBA
๐Ÿ”ฅ9๐Ÿ‘3โค1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”SeedEdit: foundational T2I๐Ÿ”

๐Ÿ‘‰ByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is online๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/hPlnN
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.06686
๐Ÿ‘‰Project team.doubao.com/en/special/seededit
๐Ÿค—Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
๐Ÿ”ฅ10โค6๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ 4 NanoSeconds inference ๐Ÿ”ฅ

๐Ÿ‘‰LogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!

๐Ÿ‘‰Review https://t.ly/GflOW
๐Ÿ‘‰Paper https://lnkd.in/dAZQr3dW
๐Ÿ‘‰Full clip https://lnkd.in/dvDJ3j-u
๐Ÿ”ฅ29๐Ÿคฏ12๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›ฅ๏ธ Global Tracklet Association MOT ๐Ÿ›ฅ๏ธ

๐Ÿ‘‰A novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/gk-yh
๐Ÿ‘‰Paper https://lnkd.in/dvXQVKFw
๐Ÿ‘‰Repo https://lnkd.in/dEJqiyWs
๐Ÿ‘10๐Ÿ”ฅ4โค2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงถ MagicQuill: super-easy Diffusion Editing ๐Ÿงถ

๐Ÿ‘‰MagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/hJyLa
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.09703
๐Ÿ‘‰Project https://magicquill.art/demo/
๐Ÿ‘‰Repo https://github.com/magic-quill/magicquill
๐Ÿ‘‰Demo https://huggingface.co/spaces/AI4Editing/MagicQuill
๐Ÿคฉ7๐Ÿ”ฅ4โค3๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฐ EchoMimicV2: Semi-body Human ๐Ÿงฐ

๐Ÿ‘‰Alipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announced๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/enLxJ
๐Ÿ‘‰Paper arxiv.org/pdf/2411.10061
๐Ÿ‘‰Project antgroup.github.io/ai/echomimic_v2/
๐Ÿ‘‰Repo-v2 github.com/antgroup/echomimic_v2
๐Ÿ‘‰Repo-v1 https://github.com/antgroup/echomimic
โค5๐Ÿ”ฅ5๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
โš”๏ธSAMurai: SAM for Trackingโš”๏ธ

๐Ÿ‘‰UWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/yGU0P
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.11922
๐Ÿ‘‰Repo https://github.com/yangchris11/samurai
๐Ÿ‘‰Project https://yangchris11.github.io/samurai/
๐Ÿ”ฅ20โค6๐Ÿ˜2โšก1๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ–Dino-X: Unified Obj-Centric LVM๐Ÿฆ–

๐Ÿ‘‰Unified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/CSQon
๐Ÿ‘‰Paper https://lnkd.in/dc44ZM8v
๐Ÿ‘‰Project https://lnkd.in/dehKJVvC
๐Ÿ‘‰Repo https://lnkd.in/df8Kb6iz
๐Ÿ”ฅ12๐Ÿคฏ8โค4๐Ÿ‘3๐Ÿคฉ1
๐ŸŒŽAll Languages Matter: LMMs vs. 100 Lang.๐ŸŒŽ

๐Ÿ‘‰ALM-Bench aims to assess the next generation of massively multilingual multimodal models in a standardized way, pushing the boundaries of LMMs towards better cultural understanding and inclusivity. Code & Dataset ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/VsoJB
๐Ÿ‘‰Paper https://lnkd.in/ddVVZfi2
๐Ÿ‘‰Project https://lnkd.in/dpssaeRq
๐Ÿ‘‰Code https://lnkd.in/dnbaJJE4
๐Ÿ‘‰Dataset https://lnkd.in/drw-_95v
โค3๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ™ EdgeCape: SOTA Agnostic Pose ๐Ÿฆ™

๐Ÿ‘‰EdgeCap: new SOTA in Category-Agnostic Pose Estimation (CAPE): finding keypoints across diverse object categories using only one or a few annotated support images. Source code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/4TpAs
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.16665
๐Ÿ‘‰Project https://orhir.github.io/edge_cape/
๐Ÿ‘‰Code https://github.com/orhir/EdgeCape
๐Ÿ”ฅ10๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›Ÿ StableAnimator: ID-aware Humans ๐Ÿ›Ÿ

๐Ÿ‘‰StableAnimator: first e2e ID-preserving diffusion for HQ videos without any post-processing. Input: single image + sequence of poses. Insane results!

๐Ÿ‘‰Review https://t.ly/JDtL3
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.17697
๐Ÿ‘‰Project francis-rings.github.io/StableAnimator/
๐Ÿ‘‰Code github.com/Francis-Rings/StableAnimator
๐Ÿ‘12โค3๐Ÿคฏ2๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงถSOTA track-by-propagation๐Ÿงถ

๐Ÿ‘‰SambaMOTR is a novel e2e model (based on Samba) for long-range dependencies and interactions between tracklets to handle complex motion patterns / occlusions. Code in Jan. 25 ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/QSQ8L
๐Ÿ‘‰Paper arxiv.org/pdf/2410.01806
๐Ÿ‘‰Project sambamotr.github.io/
๐Ÿ‘‰Repo https://lnkd.in/dRDX6nk2
โค5๐Ÿ”ฅ2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘บHiFiVFS: Extreme Face Swapping๐Ÿ‘บ

๐Ÿ‘‰HiFiVFS: HQ face swapping videos even in extremely challenging scenarios (occlusion, makeup, lights, extreme poses, etc.). Impressive results, no code announced๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/ea8dU
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.18293
๐Ÿ‘‰Project https://cxcx1996.github.io/HiFiVFS
๐Ÿคฏ13โค2๐Ÿ”ฅ2๐Ÿ‘1๐Ÿ‘1๐Ÿคฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅVideo Depth without Video Models๐Ÿ”ฅ

๐Ÿ‘‰RollingDepth: turning a single-image latent diffusion model (LDM) into the novel SOTA depth estimator. It works better than dedicated model for depth ๐Ÿคฏ Code under Apache๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/R4LqS
๐Ÿ‘‰Paper https://arxiv.org/pdf/2411.19189
๐Ÿ‘‰Project https://rollingdepth.github.io/
๐Ÿ‘‰Repo https://github.com/prs-eth/rollingdepth
๐Ÿ”ฅ14๐Ÿคฏ4๐Ÿ‘2๐Ÿคฉ1