AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘นAI and the Everything in the Whole Wide World Benchmark๐Ÿ‘น

๐Ÿ‘‰Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a โ€œsuperintelligenceโ€.

๐Ÿ‘‰Review https://t.ly/isdxM
๐Ÿ‘‰News https://lnkd.in/dFraieZS
๐Ÿ‘‰Paper https://lnkd.in/da-7PnVT
โค5๐Ÿ‘2๐Ÿ‘1๐Ÿ’ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“žFacET: VideoCall Change Your Expression๐Ÿ“ž

๐Ÿ‘‰Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).

๐Ÿ‘‰Review https://t.ly/qsQmt
๐Ÿ‘‰Paper arxiv.org/pdf/2406.00955
๐Ÿ‘‰Project facet.cs.columbia.edu/
๐Ÿ‘‰Repo (empty) github.com/stellargo/facet
๐Ÿ”ฅ8โค1๐Ÿ‘1๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿš™ UA-Track: Uncertainty-Aware MOT๐Ÿš™

๐Ÿ‘‰UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.

๐Ÿ‘‰Review https://t.ly/RmVSV
๐Ÿ‘‰Paper https://arxiv.org/pdf/2406.02147
๐Ÿ‘‰Project https://liautoad.github.io/ua-track-website
๐Ÿ‘8โค1๐Ÿ”ฅ1๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงŠ Universal 6D Pose/Tracking ๐ŸงŠ

๐Ÿ‘‰Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.

๐Ÿ‘‰Review https://t.ly/Ywgl1
๐Ÿ‘‰Paper arxiv.org/pdf/2406.04316
๐Ÿ‘‰Project https://lnkd.in/dHBvenhX
๐Ÿ‘‰Lib https://lnkd.in/d8Yc-KFh
โค12๐Ÿ‘4๐Ÿคฉ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘— SOTA Multi-Garment VTOn Editing ๐Ÿ‘—

๐Ÿ‘‰#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!

๐Ÿ‘‰Review https://t.ly/66mLN
๐Ÿ‘‰Paper arxiv.org/pdf/2406.04542
๐Ÿ‘‰Project https://mmvto.github.io
๐Ÿ‘4โค3๐Ÿฅฐ3๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘‘ Kling AI vs. OpenAI Sora ๐Ÿ‘‘

๐Ÿ‘‰Kling: the ultimate Chinese text-to-video model - rival to #OpenAIโ€™s Sora. No papers or tech info to check, but stunning results from the official site.

๐Ÿ‘‰Review https://t.ly/870DQ
๐Ÿ‘‰Paper ???
๐Ÿ‘‰Project https://kling.kuaishou.com/
๐Ÿ”ฅ6๐Ÿ‘3โค1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‰ MASA: MOT Anything By SAM ๐Ÿ‰

๐Ÿ‘‰MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/pKdEV
๐Ÿ‘‰Paper https://lnkd.in/dnjuT7xm
๐Ÿ‘‰Project https://lnkd.in/dYbWzG4E
๐Ÿ‘‰Code https://lnkd.in/dr5BJCXm
๐Ÿ”ฅ16โค4๐Ÿ‘3๐Ÿ‘2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽน PianoMotion10M for gen-hands ๐ŸŽน

๐Ÿ‘‰PianoMotion10M: 116 hours of piano playing videos from a birdโ€™s-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_pKKz
๐Ÿ‘‰Paper arxiv.org/pdf/2406.09326
๐Ÿ‘‰Code https://lnkd.in/dcBP6nvm
๐Ÿ‘‰Project https://lnkd.in/d_YqZk8x
๐Ÿ‘‰Dataset https://lnkd.in/dUPyfNDA
โค8๐Ÿ”ฅ4โšก1๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“ซMeshPose: DensePose+HMR๐Ÿ“ซ

๐Ÿ‘‰MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.

๐Ÿ‘‰Review https://t.ly/a-5uN
๐Ÿ‘‰Paper arxiv.org/pdf/2406.10180
๐Ÿ‘‰Project https://meshpose.github.io/
๐Ÿ”ฅ6โค1๐Ÿ‘1
lowlight_back_n_forth.gif
1.4 MB
๐ŸŒต RobustSAM for Degraded Images ๐ŸŒต

๐Ÿ‘‰RobustSAM, the evolution of SAM for degraded images; enhancing the SAMโ€™s performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/mnyyG
๐Ÿ‘‰Paper arxiv.org/pdf/2406.09627
๐Ÿ‘‰Project robustsam.github.io
๐Ÿ‘‰Code github.com/robustsam/RobustSAM
โค5๐Ÿ‘1๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคHOT3D Hand/Object Tracking๐Ÿงค

๐Ÿ‘‰#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/cD76F
๐Ÿ‘‰Paper https://lnkd.in/e6_7UNny
๐Ÿ‘‰Data https://lnkd.in/e6P-sQFK
๐Ÿ”ฅ9โค3๐Ÿ‘3๐Ÿ‘2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฆ Self-driving in wet conditions ๐Ÿ’ฆ

๐Ÿ‘‰BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.

๐Ÿ‘‰Review https://t.ly/8S93j
๐Ÿ‘‰Paper https://lnkd.in/dnN5MCZC
๐Ÿ‘‰Project https://lnkd.in/dkUaxyEF
๐Ÿ‘‰Data https://lnkd.in/ddhkyXv8
๐Ÿ”ฅ6โค1๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฑ TokenHMR : new 3D human pose SOTA ๐ŸŒฑ

๐Ÿ‘‰TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.

๐Ÿ‘‰Review https://t.ly/K9_8n
๐Ÿ‘‰Paper arxiv.org/pdf/2404.16752
๐Ÿ‘‰Project tokenhmr.is.tue.mpg.de/
๐Ÿ‘‰Code github.com/saidwivedi/TokenHMR
๐Ÿคฏ5๐Ÿ‘3๐Ÿ˜ฑ3โšก2โค2๐Ÿ”ฅ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿค“Glasses-Removal in Videos๐Ÿค“

๐Ÿ‘‰Lightricks unveils a novel method able to receive an input video of a person wearing glasses, and removes the glasses preserving the ID. It works even with reflections, heavy makeup, and blinks. Code announced, not yet released.

๐Ÿ‘‰Review https://t.ly/Hgs2d
๐Ÿ‘‰Paper arxiv.org/pdf/2406.14510
๐Ÿ‘‰Project https://v-lasik.github.io/
๐Ÿ‘‰Code github.com/v-lasik/v-lasik-code
๐Ÿ’ฉ16โค6๐Ÿคฏ5๐Ÿ‘3๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงฌEvent-driven SuperResolution๐Ÿงฌ

๐Ÿ‘‰USTC unveils EvTexture, the first VSR method that utilizes event signals for texture enhancement. It leverages high-freq details of events to better recover texture in VSR. Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/zlb4c
๐Ÿ‘‰Paper arxiv.org/pdf/2406.13457
๐Ÿ‘‰Code github.com/DachunKai/EvTexture
๐Ÿ‘11โค6๐Ÿคฏ4๐Ÿ”ฅ2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸปStableNormal: Stable/Sharp Normal๐Ÿป

๐Ÿ‘‰Alibaba unveils StableNormal, a novel method which tailors the diffusion priors for monocular normal estimation. Hugging Face demo is available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/FPJlG
๐Ÿ‘‰Paper https://arxiv.org/pdf/2406.16864
๐Ÿ‘‰Demo https://huggingface.co/Stable-X
๐Ÿ”ฅ4โค2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸฆGeometry Guided Depth๐Ÿฆ

๐Ÿ‘‰Depth and #3D reconstruction which can take as input, where available, previously-made estimates of the sceneโ€™s geometry

๐Ÿ‘‰Review https://lnkd.in/dMgakzWm
๐Ÿ‘‰Paper https://arxiv.org/pdf/2406.18387
๐Ÿ‘‰Repo (empty) https://github.com/nianticlabs/DoubleTake
๐Ÿ‘7๐Ÿ”ฅ7โค1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฎMeshAnything with Transformers๐ŸŒฎ

๐Ÿ‘‰MeshAnything converts any 3D representation into Artist-Created Meshes (AMs), i.e., meshes created by human artists. It can be combined with various 3D asset production pipelines, such as 3D reconstruction and generation, to transform their results into AMs that can be seamlessly applied in the 3D industry. Source Code available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/HvkD4
๐Ÿ‘‰Paper arxiv.org/pdf/2406.10163
๐Ÿ‘‰Code github.com/buaacyw/MeshAnything
๐Ÿคฏ11โค10๐Ÿ”ฅ5๐Ÿ‘4๐Ÿ‘2
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒพLLaNA: NeRF-LLM assistant๐ŸŒพ

๐Ÿ‘‰UniBO unveils LLaNA; novel Multimodal-LLM that understands and reasons on an input NeRF. It processes directly the NeRF weights and performs tasks such as captioning, Q&A, & zero-shot classification of NeRFs.

๐Ÿ‘‰Review https://t.ly/JAfhV
๐Ÿ‘‰Paper arxiv.org/pdf/2406.11840
๐Ÿ‘‰Project andreamaduzzi.github.io/llana/
๐Ÿ‘‰Code & Data coming
โค16๐Ÿ”ฅ2๐Ÿ‘2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ Depth Anything v2 is out! ๐Ÿ”ฅ

๐Ÿ‘‰ Depth Anything V2: outperforming V1 in robustness and fine-grained details. Trained w/ 595K synthetic labels and 62M+ real unlabeled images, the new SOTA in MDE. Code & Models available๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/QX9Nu
๐Ÿ‘‰Paper arxiv.org/pdf/2406.09414
๐Ÿ‘‰Project depth-anything-v2.github.io/
๐Ÿ‘‰Repo github.com/DepthAnything/Depth-Anything-V2
๐Ÿ‘‰Data huggingface.co/datasets/depth-anything/DA-2K
๐Ÿ”ฅ10๐Ÿคฏ9โšก1โค1๐Ÿ‘1๐Ÿฅฐ1๐Ÿ‘1