AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
236 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
๐Ÿ”ฅ YOLOv10 is out ๐Ÿ”ฅ

๐Ÿ‘‰YOLOv10: novel real-time end-to-end object detection. Code released under GNU AGPL v3.0๐Ÿ’™

๐Ÿ‘‰Review https://shorturl.at/ZIHBh
๐Ÿ‘‰Paper arxiv.org/pdf/2405.14458
๐Ÿ‘‰Code https://github.com/THU-MIG/yolov10/
๐Ÿ”ฅ25โค3๐Ÿ‘2โšก1
This media is not supported in your browser
VIEW IN TELEGRAM
โ›ˆ๏ธUnsupervised Neuromorphic Motionโ›ˆ๏ธ

๐Ÿ‘‰The Western Sydney University unveils a novel unsupervised event-based motion segmentation algorithm, employing the #Prophesee Gen4 HD event camera.

๐Ÿ‘‰Review https://t.ly/UZzIZ
๐Ÿ‘‰Paper arxiv.org/pdf/2405.15209
๐Ÿ‘‰Project samiarja.github.io/evairborne
๐Ÿ‘‰Repo (empty) github.com/samiarja/ev/_deep/_motion_segmentation
๐Ÿ‘5๐Ÿ”ฅ1๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ“ Z.S. Diffusive Segmentation ๐Ÿฆ“

๐Ÿ‘‰KAUST (+MPI) announced the first zero-shot approach for Video Semantic Segmentation (VSS) based on pre-trained diffusion models. Source Code released under MIT๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/v_64K
๐Ÿ‘‰Paper arxiv.org/pdf/2405.16947
๐Ÿ‘‰Project https://lnkd.in/dcSt4dQx
๐Ÿ‘‰Code https://lnkd.in/dcZfM8F3
๐Ÿคฏ4๐Ÿ”ฅ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชฐ Dynamic Gaussian Fusion via 4D Motion Scaffolds ๐Ÿชฐ

๐Ÿ‘‰MoSca is a novel 4D Motion Scaffolds to reconstruct/synthesize novel views of dynamic scenes from monocular videos in the wild!

๐Ÿ‘‰Review https://t.ly/nSdEL
๐Ÿ‘‰Paper arxiv.org/pdf/2405.17421
๐Ÿ‘‰Code github.com/JiahuiLei/MoSca
๐Ÿ‘‰Project https://lnkd.in/dkjMVcqZ
๐Ÿ”ฅ6๐Ÿ‘1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคTransformer-based 4D Hands๐Ÿงค

๐Ÿ‘‰4DHands is a novel and robust approach to recovering interactive hand meshes and their relative movement from monocular inputs. Authors: Beijing NU, Tsinghua & Lenovo. No code announced ๐Ÿ˜ข

๐Ÿ‘‰Review https://t.ly/wvG-l
๐Ÿ‘‰Paper arxiv.org/pdf/2405.20330
๐Ÿ‘‰Project 4dhands.github.io/
๐Ÿ”ฅ4๐Ÿคฏ3โค1๐Ÿ‘1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽญNew 2D Landmarks SOTA๐ŸŽญ

๐Ÿ‘‰Flawless AI unveils FaceLift, a novel semi-supervised approach that learns 3D landmarks by directly lifting (visible) hand-labeled 2D landmarks and ensures better definition alignment, with no need for 3D landmark datasets. No code announced๐Ÿฅน

๐Ÿ‘‰Review https://t.ly/lew9a
๐Ÿ‘‰Paper arxiv.org/pdf/2405.19646
๐Ÿ‘‰Project davidcferman.github.io/FaceLift
๐Ÿ”ฅ16โค5๐Ÿ˜ข5๐Ÿ‘2๐Ÿ’ฉ2โšก1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿณ MultiPly: in-the-wild Multi-People ๐Ÿณ

๐Ÿ‘‰MultiPly: novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. It's the new SOTA over the publicly available datasets and in-the-wild videos. Source Code announced, coming๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_xjk_
๐Ÿ‘‰Paper arxiv.org/pdf/2406.01595
๐Ÿ‘‰Project eth-ait.github.io/MultiPly
๐Ÿ‘‰Repo github.com/eth-ait/MultiPly
๐Ÿ”ฅ14๐Ÿ‘4๐Ÿ‘2โค1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘นAI and the Everything in the Whole Wide World Benchmark๐Ÿ‘น

๐Ÿ‘‰Last week Yann LeCun said something like "LLMs will not reach human intelligence". It's clear the on-going #deeplearning is not ready for "general AI", a "radical alternative" is necessary to create a โ€œsuperintelligenceโ€.

๐Ÿ‘‰Review https://t.ly/isdxM
๐Ÿ‘‰News https://lnkd.in/dFraieZS
๐Ÿ‘‰Paper https://lnkd.in/da-7PnVT
โค5๐Ÿ‘2๐Ÿ‘1๐Ÿ’ฉ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“žFacET: VideoCall Change Your Expression๐Ÿ“ž

๐Ÿ‘‰Columbia University unveils FacET: discovering behavioral differences between conversing face-to-face (F2F) and on video-calls (VCs).

๐Ÿ‘‰Review https://t.ly/qsQmt
๐Ÿ‘‰Paper arxiv.org/pdf/2406.00955
๐Ÿ‘‰Project facet.cs.columbia.edu/
๐Ÿ‘‰Repo (empty) github.com/stellargo/facet
๐Ÿ”ฅ8โค1๐Ÿ‘1๐Ÿ‘1๐Ÿ˜1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿš™ UA-Track: Uncertainty-Aware MOT๐Ÿš™

๐Ÿ‘‰UA-Track: novel Uncertainty-Aware 3D MOT framework which tackles the uncertainty problem from multiple aspects. Code announced, not released yet.

๐Ÿ‘‰Review https://t.ly/RmVSV
๐Ÿ‘‰Paper https://arxiv.org/pdf/2406.02147
๐Ÿ‘‰Project https://liautoad.github.io/ua-track-website
๐Ÿ‘8โค1๐Ÿ”ฅ1๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงŠ Universal 6D Pose/Tracking ๐ŸงŠ

๐Ÿ‘‰Omni6DPose is a novel dataset for 6D Object Pose with 1.5M+ annotations. Extra: GenPose++, the novel SOTA in category-level 6D estimation/tracking thanks to two pivotal improvements.

๐Ÿ‘‰Review https://t.ly/Ywgl1
๐Ÿ‘‰Paper arxiv.org/pdf/2406.04316
๐Ÿ‘‰Project https://lnkd.in/dHBvenhX
๐Ÿ‘‰Lib https://lnkd.in/d8Yc-KFh
โค12๐Ÿ‘4๐Ÿคฉ2๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘— SOTA Multi-Garment VTOn Editing ๐Ÿ‘—

๐Ÿ‘‰#Google (+UWA) unveils M&M VTO, novel mix 'n' match virtual try-on that takes as input multiple garment images, text description for garment layout and an image of a person. It's the new SOTA both qualitatively and quantitatively. Impressive results!

๐Ÿ‘‰Review https://t.ly/66mLN
๐Ÿ‘‰Paper arxiv.org/pdf/2406.04542
๐Ÿ‘‰Project https://mmvto.github.io
๐Ÿ‘4โค3๐Ÿฅฐ3๐Ÿ”ฅ1๐Ÿคฏ1๐Ÿ˜ฑ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‘‘ Kling AI vs. OpenAI Sora ๐Ÿ‘‘

๐Ÿ‘‰Kling: the ultimate Chinese text-to-video model - rival to #OpenAIโ€™s Sora. No papers or tech info to check, but stunning results from the official site.

๐Ÿ‘‰Review https://t.ly/870DQ
๐Ÿ‘‰Paper ???
๐Ÿ‘‰Project https://kling.kuaishou.com/
๐Ÿ”ฅ6๐Ÿ‘3โค1๐Ÿฅฐ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ‰ MASA: MOT Anything By SAM ๐Ÿ‰

๐Ÿ‘‰MASA: Matching Anything by Segmenting Anything pipeline to learn object-level associations from unlabeled images of any domain. An universal instance appearance model for matching any objects in any domain. Source code in June ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/pKdEV
๐Ÿ‘‰Paper https://lnkd.in/dnjuT7xm
๐Ÿ‘‰Project https://lnkd.in/dYbWzG4E
๐Ÿ‘‰Code https://lnkd.in/dr5BJCXm
๐Ÿ”ฅ16โค4๐Ÿ‘3๐Ÿ‘2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽน PianoMotion10M for gen-hands ๐ŸŽน

๐Ÿ‘‰PianoMotion10M: 116 hours of piano playing videos from a birdโ€™s-eye view with 10M+ annotated hand poses. A big contribution in hand motion generation. Code & Dataset released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/_pKKz
๐Ÿ‘‰Paper arxiv.org/pdf/2406.09326
๐Ÿ‘‰Code https://lnkd.in/dcBP6nvm
๐Ÿ‘‰Project https://lnkd.in/d_YqZk8x
๐Ÿ‘‰Dataset https://lnkd.in/dUPyfNDA
โค8๐Ÿ”ฅ4โšก1๐Ÿฅฐ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ“ซMeshPose: DensePose+HMR๐Ÿ“ซ

๐Ÿ‘‰MeshPose: novel approach to jointly tackle DensePose and Human Mesh Reconstruction in a while. A natural fit for #AR applications requiring real-time mobile inference.

๐Ÿ‘‰Review https://t.ly/a-5uN
๐Ÿ‘‰Paper arxiv.org/pdf/2406.10180
๐Ÿ‘‰Project https://meshpose.github.io/
๐Ÿ”ฅ6โค1๐Ÿ‘1
lowlight_back_n_forth.gif
1.4 MB
๐ŸŒต RobustSAM for Degraded Images ๐ŸŒต

๐Ÿ‘‰RobustSAM, the evolution of SAM for degraded images; enhancing the SAMโ€™s performance on low-quality pics while preserving prompt-ability & zeroshot generalization. Dataset & Code released๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/mnyyG
๐Ÿ‘‰Paper arxiv.org/pdf/2406.09627
๐Ÿ‘‰Project robustsam.github.io
๐Ÿ‘‰Code github.com/robustsam/RobustSAM
โค5๐Ÿ‘1๐Ÿ”ฅ1๐Ÿ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸงคHOT3D Hand/Object Tracking๐Ÿงค

๐Ÿ‘‰#Meta opens a novel egocentric dataset for 3D hand & object tracking. A new benchmark for vision-based understanding of 3D hand-object interactions. Dataset available ๐Ÿ’™

๐Ÿ‘‰Review https://t.ly/cD76F
๐Ÿ‘‰Paper https://lnkd.in/e6_7UNny
๐Ÿ‘‰Data https://lnkd.in/e6P-sQFK
๐Ÿ”ฅ9โค3๐Ÿ‘3๐Ÿ‘2๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ’ฆ Self-driving in wet conditions ๐Ÿ’ฆ

๐Ÿ‘‰BMW SemanticSpray: novel dataset contains scenes in wet surface conditions captured by camera, LiDAR and radar. Camera: 2D Boxes | LiDAR: 3D Boxes, Semantic Labels | Radar: Semantic Labels.

๐Ÿ‘‰Review https://t.ly/8S93j
๐Ÿ‘‰Paper https://lnkd.in/dnN5MCZC
๐Ÿ‘‰Project https://lnkd.in/dkUaxyEF
๐Ÿ‘‰Data https://lnkd.in/ddhkyXv8
๐Ÿ”ฅ6โค1๐Ÿ‘1๐Ÿคฏ1
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฑ TokenHMR : new 3D human pose SOTA ๐ŸŒฑ

๐Ÿ‘‰TokenHMR is the new SOTA HPS method mixing 2D keypoints and 3D pose accuracy, thus leveraging Internet data without known camera parameters. It's the new SOTA by a large margin.

๐Ÿ‘‰Review https://t.ly/K9_8n
๐Ÿ‘‰Paper arxiv.org/pdf/2404.16752
๐Ÿ‘‰Project tokenhmr.is.tue.mpg.de/
๐Ÿ‘‰Code github.com/saidwivedi/TokenHMR
๐Ÿคฏ5๐Ÿ‘3๐Ÿ˜ฑ3โšก2โค2๐Ÿ”ฅ1