AI with Papers - Artificial Intelligence & Deep Learning
14.7K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿฆ 3D Pigeons Pose & Tracking ๐Ÿฆ

๐Ÿ‘‰ 3D-MuPPET: estimate and track 3D poses of pigeons with multiple-views

๐Ÿ˜ŽReview https://t.ly/jfAJJ
๐Ÿ˜ŽPaper arxiv.org/pdf/2308.15316.pdf
๐Ÿ˜ŽCode github.com/alexhang212/3D-MuPPET/
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŽRoboTAP: Dense Tracking for Few-Shot Imitation๐ŸŽ

๐Ÿ‘‰RoboTAP: novel dense tracking representation for robotic arm

๐Ÿ˜ŽReview https://t.ly/MCO_V
๐Ÿ˜ŽPaper arxiv.org/pdf/2308.15975.pdf
๐Ÿ˜ŽProject https://robotap.github.io/
๐Ÿ˜ŽCode github.com/deepmind/tapnet
This media is not supported in your browser
VIEW IN TELEGRAM
โ›บFACET: Fairness in Computer Visionโ›บ

๐Ÿ‘‰#META AI opens a large, publicly available dataset for classification, detection & segmentation. Potential performance disparities & challenges across sensitive demographic attributes

๐Ÿ˜ŽReview https://t.ly/mKn-t
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.00035.pdf
๐Ÿ˜ŽDataset https://facet.metademolab.com/
This media is not supported in your browser
VIEW IN TELEGRAM
โ™Š๏ธ Doppelgangers in Structures โ™Š๏ธ

๐Ÿ‘‰A novel learning-based approach for visual disambiguation: distinguishing illusory matches to produce correct, disambiguated #3D reconstructions

๐Ÿ˜ŽReview https://t.ly/9yLot
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.02420.pdf
๐Ÿ˜ŽCode github.com/RuojinCai/Doppelgangers
๐Ÿ˜ŽProject doppelgangers-3d.github.io/
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿƒ Tracking Anything with Decoupled VOS ๐Ÿƒ

๐Ÿ‘‰A novel VOS approach that extends SAM for open-world video segmentation with no user input required

๐Ÿ˜ŽReview https://t.ly/xeobR
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.03903.pdf
๐Ÿ˜ŽProject hkchengrex.com/Tracking-Anything-with-DEVA
๐Ÿ˜ŽCode github.com/hkchengrex/Tracking-Anything-with-DEVA
๐Ÿ˜ŽColab https://colab.research.google.com/drive/1OsyNVoV_7ETD1zIE8UWxL3NXxu12m_YZ
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿชท Diffusive Consistent Video Editing ๐Ÿชท

๐Ÿ‘‰ Weizmann Institute of Science unveils TokenFlow, a novel text-to-image diffusion model for text-driven video editing

๐Ÿ˜ŽReview https://t.ly/ru8km
๐Ÿ˜ŽPaper arxiv.org/pdf/2307.10373.pdf
๐Ÿ˜ŽProject diffusion-tokenflow.github.io
๐Ÿ˜ŽCode github.com/omerbt/TokenFlow
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ”ฅ๐Ÿ”ฅ #META's DINOv2 is now commercial! ๐Ÿ”ฅ๐Ÿ”ฅ

๐Ÿ‘‰Universal features for image classification, instance retrieval, video understanding, depth & semantic segmentation. Now suitable for commercial.

๐Ÿ˜ŽReview https://t.ly/LNrGy
๐Ÿ˜ŽPaper arxiv.org/pdf/2304.07193.pdf
๐Ÿ˜ŽCode github.com/facebookresearch/dinov2
๐Ÿ˜ŽDemo dinov2.metademolab.com/
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿง„FreeMan: towards #3D Humans ๐Ÿง„

๐Ÿ‘‰FreeMan: the first large-scale, real-world, multi-view dataset for #3D human pose estimation. 11M frames!

๐Ÿ˜ŽReview https://t.ly/ICxpA
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.05073.pdf
๐Ÿ˜ŽProject wangjiongw.github.io/freeman
๐ŸฆŠ MagiCapture: HD Multi-Concept Portrait ๐ŸฆŠ

๐Ÿ‘‰KAIST unveils MagiCapture: integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references

๐Ÿ˜ŽReview https://t.ly/c9rOo
๐Ÿ˜ŽPaper https://arxiv.org/pdf/2309.06895.pdf
This media is not supported in your browser
VIEW IN TELEGRAM
โšฝ Dynamic NeRFs for Soccer โšฝ

๐Ÿ‘‰SoccerNeRF: first attempt of "cheap" NeRF applied to football for reconstructing soccer replays in space and time.

๐Ÿ˜ŽReview https://t.ly/Ywcvk
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.06802.pdf
๐Ÿ˜ŽProject https://soccernerfs.isach.be/
๐Ÿ˜ŽCode github.com/iSach/SoccerNeRFs
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜ข๏ธ GlueStick: Graph Neural Matching โ˜ข๏ธ

๐Ÿ‘‰GlueStick is joint deep matcher for points and lines that leverages the connectivity information between nodes to better glue them together

๐Ÿ˜ŽReview https://t.ly/Atxqo
๐Ÿ˜ŽPaper arxiv.org/pdf/2304.02008.pdf
๐Ÿ˜ŽCode https://github.com/cvg/GlueStick
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿซ€CPR-Coach: Neural Cardiopulmonary Resuscitation๐Ÿซ€

๐Ÿ‘‰CPR-Coach: fine-grained action recognition in cardiopulmonary resuscitation

๐Ÿ˜ŽReview https://t.ly/Qbg4K
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.11718.pdf
๐Ÿ˜ŽCode github.com/Shunli-Wang/CPR-Coach
๐Ÿ˜ŽProject shunli-wang.github.io/CPR-Coach
๐Ÿงช NeuralLabeling with NeRF ๐Ÿงช

๐Ÿ‘‰Annotating a scene by generating segmentation masks, affordance maps, 2D bounding boxes, 3D BB, 6DOF poses, depth & meshes.

๐Ÿ˜ŽReview https://t.ly/1GPsj
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.11966.pdf
๐Ÿ˜ŽCode github.com/FlorisE/neural-labeling
๐Ÿ˜ŽProject florise.github.io/neural_labeling_web
๐ŸŸ DE-ViT: detecting everything via DINOv2 ๐ŸŸ

๐Ÿ‘‰DE-ViT: open-set object detector based on DINOv2 backbone. It's the new SOTA on COCO & LVIS dataset

๐Ÿ˜ŽReview https://t.ly/_DAmt
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.12969.pdf
๐Ÿ˜ŽCode https://github.com/mlzxy/devit
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿ›ตCoTracker: fast transformer-tracker๐Ÿ›ต

๐Ÿ‘‰META's CoTracker is a fast transformer-based model that can track any point in a video

๐Ÿ˜ŽReview https://t.ly/M36A_
๐Ÿ˜ŽPaper arxiv.org/pdf/2307.07635.pdf
๐Ÿ˜ŽProject https://co-tracker.github.io/
๐Ÿ˜ŽCode github.com/facebookresearch/co-tracker
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฌ๏ธ Neural Blowing in Still Photos ๐ŸŒฌ๏ธ

๐Ÿ‘‰ A novel approach to animate human hair (and clothes) in a still portraits

๐Ÿ˜ŽReview https://t.ly/HKG0t
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.14207.pdf
๐Ÿ˜ŽProject nevergiveu.github.io/AutomaticHairBlowing
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฎ OW Indoor Segmentation ๐ŸŒฎ

๐Ÿ‘‰3D-OWIS is a novel open-world 3D indoor instance segmentation method (with auto-labeling scheme) to separate known/unknown category labels

๐Ÿ˜ŽReview https://t.ly/-7ALf
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.14338.pdf
๐Ÿ˜ŽCode github.com/aminebdj/3D-OWIS
This media is not supported in your browser
VIEW IN TELEGRAM
๐Ÿงฑ Generating Scenes from Touch ๐Ÿงฑ

๐Ÿ‘‰#AI for synthesizing images from tactile signals (and vice versa) and apply it to a number of visuo-tactile synthesis tasks

๐Ÿ˜ŽReview https://t.ly/Gxr0L
๐Ÿ˜ŽPaper https://arxiv.org/pdf/2309.15117.pdf
๐Ÿ˜ŽProject https://fredfyyang.github.io/vision-from-touch
๐Ÿ˜ŽCode https://github.com/fredfyyang/vision-from-touch
This media is not supported in your browser
VIEW IN TELEGRAM
โ˜•Decaf: 3D Face-Hand Interactionsโ˜•

๐Ÿ‘‰The first learning-based MoCap to track human hands interacting with human faces in #3D from single monocular RGB videos

๐Ÿ˜ŽReview https://t.ly/070Tj
๐Ÿ˜ŽPaper arxiv.org/pdf/2309.16670.pdf
๐Ÿ˜ŽProject vcai.mpi-inf.mpg.de/projects/Decaf
This media is not supported in your browser
VIEW IN TELEGRAM
๐ŸŒฑ Making LLaMA See and Draw ๐ŸŒฑ

๐Ÿ‘‰Tencent #AI planted a SEED of Vision in Large Language Model. Making LLaMA see 'n' draw stuff.

๐Ÿ˜ŽReview https://t.ly/QiCAv
๐Ÿ˜ŽPaper arxiv.org/pdf/2310.01218.pdf
๐Ÿ˜ŽCode github.com/AILab-CVC/SEED
๐Ÿ”ฅVisual-Math Q&A: MathVista is out! ๐Ÿ”ฅ

๐Ÿ‘‰ MathVista is the ultimate benchmark designed to amalgamate challenges from diverse mathematical and visual tasks

๐Ÿ˜ŽReview https://t.ly/yfqHZ
๐Ÿ˜ŽPaper https://arxiv.org/pdf/2310.02255.pdf
๐Ÿ˜ŽProject https://mathvista.github.io/
๐Ÿ˜ŽCode github.com/lupantech/MathVista