This media is not supported in your browser
VIEW IN TELEGRAM
๐ฆ 3D Pigeons Pose & Tracking ๐ฆ
๐ 3D-MuPPET: estimate and track 3D poses of pigeons with multiple-views
๐Review https://t.ly/jfAJJ
๐Paper arxiv.org/pdf/2308.15316.pdf
๐Code github.com/alexhang212/3D-MuPPET/
๐ 3D-MuPPET: estimate and track 3D poses of pigeons with multiple-views
๐Review https://t.ly/jfAJJ
๐Paper arxiv.org/pdf/2308.15316.pdf
๐Code github.com/alexhang212/3D-MuPPET/
This media is not supported in your browser
VIEW IN TELEGRAM
๐RoboTAP: Dense Tracking for Few-Shot Imitation๐
๐RoboTAP: novel dense tracking representation for robotic arm
๐Review https://t.ly/MCO_V
๐Paper arxiv.org/pdf/2308.15975.pdf
๐Project https://robotap.github.io/
๐Code github.com/deepmind/tapnet
๐RoboTAP: novel dense tracking representation for robotic arm
๐Review https://t.ly/MCO_V
๐Paper arxiv.org/pdf/2308.15975.pdf
๐Project https://robotap.github.io/
๐Code github.com/deepmind/tapnet
This media is not supported in your browser
VIEW IN TELEGRAM
โบFACET: Fairness in Computer Visionโบ
๐#META AI opens a large, publicly available dataset for classification, detection & segmentation. Potential performance disparities & challenges across sensitive demographic attributes
๐Review https://t.ly/mKn-t
๐Paper arxiv.org/pdf/2309.00035.pdf
๐Dataset https://facet.metademolab.com/
๐#META AI opens a large, publicly available dataset for classification, detection & segmentation. Potential performance disparities & challenges across sensitive demographic attributes
๐Review https://t.ly/mKn-t
๐Paper arxiv.org/pdf/2309.00035.pdf
๐Dataset https://facet.metademolab.com/
This media is not supported in your browser
VIEW IN TELEGRAM
โ๏ธ Doppelgangers in Structures โ๏ธ
๐A novel learning-based approach for visual disambiguation: distinguishing illusory matches to produce correct, disambiguated #3D reconstructions
๐Review https://t.ly/9yLot
๐Paper arxiv.org/pdf/2309.02420.pdf
๐Code github.com/RuojinCai/Doppelgangers
๐Project doppelgangers-3d.github.io/
๐A novel learning-based approach for visual disambiguation: distinguishing illusory matches to produce correct, disambiguated #3D reconstructions
๐Review https://t.ly/9yLot
๐Paper arxiv.org/pdf/2309.02420.pdf
๐Code github.com/RuojinCai/Doppelgangers
๐Project doppelgangers-3d.github.io/
This media is not supported in your browser
VIEW IN TELEGRAM
๐ Tracking Anything with Decoupled VOS ๐
๐A novel VOS approach that extends SAM for open-world video segmentation with no user input required
๐Review https://t.ly/xeobR
๐Paper arxiv.org/pdf/2309.03903.pdf
๐Project hkchengrex.com/Tracking-Anything-with-DEVA
๐Code github.com/hkchengrex/Tracking-Anything-with-DEVA
๐Colab https://colab.research.google.com/drive/1OsyNVoV_7ETD1zIE8UWxL3NXxu12m_YZ
๐A novel VOS approach that extends SAM for open-world video segmentation with no user input required
๐Review https://t.ly/xeobR
๐Paper arxiv.org/pdf/2309.03903.pdf
๐Project hkchengrex.com/Tracking-Anything-with-DEVA
๐Code github.com/hkchengrex/Tracking-Anything-with-DEVA
๐Colab https://colab.research.google.com/drive/1OsyNVoV_7ETD1zIE8UWxL3NXxu12m_YZ
This media is not supported in your browser
VIEW IN TELEGRAM
๐ชท Diffusive Consistent Video Editing ๐ชท
๐ Weizmann Institute of Science unveils TokenFlow, a novel text-to-image diffusion model for text-driven video editing
๐Review https://t.ly/ru8km
๐Paper arxiv.org/pdf/2307.10373.pdf
๐Project diffusion-tokenflow.github.io
๐Code github.com/omerbt/TokenFlow
๐ Weizmann Institute of Science unveils TokenFlow, a novel text-to-image diffusion model for text-driven video editing
๐Review https://t.ly/ru8km
๐Paper arxiv.org/pdf/2307.10373.pdf
๐Project diffusion-tokenflow.github.io
๐Code github.com/omerbt/TokenFlow
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฅ๐ฅ #META's DINOv2 is now commercial! ๐ฅ๐ฅ
๐Universal features for image classification, instance retrieval, video understanding, depth & semantic segmentation. Now suitable for commercial.
๐Review https://t.ly/LNrGy
๐Paper arxiv.org/pdf/2304.07193.pdf
๐Code github.com/facebookresearch/dinov2
๐Demo dinov2.metademolab.com/
๐Universal features for image classification, instance retrieval, video understanding, depth & semantic segmentation. Now suitable for commercial.
๐Review https://t.ly/LNrGy
๐Paper arxiv.org/pdf/2304.07193.pdf
๐Code github.com/facebookresearch/dinov2
๐Demo dinov2.metademolab.com/
This media is not supported in your browser
VIEW IN TELEGRAM
๐งFreeMan: towards #3D Humans ๐ง
๐FreeMan: the first large-scale, real-world, multi-view dataset for #3D human pose estimation. 11M frames!
๐Review https://t.ly/ICxpA
๐Paper arxiv.org/pdf/2309.05073.pdf
๐Project wangjiongw.github.io/freeman
๐FreeMan: the first large-scale, real-world, multi-view dataset for #3D human pose estimation. 11M frames!
๐Review https://t.ly/ICxpA
๐Paper arxiv.org/pdf/2309.05073.pdf
๐Project wangjiongw.github.io/freeman
๐ฆ MagiCapture: HD Multi-Concept Portrait ๐ฆ
๐KAIST unveils MagiCapture: integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references
๐Review https://t.ly/c9rOo
๐Paper https://arxiv.org/pdf/2309.06895.pdf
๐KAIST unveils MagiCapture: integrating subject and style concepts to generate high-resolution portrait images using just a few subject and style references
๐Review https://t.ly/c9rOo
๐Paper https://arxiv.org/pdf/2309.06895.pdf
This media is not supported in your browser
VIEW IN TELEGRAM
โฝ Dynamic NeRFs for Soccer โฝ
๐SoccerNeRF: first attempt of "cheap" NeRF applied to football for reconstructing soccer replays in space and time.
๐Review https://t.ly/Ywcvk
๐Paper arxiv.org/pdf/2309.06802.pdf
๐Project https://soccernerfs.isach.be/
๐Code github.com/iSach/SoccerNeRFs
๐SoccerNeRF: first attempt of "cheap" NeRF applied to football for reconstructing soccer replays in space and time.
๐Review https://t.ly/Ywcvk
๐Paper arxiv.org/pdf/2309.06802.pdf
๐Project https://soccernerfs.isach.be/
๐Code github.com/iSach/SoccerNeRFs
This media is not supported in your browser
VIEW IN TELEGRAM
โข๏ธ GlueStick: Graph Neural Matching โข๏ธ
๐GlueStick is joint deep matcher for points and lines that leverages the connectivity information between nodes to better glue them together
๐Review https://t.ly/Atxqo
๐Paper arxiv.org/pdf/2304.02008.pdf
๐Code https://github.com/cvg/GlueStick
๐GlueStick is joint deep matcher for points and lines that leverages the connectivity information between nodes to better glue them together
๐Review https://t.ly/Atxqo
๐Paper arxiv.org/pdf/2304.02008.pdf
๐Code https://github.com/cvg/GlueStick
This media is not supported in your browser
VIEW IN TELEGRAM
๐ซCPR-Coach: Neural Cardiopulmonary Resuscitation๐ซ
๐CPR-Coach: fine-grained action recognition in cardiopulmonary resuscitation
๐Review https://t.ly/Qbg4K
๐Paper arxiv.org/pdf/2309.11718.pdf
๐Code github.com/Shunli-Wang/CPR-Coach
๐Project shunli-wang.github.io/CPR-Coach
๐CPR-Coach: fine-grained action recognition in cardiopulmonary resuscitation
๐Review https://t.ly/Qbg4K
๐Paper arxiv.org/pdf/2309.11718.pdf
๐Code github.com/Shunli-Wang/CPR-Coach
๐Project shunli-wang.github.io/CPR-Coach
๐งช NeuralLabeling with NeRF ๐งช
๐Annotating a scene by generating segmentation masks, affordance maps, 2D bounding boxes, 3D BB, 6DOF poses, depth & meshes.
๐Review https://t.ly/1GPsj
๐Paper arxiv.org/pdf/2309.11966.pdf
๐Code github.com/FlorisE/neural-labeling
๐Project florise.github.io/neural_labeling_web
๐Annotating a scene by generating segmentation masks, affordance maps, 2D bounding boxes, 3D BB, 6DOF poses, depth & meshes.
๐Review https://t.ly/1GPsj
๐Paper arxiv.org/pdf/2309.11966.pdf
๐Code github.com/FlorisE/neural-labeling
๐Project florise.github.io/neural_labeling_web
๐ DE-ViT: detecting everything via DINOv2 ๐
๐DE-ViT: open-set object detector based on DINOv2 backbone. It's the new SOTA on COCO & LVIS dataset
๐Review https://t.ly/_DAmt
๐Paper arxiv.org/pdf/2309.12969.pdf
๐Code https://github.com/mlzxy/devit
๐DE-ViT: open-set object detector based on DINOv2 backbone. It's the new SOTA on COCO & LVIS dataset
๐Review https://t.ly/_DAmt
๐Paper arxiv.org/pdf/2309.12969.pdf
๐Code https://github.com/mlzxy/devit
This media is not supported in your browser
VIEW IN TELEGRAM
๐ตCoTracker: fast transformer-tracker๐ต
๐META's CoTracker is a fast transformer-based model that can track any point in a video
๐Review https://t.ly/M36A_
๐Paper arxiv.org/pdf/2307.07635.pdf
๐Project https://co-tracker.github.io/
๐Code github.com/facebookresearch/co-tracker
๐META's CoTracker is a fast transformer-based model that can track any point in a video
๐Review https://t.ly/M36A_
๐Paper arxiv.org/pdf/2307.07635.pdf
๐Project https://co-tracker.github.io/
๐Code github.com/facebookresearch/co-tracker
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฌ๏ธ Neural Blowing in Still Photos ๐ฌ๏ธ
๐ A novel approach to animate human hair (and clothes) in a still portraits
๐Review https://t.ly/HKG0t
๐Paper arxiv.org/pdf/2309.14207.pdf
๐Project nevergiveu.github.io/AutomaticHairBlowing
๐ A novel approach to animate human hair (and clothes) in a still portraits
๐Review https://t.ly/HKG0t
๐Paper arxiv.org/pdf/2309.14207.pdf
๐Project nevergiveu.github.io/AutomaticHairBlowing
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฎ OW Indoor Segmentation ๐ฎ
๐3D-OWIS is a novel open-world 3D indoor instance segmentation method (with auto-labeling scheme) to separate known/unknown category labels
๐Review https://t.ly/-7ALf
๐Paper arxiv.org/pdf/2309.14338.pdf
๐Code github.com/aminebdj/3D-OWIS
๐3D-OWIS is a novel open-world 3D indoor instance segmentation method (with auto-labeling scheme) to separate known/unknown category labels
๐Review https://t.ly/-7ALf
๐Paper arxiv.org/pdf/2309.14338.pdf
๐Code github.com/aminebdj/3D-OWIS
This media is not supported in your browser
VIEW IN TELEGRAM
๐งฑ Generating Scenes from Touch ๐งฑ
๐#AI for synthesizing images from tactile signals (and vice versa) and apply it to a number of visuo-tactile synthesis tasks
๐Review https://t.ly/Gxr0L
๐Paper https://arxiv.org/pdf/2309.15117.pdf
๐Project https://fredfyyang.github.io/vision-from-touch
๐Code https://github.com/fredfyyang/vision-from-touch
๐#AI for synthesizing images from tactile signals (and vice versa) and apply it to a number of visuo-tactile synthesis tasks
๐Review https://t.ly/Gxr0L
๐Paper https://arxiv.org/pdf/2309.15117.pdf
๐Project https://fredfyyang.github.io/vision-from-touch
๐Code https://github.com/fredfyyang/vision-from-touch
This media is not supported in your browser
VIEW IN TELEGRAM
โDecaf: 3D Face-Hand Interactionsโ
๐The first learning-based MoCap to track human hands interacting with human faces in #3D from single monocular RGB videos
๐Review https://t.ly/070Tj
๐Paper arxiv.org/pdf/2309.16670.pdf
๐Project vcai.mpi-inf.mpg.de/projects/Decaf
๐The first learning-based MoCap to track human hands interacting with human faces in #3D from single monocular RGB videos
๐Review https://t.ly/070Tj
๐Paper arxiv.org/pdf/2309.16670.pdf
๐Project vcai.mpi-inf.mpg.de/projects/Decaf
This media is not supported in your browser
VIEW IN TELEGRAM
๐ฑ Making LLaMA See and Draw ๐ฑ
๐Tencent #AI planted a SEED of Vision in Large Language Model. Making LLaMA see 'n' draw stuff.
๐Review https://t.ly/QiCAv
๐Paper arxiv.org/pdf/2310.01218.pdf
๐Code github.com/AILab-CVC/SEED
๐Tencent #AI planted a SEED of Vision in Large Language Model. Making LLaMA see 'n' draw stuff.
๐Review https://t.ly/QiCAv
๐Paper arxiv.org/pdf/2310.01218.pdf
๐Code github.com/AILab-CVC/SEED
๐ฅVisual-Math Q&A: MathVista is out! ๐ฅ
๐ MathVista is the ultimate benchmark designed to amalgamate challenges from diverse mathematical and visual tasks
๐Review https://t.ly/yfqHZ
๐Paper https://arxiv.org/pdf/2310.02255.pdf
๐Project https://mathvista.github.io/
๐Code github.com/lupantech/MathVista
๐ MathVista is the ultimate benchmark designed to amalgamate challenges from diverse mathematical and visual tasks
๐Review https://t.ly/yfqHZ
๐Paper https://arxiv.org/pdf/2310.02255.pdf
๐Project https://mathvista.github.io/
๐Code github.com/lupantech/MathVista