This media is not supported in your browser
VIEW IN TELEGRAM
π» Plant Camouflage Detectionπ»
πPlantCamo Dataset is the first dataset for plant camouflage detection: 1,250 images with camouflage characteristics. Source Code released π
πReview https://t.ly/pYFX4
πPaper arxiv.org/pdf/2410.17598
πCode github.com/yjybuaa/PlantCamo
πPlantCamo Dataset is the first dataset for plant camouflage detection: 1,250 images with camouflage characteristics. Source Code released π
πReview https://t.ly/pYFX4
πPaper arxiv.org/pdf/2410.17598
πCode github.com/yjybuaa/PlantCamo
β€11π6π€―4π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ SMITE: SEGMENT IN TIME βοΈ
πSFU unveils SMITE: a novel AI that -with only one or few segmentation references with fine granularity- is able to segment different unseen videos respecting the segmentation references. Dataset & Code (under Apache 2.0) announced π
πReview https://t.ly/w6aWJ
πPaper arxiv.org/pdf/2410.18538
πProject segment-me-in-time.github.io/
πRepo github.com/alimohammadiamirhossein/smite
πSFU unveils SMITE: a novel AI that -with only one or few segmentation references with fine granularity- is able to segment different unseen videos respecting the segmentation references. Dataset & Code (under Apache 2.0) announced π
πReview https://t.ly/w6aWJ
πPaper arxiv.org/pdf/2410.18538
πProject segment-me-in-time.github.io/
πRepo github.com/alimohammadiamirhossein/smite
π€―11β€4π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π« Blendify: #Python + Blender π«
πLightweight Python framework that provides a high-level API for creating & rendering scenes with #Blender. It simplifies data augmentation & synthesis. Source Code releasedπ
πReview https://t.ly/l0crA
πPaper https://arxiv.org/pdf/2410.17858
πCode https://virtualhumans.mpi-inf.mpg.de/blendify/
πLightweight Python framework that provides a high-level API for creating & rendering scenes with #Blender. It simplifies data augmentation & synthesis. Source Code releasedπ
πReview https://t.ly/l0crA
πPaper https://arxiv.org/pdf/2410.17858
πCode https://virtualhumans.mpi-inf.mpg.de/blendify/
π€©13π4π₯4β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ D-FINE: new SOTA Detector π₯
πD-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR model. New SOTA on MS COCO with additional data. Code & models available π
πReview https://t.ly/aw9fN
πPaper https://arxiv.org/pdf/2410.13842
πCode https://github.com/Peterande/D-FINE
πD-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR model. New SOTA on MS COCO with additional data. Code & models available π
πReview https://t.ly/aw9fN
πPaper https://arxiv.org/pdf/2410.13842
πCode https://github.com/Peterande/D-FINE
β€16π3π1π€―1
AI with Papers - Artificial Intelligence & Deep Learning
π« Free-Moving Reconstruction π« πEPFL (+#MagicLeap) unveils a novel approach for reconstructing free-moving object from monocular RGB clip. Free interaction with objects in front of a moving cam without relying on any prior, and optimizes the sequence globallyβ¦
GitHub
GitHub - HaixinShi/fmov_pose: This is the official repo for the implementation of Free-Moving Object Reconstruction and Pose Estimationβ¦
This is the official repo for the implementation of Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera(AAAI 2025). - HaixinShi/fmov_pose
π1
This media is not supported in your browser
VIEW IN TELEGRAM
π REM: Segment What You Describe π
πREM is a framework for segmenting concepts in video that can be described via LLM. Suitable for rare & non-object dynamic concepts, such as waves, smoke, etc. Code & Data announced π
πReview https://t.ly/OyVtV
πPaper arxiv.org/pdf/2410.23287
πProject https://miccooper9.github.io/projects/ReferEverything/
πREM is a framework for segmenting concepts in video that can be described via LLM. Suitable for rare & non-object dynamic concepts, such as waves, smoke, etc. Code & Data announced π
πReview https://t.ly/OyVtV
πPaper arxiv.org/pdf/2410.23287
πProject https://miccooper9.github.io/projects/ReferEverything/
π₯18β€4π3π€©2π€―1π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ Universal Relightable Avatars βοΈ
π#Meta unveils URAvatar, photorealistic & relightable avatars from phone scan with unknown illumination. Stunning results!
πReview https://t.ly/U-ESX
πPaper arxiv.org/pdf/2410.24223
πProject junxuan-li.github.io/urgca-website
π#Meta unveils URAvatar, photorealistic & relightable avatars from phone scan with unknown illumination. Stunning results!
πReview https://t.ly/U-ESX
πPaper arxiv.org/pdf/2410.24223
πProject junxuan-li.github.io/urgca-website
β€11π₯5β‘1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π£ CityGaussianV2: Large-Scale City π£
πA novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency: 10x compression, 25% faster & -50% memory! Source code releasedπ
πReview https://t.ly/Xgn59
πPaper arxiv.org/pdf/2411.00771
πProject dekuliutesla.github.io/CityGaussianV2/
πCode github.com/DekuLiuTesla/CityGaussian
πA novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency: 10x compression, 25% faster & -50% memory! Source code releasedπ
πReview https://t.ly/Xgn59
πPaper arxiv.org/pdf/2411.00771
πProject dekuliutesla.github.io/CityGaussianV2/
πCode github.com/DekuLiuTesla/CityGaussian
π15π₯9β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ Muscles in Time Dataset πͺ
πMuscles in Time (MinT) is a large-scale synthetic muscle activation dataset. MinT contains 9+ hours of simulation data covering 227 subjects and 402 simulated muscle strands. Code & Dataset available soon π
πReview https://t.ly/108g6
πPaper arxiv.org/pdf/2411.00128
πProject davidschneider.ai/mint
πCode github.com/simplexsigil/MusclesInTime
πMuscles in Time (MinT) is a large-scale synthetic muscle activation dataset. MinT contains 9+ hours of simulation data covering 227 subjects and 402 simulated muscle strands. Code & Dataset available soon π
πReview https://t.ly/108g6
πPaper arxiv.org/pdf/2411.00128
πProject davidschneider.ai/mint
πCode github.com/simplexsigil/MusclesInTime
π₯8β€3π3
This media is not supported in your browser
VIEW IN TELEGRAM
π§ Single Neuron Reconstruction π§
πSIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience π
πReview https://t.ly/Y5Xu0
πPaper https://arxiv.org/pdf/2411.04715
πRepo github.com/beanli161514/neurofly
πSIAT unveils NeuroFly, a framework for large-scale single neuron reconstruction. Formulating neuron reconstruction task as a 3-stage streamlined workflow: automatic segmentation - connection - manual proofreading. Bridging computer vision and neuroscience π
πReview https://t.ly/Y5Xu0
πPaper https://arxiv.org/pdf/2411.04715
πRepo github.com/beanli161514/neurofly
β€4π₯1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π« X-Portrait 2: SOTA(?) Portrait Animation π«
πByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
πReview https://t.ly/8Owh9 [UPDATE]
πPaper ?
πProject byteaigc.github.io/X-Portrait2/
πRepo ?
πByteDance unveils a preview of X-Portrait2, the new SOTA expression encoder model that implicitly encodes every minuscule expressions from the input by training it on large-scale datasets. Impressive results but no paper & code announced.
πReview https://t.ly/8Owh9 [UPDATE]
πPaper ?
πProject byteaigc.github.io/X-Portrait2/
πRepo ?
π₯13π€―5π4β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈDonβt Look Twice: ViT by RLTβοΈ
πCMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced π
πReview https://t.ly/ccSwN
πPaper https://lnkd.in/d6VXur_q
πProject https://lnkd.in/d4tXwM5T
πRepo TBA
πCMU unveils RLT: speeding up the video transformers inspired by run-length encoding for data compression. Speed the training up and reducing the token count by up to 80%! Source Code announced π
πReview https://t.ly/ccSwN
πPaper https://lnkd.in/d6VXur_q
πProject https://lnkd.in/d4tXwM5T
πRepo TBA
π₯9π3β€1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πSeedEdit: foundational T2Iπ
πByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is onlineπ
πReview https://t.ly/hPlnN
πPaper https://arxiv.org/pdf/2411.06686
πProject team.doubao.com/en/special/seededit
π€Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
πByteDance unveils a novel T2I foundational model capable of delivering stable, high-aesthetic image edits which maintain image quality through unlimited rounds of editing instructions. No code announced but a Demo is onlineπ
πReview https://t.ly/hPlnN
πPaper https://arxiv.org/pdf/2411.06686
πProject team.doubao.com/en/special/seededit
π€Demo https://huggingface.co/spaces/ByteDance/SeedEdit-APP
π₯10β€6π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ 4 NanoSeconds inference π₯
πLogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
πReview https://t.ly/GflOW
πPaper https://lnkd.in/dAZQr3dW
πFull clip https://lnkd.in/dvDJ3j-u
πLogicTreeNet: convolutional differentiable logic gate net. with logic gate tree kernels: Computer Vision into differentiable LGNs. Up to 6100% smaller than SOTA, inference in 4 NANOsecs!
πReview https://t.ly/GflOW
πPaper https://lnkd.in/dAZQr3dW
πFull clip https://lnkd.in/dvDJ3j-u
π₯29π€―12π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯οΈ Global Tracklet Association MOT π₯οΈ
πA novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code releasedπ
πReview https://t.ly/gk-yh
πPaper https://lnkd.in/dvXQVKFw
πRepo https://lnkd.in/dEJqiyWs
πA novel universal, model-agnostic method designed to refine and enhance tracklet association for single-camera MOT. Suitable for datasets such as SportsMOT, SoccerNet & similar. Source code releasedπ
πReview https://t.ly/gk-yh
πPaper https://lnkd.in/dvXQVKFw
πRepo https://lnkd.in/dEJqiyWs
π10π₯4β€2
This media is not supported in your browser
VIEW IN TELEGRAM
π§Ά MagicQuill: super-easy Diffusion Editing π§Ά
πMagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released π
πReview https://t.ly/hJyLa
πPaper https://arxiv.org/pdf/2411.09703
πProject https://magicquill.art/demo/
πRepo https://github.com/magic-quill/magicquill
πDemo https://huggingface.co/spaces/AI4Editing/MagicQuill
πMagicQuill is a novel system designed to support users in smart editing of images. Robust UI/UX (e.g., inserting/erasing objects, colors, etc.) under a multimodal LLM to anticipate user intentions in real time. Code & Demos released π
πReview https://t.ly/hJyLa
πPaper https://arxiv.org/pdf/2411.09703
πProject https://magicquill.art/demo/
πRepo https://github.com/magic-quill/magicquill
πDemo https://huggingface.co/spaces/AI4Editing/MagicQuill
π€©7π₯4β€3π2
This media is not supported in your browser
VIEW IN TELEGRAM
π§° EchoMimicV2: Semi-body Human π§°
πAlipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announcedπ
πReview https://t.ly/enLxJ
πPaper arxiv.org/pdf/2411.10061
πProject antgroup.github.io/ai/echomimic_v2/
πRepo-v2 github.com/antgroup/echomimic_v2
πRepo-v1 https://github.com/antgroup/echomimic
πAlipay (ANT Group) unveils EchoMimicV2, the novel SOTA half-body human animation via APD-Harmonization. See clip with audio (ZH/ENG). Code & Demo announcedπ
πReview https://t.ly/enLxJ
πPaper arxiv.org/pdf/2411.10061
πProject antgroup.github.io/ai/echomimic_v2/
πRepo-v2 github.com/antgroup/echomimic_v2
πRepo-v1 https://github.com/antgroup/echomimic
β€5π₯5π2
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈSAMurai: SAM for TrackingβοΈ
πUWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0π
πReview https://t.ly/yGU0P
πPaper https://arxiv.org/pdf/2411.11922
πRepo https://github.com/yangchris11/samurai
πProject https://yangchris11.github.io/samurai/
πUWA unveils SAMURAI, an enhanced adaptation of SAM 2 specifically designed for visual object tracking. New SOTA! Code under Apache 2.0π
πReview https://t.ly/yGU0P
πPaper https://arxiv.org/pdf/2411.11922
πRepo https://github.com/yangchris11/samurai
πProject https://yangchris11.github.io/samurai/
π₯20β€6π2β‘1π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦Dino-X: Unified Obj-Centric LVMπ¦
πUnified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced π
πReview https://t.ly/CSQon
πPaper https://lnkd.in/dc44ZM8v
πProject https://lnkd.in/dehKJVvC
πRepo https://lnkd.in/df8Kb6iz
πUnified vision model for Open-World Detection, Segmentation, Phrase Grounding, Visual Counting, Pose, Prompt-Free Detection/Recognition, Dense Caption, & more. Demo & API announced π
πReview https://t.ly/CSQon
πPaper https://lnkd.in/dc44ZM8v
πProject https://lnkd.in/dehKJVvC
πRepo https://lnkd.in/df8Kb6iz
π₯12π€―8β€4π3π€©1