This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯ SAM v2 is out! π₯π₯
π#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesπ
πReview https://t.ly/oovJZ
πPaper https://t.ly/sCxMY
πDemo https://sam2.metademolab.com
πProject ai.meta.com/blog/segment-anything-2/
πModels github.com/facebookresearch/segment-anything-2
π#Meta announced SAM 2, the novel unified model for real-time promptable segmentation in images and videos. 6x faster, it's the new SOTA by a large margin. Source Code, Dataset, Models & Demo released under permissive licensesπ
πReview https://t.ly/oovJZ
πPaper https://t.ly/sCxMY
πDemo https://sam2.metademolab.com
πProject ai.meta.com/blog/segment-anything-2/
πModels github.com/facebookresearch/segment-anything-2
π₯27β€10π€―4π2πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
π Real-time Expressive Hands π
πZhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024π
πReview https://t.ly/8obbB
πProject https://lnkd.in/dRtVGe6i
πPaper https://lnkd.in/daCx2iB7
πCode https://lnkd.in/dZ9pgzug
πZhejiang unveils XHand, a novel expressive hand avatar designed to comprehensively generate hand shape, appearance, and deformations in real-time. Source Code released (Apache 2.0) the Jul. 31st, 2024π
πReview https://t.ly/8obbB
πProject https://lnkd.in/dRtVGe6i
πPaper https://lnkd.in/daCx2iB7
πCode https://lnkd.in/dZ9pgzug
π6π3β€2π€£2β‘1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ͺ Click-Attention Segmentation π§ͺ
πAn interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apacheπ
πReview https://t.ly/tG05L
πPaper https://arxiv.org/pdf/2408.06021
πCode https://github.com/hahamyt/ClickAttention
πAn interesting image patch-based click attention algorithm and an affinity loss inspired by SASFormer. This novel approach aims to decouple positive and negative clicks, guiding positive ones to focus on the target object and negative ones on the background. Code released under Apacheπ
πReview https://t.ly/tG05L
πPaper https://arxiv.org/pdf/2408.06021
πCode https://github.com/hahamyt/ClickAttention
β€12π₯3π2π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ #Adobe Instant TurboEdit ποΈ
πAdobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
πReview https://t.ly/Na7-y
πPaper https://lnkd.in/dVs9RcCK
πProject https://lnkd.in/dGCqwh9Z
πCode π’
πAdobe unveils a novel real-time text-based disentangled real image editing method built upon 4-step SDXL Turbo. SOTA HQ image editing using ultra fast few-step diffusion. No code announced but easy to guess it will be released in commercial tools.
πReview https://t.ly/Na7-y
πPaper https://lnkd.in/dVs9RcCK
πProject https://lnkd.in/dGCqwh9Z
πCode π’
π₯14π4π₯°2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Zebra Detection & Pose π¦
πThe first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourcedπ
πReview https://t.ly/HTEZZ
πPaper https://lnkd.in/dQYT-fyq
πProject https://lnkd.in/dAnNXgG3
πCode https://lnkd.in/dhvU97xD
πThe first synthetic dataset that can be used for both detection and 2D pose estimation of zebras without applying any bridging strategies. Code, results, models, and the synthetic, training/validation data, including 104K manually labeled images open-sourcedπ
πReview https://t.ly/HTEZZ
πPaper https://lnkd.in/dQYT-fyq
πProject https://lnkd.in/dAnNXgG3
πCode https://lnkd.in/dhvU97xD
π7π3β€1π₯1π₯°1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦§Sapiens: SOTA ViTs for humanπ¦§
πMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingπ
πReview https://t.ly/GKQI0
πPaper arxiv.org/pdf/2408.12569
πProject rawalkhirodkar.github.io/sapiens
πCode github.com/facebookresearch/sapiens
πMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingπ
πReview https://t.ly/GKQI0
πPaper arxiv.org/pdf/2408.12569
πProject rawalkhirodkar.github.io/sapiens
πCode github.com/facebookresearch/sapiens
π₯19β€7π₯°2π1π€―1
AI with Papers - Artificial Intelligence & Deep Learning
π¦§Sapiens: SOTA ViTs for human𦧠πMETA unveils Sapiens, a family of models for human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Source Code announced, comingπ πReview https://t.ly/GKQI0β¦
π₯π₯π₯π₯π₯ SOURCE CODE IS OUT !!! π₯π₯π₯π₯π₯
Thanks Danny for the info π₯
Thanks Danny for the info π₯
π11π₯4π4β€3π±1
This media is not supported in your browser
VIEW IN TELEGRAM
πΊ Diffusion Game Engine πΊ
π#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it π
πReview https://t.ly/_WR5z
πPaper https://lnkd.in/dZqgiqb9
πProject https://lnkd.in/dJUd2Fr6
π#Google unveils GameNGen: the first game engine powered entirely by a neural #AI that enables real-time interaction with a complex environment over long trajectories at HQ. No code announced but I love it π
πReview https://t.ly/_WR5z
πPaper https://lnkd.in/dZqgiqb9
πProject https://lnkd.in/dJUd2Fr6
π₯10π5β€2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π« Omni Urban Scene Reconstruction π«
πOmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedπ
πReview https://t.ly/SXVPa
πPaper arxiv.org/pdf/2408.16760
πProject ziyc.github.io/omnire/
πCode github.com/ziyc/drivestudio
πOmniRe is novel holistic approach for efficiently reconstructing HD dynamic urban scenes from on-device logs. It's able to create the simulation of reconstructed scenarios with actors in real-time (~60 Hz). Code releasedπ
πReview https://t.ly/SXVPa
πPaper arxiv.org/pdf/2408.16760
πProject ziyc.github.io/omnire/
πCode github.com/ziyc/drivestudio
π₯10π9β€3π€―1πΎ1
This media is not supported in your browser
VIEW IN TELEGRAM
πInteractive Drag-based Editingπ
πCSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingπ
πReview https://t.ly/hy6SL
πPaper arxiv.org/pdf/2409.08857
πProject joonghyuk.com/instantdrag-web/
πCode github.com/alex4727/InstantDrag
πCSE unveils InstantDrag: novel pipeline designed to enhance editing interactivity and speed, taking only an image and a drag instruction as input. Source Code announced, comingπ
πReview https://t.ly/hy6SL
πPaper arxiv.org/pdf/2409.08857
πProject joonghyuk.com/instantdrag-web/
πCode github.com/alex4727/InstantDrag
π₯13π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
πHand-Object interaction Pretrainingπ
πBerkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
πReview https://t.ly/FLqvJ
πPaper https://arxiv.org/pdf/2409.08273
πProject https://hgaurav2k.github.io/hop/
πBerkeley unveils HOP, a novel approach to learn general robot manipulation priors from 3D hand-object interaction trajectories.
πReview https://t.ly/FLqvJ
πPaper https://arxiv.org/pdf/2409.08273
πProject https://hgaurav2k.github.io/hop/
π₯°3β€1π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π§ΈMotion Instruction Fine-Tuningπ§Έ
πMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingπ
πReview https://t.ly/iJ2UY
πPaper https://arxiv.org/pdf/2409.10683
πProject https://motif-1k.github.io/
πCode coming
πMotIF is a novel method that fine-tunes pre-trained VLMs to equip the capability to distinguish nuanced robotic motions with different shapes and semantic groundings. A work by MIT, Stanford, and CMU. Source Code announced, comingπ
πReview https://t.ly/iJ2UY
πPaper https://arxiv.org/pdf/2409.10683
πProject https://motif-1k.github.io/
πCode coming
π1π₯1π€―1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
β½ SoccerNet 2024 Results β½
πSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
πReview https://t.ly/DUPgx
πPaper arxiv.org/pdf/2409.10587
πRepo github.com/SoccerNet
πProject www.soccer-net.org/
πSoccerNet is the annual video understanding challenge for football. These challenges aim to advance research across multiple themes in football. The 2024 results are out!
πReview https://t.ly/DUPgx
πPaper arxiv.org/pdf/2409.10587
πRepo github.com/SoccerNet
πProject www.soccer-net.org/
π₯12π6π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π JoyHallo: Mandarin Digital Human π
πJD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageβs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableπ
πReview https://t.ly/5NGDh
πPaper arxiv.org/pdf/2409.13268
πProject jdh-algo.github.io/JoyHallo/
πCode github.com/jdh-algo/JoyHallo
πJD Health faced the challenges of audio-driven video generation in Mandarin, a task complicated by the languageβs intricate lip movements and the scarcity of HQ datasets. Impressive results (-> audio ON). Code Models availableπ
πReview https://t.ly/5NGDh
πPaper arxiv.org/pdf/2409.13268
πProject jdh-algo.github.io/JoyHallo/
πCode github.com/jdh-algo/JoyHallo
π₯9π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π’ Robo-quadruped Parkourπ’
πLAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableπ
πReview https://t.ly/-6VRm
πPaper arxiv.org/pdf/2409.13678
πProject gepetto.github.io/SoloParkour/
πCode github.com/Gepetto/SoloParkour
πLAAS-CNRS unveils a novel RL approach to perform agile skills that are reminiscent of parkour, such as walking, climbing high steps, leaping over gaps, and crawling under obstacles. Data and Code availableπ
πReview https://t.ly/-6VRm
πPaper arxiv.org/pdf/2409.13678
πProject gepetto.github.io/SoloParkour/
πCode github.com/Gepetto/SoloParkour
π₯5π2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π©° Dressed Humans in the wild π©°
πETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming π
πReview https://t.ly/evgmN
πPaper arxiv.org/pdf/2409.15269
πProject moygcc.github.io/ReLoo/
πCode github.com/eth-ait/ReLoo
πETH (+ #Microsoft ) ReLoo: novel 3D-HQ reconstruction of humans dressed in loose garments from mono in-the-wild clips. No prior assumptions about the garments. Source Code announced, coming π
πReview https://t.ly/evgmN
πPaper arxiv.org/pdf/2409.15269
πProject moygcc.github.io/ReLoo/
πCode github.com/eth-ait/ReLoo
π€―9β€2π1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πΎ New SOTA Edge Detection πΎ
πCUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedπ
πReview https://t.ly/zUMcS
πPaper arxiv.org/pdf/2409.14976
πCode github.com/Li-yachuan/NBED
πCUP (+ ESPOCH) unveils the new SOTA for Edge Detection (NBED); superior performance consistently across multiple benchmarks, even compared with huge computational cost and complex training models. Source Code releasedπ
πReview https://t.ly/zUMcS
πPaper arxiv.org/pdf/2409.14976
πCode github.com/Li-yachuan/NBED
π₯11π5π1
This media is not supported in your browser
VIEW IN TELEGRAM
π©βπ¦° SOTA Gaussian Haircut π©βπ¦°
πETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedπ
πReview https://t.ly/aiOjq
πPaper arxiv.org/pdf/2409.14778
πProject https://lnkd.in/dFRm2ycb
πRepo https://lnkd.in/d5NWNkb5
πETH et. al unveils Gaussian Haircut, the new SOTA in hair reconstruction via dual representation (classic + 3D Gaussian). Code and Model announcedπ
πReview https://t.ly/aiOjq
πPaper arxiv.org/pdf/2409.14778
πProject https://lnkd.in/dFRm2ycb
πRepo https://lnkd.in/d5NWNkb5
π₯16π2β€1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πSPARK: Real-time Face Captureπ
πTechnicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedπ
πReview https://t.ly/rZOgp
πPaper arxiv.org/pdf/2409.07984
πProject kelianb.github.io/SPARK/
πRepo github.com/KelianB/SPARK/
πTechnicolor Group unveils SPARK, a novel high-precision 3D face capture via collection of unconstrained videos of a subject as prior information. New SOTA able to handle unseen pose, expression and lighting. Impressive results. Code & Model announcedπ
πReview https://t.ly/rZOgp
πPaper arxiv.org/pdf/2409.07984
πProject kelianb.github.io/SPARK/
πRepo github.com/KelianB/SPARK/
π₯10β€2π1π©1
This media is not supported in your browser
VIEW IN TELEGRAM
𦴠One-Image Object Detection π¦΄
πDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπ
πReview https://t.ly/-li2G
πPaper arxiv.org/pdf/2410.00900
πCode github.com/RobinGerster7/OSSA
πDelft University (+Hensoldt Optronics) introduces OSSA, a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Code releasedπ
πReview https://t.ly/-li2G
πPaper arxiv.org/pdf/2410.00900
πCode github.com/RobinGerster7/OSSA
π₯19π2β‘1π1π₯°1