πUnified Scene Text/Layout Detectionπ
πWorld's first hierarchical scene text dataset + novel detection method
ππ’π π‘π₯π’π π‘ππ¬:
β Unified detection & geometric layout
β Hierarchical annotations in natural scenes
β Word, line, & paragraph level annotations
β Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
πWorld's first hierarchical scene text dataset + novel detection method
ππ’π π‘π₯π’π π‘ππ¬:
β Unified detection & geometric layout
β Hierarchical annotations in natural scenes
β Word, line, & paragraph level annotations
β Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
π₯3π€―2β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π #Oculus' new Hand Tracking π
πHands are able to move as naturally and intuitively in the #metaverse as do in real life
ππ’π π‘π₯π’π π‘ππ¬:
β Hands2.0 powered by CV & ML
β Tracking hand-over-hand interactions
β Crossing hands, clapping, high-fives
β Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
πHands are able to move as naturally and intuitively in the #metaverse as do in real life
ππ’π π‘π₯π’π π‘ππ¬:
β Hands2.0 powered by CV & ML
β Tracking hand-over-hand interactions
β Crossing hands, clapping, high-fives
β Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
π€―6β€4π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈNew SOTA in #3D human avatarποΈ
πPHORHUM: photorealistic 3D human from mono-RGB
ππ’π π‘π₯π’π π‘ππ¬:
β Pixel-aligned method for 3D geometry
β Unshaded surface color + illumination
β Patch-based rendering losses for visible
β Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
πPHORHUM: photorealistic 3D human from mono-RGB
ππ’π π‘π₯π’π π‘ππ¬:
β Pixel-aligned method for 3D geometry
β Unshaded surface color + illumination
β Patch-based rendering losses for visible
β Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
π€―4π2π₯°2β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π What's in your hands (#3D) ? π
πReconstructing hand-held objects (from single RGB) without knowing their 3D templatesπ€·ββοΈ
ππ’π π‘π₯π’π π‘ππ¬:
β Hand is highly predictive of object shape
β Conditional-based on the articulation
β Visual feats. / articulation-aware coords.
β Code and models available!
More: https://bit.ly/3vuYn2a
πReconstructing hand-held objects (from single RGB) without knowing their 3D templatesπ€·ββοΈ
ππ’π π‘π₯π’π π‘ππ¬:
β Hand is highly predictive of object shape
β Conditional-based on the articulation
β Visual feats. / articulation-aware coords.
β Code and models available!
More: https://bit.ly/3vuYn2a
π9π€―2π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πYODO: You Only Demonstrate Onceπ
πA novel category-level manipulation learned in sim from single demonstration videoπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot IL, model-free 6D pose tracking
β Demonstration BY single 3rd-person-view
β manipulation including hi-precision tasks
β Category-level Behavior Cloning
β Attention for dynamic coords selection
β Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
πA novel category-level manipulation learned in sim from single demonstration videoπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot IL, model-free 6D pose tracking
β Demonstration BY single 3rd-person-view
β manipulation including hi-precision tasks
β Category-level Behavior Cloning
β Attention for dynamic coords selection
β Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
π€―8β€3π2π±2π€©2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π Dress Code for Virtual Try-On π
πUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ππ’π π‘π₯π’π π‘ππ¬:
β Hi-Res paired front-view / full-body
β Pixel-level Semantic-Aware Discriminator
β 9 SOTA VTON approaches / 3 baselines
β New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
πUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ππ’π π‘π₯π’π π‘ππ¬:
β Hi-Res paired front-view / full-body
β Pixel-level Semantic-Aware Discriminator
β 9 SOTA VTON approaches / 3 baselines
β New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
β€3π3π₯1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πDeep Equilibrium for Optical Flowπ
πDEQ: converge faster, less memory, often more accurate
ππ’π π‘π₯π’π π‘ππ¬:
β Novel formulation of optical flow method
β Compatible with prior modeling/data-related
β Sparse fixed-point correction for stability
β Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
πDEQ: converge faster, less memory, often more accurate
ππ’π π‘π₯π’π π‘ππ¬:
β Novel formulation of optical flow method
β Compatible with prior modeling/data-related
β Sparse fixed-point correction for stability
β Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
π3π₯°2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π³Ultra High-Resolution Neural Saliencyπ³
πA novel ultra high-resolution saliency detector with dataset!
ππ’π π‘π₯π’π π‘ππ¬:
β Ultra Hi-Res Saliency Detection
β 5,920 pics at 4K-8K resolution
β Pyramid Grafting Network
β Cross-Model Grafting Module
β AGL: Attention Guided Loss
β Code/models under MIT
More: https://bit.ly/3MnU1Rf
πA novel ultra high-resolution saliency detector with dataset!
ππ’π π‘π₯π’π π‘ππ¬:
β Ultra Hi-Res Saliency Detection
β 5,920 pics at 4K-8K resolution
β Pyramid Grafting Network
β Cross-Model Grafting Module
β AGL: Attention Guided Loss
β Code/models under MIT
More: https://bit.ly/3MnU1Rf
β€6π3π€―3π₯2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺStyleGAN-Human for fashion πͺ
πA novel unconditional human generation based on StyleGAN is out!
ππ’π π‘π₯π’π π‘ππ¬:
β 200,000+ labeled sample (pose/texture)
β 1024x512 StyleGAN-Human StyleGAN3
β 512x256 StyleGAN-Human StyleGAN1
β Face model for downstream: InsetGAN
β Source code and model available!
More: https://bit.ly/3xMg5B2
πA novel unconditional human generation based on StyleGAN is out!
ππ’π π‘π₯π’π π‘ππ¬:
β 200,000+ labeled sample (pose/texture)
β 1024x512 StyleGAN-Human StyleGAN3
β 512x256 StyleGAN-Human StyleGAN1
β Face model for downstream: InsetGAN
β Source code and model available!
More: https://bit.ly/3xMg5B2
β€5π4π₯3π€―1π©1
This media is not supported in your browser
VIEW IN TELEGRAM
π OSSO: Skeletal Shape from Outside π
πAnatomic skeleton of a person from 3D surface of body π¦΄
ππ’π π‘π₯π’π π‘ππ¬:
β Max Planck + IMATI-CNR + INRIA
β DXA images to obtain #3D shape
β External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
πAnatomic skeleton of a person from 3D surface of body π¦΄
ππ’π π‘π₯π’π π‘ππ¬:
β Max Planck + IMATI-CNR + INRIA
β DXA images to obtain #3D shape
β External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
π4π€―2π₯1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π· Pix2Seq: object detection by #Google π·
πA novel framework to perform object detection as a language modeling task
ππ’π π‘π₯π’π π‘ππ¬:
β Obj. detection as a lang-modeling task
β BBs/labels -> seq. of discrete token
β Encoder-decoder (one token at a time)
β Code under Apache License 2.0
More: https://bit.ly/3F49PX3
πA novel framework to perform object detection as a language modeling task
ππ’π π‘π₯π’π π‘ππ¬:
β Obj. detection as a lang-modeling task
β BBs/labels -> seq. of discrete token
β Encoder-decoder (one token at a time)
β Code under Apache License 2.0
More: https://bit.ly/3F49PX3
π8π€―3π₯1π±1π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉ Generalizable Neural Performer πΉ
πGeneral neural framework to synthesize free-viewpoint images of arbitrary human performers
ππ’π π‘π₯π’π π‘ππ¬:
β Free-viewpoint synthesis of humans
β Implicit Geometric Body Embedding
β Screen-Space Occlusion-Aware Blending
β GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
πGeneral neural framework to synthesize free-viewpoint images of arbitrary human performers
ππ’π π‘π₯π’π π‘ππ¬:
β Free-viewpoint synthesis of humans
β Implicit Geometric Body Embedding
β Screen-Space Occlusion-Aware Blending
β GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
π5π₯1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π Tire-defect inspection π
πUnsupervised defects in tires using neural networks
ππ’π π‘π₯π’π π‘ππ¬:
β Impurity, same material as tire
β Impurity, with different material
β Damage by temp/pressure
β Crack or etched material
More: https://bit.ly/37GX1JT
πUnsupervised defects in tires using neural networks
ππ’π π‘π₯π’π π‘ππ¬:
β Impurity, same material as tire
β Impurity, with different material
β Damage by temp/pressure
β Crack or etched material
More: https://bit.ly/37GX1JT
β€5π3π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π§#4D Neural Fieldsπ§
π4D N.F. visual representations from monocular RGB-D π€―
ππ’π π‘π₯π’π π‘ππ¬:
β 4D scene completion (occlusions)
β Scene completion in cluttered scenes
β Novel #AI for contextual point clouds
β Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
π4D N.F. visual representations from monocular RGB-D π€―
ππ’π π‘π₯π’π π‘ππ¬:
β 4D scene completion (occlusions)
β Scene completion in cluttered scenes
β Novel #AI for contextual point clouds
β Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
π6π€―2π₯1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πLargest dataset of human-object π
πBEHAVE by Google: largest dataset of human-object interactions
ππ’π π‘π₯π’π π‘ππ¬:
β 8 subjects, 20 objects, 5 envs.
β 321 clips with 4 Kinect RGB-D
β Masks and segmented point clouds
β 3D SMPL & mesh registration
β Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
πBEHAVE by Google: largest dataset of human-object interactions
ππ’π π‘π₯π’π π‘ππ¬:
β 8 subjects, 20 objects, 5 envs.
β 321 clips with 4 Kinect RGB-D
β Masks and segmented point clouds
β 3D SMPL & mesh registration
β Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
π5π4π₯2β€1π±1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦΄ENARF-GAN Neural Articulationsπ¦΄
πUnsupervised method for 3D geometry-aware representation of articulated objects
ππ’π π‘π₯π’π π‘ππ¬:
β Novel efficient neural representation
β Tri-planes deformation fields for training
β Novel GAN for articulated representations
β Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
πUnsupervised method for 3D geometry-aware representation of articulated objects
ππ’π π‘π₯π’π π‘ππ¬:
β Novel efficient neural representation
β Tri-planes deformation fields for training
β Novel GAN for articulated representations
β Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
π€―3π2β€1π₯1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π²οΈ HuMMan: 4D human dataset π²οΈ
πHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames π€―
ππ’π π‘π₯π’π π‘ππ¬:
β RGB, pt-clouds, keypts, SMPL, texture
β Mobile device in the sensor suite
β 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
πHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames π€―
ππ’π π‘π₯π’π π‘ππ¬:
β RGB, pt-clouds, keypts, SMPL, texture
β Mobile device in the sensor suite
β 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
π₯°2π±2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Neighborhood Attention Transformer π₯
πA novel transformer for both image classification and downstream vision tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Neighborhood Attention (NA)
β Neighborhood Attention Transformer, NAT
β Faster training/inference, good throughput
β Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
πA novel transformer for both image classification and downstream vision tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Neighborhood Attention (NA)
β Neighborhood Attention Transformer, NAT
β Faster training/inference, good throughput
β Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
π€―4π3π₯1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯FANs: Fully Attentional Networksπ₯π₯
π#Nvidia unveils the fully attentional networks (FANs)
ππ’π π‘π₯π’π π‘ππ¬:
β Efficient fully attentional design
β Semantic seg. & object detection
β Model/source code soon available!
More: https://bit.ly/3vtpITs
π#Nvidia unveils the fully attentional networks (FANs)
ππ’π π‘π₯π’π π‘ππ¬:
β Efficient fully attentional design
β Semantic seg. & object detection
β Model/source code soon available!
More: https://bit.ly/3vtpITs
π₯7π€―3π2β€1
π¨πΌβπ¨ Open-Source DALLΒ·E 2 is out π¨πΌβπ¨
π#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA for text-to-image generation
β Source code/model under MIT License
β "Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
π#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA for text-to-image generation
β Source code/model under MIT License
β "Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
π€―14π6π1