This media is not supported in your browser
VIEW IN TELEGRAM
ð§§Hyper-Fast Instance Segmentationð§§
ðNovel Temporally Efficient Vision Transformer (TeViT) for VIS
ððĒð ðĄðĨðĒð ðĄððŽ:
â Video instance segmentation transformer
â Contextual-info at frame/instance level
â Nearly convolution-free framework ðĪ·ââïļ
â The new SOTA for VIS, ~70 FPS!
â Code & models under MIT license
More: https://bit.ly/3rCMXIn
ðNovel Temporally Efficient Vision Transformer (TeViT) for VIS
ððĒð ðĄðĨðĒð ðĄððŽ:
â Video instance segmentation transformer
â Contextual-info at frame/instance level
â Nearly convolution-free framework ðĪ·ââïļ
â The new SOTA for VIS, ~70 FPS!
â Code & models under MIT license
More: https://bit.ly/3rCMXIn
ðĨ10ð3ð1ðĪŊ1
ðUnified Scene Text/Layout Detectionð
ðWorld's first hierarchical scene text dataset + novel detection method
ððĒð ðĄðĨðĒð ðĄððŽ:
â Unified detection & geometric layout
â Hierarchical annotations in natural scenes
â Word, line, & paragraph level annotations
â Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
ðWorld's first hierarchical scene text dataset + novel detection method
ððĒð ðĄðĨðĒð ðĄððŽ:
â Unified detection & geometric layout
â Hierarchical annotations in natural scenes
â Word, line, & paragraph level annotations
â Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
ðĨ3ðĪŊ2âĪ1ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ð #Oculus' new Hand Tracking ð
ðHands are able to move as naturally and intuitively in the #metaverse as do in real life
ððĒð ðĄðĨðĒð ðĄððŽ:
â Hands2.0 powered by CV & ML
â Tracking hand-over-hand interactions
â Crossing hands, clapping, high-fives
â Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
ðHands are able to move as naturally and intuitively in the #metaverse as do in real life
ððĒð ðĄðĨðĒð ðĄððŽ:
â Hands2.0 powered by CV & ML
â Tracking hand-over-hand interactions
â Crossing hands, clapping, high-fives
â Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
ðĪŊ6âĪ4ð2ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ðïļNew SOTA in #3D human avatarðïļ
ðPHORHUM: photorealistic 3D human from mono-RGB
ððĒð ðĄðĨðĒð ðĄððŽ:
â Pixel-aligned method for 3D geometry
â Unshaded surface color + illumination
â Patch-based rendering losses for visible
â Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
ðPHORHUM: photorealistic 3D human from mono-RGB
ððĒð ðĄðĨðĒð ðĄððŽ:
â Pixel-aligned method for 3D geometry
â Unshaded surface color + illumination
â Patch-based rendering losses for visible
â Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
ðĪŊ4ð2ðĨ°2âĪ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð What's in your hands (#3D) ? ð
ðReconstructing hand-held objects (from single RGB) without knowing their 3D templatesðĪ·ââïļ
ððĒð ðĄðĨðĒð ðĄððŽ:
â Hand is highly predictive of object shape
â Conditional-based on the articulation
â Visual feats. / articulation-aware coords.
â Code and models available!
More: https://bit.ly/3vuYn2a
ðReconstructing hand-held objects (from single RGB) without knowing their 3D templatesðĪ·ââïļ
ððĒð ðĄðĨðĒð ðĄððŽ:
â Hand is highly predictive of object shape
â Conditional-based on the articulation
â Visual feats. / articulation-aware coords.
â Code and models available!
More: https://bit.ly/3vuYn2a
ð9ðĪŊ2ðĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðYODO: You Only Demonstrate Onceð
ðA novel category-level manipulation learned in sim from single demonstration videoðĪŊ
ððĒð ðĄðĨðĒð ðĄððŽ:
â One-shot IL, model-free 6D pose tracking
â Demonstration BY single 3rd-person-view
â manipulation including hi-precision tasks
â Category-level Behavior Cloning
â Attention for dynamic coords selection
â Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
ðA novel category-level manipulation learned in sim from single demonstration videoðĪŊ
ððĒð ðĄðĨðĒð ðĄððŽ:
â One-shot IL, model-free 6D pose tracking
â Demonstration BY single 3rd-person-view
â manipulation including hi-precision tasks
â Category-level Behavior Cloning
â Attention for dynamic coords selection
â Generalizability to novel unseen obj/env
More: https://bit.ly/3v0V4R4
ðĪŊ8âĪ3ð2ðą2ðĪĐ2ð1
This media is not supported in your browser
VIEW IN TELEGRAM
ð Dress Code for Virtual Try-On ð
ðUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ððĒð ðĄðĨðĒð ðĄððŽ:
â Hi-Res paired front-view / full-body
â Pixel-level Semantic-Aware Discriminator
â 9 SOTA VTON approaches / 3 baselines
â New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
ðUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ððĒð ðĄðĨðĒð ðĄððŽ:
â Hi-Res paired front-view / full-body
â Pixel-level Semantic-Aware Discriminator
â 9 SOTA VTON approaches / 3 baselines
â New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
âĪ3ð3ðĨ1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðDeep Equilibrium for Optical Flowð
ðDEQ: converge faster, less memory, often more accurate
ððĒð ðĄðĨðĒð ðĄððŽ:
â Novel formulation of optical flow method
â Compatible with prior modeling/data-related
â Sparse fixed-point correction for stability
â Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
ðDEQ: converge faster, less memory, often more accurate
ððĒð ðĄðĨðĒð ðĄððŽ:
â Novel formulation of optical flow method
â Compatible with prior modeling/data-related
â Sparse fixed-point correction for stability
â Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
ð3ðĨ°2ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðģUltra High-Resolution Neural Saliencyðģ
ðA novel ultra high-resolution saliency detector with dataset!
ððĒð ðĄðĨðĒð ðĄððŽ:
â Ultra Hi-Res Saliency Detection
â 5,920 pics at 4K-8K resolution
â Pyramid Grafting Network
â Cross-Model Grafting Module
â AGL: Attention Guided Loss
â Code/models under MIT
More: https://bit.ly/3MnU1Rf
ðA novel ultra high-resolution saliency detector with dataset!
ððĒð ðĄðĨðĒð ðĄððŽ:
â Ultra Hi-Res Saliency Detection
â 5,920 pics at 4K-8K resolution
â Pyramid Grafting Network
â Cross-Model Grafting Module
â AGL: Attention Guided Loss
â Code/models under MIT
More: https://bit.ly/3MnU1Rf
âĪ6ð3ðĪŊ3ðĨ2ðĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŠStyleGAN-Human for fashion ðŠ
ðA novel unconditional human generation based on StyleGAN is out!
ððĒð ðĄðĨðĒð ðĄððŽ:
â 200,000+ labeled sample (pose/texture)
â 1024x512 StyleGAN-Human StyleGAN3
â 512x256 StyleGAN-Human StyleGAN1
â Face model for downstream: InsetGAN
â Source code and model available!
More: https://bit.ly/3xMg5B2
ðA novel unconditional human generation based on StyleGAN is out!
ððĒð ðĄðĨðĒð ðĄððŽ:
â 200,000+ labeled sample (pose/texture)
â 1024x512 StyleGAN-Human StyleGAN3
â 512x256 StyleGAN-Human StyleGAN1
â Face model for downstream: InsetGAN
â Source code and model available!
More: https://bit.ly/3xMg5B2
âĪ5ð4ðĨ3ðĪŊ1ðĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð OSSO: Skeletal Shape from Outside ð
ðAnatomic skeleton of a person from 3D surface of body ðĶī
ððĒð ðĄðĨðĒð ðĄððŽ:
â Max Planck + IMATI-CNR + INRIA
â DXA images to obtain #3D shape
â External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
ðAnatomic skeleton of a person from 3D surface of body ðĶī
ððĒð ðĄðĨðĒð ðĄððŽ:
â Max Planck + IMATI-CNR + INRIA
â DXA images to obtain #3D shape
â External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
ð4ðĪŊ2ðĨ1ðą1
This media is not supported in your browser
VIEW IN TELEGRAM
ð· Pix2Seq: object detection by #Google ð·
ðA novel framework to perform object detection as a language modeling task
ððĒð ðĄðĨðĒð ðĄððŽ:
â Obj. detection as a lang-modeling task
â BBs/labels -> seq. of discrete token
â Encoder-decoder (one token at a time)
â Code under Apache License 2.0
More: https://bit.ly/3F49PX3
ðA novel framework to perform object detection as a language modeling task
ððĒð ðĄðĨðĒð ðĄððŽ:
â Obj. detection as a lang-modeling task
â BBs/labels -> seq. of discrete token
â Encoder-decoder (one token at a time)
â Code under Apache License 2.0
More: https://bit.ly/3F49PX3
ð8ðĪŊ3ðĨ1ðą1ð1ðĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðđ Generalizable Neural Performer ðđ
ðGeneral neural framework to synthesize free-viewpoint images of arbitrary human performers
ððĒð ðĄðĨðĒð ðĄððŽ:
â Free-viewpoint synthesis of humans
â Implicit Geometric Body Embedding
â Screen-Space Occlusion-Aware Blending
â GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
ðGeneral neural framework to synthesize free-viewpoint images of arbitrary human performers
ððĒð ðĄðĨðĒð ðĄððŽ:
â Free-viewpoint synthesis of humans
â Implicit Geometric Body Embedding
â Screen-Space Occlusion-Aware Blending
â GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
ð5ðĨ1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð Tire-defect inspection ð
ðUnsupervised defects in tires using neural networks
ððĒð ðĄðĨðĒð ðĄððŽ:
â Impurity, same material as tire
â Impurity, with different material
â Damage by temp/pressure
â Crack or etched material
More: https://bit.ly/37GX1JT
ðUnsupervised defects in tires using neural networks
ððĒð ðĄðĨðĒð ðĄððŽ:
â Impurity, same material as tire
â Impurity, with different material
â Damage by temp/pressure
â Crack or etched material
More: https://bit.ly/37GX1JT
âĪ5ð3ðĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ð§#4D Neural Fieldsð§
ð4D N.F. visual representations from monocular RGB-D ðĪŊ
ððĒð ðĄðĨðĒð ðĄððŽ:
â 4D scene completion (occlusions)
â Scene completion in cluttered scenes
â Novel #AI for contextual point clouds
â Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
ð4D N.F. visual representations from monocular RGB-D ðĪŊ
ððĒð ðĄðĨðĒð ðĄððŽ:
â 4D scene completion (occlusions)
â Scene completion in cluttered scenes
â Novel #AI for contextual point clouds
â Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
ð6ðĪŊ2ðĨ1ðĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðLargest dataset of human-object ð
ðBEHAVE by Google: largest dataset of human-object interactions
ððĒð ðĄðĨðĒð ðĄððŽ:
â 8 subjects, 20 objects, 5 envs.
â 321 clips with 4 Kinect RGB-D
â Masks and segmented point clouds
â 3D SMPL & mesh registration
â Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
ðBEHAVE by Google: largest dataset of human-object interactions
ððĒð ðĄðĨðĒð ðĄððŽ:
â 8 subjects, 20 objects, 5 envs.
â 321 clips with 4 Kinect RGB-D
â Masks and segmented point clouds
â 3D SMPL & mesh registration
â Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
ð5ð4ðĨ2âĪ1ðą1ðĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĶīENARF-GAN Neural ArticulationsðĶī
ðUnsupervised method for 3D geometry-aware representation of articulated objects
ððĒð ðĄðĨðĒð ðĄððŽ:
â Novel efficient neural representation
â Tri-planes deformation fields for training
â Novel GAN for articulated representations
â Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
ðUnsupervised method for 3D geometry-aware representation of articulated objects
ððĒð ðĄðĨðĒð ðĄððŽ:
â Novel efficient neural representation
â Tri-planes deformation fields for training
â Novel GAN for articulated representations
â Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
ðĪŊ3ð2âĪ1ðĨ1ðĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
ðēïļ HuMMan: 4D human dataset ðēïļ
ðHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames ðĪŊ
ððĒð ðĄðĨðĒð ðĄððŽ:
â RGB, pt-clouds, keypts, SMPL, texture
â Mobile device in the sensor suite
â 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
ðHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames ðĪŊ
ððĒð ðĄðĨðĒð ðĄððŽ:
â RGB, pt-clouds, keypts, SMPL, texture
â Mobile device in the sensor suite
â 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
ðĨ°2ðą2ð1ðĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĨNeighborhood Attention Transformer ðĨ
ðA novel transformer for both image classification and downstream vision tasks
ððĒð ðĄðĨðĒð ðĄððŽ:
â Neighborhood Attention (NA)
â Neighborhood Attention Transformer, NAT
â Faster training/inference, good throughput
â Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
ðA novel transformer for both image classification and downstream vision tasks
ððĒð ðĄðĨðĒð ðĄððŽ:
â Neighborhood Attention (NA)
â Neighborhood Attention Transformer, NAT
â Faster training/inference, good throughput
â Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
ðĪŊ4ð3ðĨ1ðą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðĨðĨFANs: Fully Attentional NetworksðĨðĨ
ð#Nvidia unveils the fully attentional networks (FANs)
ððĒð ðĄðĨðĒð ðĄððŽ:
â Efficient fully attentional design
â Semantic seg. & object detection
â Model/source code soon available!
More: https://bit.ly/3vtpITs
ð#Nvidia unveils the fully attentional networks (FANs)
ððĒð ðĄðĨðĒð ðĄððŽ:
â Efficient fully attentional design
â Semantic seg. & object detection
â Model/source code soon available!
More: https://bit.ly/3vtpITs
ðĨ7ðĪŊ3ð2âĪ1