This media is not supported in your browser
VIEW IN TELEGRAM
π Dress Code for Virtual Try-On π
πUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ππ’π π‘π₯π’π π‘ππ¬:
β Hi-Res paired front-view / full-body
β Pixel-level Semantic-Aware Discriminator
β 9 SOTA VTON approaches / 3 baselines
β New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
πUniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.
ππ’π π‘π₯π’π π‘ππ¬:
β Hi-Res paired front-view / full-body
β Pixel-level Semantic-Aware Discriminator
β 9 SOTA VTON approaches / 3 baselines
β New SOTA considering res. & garments
More: https://bit.ly/3xKXSUw
β€3π3π₯1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
πDeep Equilibrium for Optical Flowπ
πDEQ: converge faster, less memory, often more accurate
ππ’π π‘π₯π’π π‘ππ¬:
β Novel formulation of optical flow method
β Compatible with prior modeling/data-related
β Sparse fixed-point correction for stability
β Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
πDEQ: converge faster, less memory, often more accurate
ππ’π π‘π₯π’π π‘ππ¬:
β Novel formulation of optical flow method
β Compatible with prior modeling/data-related
β Sparse fixed-point correction for stability
β Code/models under GNU Affero GPL v3.0
More: https://bit.ly/3v4fZmi
π3π₯°2π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π³Ultra High-Resolution Neural Saliencyπ³
πA novel ultra high-resolution saliency detector with dataset!
ππ’π π‘π₯π’π π‘ππ¬:
β Ultra Hi-Res Saliency Detection
β 5,920 pics at 4K-8K resolution
β Pyramid Grafting Network
β Cross-Model Grafting Module
β AGL: Attention Guided Loss
β Code/models under MIT
More: https://bit.ly/3MnU1Rf
πA novel ultra high-resolution saliency detector with dataset!
ππ’π π‘π₯π’π π‘ππ¬:
β Ultra Hi-Res Saliency Detection
β 5,920 pics at 4K-8K resolution
β Pyramid Grafting Network
β Cross-Model Grafting Module
β AGL: Attention Guided Loss
β Code/models under MIT
More: https://bit.ly/3MnU1Rf
β€6π3π€―3π₯2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺStyleGAN-Human for fashion πͺ
πA novel unconditional human generation based on StyleGAN is out!
ππ’π π‘π₯π’π π‘ππ¬:
β 200,000+ labeled sample (pose/texture)
β 1024x512 StyleGAN-Human StyleGAN3
β 512x256 StyleGAN-Human StyleGAN1
β Face model for downstream: InsetGAN
β Source code and model available!
More: https://bit.ly/3xMg5B2
πA novel unconditional human generation based on StyleGAN is out!
ππ’π π‘π₯π’π π‘ππ¬:
β 200,000+ labeled sample (pose/texture)
β 1024x512 StyleGAN-Human StyleGAN3
β 512x256 StyleGAN-Human StyleGAN1
β Face model for downstream: InsetGAN
β Source code and model available!
More: https://bit.ly/3xMg5B2
β€5π4π₯3π€―1π©1
This media is not supported in your browser
VIEW IN TELEGRAM
π OSSO: Skeletal Shape from Outside π
πAnatomic skeleton of a person from 3D surface of body π¦΄
ππ’π π‘π₯π’π π‘ππ¬:
β Max Planck + IMATI-CNR + INRIA
β DXA images to obtain #3D shape
β External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
πAnatomic skeleton of a person from 3D surface of body π¦΄
ππ’π π‘π₯π’π π‘ππ¬:
β Max Planck + IMATI-CNR + INRIA
β DXA images to obtain #3D shape
β External body to internal skeleton
More: https://bit.ly/3v7Z5TQ
π4π€―2π₯1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π· Pix2Seq: object detection by #Google π·
πA novel framework to perform object detection as a language modeling task
ππ’π π‘π₯π’π π‘ππ¬:
β Obj. detection as a lang-modeling task
β BBs/labels -> seq. of discrete token
β Encoder-decoder (one token at a time)
β Code under Apache License 2.0
More: https://bit.ly/3F49PX3
πA novel framework to perform object detection as a language modeling task
ππ’π π‘π₯π’π π‘ππ¬:
β Obj. detection as a lang-modeling task
β BBs/labels -> seq. of discrete token
β Encoder-decoder (one token at a time)
β Code under Apache License 2.0
More: https://bit.ly/3F49PX3
π8π€―3π₯1π±1π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πΉ Generalizable Neural Performer πΉ
πGeneral neural framework to synthesize free-viewpoint images of arbitrary human performers
ππ’π π‘π₯π’π π‘ππ¬:
β Free-viewpoint synthesis of humans
β Implicit Geometric Body Embedding
β Screen-Space Occlusion-Aware Blending
β GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
πGeneral neural framework to synthesize free-viewpoint images of arbitrary human performers
ππ’π π‘π₯π’π π‘ππ¬:
β Free-viewpoint synthesis of humans
β Implicit Geometric Body Embedding
β Screen-Space Occlusion-Aware Blending
β GeneBody: 4M frames, multi-view cams
More: https://cutt.ly/SGcnQzn
π5π₯1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π Tire-defect inspection π
πUnsupervised defects in tires using neural networks
ππ’π π‘π₯π’π π‘ππ¬:
β Impurity, same material as tire
β Impurity, with different material
β Damage by temp/pressure
β Crack or etched material
More: https://bit.ly/37GX1JT
πUnsupervised defects in tires using neural networks
ππ’π π‘π₯π’π π‘ππ¬:
β Impurity, same material as tire
β Impurity, with different material
β Damage by temp/pressure
β Crack or etched material
More: https://bit.ly/37GX1JT
β€5π3π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π§#4D Neural Fieldsπ§
π4D N.F. visual representations from monocular RGB-D π€―
ππ’π π‘π₯π’π π‘ππ¬:
β 4D scene completion (occlusions)
β Scene completion in cluttered scenes
β Novel #AI for contextual point clouds
β Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
π4D N.F. visual representations from monocular RGB-D π€―
ππ’π π‘π₯π’π π‘ππ¬:
β 4D scene completion (occlusions)
β Scene completion in cluttered scenes
β Novel #AI for contextual point clouds
β Data, code, models under MIT license
More: https://cutt.ly/6GveKiJ
π6π€―2π₯1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πLargest dataset of human-object π
πBEHAVE by Google: largest dataset of human-object interactions
ππ’π π‘π₯π’π π‘ππ¬:
β 8 subjects, 20 objects, 5 envs.
β 321 clips with 4 Kinect RGB-D
β Masks and segmented point clouds
β 3D SMPL & mesh registration
β Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
πBEHAVE by Google: largest dataset of human-object interactions
ππ’π π‘π₯π’π π‘ππ¬:
β 8 subjects, 20 objects, 5 envs.
β 321 clips with 4 Kinect RGB-D
β Masks and segmented point clouds
β 3D SMPL & mesh registration
β Textured scan reconstructions
More: https://bit.ly/3Lx6NNo
π5π4π₯2β€1π±1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦΄ENARF-GAN Neural Articulationsπ¦΄
πUnsupervised method for 3D geometry-aware representation of articulated objects
ππ’π π‘π₯π’π π‘ππ¬:
β Novel efficient neural representation
β Tri-planes deformation fields for training
β Novel GAN for articulated representations
β Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
πUnsupervised method for 3D geometry-aware representation of articulated objects
ππ’π π‘π₯π’π π‘ππ¬:
β Novel efficient neural representation
β Tri-planes deformation fields for training
β Novel GAN for articulated representations
β Controllable 3D from real unlabeled pic
More: https://bit.ly/3xYqedN
π€―3π2β€1π₯1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π²οΈ HuMMan: 4D human dataset π²οΈ
πHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames π€―
ππ’π π‘π₯π’π π‘ππ¬:
β RGB, pt-clouds, keypts, SMPL, texture
β Mobile device in the sensor suite
β 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
πHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames π€―
ππ’π π‘π₯π’π π‘ππ¬:
β RGB, pt-clouds, keypts, SMPL, texture
β Mobile device in the sensor suite
β 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
π₯°2π±2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Neighborhood Attention Transformer π₯
πA novel transformer for both image classification and downstream vision tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Neighborhood Attention (NA)
β Neighborhood Attention Transformer, NAT
β Faster training/inference, good throughput
β Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
πA novel transformer for both image classification and downstream vision tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Neighborhood Attention (NA)
β Neighborhood Attention Transformer, NAT
β Faster training/inference, good throughput
β Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
π€―4π3π₯1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯FANs: Fully Attentional Networksπ₯π₯
π#Nvidia unveils the fully attentional networks (FANs)
ππ’π π‘π₯π’π π‘ππ¬:
β Efficient fully attentional design
β Semantic seg. & object detection
β Model/source code soon available!
More: https://bit.ly/3vtpITs
π#Nvidia unveils the fully attentional networks (FANs)
ππ’π π‘π₯π’π π‘ππ¬:
β Efficient fully attentional design
β Semantic seg. & object detection
β Model/source code soon available!
More: https://bit.ly/3vtpITs
π₯7π€―3π2β€1
π¨πΌβπ¨ Open-Source DALLΒ·E 2 is out π¨πΌβπ¨
π#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA for text-to-image generation
β Source code/model under MIT License
β "Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
π#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA for text-to-image generation
β Source code/model under MIT License
β "Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
π€―14π6π1
This media is not supported in your browser
VIEW IN TELEGRAM
βΊViTPose: Transformer for PoseβΊ
πViTPose from ViTAE, ViT for human pose
ππ’π π‘π₯π’π π‘ππ¬:
β Plain/nonhierarchical ViT for pose
β Deconv-layers after ViT for keypoints
β Just the baseline is the new SOTA
β Source code & models available soon!
More: https://bit.ly/3MJ0kz1
πViTPose from ViTAE, ViT for human pose
ππ’π π‘π₯π’π π‘ππ¬:
β Plain/nonhierarchical ViT for pose
β Deconv-layers after ViT for keypoints
β Just the baseline is the new SOTA
β Source code & models available soon!
More: https://bit.ly/3MJ0kz1
π5π€―4π₯1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π§³ Unsupervised HD Motion Transfer π§³
πNovel e2e unsupervised motion transfer for image animation
ππ’π π‘π₯π’π π‘ππ¬:
β TPS motion estimation + Dropout
β Novel E2E unsupervised motion transfer
β Optical flow + multi-res. occlusion mask
β Code and models under MIT license
More: https://bit.ly/3MGNPns
πNovel e2e unsupervised motion transfer for image animation
ππ’π π‘π₯π’π π‘ππ¬:
β TPS motion estimation + Dropout
β Novel E2E unsupervised motion transfer
β Optical flow + multi-res. occlusion mask
β Code and models under MIT license
More: https://bit.ly/3MGNPns
π₯8π6π€―4β€2π±2
This media is not supported in your browser
VIEW IN TELEGRAM
π€ Neural Self-Calibration in the wild π€
π Learning algorithm to regress calibration params from in the wild clips
ππ’π π‘π₯π’π π‘ππ¬:
β Params purely from self-supervision
β S.S. depth/pose learning as objective
β POV, fisheye, catadioptric: no changes
β SOTA results on EuRoC MAV dataset
More: https://bit.ly/3w1n6LB
π Learning algorithm to regress calibration params from in the wild clips
ππ’π π‘π₯π’π π‘ππ¬:
β Params purely from self-supervision
β S.S. depth/pose learning as objective
β POV, fisheye, catadioptric: no changes
β SOTA results on EuRoC MAV dataset
More: https://bit.ly/3w1n6LB
π8π€©2π₯1π₯°1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦
ConDor: S.S. Canonicalization π¦
πSelf-Supervised Canonicalization for full/partial 3D points cloud
ππ’π π‘π₯π’π π‘ππ¬:
β RRC + Stanford + KAIST + Brown
β On top of Tensor Field Networks (TFNs)
β Unseen 3D -> equivariant canonical
β Co-segmentation, NO supervision
β Code and model under MIT license
More: https://bit.ly/3MNDyGa
πSelf-Supervised Canonicalization for full/partial 3D points cloud
ππ’π π‘π₯π’π π‘ππ¬:
β RRC + Stanford + KAIST + Brown
β On top of Tensor Field Networks (TFNs)
β Unseen 3D -> equivariant canonical
β Co-segmentation, NO supervision
β Code and model under MIT license
More: https://bit.ly/3MNDyGa
π₯4π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Event-aided Direct Sparse Odometry π¦
πEDS: direct monocular visual odometry using events/frames
ππ’π π‘π₯π’π π‘ππ¬:
β Mono 6-DOF visual odometry + events
β Direct photometric bundle adjustment
β Camera motion tracking by sparse pixels
β A new dataset with HQ events and frame
More: https://bit.ly/3s9FiBN
πEDS: direct monocular visual odometry using events/frames
ππ’π π‘π₯π’π π‘ππ¬:
β Mono 6-DOF visual odometry + events
β Direct photometric bundle adjustment
β Camera motion tracking by sparse pixels
β A new dataset with HQ events and frame
More: https://bit.ly/3s9FiBN
π₯5π3π€―1π±1