This media is not supported in your browser
VIEW IN TELEGRAM
π²οΈ HuMMan: 4D human dataset π²οΈ
πHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames π€―
ππ’π π‘π₯π’π π‘ππ¬:
β RGB, pt-clouds, keypts, SMPL, texture
β Mobile device in the sensor suite
β 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
πHuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames π€―
ππ’π π‘π₯π’π π‘ππ¬:
β RGB, pt-clouds, keypts, SMPL, texture
β Mobile device in the sensor suite
β 500+ actions to cover movements
More: https://bit.ly/3vTRW8Z
π₯°2π±2π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Neighborhood Attention Transformer π₯
πA novel transformer for both image classification and downstream vision tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Neighborhood Attention (NA)
β Neighborhood Attention Transformer, NAT
β Faster training/inference, good throughput
β Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
πA novel transformer for both image classification and downstream vision tasks
ππ’π π‘π₯π’π π‘ππ¬:
β Neighborhood Attention (NA)
β Neighborhood Attention Transformer, NAT
β Faster training/inference, good throughput
β Checkpoints, train, #CUDA kernel available
More: https://bit.ly/3F5aVSo
π€―4π3π₯1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯FANs: Fully Attentional Networksπ₯π₯
π#Nvidia unveils the fully attentional networks (FANs)
ππ’π π‘π₯π’π π‘ππ¬:
β Efficient fully attentional design
β Semantic seg. & object detection
β Model/source code soon available!
More: https://bit.ly/3vtpITs
π#Nvidia unveils the fully attentional networks (FANs)
ππ’π π‘π₯π’π π‘ππ¬:
β Efficient fully attentional design
β Semantic seg. & object detection
β Model/source code soon available!
More: https://bit.ly/3vtpITs
π₯7π€―3π2β€1
π¨πΌβπ¨ Open-Source DALLΒ·E 2 is out π¨πΌβπ¨
π#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA for text-to-image generation
β Source code/model under MIT License
β "Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
π#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA for text-to-image generation
β Source code/model under MIT License
β "Medieval painting of wifi not working"
More: https://bit.ly/3vzsff6
π€―14π6π1
This media is not supported in your browser
VIEW IN TELEGRAM
βΊViTPose: Transformer for PoseβΊ
πViTPose from ViTAE, ViT for human pose
ππ’π π‘π₯π’π π‘ππ¬:
β Plain/nonhierarchical ViT for pose
β Deconv-layers after ViT for keypoints
β Just the baseline is the new SOTA
β Source code & models available soon!
More: https://bit.ly/3MJ0kz1
πViTPose from ViTAE, ViT for human pose
ππ’π π‘π₯π’π π‘ππ¬:
β Plain/nonhierarchical ViT for pose
β Deconv-layers after ViT for keypoints
β Just the baseline is the new SOTA
β Source code & models available soon!
More: https://bit.ly/3MJ0kz1
π5π€―4π₯1π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π§³ Unsupervised HD Motion Transfer π§³
πNovel e2e unsupervised motion transfer for image animation
ππ’π π‘π₯π’π π‘ππ¬:
β TPS motion estimation + Dropout
β Novel E2E unsupervised motion transfer
β Optical flow + multi-res. occlusion mask
β Code and models under MIT license
More: https://bit.ly/3MGNPns
πNovel e2e unsupervised motion transfer for image animation
ππ’π π‘π₯π’π π‘ππ¬:
β TPS motion estimation + Dropout
β Novel E2E unsupervised motion transfer
β Optical flow + multi-res. occlusion mask
β Code and models under MIT license
More: https://bit.ly/3MGNPns
π₯8π6π€―4β€2π±2
This media is not supported in your browser
VIEW IN TELEGRAM
π€ Neural Self-Calibration in the wild π€
π Learning algorithm to regress calibration params from in the wild clips
ππ’π π‘π₯π’π π‘ππ¬:
β Params purely from self-supervision
β S.S. depth/pose learning as objective
β POV, fisheye, catadioptric: no changes
β SOTA results on EuRoC MAV dataset
More: https://bit.ly/3w1n6LB
π Learning algorithm to regress calibration params from in the wild clips
ππ’π π‘π₯π’π π‘ππ¬:
β Params purely from self-supervision
β S.S. depth/pose learning as objective
β POV, fisheye, catadioptric: no changes
β SOTA results on EuRoC MAV dataset
More: https://bit.ly/3w1n6LB
π8π€©2π₯1π₯°1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦
ConDor: S.S. Canonicalization π¦
πSelf-Supervised Canonicalization for full/partial 3D points cloud
ππ’π π‘π₯π’π π‘ππ¬:
β RRC + Stanford + KAIST + Brown
β On top of Tensor Field Networks (TFNs)
β Unseen 3D -> equivariant canonical
β Co-segmentation, NO supervision
β Code and model under MIT license
More: https://bit.ly/3MNDyGa
πSelf-Supervised Canonicalization for full/partial 3D points cloud
ππ’π π‘π₯π’π π‘ππ¬:
β RRC + Stanford + KAIST + Brown
β On top of Tensor Field Networks (TFNs)
β Unseen 3D -> equivariant canonical
β Co-segmentation, NO supervision
β Code and model under MIT license
More: https://bit.ly/3MNDyGa
π₯4π1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Event-aided Direct Sparse Odometry π¦
πEDS: direct monocular visual odometry using events/frames
ππ’π π‘π₯π’π π‘ππ¬:
β Mono 6-DOF visual odometry + events
β Direct photometric bundle adjustment
β Camera motion tracking by sparse pixels
β A new dataset with HQ events and frame
More: https://bit.ly/3s9FiBN
πEDS: direct monocular visual odometry using events/frames
ππ’π π‘π₯π’π π‘ππ¬:
β Mono 6-DOF visual odometry + events
β Direct photometric bundle adjustment
β Camera motion tracking by sparse pixels
β A new dataset with HQ events and frame
More: https://bit.ly/3s9FiBN
π₯5π3π€―1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π«BlobGAN: Blob-Disentangled Sceneπ«
πUnsupervised, mid-level (blobs) generation of scenes
ππ’π π‘π₯π’π π‘ππ¬:
β Spatial, depth-ordered Gaussian blobs
β Reaching for supervised level, and more
β Source under BSD-2 "Simplified" License
More: https://bit.ly/3kRyGnj
πUnsupervised, mid-level (blobs) generation of scenes
ππ’π π‘π₯π’π π‘ππ¬:
β Spatial, depth-ordered Gaussian blobs
β Reaching for supervised level, and more
β Source under BSD-2 "Simplified" License
More: https://bit.ly/3kRyGnj
π₯8π1π₯°1π€―1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦E2EVE editor via pre-trained artistπ¦
πE2EVE generates a new version of the source image that resembles the "driver" one
ππ’π π‘π₯π’π π‘ππ¬:
β Blending regions by driver image
β E2E cond-probability of the edits
β S.S. augmenting in target domain
β Implemented as SOTA transformer
β Code/models available (soon)
More: https://bit.ly/3P9TDYW
πE2EVE generates a new version of the source image that resembles the "driver" one
ππ’π π‘π₯π’π π‘ππ¬:
β Blending regions by driver image
β E2E cond-probability of the edits
β S.S. augmenting in target domain
β Implemented as SOTA transformer
β Code/models available (soon)
More: https://bit.ly/3P9TDYW
π€―5π2π€©2β€1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
πΆ Bringing pets in #metaverse πΆ
πARTEMIS: pipeline for generating articulated neural pets for virtual worlds
ππ’π π‘π₯π’π π‘ππ¬:
β ARTiculated, appEarance, Mo-synthesIS
β Motion control, animation & rendering
β Neural-generated (NGI) animal engine
β SOTA animal mocap + neural control
More: https://bit.ly/3LZSLDU
πARTEMIS: pipeline for generating articulated neural pets for virtual worlds
ππ’π π‘π₯π’π π‘ππ¬:
β ARTiculated, appEarance, Mo-synthesIS
β Motion control, animation & rendering
β Neural-generated (NGI) animal engine
β SOTA animal mocap + neural control
More: https://bit.ly/3LZSLDU
β€4π2π₯°2π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πAnimated hand in 1972, damn romanticπ
πQ: is #VR the technology that developed least in the last 30 years? π€
More: https://bit.ly/3snxNaq
πQ: is #VR the technology that developed least in the last 30 years? π€
More: https://bit.ly/3snxNaq
π7β€3π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈEnsembling models for GAN trainingβοΈ
πPretrained vision models to improve the GAN training. FID by 1.5 to 2Γ!
ππ’π π‘π₯π’π π‘ππ¬:
β CV models as ensemble of discriminators
β Improving GAN in limited / large-scale set
β 10k samples matches StyleGAN2 w/ 1.6M
β Source code / models under MIT license
More: https://bit.ly/3wgUVsr
πPretrained vision models to improve the GAN training. FID by 1.5 to 2Γ!
ππ’π π‘π₯π’π π‘ππ¬:
β CV models as ensemble of discriminators
β Improving GAN in limited / large-scale set
β 10k samples matches StyleGAN2 w/ 1.6M
β Source code / models under MIT license
More: https://bit.ly/3wgUVsr
π€―6π₯2
This media is not supported in your browser
VIEW IN TELEGRAM
π€―Cooperative Driving + AUTOCASTSIMπ€―
πCOOPERNAUT: cross-vehicle perception for vision-based cooperative driving
ππ’π π‘π₯π’π π‘ππ¬:
β UTexas + #Stanford + #Sony #AI
β LiDAR into compact point-based
β Network-augmented simulator
β Source code and models available
More: https://bit.ly/3sr5HLk
πCOOPERNAUT: cross-vehicle perception for vision-based cooperative driving
ππ’π π‘π₯π’π π‘ππ¬:
β UTexas + #Stanford + #Sony #AI
β LiDAR into compact point-based
β Network-augmented simulator
β Source code and models available
More: https://bit.ly/3sr5HLk
π₯6π€―3π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πNeuralHDHair: 3D Neural Hairπ
πNeuralHDHair: fully automatic system for modeling HD hair from a single image
ππ’π π‘π₯π’π π‘ππ¬:
β IRHairNet for hair geometric features
β GrowingNet: 3D hair strands in parallel
β VIFu: novel voxel-aligned implicit function
β SOTA in 3D hair modeling from single pic
More: https://bit.ly/38iR0mQ
πNeuralHDHair: fully automatic system for modeling HD hair from a single image
ππ’π π‘π₯π’π π‘ππ¬:
β IRHairNet for hair geometric features
β GrowingNet: 3D hair strands in parallel
β VIFu: novel voxel-aligned implicit function
β SOTA in 3D hair modeling from single pic
More: https://bit.ly/38iR0mQ
π5π₯°3β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π‘DyNeRF: Neural 3D Video Synthesisπ‘
π#Meta unveils DyNeRF, novel rendering HQ 3D video
ππ’π π‘π₯π’π π‘ππ¬:
β Novel NeRF-based on temp-latent codes
β Novel training based on hierarchical step
β Datasets of time-synch/calibrated clips
β Attribution-NonCommercial 4.0 Int.
More: https://bit.ly/3MlBRA9
π#Meta unveils DyNeRF, novel rendering HQ 3D video
ππ’π π‘π₯π’π π‘ππ¬:
β Novel NeRF-based on temp-latent codes
β Novel training based on hierarchical step
β Datasets of time-synch/calibrated clips
β Attribution-NonCommercial 4.0 Int.
More: https://bit.ly/3MlBRA9
π€―8π2π₯1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
πGATO: agent for multiple tasksπ
πThe same network with the same weights can play Atari, caption pics, chat, and moreπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β General-purpose agent, multiple tasks
β Multi-modal-task, multi-embodiment
β Inspired by large-scale language model
More: https://bit.ly/3LbBOWb
πThe same network with the same weights can play Atari, caption pics, chat, and moreπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β General-purpose agent, multiple tasks
β Multi-modal-task, multi-embodiment
β Inspired by large-scale language model
More: https://bit.ly/3LbBOWb
π€―10β€3π2π₯2
This media is not supported in your browser
VIEW IN TELEGRAM
πͺNeRF powered by keypointsπͺ
πETHZ + META unveil how to encode relative spatial #3D info via sparse 3D keypoints
ππ’π π‘π₯π’π π‘ππ¬:
β Sparse 3D keypoints for SOTA avatars
β Unseen subjects from 2/3 views
β Never-before-seen iPhone captures
More: https://bit.ly/39NQqhe
πETHZ + META unveil how to encode relative spatial #3D info via sparse 3D keypoints
ππ’π π‘π₯π’π π‘ππ¬:
β Sparse 3D keypoints for SOTA avatars
β Unseen subjects from 2/3 views
β Never-before-seen iPhone captures
More: https://bit.ly/39NQqhe
π€―5π₯2β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
πSelf-Supervised human co-evolutionπ
πSelf-supervised 3D by co-evolution of pose estimator, imitator, and hallucinator
ππ’π π‘π₯π’π π‘ππ¬:
β Novel self-supervised 3D pose
β Co-evo of pose, imitator, hallucinator
β Realist 3D pose and 2D-3D supervision
β Source code / model under MIT license
More: https://bit.ly/37J5ImL
πSelf-supervised 3D by co-evolution of pose estimator, imitator, and hallucinator
ππ’π π‘π₯π’π π‘ππ¬:
β Novel self-supervised 3D pose
β Co-evo of pose, imitator, hallucinator
β Realist 3D pose and 2D-3D supervision
β Source code / model under MIT license
More: https://bit.ly/37J5ImL
π₯4π3β€1π€―1