This media is not supported in your browser
VIEW IN TELEGRAM
đ§¯Long Video Diffusion Modelsđ§¯
đ#Google unveils a novel diffusion model for video generation
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Straightforward extension of 2D UNet
â Longer by new conditional generation
â SOTA in unconditional generation
More: https://bit.ly/35Y2rzg
đ#Google unveils a novel diffusion model for video generation
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Straightforward extension of 2D UNet
â Longer by new conditional generation
â SOTA in unconditional generation
More: https://bit.ly/35Y2rzg
đĨ4đ2đ¤Š1
This media is not supported in your browser
VIEW IN TELEGRAM
đ AutoRF: #3D objects in-the-wild đ
đFrom #Meta: #3D object from just a single, in-the wild, image
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Novel view synthesis from in-the-wild
â Normalized, object-centric representation
â Disentangling shape, appearance & pose
â Exploiting BBS & panoptic segmentation
â Shape/appearance properties for objects
More: https://bit.ly/3O4ONeQ
đFrom #Meta: #3D object from just a single, in-the wild, image
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Novel view synthesis from in-the-wild
â Normalized, object-centric representation
â Disentangling shape, appearance & pose
â Exploiting BBS & panoptic segmentation
â Shape/appearance properties for objects
More: https://bit.ly/3O4ONeQ
đ¤¯7đą2đĨ1
This media is not supported in your browser
VIEW IN TELEGRAM
đ GAN-based Darkest Datasetđ
đBerkeley + #Intel announce first photorealistic dataset under starlight (no moon, <0.001 lx)
đđĸđ đĄđĨđĸđ đĄđđŦ:
â "Darkest" dataset ever seen
â Moonless, no external illumination
â GAN-tuned physics-based model
â Clips with dancing, volleyball, flags...
More: https://bit.ly/3LXxMkN
đBerkeley + #Intel announce first photorealistic dataset under starlight (no moon, <0.001 lx)
đđĸđ đĄđĨđĸđ đĄđđŦ:
â "Darkest" dataset ever seen
â Moonless, no external illumination
â GAN-tuned physics-based model
â Clips with dancing, volleyball, flags...
More: https://bit.ly/3LXxMkN
đ3đ¤¯2đĨ1
This media is not supported in your browser
VIEW IN TELEGRAM
đ¤Populating with digital humansđ¤
đETHZ unveils GAMMA to populate the #3D scene with digital humans
đđĸđ đĄđĨđĸđ đĄđđŦ:
â GenerAtive Motion primitive MArkers
â Realistic, controllable, infinite motions
â Tree-based search to preserve quality
â SOTA in realistic/controllable motion
More: https://bit.ly/3OgY4AG
đETHZ unveils GAMMA to populate the #3D scene with digital humans
đđĸđ đĄđĨđĸđ đĄđđŦ:
â GenerAtive Motion primitive MArkers
â Realistic, controllable, infinite motions
â Tree-based search to preserve quality
â SOTA in realistic/controllable motion
More: https://bit.ly/3OgY4AG
đą5đ4đĨ2đ1đ¤¯1đ¤Š1
This media is not supported in your browser
VIEW IN TELEGRAM
đĨ#AIwithPapers: we are ~2,000!đĨ
đđ Simply amazing. Thank you all đđ
đ Invite your friends -> https://t.me/AI_DeepLearning
đđ Simply amazing. Thank you all đđ
đ Invite your friends -> https://t.me/AI_DeepLearning
â¤18đĨ8đĨ°4đ3
This media is not supported in your browser
VIEW IN TELEGRAM
đŧGARF: Gaussian Activated NeRFđŧ
đGARF: Gaussian Activated R.F. for Hi-Fi reconstruction/pose
đđĸđ đĄđĨđĸđ đĄđđŦ:
â NeRF from imperfect camera poses
â NO hyper-parameter tuning/initialization
â Theoretical insight on Gaussian activation
â Unlocking NeRF for real-world application?
More: https://bit.ly/36bvdfU
đGARF: Gaussian Activated R.F. for Hi-Fi reconstruction/pose
đđĸđ đĄđĨđĸđ đĄđđŦ:
â NeRF from imperfect camera poses
â NO hyper-parameter tuning/initialization
â Theoretical insight on Gaussian activation
â Unlocking NeRF for real-world application?
More: https://bit.ly/36bvdfU
đ4đ¤Š2â¤1đ1đ¤¯1
This media is not supported in your browser
VIEW IN TELEGRAM
đNovel pre-training strategy for #AIđ
đEPFL unveils the Multi-modal Multi-task Masked Autoencoders (MultiMAE)
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Multimodal: additional modal. over RGB
â Multi-task: multiple outputs over RGB
â General: MultiMAE by pseudo-labeling
â Classification, segmentation, depth
â Code under NonCommercial 4.0 Int.
More: https://bit.ly/3jRhNsN
đEPFL unveils the Multi-modal Multi-task Masked Autoencoders (MultiMAE)
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Multimodal: additional modal. over RGB
â Multi-task: multiple outputs over RGB
â General: MultiMAE by pseudo-labeling
â Classification, segmentation, depth
â Code under NonCommercial 4.0 Int.
More: https://bit.ly/3jRhNsN
đĨ7đ¤¯2đ1đ1
This media is not supported in your browser
VIEW IN TELEGRAM
đ§Ē A new SOTA in Dataset Distillation đ§Ē
đA new approach by Matching Training Trajectories is out!
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Distilling data "to match" bigger one
â Distilled data to guide a network
â Trajectories of experts from real data
â SOTA + distilling higher-res visual data
More: https://bit.ly/3JwYOxW
đA new approach by Matching Training Trajectories is out!
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Distilling data "to match" bigger one
â Distilled data to guide a network
â Trajectories of experts from real data
â SOTA + distilling higher-res visual data
More: https://bit.ly/3JwYOxW
đ5đĨ1đ¤¯1
This media is not supported in your browser
VIEW IN TELEGRAM
𧤠Two-Hand tracking via GCN đ§¤
đThe first-ever GCN for two interacting hands in single RGB image
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Reconstruction by GCN mesh regression
â PIFA: pyramid attention for local occlusion
â CHA: cross hand attention for interaction
â SOTA + generalization in-the-wild scenario
â Source code available under GNU đ¤¯
More: https://bit.ly/3KH5FWO
đThe first-ever GCN for two interacting hands in single RGB image
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Reconstruction by GCN mesh regression
â PIFA: pyramid attention for local occlusion
â CHA: cross hand attention for interaction
â SOTA + generalization in-the-wild scenario
â Source code available under GNU đ¤¯
More: https://bit.ly/3KH5FWO
đ10đ4đ¤¯1
This media is not supported in your browser
VIEW IN TELEGRAM
đšī¸Video K-Net, SOTA in Segmentationđšī¸
đSimple, strong, and unified framework for fully end-to-end video panoptic segmentation
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Learnable kernels from K-Net
â K-Net learns to segment & track
â Appearance / cross-T kernel interaction
â New SOTA without bells and whistles đ¤ˇââī¸
More: https://bit.ly/3uEEZQR
đSimple, strong, and unified framework for fully end-to-end video panoptic segmentation
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Learnable kernels from K-Net
â K-Net learns to segment & track
â Appearance / cross-T kernel interaction
â New SOTA without bells and whistles đ¤ˇââī¸
More: https://bit.ly/3uEEZQR
đ6đĨ1đ¤¯1
This media is not supported in your browser
VIEW IN TELEGRAM
đDeepLabCut: tracking animals in the wildđ
đA toolbox for markerless pose estimation of animals performing various tasks
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Multi-animal pose estimation
â Datasets for multi-animal pose
â Key-points, limbs, animal identity
â Optimal key-points without input
More: https://bit.ly/37L1mLE
đA toolbox for markerless pose estimation of animals performing various tasks
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Multi-animal pose estimation
â Datasets for multi-animal pose
â Key-points, limbs, animal identity
â Optimal key-points without input
More: https://bit.ly/37L1mLE
đĨ6đ¤4đ2đ¤¯2â¤1đ1đą1
This media is not supported in your browser
VIEW IN TELEGRAM
đĄNeural Articulated Human BodyđĄ
đNovel neural implicit representation for articulated body
đđĸđ đĄđĨđĸđ đĄđđŦ:
â COmpositional Articulated People
â Large variety of shapes & poses
â Novel encoder-decoder architecture
More: https://bit.ly/3xvn7dl
đNovel neural implicit representation for articulated body
đđĸđ đĄđĨđĸđ đĄđđŦ:
â COmpositional Articulated People
â Large variety of shapes & poses
â Novel encoder-decoder architecture
More: https://bit.ly/3xvn7dl
đ4đĨ°2đ1
This media is not supported in your browser
VIEW IN TELEGRAM
đĻ 2K Resolution Generative #AI đĻ
đNovel continuous-scale training with variable output resolutions
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Mixed-resolution data
â Arbitrary scales during training
â Generations beyond 1024Ã1024
â Variant of FID metric for scales
â Source code under MIT license
More: https://bit.ly/3uNfVY6
đNovel continuous-scale training with variable output resolutions
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Mixed-resolution data
â Arbitrary scales during training
â Generations beyond 1024Ã1024
â Variant of FID metric for scales
â Source code under MIT license
More: https://bit.ly/3uNfVY6
đ¤¯11đ2đĨ2đą1đ¤Š1
This media is not supported in your browser
VIEW IN TELEGRAM
đDS Unsupervised Video Decompositionđ
đNovel method to extract persistent elements of a scene
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Scene element as Deformable Sprite (DS)
â Deformable Sprites by video auto-encoder
â Canonical texture image for appearance
â Non-rigid geom. transformation
More: https://bit.ly/37WV9w1
đNovel method to extract persistent elements of a scene
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Scene element as Deformable Sprite (DS)
â Deformable Sprites by video auto-encoder
â Canonical texture image for appearance
â Non-rigid geom. transformation
More: https://bit.ly/37WV9w1
đ4đ¤¯3đĨ1đĨ°1đ1đą1
This media is not supported in your browser
VIEW IN TELEGRAM
đĨ L-SVPE for Deep Deblurring đĨ
đL-SVPE to deblur scenes while recovering high-freq details
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Learned Spatially Varying Pixel Exposures
â Next-gen focal-plane sensor + DL
â Deep conv decoder for motion deblurring
â Superior results over non-optimized exp.
More: https://bit.ly/3uRYQMT
đL-SVPE to deblur scenes while recovering high-freq details
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Learned Spatially Varying Pixel Exposures
â Next-gen focal-plane sensor + DL
â Deep conv decoder for motion deblurring
â Superior results over non-optimized exp.
More: https://bit.ly/3uRYQMT
đ¤Š7đ2đ¤2đ1
This media is not supported in your browser
VIEW IN TELEGRAM
đ§§Hyper-Fast Instance Segmentationđ§§
đNovel Temporally Efficient Vision Transformer (TeViT) for VIS
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Video instance segmentation transformer
â Contextual-info at frame/instance level
â Nearly convolution-free framework đ¤ˇââī¸
â The new SOTA for VIS, ~70 FPS!
â Code & models under MIT license
More: https://bit.ly/3rCMXIn
đNovel Temporally Efficient Vision Transformer (TeViT) for VIS
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Video instance segmentation transformer
â Contextual-info at frame/instance level
â Nearly convolution-free framework đ¤ˇââī¸
â The new SOTA for VIS, ~70 FPS!
â Code & models under MIT license
More: https://bit.ly/3rCMXIn
đĨ10đ3đ1đ¤¯1
đUnified Scene Text/Layout Detectionđ
đWorld's first hierarchical scene text dataset + novel detection method
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Unified detection & geometric layout
â Hierarchical annotations in natural scenes
â Word, line, & paragraph level annotations
â Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
đWorld's first hierarchical scene text dataset + novel detection method
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Unified detection & geometric layout
â Hierarchical annotations in natural scenes
â Word, line, & paragraph level annotations
â Source under CC Attribution Share Alike 4.0
More: https://bit.ly/3jRpezV
đĨ3đ¤¯2â¤1đ1
This media is not supported in your browser
VIEW IN TELEGRAM
đ #Oculus' new Hand Tracking đ
đHands are able to move as naturally and intuitively in the #metaverse as do in real life
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Hands2.0 powered by CV & ML
â Tracking hand-over-hand interactions
â Crossing hands, clapping, high-fives
â Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
đHands are able to move as naturally and intuitively in the #metaverse as do in real life
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Hands2.0 powered by CV & ML
â Tracking hand-over-hand interactions
â Crossing hands, clapping, high-fives
â Accurate thumbs-up gesture
More: https://bit.ly/3JXPvY2
đ¤¯6â¤4đ2đ1
This media is not supported in your browser
VIEW IN TELEGRAM
đī¸New SOTA in #3D human avatarđī¸
đPHORHUM: photorealistic 3D human from mono-RGB
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Pixel-aligned method for 3D geometry
â Unshaded surface color + illumination
â Patch-based rendering losses for visible
â Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
đPHORHUM: photorealistic 3D human from mono-RGB
đđĸđ đĄđĨđĸđ đĄđđŦ:
â Pixel-aligned method for 3D geometry
â Unshaded surface color + illumination
â Patch-based rendering losses for visible
â Plausible color estimation for non-visible
More: https://bit.ly/3MkvBrA
đ¤¯4đ2đĨ°2â¤1