This media is not supported in your browser
VIEW IN TELEGRAM
π₯ MinVIS, a new SOTA is out π₯
π#Nvidia miniVIS: no video-based architectures nor training proceduresπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β Video architecture/train not required
β MinVIS outperforms the previous SOTA
β Occluded VIS (OVIS): >10% improvement
β 1% of labeled frames >> fully-supervised
More: https://bit.ly/3pcYzk1
π#Nvidia miniVIS: no video-based architectures nor training proceduresπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β Video architecture/train not required
β MinVIS outperforms the previous SOTA
β Occluded VIS (OVIS): >10% improvement
β 1% of labeled frames >> fully-supervised
More: https://bit.ly/3pcYzk1
π₯12
This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯MultiNeRF: three NeRFs are out!π₯π₯
πGoogle opens the code of three #cvpr2022 papers: Mip-NeRF 360, Ref-NeRF, RawNeRF
ππ’π π‘π₯π’π π‘ππ¬:
β Paper_1: Mip-NeRF 360
β Paper_2: Ref-NeRF
β Paper_3: NeRF in the Dark
More: https://bit.ly/3QjpRRc
πGoogle opens the code of three #cvpr2022 papers: Mip-NeRF 360, Ref-NeRF, RawNeRF
ππ’π π‘π₯π’π π‘ππ¬:
β Paper_1: Mip-NeRF 360
β Paper_2: Ref-NeRF
β Paper_3: NeRF in the Dark
More: https://bit.ly/3QjpRRc
π13β€4π€―4
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈLocoProp: Neural Layers CompositionβοΈ
πGoogle AI unveils LocoProp: novel neural paradigm for modular composition of layers.
ππ’π π‘π₯π’π π‘ππ¬:
β Backprop++ via Local Loss Optimization
β Layer-based w-reg, target output, loss
β Multiple local update via first-order opt.
β Superior performance and efficiency
More: https://bit.ly/3Q40YJn
πGoogle AI unveils LocoProp: novel neural paradigm for modular composition of layers.
ππ’π π‘π₯π’π π‘ππ¬:
β Backprop++ via Local Loss Optimization
β Layer-based w-reg, target output, loss
β Multiple local update via first-order opt.
β Superior performance and efficiency
More: https://bit.ly/3Q40YJn
π₯13
This media is not supported in your browser
VIEW IN TELEGRAM
π₯PCVOS: clip-wise mask VOSπ₯
πPCVOS: new semi-supervised video object segmentation method
ππ’π π‘π₯π’π π‘ππ¬:
β Reformulating semi-supervised VOS
β Novel per-clip inference perspective
β Clip-wise operation on intra-clip
β PCVOS: model for per-clip inference
β New SOTA on multiple benchmarks
More: https://bit.ly/3vJtmbz
πPCVOS: new semi-supervised video object segmentation method
ππ’π π‘π₯π’π π‘ππ¬:
β Reformulating semi-supervised VOS
β Novel per-clip inference perspective
β Clip-wise operation on intra-clip
β PCVOS: model for per-clip inference
β New SOTA on multiple benchmarks
More: https://bit.ly/3vJtmbz
π10π2β€1π€©1
This media is not supported in your browser
VIEW IN TELEGRAM
π World-Object Detection via ViT π
πGoogle unveils OWL-ViT: open-vocabulary detector based on ViTs π€―
ππ’π π‘π₯π’π π‘ππ¬:
β ViTs for Open-World Localization
β Img-level to open-vocabulary detection
β SOTA one-shot (img.cond.) detection
More: https://bit.ly/3Sy3jOj
πGoogle unveils OWL-ViT: open-vocabulary detector based on ViTs π€―
ππ’π π‘π₯π’π π‘ππ¬:
β ViTs for Open-World Localization
β Img-level to open-vocabulary detection
β SOTA one-shot (img.cond.) detection
More: https://bit.ly/3Sy3jOj
π€―12π3
This media is not supported in your browser
VIEW IN TELEGRAM
πΉπΉ Learning Piano in #AR πΉπΉ
πPianoVision (on #META #Quest2) accelerates the piano learning via Passthrough #AR & hand tracking
ππ’π π‘π₯π’π π‘ππ¬:
β Sheet Insight to learn sight-read
β MIDI keyboard connectivity
β Air piano for no physical pianos
β Multiplayer Music Instruction
β PianoVision Music Hall in #VR
More: https://bit.ly/3zYvwGX
πPianoVision (on #META #Quest2) accelerates the piano learning via Passthrough #AR & hand tracking
ππ’π π‘π₯π’π π‘ππ¬:
β Sheet Insight to learn sight-read
β MIDI keyboard connectivity
β Air piano for no physical pianos
β Multiplayer Music Instruction
β PianoVision Music Hall in #VR
More: https://bit.ly/3zYvwGX
β€15π€―6π1
This media is not supported in your browser
VIEW IN TELEGRAM
π§EPro-PnP: Persp-n-Points Detectionπ§
πEPro-PnP: probabilistic PnP layer for general e2e pose estimation
ππ’π π‘π₯π’π π‘ππ¬:
β Probabilistic PnP for general e2e pose
β Top-tier in 6DoF by inserting into CDPN
β Deformable accurate detection
β 2D-3D corresp. learned from scratch
More: https://bit.ly/3BNPXYr
πEPro-PnP: probabilistic PnP layer for general e2e pose estimation
ππ’π π‘π₯π’π π‘ππ¬:
β Probabilistic PnP for general e2e pose
β Top-tier in 6DoF by inserting into CDPN
β Deformable accurate detection
β 2D-3D corresp. learned from scratch
More: https://bit.ly/3BNPXYr
π11
This media is not supported in your browser
VIEW IN TELEGRAM
π₯#NVIDIA wins SIGGRAPH's Best Paperπ₯
πInstant #NeRF awarded as a best paper at SIGGRAPH 2022!
ππ’π π‘π₯π’π π‘ππ¬:
β Speed-up of several orders of magnitude
β HQ neural primitives in a matter of secs
β Render in tens of milliseconds at 1080p
β Source code and resources available!
More: https://bit.ly/3Qt8c9D
πInstant #NeRF awarded as a best paper at SIGGRAPH 2022!
ππ’π π‘π₯π’π π‘ππ¬:
β Speed-up of several orders of magnitude
β HQ neural primitives in a matter of secs
β Render in tens of milliseconds at 1080p
β Source code and resources available!
More: https://bit.ly/3Qt8c9D
π16π₯6β€3π1
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ° EasyMocap: Open Neural Mocap πͺ°
πEasyMocap: open-source marker-less mocap with novel view synthesis from RGB
ππ’π π‘π₯π’π π‘ππ¬ (of last paper added):
β Editable free-viewpoint video
β Layered neural representation of humans
β Multi-pax -> instances, weakly-supervised
β HQ neural representation of the humans
β Addressing camera error by human poses
More: https://bit.ly/3p6lUDO
πEasyMocap: open-source marker-less mocap with novel view synthesis from RGB
ππ’π π‘π₯π’π π‘ππ¬ (of last paper added):
β Editable free-viewpoint video
β Layered neural representation of humans
β Multi-pax -> instances, weakly-supervised
β HQ neural representation of the humans
β Addressing camera error by human poses
More: https://bit.ly/3p6lUDO
π€―6π3π3β€2
This media is not supported in your browser
VIEW IN TELEGRAM
π° Texturify: Neural Textures Generator π°
πA step towards automated content creation. HQ textures directly on surface of 3D object
ππ’π π‘π₯π’π π‘ππ¬:
β TUM + Max Planck + Apple π
β Realistic, HQ textures from 2D pics
β 3D shape geometry, no 3D supervision
β 3D-aware surface-based generation net
More: https://bit.ly/3BW7UUU
πA step towards automated content creation. HQ textures directly on surface of 3D object
ππ’π π‘π₯π’π π‘ππ¬:
β TUM + Max Planck + Apple π
β Realistic, HQ textures from 2D pics
β 3D shape geometry, no 3D supervision
β 3D-aware surface-based generation net
More: https://bit.ly/3BW7UUU
π8
This media is not supported in your browser
VIEW IN TELEGRAM
π¨ Scaling Neural Indoor Scene π¨
πNeural scene rendering for indoor: scalable in both training/rendering
ππ’π π‘π₯π’π π‘ππ¬:
β Neural scene rendering for indoor
β #3D into tiles with MLPs to scale up
β Parallel training of tile-based MLPs
β View-indep. components (via surf-MLP)
More: https://bit.ly/3bH94IX
πNeural scene rendering for indoor: scalable in both training/rendering
ππ’π π‘π₯π’π π‘ππ¬:
β Neural scene rendering for indoor
β #3D into tiles with MLPs to scale up
β Parallel training of tile-based MLPs
β View-indep. components (via surf-MLP)
More: https://bit.ly/3bH94IX
π₯2π1
AI with Papers - Artificial Intelligence & Deep Learning
π₯ MobileNeRF is out -> Pure Fire! π₯ πMobileNeRF is out: the mobile evolution of NeRF via textured polygons. ππ’π π‘π₯π’π π‘ππ¬: β
Same quality, 10x faster than SNeRG β
Memory-- by storing surface textures β
Integrated GPUs: less memory/power β
Suitable for browser &β¦
π₯π₯UPDATEπ₯π₯
Code Released: https://github.com/google-research/jax3d/tree/main/jax3d/projects/mobilenerf
Code Released: https://github.com/google-research/jax3d/tree/main/jax3d/projects/mobilenerf
β€6π₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Stable Diffusion on clips. INSANEπ₯
πThe most advanced latent text-to-image DM. #RunwayML just announced is going to apply it on clips
ππ’π π‘π₯π’π π‘ππ¬:
β Latent DM on 512p from LAION-5B
β Frozen CLIP ViT-L/14 text encoder
β Lightweight, runs on a 10GB-GPU
β Checkpoints only for research
More: https://bit.ly/3QfkRx3
πThe most advanced latent text-to-image DM. #RunwayML just announced is going to apply it on clips
ππ’π π‘π₯π’π π‘ππ¬:
β Latent DM on 512p from LAION-5B
β Frozen CLIP ViT-L/14 text encoder
β Lightweight, runs on a 10GB-GPU
β Checkpoints only for research
More: https://bit.ly/3QfkRx3
π€―13π±12π2π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
π Implicitron: "democratizing" NeRFπ
π#META opens a novel framework for NeRF-world in #PyTorch3D #pytorch
ππ’π π‘π₯π’π π‘ππ¬:
β Implicit representations (NeRF) / Render
β RaySampler/PointSampler & more
β NeRFβs MLP, IDRβs FF, SRN, etc.
β Renderers: MEAR, LSTMRenderer, etc.
More: https://bit.ly/3bPyJPJ
π#META opens a novel framework for NeRF-world in #PyTorch3D #pytorch
ππ’π π‘π₯π’π π‘ππ¬:
β Implicit representations (NeRF) / Render
β RaySampler/PointSampler & more
β NeRFβs MLP, IDRβs FF, SRN, etc.
β Renderers: MEAR, LSTMRenderer, etc.
More: https://bit.ly/3bPyJPJ
π₯4π€―2
This media is not supported in your browser
VIEW IN TELEGRAM
π§° FGT: flow-guided inpainting π§°
π#Microsoft (+USTC) unveils FGT: flow-guided ViT for video inpainting π€―
ππ’π π‘π₯π’π π‘ππ¬:
β OF into transformer for attention++
β Flow completion net w/ local feats.
β Dual perspective spatial MHSA
β Local attention with global content
More: https://bit.ly/3pk5J5S
π#Microsoft (+USTC) unveils FGT: flow-guided ViT for video inpainting π€―
ππ’π π‘π₯π’π π‘ππ¬:
β OF into transformer for attention++
β Flow completion net w/ local feats.
β Dual perspective spatial MHSA
β Local attention with global content
More: https://bit.ly/3pk5J5S
β€11π5
This media is not supported in your browser
VIEW IN TELEGRAM
πNeuMan: Human NeRF in the wildπ
π#Apple opens a novel human pose/view from just a single in-the-wild video
ππ’π π‘π₯π’π π‘ππ¬:
β No extra devices/annotations
β Both Human (novel poses) + Scene
β E2E SMPL optimization + error-corr.
β Applications such as "telegathering"
More: https://bit.ly/3K4iTO6
π#Apple opens a novel human pose/view from just a single in-the-wild video
ππ’π π‘π₯π’π π‘ππ¬:
β No extra devices/annotations
β Both Human (novel poses) + Scene
β E2E SMPL optimization + error-corr.
β Applications such as "telegathering"
More: https://bit.ly/3K4iTO6
π15
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ CLIP-based Neural Style Transfer π₯
πFrom #Nvidia a novel method for transferring the style to a #3D object
ππ’π π‘π₯π’π π‘ππ¬:
β Texture style for 3D by CLIP-ResNet50
β Nearest-neighbor feature matching loss
β CLIP-based loss extraction of textures
β NNFM for multiple style pics / control
β No source code or models available π
More: https://bit.ly/3c32dK5
πFrom #Nvidia a novel method for transferring the style to a #3D object
ππ’π π‘π₯π’π π‘ππ¬:
β Texture style for 3D by CLIP-ResNet50
β Nearest-neighbor feature matching loss
β CLIP-based loss extraction of textures
β NNFM for multiple style pics / control
β No source code or models available π
More: https://bit.ly/3c32dK5
π€―12π₯5β€4π2π±2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ KeypointNeRF: code is out! π₯
πKeypointNeRF by #Meta: "NeRF"-avatars
ππ’π π‘π₯π’π π‘ππ¬:
β Generalizable NeRF for virtual avatar
β Sparse 3D keypoints for SOTA avatar
β Novel unseen subjects from 2/3 views
β "iPhone" captures for #metaverse
More: https://bit.ly/3pyl17e
πKeypointNeRF by #Meta: "NeRF"-avatars
ππ’π π‘π₯π’π π‘ππ¬:
β Generalizable NeRF for virtual avatar
β Sparse 3D keypoints for SOTA avatar
β Novel unseen subjects from 2/3 views
β "iPhone" captures for #metaverse
More: https://bit.ly/3pyl17e
π₯8π3π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Massive GTA-V human datasetπ₯
πGTA-Human: outperforming SOTA with a purely synthetic training.
ππ’π π‘π₯π’π π‘ππ¬:
β 600+ gender, age, ethnicity & clothing
β 20,000+ clips, variety of human activities
β 6 categories of location, different BGs
β Occlusions, lighting, and weather system
More: https://bit.ly/3wpZyRD
πGTA-Human: outperforming SOTA with a purely synthetic training.
ππ’π π‘π₯π’π π‘ππ¬:
β 600+ gender, age, ethnicity & clothing
β 20,000+ clips, variety of human activities
β 6 categories of location, different BGs
β Occlusions, lighting, and weather system
More: https://bit.ly/3wpZyRD
π₯14β€2π1