This media is not supported in your browser
VIEW IN TELEGRAM
π₯OmniBenchmark: CV beyond ImageNetπ₯
π 21 realms, 7,000+ concepts and 1M+ images. Far beyond ImageNet!
ππ’π π‘π₯π’π π‘ππ¬:
β vs. ImageNet: 2.5x realms, 9x concepts
β Conciseness: no concept overlapping
β ReCo: Relational Contrastive Learning
β New supervised contrastive learning SOTA
More: https://bit.ly/3RJRKU0
π 21 realms, 7,000+ concepts and 1M+ images. Far beyond ImageNet!
ππ’π π‘π₯π’π π‘ππ¬:
β vs. ImageNet: 2.5x realms, 9x concepts
β Conciseness: no concept overlapping
β ReCo: Relational Contrastive Learning
β New supervised contrastive learning SOTA
More: https://bit.ly/3RJRKU0
π₯11π€©3
This media is not supported in your browser
VIEW IN TELEGRAM
π£ HD Neural Avatar @130FPS π£
πSamsung unveils MegaPortraits: novel one-shot creation of HD neural human avatar
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot neural avatars, SOTA up 512p
β "Upgrading" to megapixel via more pics
β First Neural Head Avatars in HD
β Up to to 130 FPS via #GPU
More: https://bit.ly/3oboWWT
πSamsung unveils MegaPortraits: novel one-shot creation of HD neural human avatar
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot neural avatars, SOTA up 512p
β "Upgrading" to megapixel via more pics
β First Neural Head Avatars in HD
β Up to to 130 FPS via #GPU
More: https://bit.ly/3oboWWT
π₯22π1π1
AI with Papers - Artificial Intelligence & Deep Learning
π§ Bias in #AI, explained simple π§ πAsking DallE-Mini to help me to show what the BIAS in #AI is πππ§ππ«ππππ πππ¦π©π₯ππ¬: β
Best eng.->men/Caucasians β
Best doctors->men/Caucasians β
Top CEOs->men/Caucasians β
Chef, kitchen->men/Caucasians β
Rich People->only Caucasiansβ¦
π₯Important update from #OpenAIπ₯
π https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2/
π https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2/
Openai
Reducing bias and improving safety in DALLΒ·E 2
Today, we are implementing a new technique so that DALLΒ·E generates images of people that more accurately reflect the diversity of the worldβs population.
π10β€2
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ TimeLens++: Event-based Interpolation π¦
πNovel event-based interpolation with non-linear flow & multi-scale fusion
ππ’π π‘π₯π’π π‘ππ¬:
β Novel motion spline estimator
β Non-linear continuous event/frames flow
β Multi-feature fusion, gated compression
β Novel hybrid dataset with 100+ videos
More: https://bit.ly/3yJyY6g
πNovel event-based interpolation with non-linear flow & multi-scale fusion
ππ’π π‘π₯π’π π‘ππ¬:
β Novel motion spline estimator
β Non-linear continuous event/frames flow
β Multi-feature fusion, gated compression
β Novel hybrid dataset with 100+ videos
More: https://bit.ly/3yJyY6g
π₯16π4
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ°NUWA-Infinity is out!πͺ°
πβ generation by #Microsoft: arbitrarily-sized HD images and long videos π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Unconditional Image Gen.
β Text-to-Image/Text-to-Clip
β Animation / Out-painting
β Hi-res, arbitrary long clip
β NCP for patches caching
More: https://bit.ly/3zmBf9f
πβ generation by #Microsoft: arbitrarily-sized HD images and long videos π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Unconditional Image Gen.
β Text-to-Image/Text-to-Clip
β Animation / Out-painting
β Hi-res, arbitrary long clip
β NCP for patches caching
More: https://bit.ly/3zmBf9f
π₯7π2β€1π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ #AIwithPapers: we are 3,500+! π₯
ππ Ready for YOLO 10, 11, Ο, β, Ξ¨, and more? The more we are, the faster we catch'em all ππ
π Invite your friends -> https://t.me/AI_DeepLearning
ππ Ready for YOLO 10, 11, Ο, β, Ξ¨, and more? The more we are, the faster we catch'em all ππ
π Invite your friends -> https://t.me/AI_DeepLearning
π12β€10π5π₯3
This media is not supported in your browser
VIEW IN TELEGRAM
π·π·OMNI3D: #3D Objects in the Wildπ·π·
π#3D detection: 234k images, 3M+ instances & 97 categories
ππ’π π‘π₯π’π π‘ππ¬:
β OMNI3D from publicly released dataset
β 234k pics, 3M+ annotation with 3D box
β 97 categories such as sofa, table, cars
β Fast (450x) and exact algorithm for IoU
β Cube R-CNN: novel 3D object detector
More: https://bit.ly/3cznjzG
π#3D detection: 234k images, 3M+ instances & 97 categories
ππ’π π‘π₯π’π π‘ππ¬:
β OMNI3D from publicly released dataset
β 234k pics, 3M+ annotation with 3D box
β 97 categories such as sofa, table, cars
β Fast (450x) and exact algorithm for IoU
β Cube R-CNN: novel 3D object detector
More: https://bit.ly/3cznjzG
π11
This media is not supported in your browser
VIEW IN TELEGRAM
πΉMultiface Neural Rendering πΉ
πA new multi-view, Hi-Res data collected at #META Reality Labs for neural face
ππ’π π‘π₯π’π π‘ππ¬:
β Mugsy, large scale multi-cam apparatus
β High-Res sync facial performance
β Closing the gap in accessing HQ data
β Suitable for #VR & #mixedreality
More: https://bit.ly/3b6XfeL
πA new multi-view, Hi-Res data collected at #META Reality Labs for neural face
ππ’π π‘π₯π’π π‘ππ¬:
β Mugsy, large scale multi-cam apparatus
β High-Res sync facial performance
β Closing the gap in accessing HQ data
β Suitable for #VR & #mixedreality
More: https://bit.ly/3b6XfeL
π€―8π3
This media is not supported in your browser
VIEW IN TELEGRAM
πDEVIANT: SOTA in mono-3D detectionπ
πA novel Depth EquiVarIAnt NeTwork for 3D monocular detection in the wild
ππ’π π‘π₯π’π π‘ππ¬:
β Michigan + #Meta + Ford π€―
β Depth-equi. + scale equiv. steerable
β New SOTA on KITTI & Waymo
β Ok cross-dataset -> generalization
More: https://bit.ly/3OEFtgK
πA novel Depth EquiVarIAnt NeTwork for 3D monocular detection in the wild
ππ’π π‘π₯π’π π‘ππ¬:
β Michigan + #Meta + Ford π€―
β Depth-equi. + scale equiv. steerable
β New SOTA on KITTI & Waymo
β Ok cross-dataset -> generalization
More: https://bit.ly/3OEFtgK
π₯16π2β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π§± Assembling #LEGO with #AI π§±
πStep-by-step assembly manual created by human into machine-interpretable instructions
ππ’π π‘π₯π’π π‘ππ¬:
β Stanford + MIT + #Google π€―
β MEPNet: Manual-to-Executable-Plan Net
β Manual to machine-executable plan
β 2D manual - 3D geometric shape
β Reasoning on 3D alignments of legos
More: https://bit.ly/3PCwn5C
πStep-by-step assembly manual created by human into machine-interpretable instructions
ππ’π π‘π₯π’π π‘ππ¬:
β Stanford + MIT + #Google π€―
β MEPNet: Manual-to-Executable-Plan Net
β Manual to machine-executable plan
β 2D manual - 3D geometric shape
β Reasoning on 3D alignments of legos
More: https://bit.ly/3PCwn5C
π₯9β€3
This media is not supported in your browser
VIEW IN TELEGRAM
πNew SOTA in UDA Semantic Seg.π
πHRDA: multi-res Unsupervised Domain Adaptive Semantic Seg. -> SOTA
ππ’π π‘π₯π’π π‘ππ¬:
β ETH + MPG + KU Leuven π€―
β HRDA: multi-res approach for UDA
β Manageable GPU memory footprint
β Small objects & fine segmentation detail
β New SOTA on GTA and Synthia dataset
More: https://bit.ly/3cKtDEp
πHRDA: multi-res Unsupervised Domain Adaptive Semantic Seg. -> SOTA
ππ’π π‘π₯π’π π‘ππ¬:
β ETH + MPG + KU Leuven π€―
β HRDA: multi-res approach for UDA
β Manageable GPU memory footprint
β Small objects & fine segmentation detail
β New SOTA on GTA and Synthia dataset
More: https://bit.ly/3cKtDEp
π€―8π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ SemAbs: 3D Scene Understanding βοΈ
πFramework that equips 2D Vision-Language Models (VLMs) with new 3D spatial capabilities
ππ’π π‘π₯π’π π‘ππ¬:
β 2D VLMs with 3D reasoning skills
β ViTs Efficient MS Relevancy Extraction
β Novel Open-World understanding tasks
β Completing partially observed objects
β Finding hidden objects from language
More: https://bit.ly/3PYYk7d
πFramework that equips 2D Vision-Language Models (VLMs) with new 3D spatial capabilities
ππ’π π‘π₯π’π π‘ππ¬:
β 2D VLMs with 3D reasoning skills
β ViTs Efficient MS Relevancy Extraction
β Novel Open-World understanding tasks
β Completing partially observed objects
β Finding hidden objects from language
More: https://bit.ly/3PYYk7d
π₯7β€1π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ TinyCD: Neural Change Detection π¦
πTinyCD: new SOTA in change detection with up to 150x fewer parameters.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA with up to 150X fewer params
β Mixing blocks for s.t. cross-correlation
β PW-MLP for pixel wise classification
β MAMB: novel block for skip connection
More: https://bit.ly/3zFEngk
πTinyCD: new SOTA in change detection with up to 150x fewer parameters.
ππ’π π‘π₯π’π π‘ππ¬:
β SOTA with up to 150X fewer params
β Mixing blocks for s.t. cross-correlation
β PW-MLP for pixel wise classification
β MAMB: novel block for skip connection
More: https://bit.ly/3zFEngk
β€16π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ 3D-Aware "StyleGANv2" version π¦
πUpgrading StyleGANv2 into a novel 3D-aware GAN with just a minimal set of changesπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β MPI-like 3D-aware GAN w/ single-view
β GMPI: generative multiplane image
β 2D GAN 3D-aware with a minimal changes
β Encoding 3D-aware inductive biases
More: https://bit.ly/3OJ5gnS
πUpgrading StyleGANv2 into a novel 3D-aware GAN with just a minimal set of changesπ€―
ππ’π π‘π₯π’π π‘ππ¬:
β MPI-like 3D-aware GAN w/ single-view
β GMPI: generative multiplane image
β 2D GAN 3D-aware with a minimal changes
β Encoding 3D-aware inductive biases
More: https://bit.ly/3OJ5gnS
π€―6π4β€1
This media is not supported in your browser
VIEW IN TELEGRAM
πΊ NeRF-ing "The Big Bang Theory" πΊ
πBerkeley unveils an approach for accurate estimation of actorβs 3D pose & location
ππ’π π‘π₯π’π π‘ππ¬:
β Input: images across the whole season
β 3D context (i.e. cams, structure, body)
β Integrating context in 3D estimation
β Re-ID, gaze, cinematography, pic editing
β Knock, Knock, Penny!
More: https://bit.ly/3OLuaUb
πBerkeley unveils an approach for accurate estimation of actorβs 3D pose & location
ππ’π π‘π₯π’π π‘ππ¬:
β Input: images across the whole season
β 3D context (i.e. cams, structure, body)
β Integrating context in 3D estimation
β Re-ID, gaze, cinematography, pic editing
β Knock, Knock, Penny!
More: https://bit.ly/3OLuaUb
π₯7π€―5π₯°2β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π©ShAPO: SOTA in object understandingπ©
πJoint multi-object detection, #3D texture, 6D object pose & size estimation.
ππ’π π‘π₯π’π π‘ππ¬:
β Disentangled shape & appearance
β Efficient octree-based differentiable
β Object-centric understanding pipeline
β Detection, reconstruction , 6D & size
β SOTA in reconstruction & pose est.
More: https://bit.ly/3oHN5EQ
πJoint multi-object detection, #3D texture, 6D object pose & size estimation.
ππ’π π‘π₯π’π π‘ππ¬:
β Disentangled shape & appearance
β Efficient octree-based differentiable
β Object-centric understanding pipeline
β Detection, reconstruction , 6D & size
β SOTA in reconstruction & pose est.
More: https://bit.ly/3oHN5EQ
π7π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
ποΈ CityNeRF: Neural Rendering of City Scenes ποΈ
πProgressive NeRF model and training set on city-scenes
ππ’π π‘π₯π’π π‘ππ¬:
β BungeeNeRF: novel progressive NeRF
β Details on drastically varied scales
β Growing with residual block structure
β Inclusive multi-level data supervision
More: https://bit.ly/3cS9vk7
πProgressive NeRF model and training set on city-scenes
ππ’π π‘π₯π’π π‘ππ¬:
β BungeeNeRF: novel progressive NeRF
β Details on drastically varied scales
β Growing with residual block structure
β Inclusive multi-level data supervision
More: https://bit.ly/3cS9vk7
π₯°7π3π€―3π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦π¦ Rewriting Geometry of GAN π¦π¦
πDrive GAN synthesizing many unseen objects with the desired shape
ππ’π π‘π₯π’π π‘ππ¬:
β User-friendly "warping" with geometry
β Low-rank update to layer for editing
β Latent augmentation based on style-mix
β Endless objects with defined changes
β Latent space interpolation, image editing
More: https://bit.ly/3zIfOj8
πDrive GAN synthesizing many unseen objects with the desired shape
ππ’π π‘π₯π’π π‘ππ¬:
β User-friendly "warping" with geometry
β Low-rank update to layer for editing
β Latent augmentation based on style-mix
β Endless objects with defined changes
β Latent space interpolation, image editing
More: https://bit.ly/3zIfOj8
π8π±7π3π2β€1π₯1