This media is not supported in your browser
VIEW IN TELEGRAM
π₯YOLOv7: YOLO for segmentationπ₯
πYOLOv7: adding a lot of newer skills to the YOLO architecture family.
ππ’π π‘π₯π’π π‘ππ¬:
β YOLOv7, not a successor of YOLO family!
β Framework for detection & segmentation
β Applications based on #META detectron2
β DETR & ViT detection out-of-box
β Easy support for pipeline thought #ONNX
β YOLOv4 + InstanceSegm. via single stage
β The latest YOLOv6 training is supported!
β Source code under GPL license.
More: https://bit.ly/3ysSJAp
πYOLOv7: adding a lot of newer skills to the YOLO architecture family.
ππ’π π‘π₯π’π π‘ππ¬:
β YOLOv7, not a successor of YOLO family!
β Framework for detection & segmentation
β Applications based on #META detectron2
β DETR & ViT detection out-of-box
β Easy support for pipeline thought #ONNX
β YOLOv4 + InstanceSegm. via single stage
β The latest YOLOv6 training is supported!
β Source code under GPL license.
More: https://bit.ly/3ysSJAp
π₯22π€―9π5π2
This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯ HD Dichotomous Segmentation π₯π₯
π A new task to segment highly accurate objects from natural images.
ππ’π π‘π₯π’π π‘ππ¬:
β 5,000+ HD images + accurate binary mask
β IS-Net baseline in high-dim feature spaces
β HCE: model vs. human interventions
β Source code (should be) available soon
More: https://bit.ly/3ah2BDO
π A new task to segment highly accurate objects from natural images.
ππ’π π‘π₯π’π π‘ππ¬:
β 5,000+ HD images + accurate binary mask
β IS-Net baseline in high-dim feature spaces
β HCE: model vs. human interventions
β Source code (should be) available soon
More: https://bit.ly/3ah2BDO
π₯13
This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯ Neural Segmentation on fire π₯π₯
πNovel methods for segmentation with mask calibration. Robustness++ in VOS.
ππ’π π‘π₯π’π π‘ππ¬:
β Study: VOS robustness vs. perturbations
β Adaptive object proxy (AOP) aggregation
β Less errors due unstable pixel-level match
β Code/models (should be) available soon
More: https://bit.ly/3yhIY6Q
πNovel methods for segmentation with mask calibration. Robustness++ in VOS.
ππ’π π‘π₯π’π π‘ππ¬:
β Study: VOS robustness vs. perturbations
β Adaptive object proxy (AOP) aggregation
β Less errors due unstable pixel-level match
β Code/models (should be) available soon
More: https://bit.ly/3yhIY6Q
π15β€1π₯1
This media is not supported in your browser
VIEW IN TELEGRAM
ππ Seq-DeepFake via Transformers ππ
πS-Lab opens Seq-DeepFake: Detecting Sequential DeepFake Manipulation
ππ’π π‘π₯π’π π‘ππ¬:
β Seq-DeepFake: sequences of facial edits
β Dataset: 85k #deepfake manipulation
β Powerful Seq-DeepFake Transformer
β Code, dataset and models available!
More: https://bit.ly/3ACQXhi
πS-Lab opens Seq-DeepFake: Detecting Sequential DeepFake Manipulation
ππ’π π‘π₯π’π π‘ππ¬:
β Seq-DeepFake: sequences of facial edits
β Dataset: 85k #deepfake manipulation
β Powerful Seq-DeepFake Transformer
β Code, dataset and models available!
More: https://bit.ly/3ACQXhi
π15π₯2β€1
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ Text2LIVE: Text-Driven Neural Editing π¦
π#Amazon unveils a novel #AI for text-driven edit of videos. Insane! π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Semantic edits of real-world videos
β Edit layerβRGBA representing target
β Edit layers synthesized on single input
β No masks or a pre-trained generator
More: https://bit.ly/3NVP6aE
π#Amazon unveils a novel #AI for text-driven edit of videos. Insane! π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Semantic edits of real-world videos
β Edit layerβRGBA representing target
β Edit layers synthesized on single input
β No masks or a pre-trained generator
More: https://bit.ly/3NVP6aE
π€―18π9π₯8β€1
This media is not supported in your browser
VIEW IN TELEGRAM
ππAI-Designed Circuits with Deep RLππ
π#Nvidia unveils an #AI to design circuits from scratch, smaller and faster than SOTA ones
ππ’π π‘π₯π’π π‘ππ¬:
β Parallel prefix circuits for Hi-Perf
β RL framework to explore the circuit space
β Smaller, Faster, Power-- from the scratch
More: https://bit.ly/3yY9dk7
π#Nvidia unveils an #AI to design circuits from scratch, smaller and faster than SOTA ones
ππ’π π‘π₯π’π π‘ππ¬:
β Parallel prefix circuits for Hi-Perf
β RL framework to explore the circuit space
β Smaller, Faster, Power-- from the scratch
More: https://bit.ly/3yY9dk7
π€―13π5π₯3
This media is not supported in your browser
VIEW IN TELEGRAM
π½ Neural I2I with a few shoots π½
π#Alibaba unveils a novel portrait stylization. Limited samples (βΌ100) -> HD outputs
ππ’π π‘π₯π’π π‘ππ¬:
β Calibration first, translation later
β Balanced distribution to calibrate bias
β Spatially semantic constraints via geometry
β Source code and models soon available!
More: https://bit.ly/3IwOmHO
π#Alibaba unveils a novel portrait stylization. Limited samples (βΌ100) -> HD outputs
ππ’π π‘π₯π’π π‘ππ¬:
β Calibration first, translation later
β Balanced distribution to calibrate bias
β Spatially semantic constraints via geometry
β Source code and models soon available!
More: https://bit.ly/3IwOmHO
β€10π5π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π€ΉββοΈ K-Means Mask Transformer π€ΉββοΈ
π#Google AI unveils kMaX-DeepLab, novel E2E method for segmentation
ππ’π π‘π₯π’π π‘ππ¬:
β kMaX-DeepLab: k-means Mask Xformer
β Rethinking relationship pixels / object
β Cross-attention -> k-means clustering
β The new SOTA on several dataset
More: https://bit.ly/3O2QV5I
π#Google AI unveils kMaX-DeepLab, novel E2E method for segmentation
ππ’π π‘π₯π’π π‘ππ¬:
β kMaX-DeepLab: k-means Mask Xformer
β Rethinking relationship pixels / object
β Cross-attention -> k-means clustering
β The new SOTA on several dataset
More: https://bit.ly/3O2QV5I
π₯11π2π1
This media is not supported in your browser
VIEW IN TELEGRAM
βοΈ 4D Neural Relightable Humans βοΈ
πRelighting4D: free-viewpoints relighting of humans under unknown illuminations
ππ’π π‘π₯π’π π‘ππ¬:
β Relight dynamic, free viewpoints
β Disentangled reflectance/geometry
β SOTA on synthetic/real datasets
β Code/models under MIT License
More: https://bit.ly/3RF3yH9
πRelighting4D: free-viewpoints relighting of humans under unknown illuminations
ππ’π π‘π₯π’π π‘ππ¬:
β Relight dynamic, free viewpoints
β Disentangled reflectance/geometry
β SOTA on synthetic/real datasets
β Code/models under MIT License
More: https://bit.ly/3RF3yH9
π₯9π2
This media is not supported in your browser
VIEW IN TELEGRAM
π° Long-Term Object Segmentation π°
πXMem: object segmentation for long clips with unified feature memory stores
ππ’π π‘π₯π’π π‘ππ¬:
β Inspired by AtkinsonβShiffrin model
β Stores with different temporal scales
β Memory consolidation algorithm
β Compact/powerful long-term memory
β Source code and models available
More: https://bit.ly/3PP0EOn
πXMem: object segmentation for long clips with unified feature memory stores
ππ’π π‘π₯π’π π‘ππ¬:
β Inspired by AtkinsonβShiffrin model
β Stores with different temporal scales
β Memory consolidation algorithm
β Compact/powerful long-term memory
β Source code and models available
More: https://bit.ly/3PP0EOn
π€―16π5π3
AI with Papers - Artificial Intelligence & Deep Learning
π¦ CogVideo: insane text-to-clip π¦ πCogVideo: 9B-parameters world's first large scale open-source text-to-video π΅ ππ’π π‘π₯π’π π‘ππ¬: β
Largest open-source T2C transformer β
Finetuning of text-to-image model β
Multi-frame-rate hierarchical training β
From pretrainedβ¦
This media is not supported in your browser
VIEW IN TELEGRAM
π₯π₯ Update π₯π₯
πCode https://github.com/THUDM/CogVideo
πDemo https://wudao.aminer.cn/cogvideo/
More: https://bit.ly/3yP86BQ
πCode https://github.com/THUDM/CogVideo
πDemo https://wudao.aminer.cn/cogvideo/
More: https://bit.ly/3yP86BQ
π₯5β€4π1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯Grand Unification of Object Trackingπ₯
πUNICORN: unified method for SOT, MOT, VOS, & MOTS with a single neural net. π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Great unification for 4 tracking tasks
β Bridging methods / pixel-wise corresp.
β SOTA on 8 challenging benchmarks
β Source code under MIT License
More: https://bit.ly/3o74h6g
πUNICORN: unified method for SOT, MOT, VOS, & MOTS with a single neural net. π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Great unification for 4 tracking tasks
β Bridging methods / pixel-wise corresp.
β SOTA on 8 challenging benchmarks
β Source code under MIT License
More: https://bit.ly/3o74h6g
π13π₯3π€―1π±1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯OmniBenchmark: CV beyond ImageNetπ₯
π 21 realms, 7,000+ concepts and 1M+ images. Far beyond ImageNet!
ππ’π π‘π₯π’π π‘ππ¬:
β vs. ImageNet: 2.5x realms, 9x concepts
β Conciseness: no concept overlapping
β ReCo: Relational Contrastive Learning
β New supervised contrastive learning SOTA
More: https://bit.ly/3RJRKU0
π 21 realms, 7,000+ concepts and 1M+ images. Far beyond ImageNet!
ππ’π π‘π₯π’π π‘ππ¬:
β vs. ImageNet: 2.5x realms, 9x concepts
β Conciseness: no concept overlapping
β ReCo: Relational Contrastive Learning
β New supervised contrastive learning SOTA
More: https://bit.ly/3RJRKU0
π₯11π€©3
This media is not supported in your browser
VIEW IN TELEGRAM
π£ HD Neural Avatar @130FPS π£
πSamsung unveils MegaPortraits: novel one-shot creation of HD neural human avatar
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot neural avatars, SOTA up 512p
β "Upgrading" to megapixel via more pics
β First Neural Head Avatars in HD
β Up to to 130 FPS via #GPU
More: https://bit.ly/3oboWWT
πSamsung unveils MegaPortraits: novel one-shot creation of HD neural human avatar
ππ’π π‘π₯π’π π‘ππ¬:
β One-shot neural avatars, SOTA up 512p
β "Upgrading" to megapixel via more pics
β First Neural Head Avatars in HD
β Up to to 130 FPS via #GPU
More: https://bit.ly/3oboWWT
π₯22π1π1
AI with Papers - Artificial Intelligence & Deep Learning
π§ Bias in #AI, explained simple π§ πAsking DallE-Mini to help me to show what the BIAS in #AI is πππ§ππ«ππππ πππ¦π©π₯ππ¬: β
Best eng.->men/Caucasians β
Best doctors->men/Caucasians β
Top CEOs->men/Caucasians β
Chef, kitchen->men/Caucasians β
Rich People->only Caucasiansβ¦
π₯Important update from #OpenAIπ₯
π https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2/
π https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2/
Openai
Reducing bias and improving safety in DALLΒ·E 2
Today, we are implementing a new technique so that DALLΒ·E generates images of people that more accurately reflect the diversity of the worldβs population.
π10β€2
This media is not supported in your browser
VIEW IN TELEGRAM
π¦ TimeLens++: Event-based Interpolation π¦
πNovel event-based interpolation with non-linear flow & multi-scale fusion
ππ’π π‘π₯π’π π‘ππ¬:
β Novel motion spline estimator
β Non-linear continuous event/frames flow
β Multi-feature fusion, gated compression
β Novel hybrid dataset with 100+ videos
More: https://bit.ly/3yJyY6g
πNovel event-based interpolation with non-linear flow & multi-scale fusion
ππ’π π‘π₯π’π π‘ππ¬:
β Novel motion spline estimator
β Non-linear continuous event/frames flow
β Multi-feature fusion, gated compression
β Novel hybrid dataset with 100+ videos
More: https://bit.ly/3yJyY6g
π₯16π4
This media is not supported in your browser
VIEW IN TELEGRAM
πͺ°NUWA-Infinity is out!πͺ°
πβ generation by #Microsoft: arbitrarily-sized HD images and long videos π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Unconditional Image Gen.
β Text-to-Image/Text-to-Clip
β Animation / Out-painting
β Hi-res, arbitrary long clip
β NCP for patches caching
More: https://bit.ly/3zmBf9f
πβ generation by #Microsoft: arbitrarily-sized HD images and long videos π€―
ππ’π π‘π₯π’π π‘ππ¬:
β Unconditional Image Gen.
β Text-to-Image/Text-to-Clip
β Animation / Out-painting
β Hi-res, arbitrary long clip
β NCP for patches caching
More: https://bit.ly/3zmBf9f
π₯7π2β€1π1π€―1
This media is not supported in your browser
VIEW IN TELEGRAM
π₯ #AIwithPapers: we are 3,500+! π₯
ππ Ready for YOLO 10, 11, Ο, β, Ξ¨, and more? The more we are, the faster we catch'em all ππ
π Invite your friends -> https://t.me/AI_DeepLearning
ππ Ready for YOLO 10, 11, Ο, β, Ξ¨, and more? The more we are, the faster we catch'em all ππ
π Invite your friends -> https://t.me/AI_DeepLearning
π12β€10π5π₯3
This media is not supported in your browser
VIEW IN TELEGRAM
π·π·OMNI3D: #3D Objects in the Wildπ·π·
π#3D detection: 234k images, 3M+ instances & 97 categories
ππ’π π‘π₯π’π π‘ππ¬:
β OMNI3D from publicly released dataset
β 234k pics, 3M+ annotation with 3D box
β 97 categories such as sofa, table, cars
β Fast (450x) and exact algorithm for IoU
β Cube R-CNN: novel 3D object detector
More: https://bit.ly/3cznjzG
π#3D detection: 234k images, 3M+ instances & 97 categories
ππ’π π‘π₯π’π π‘ππ¬:
β OMNI3D from publicly released dataset
β 234k pics, 3M+ annotation with 3D box
β 97 categories such as sofa, table, cars
β Fast (450x) and exact algorithm for IoU
β Cube R-CNN: novel 3D object detector
More: https://bit.ly/3cznjzG
π11
This media is not supported in your browser
VIEW IN TELEGRAM
πΉMultiface Neural Rendering πΉ
πA new multi-view, Hi-Res data collected at #META Reality Labs for neural face
ππ’π π‘π₯π’π π‘ππ¬:
β Mugsy, large scale multi-cam apparatus
β High-Res sync facial performance
β Closing the gap in accessing HQ data
β Suitable for #VR & #mixedreality
More: https://bit.ly/3b6XfeL
πA new multi-view, Hi-Res data collected at #META Reality Labs for neural face
ππ’π π‘π₯π’π π‘ππ¬:
β Mugsy, large scale multi-cam apparatus
β High-Res sync facial performance
β Closing the gap in accessing HQ data
β Suitable for #VR & #mixedreality
More: https://bit.ly/3b6XfeL
π€―8π3