AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶš 2K Resolution Generative #AI ðŸĶš

👉Novel continuous-scale training with variable output resolutions

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Mixed-resolution data
✅Arbitrary scales during training
✅Generations beyond 1024×1024
✅Variant of FID metric for scales
✅Source code under MIT license

More: https://bit.ly/3uNfVY6
ðŸĪŊ11👍2ðŸ”Ĩ2ðŸ˜ą1ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
🐍DS Unsupervised Video Decomposition🐍

👉Novel method to extract persistent elements of a scene

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Scene element as Deformable Sprite (DS)
✅Deformable Sprites by video auto-encoder
✅Canonical texture image for appearance
✅Non-rigid geom. transformation

More: https://bit.ly/37WV9w1
👍4ðŸĪŊ3ðŸ”Ĩ1ðŸĨ°1👏1ðŸ˜ą1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĨ“ L-SVPE for Deep Deblurring ðŸĨ“

👉L-SVPE to deblur scenes while recovering high-freq details

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Learned Spatially Varying Pixel Exposures
✅Next-gen focal-plane sensor + DL
✅Deep conv decoder for motion deblurring
✅Superior results over non-optimized exp.

More: https://bit.ly/3uRYQMT
ðŸĪĐ7👍2ðŸĪ”2🎉1
This media is not supported in your browser
VIEW IN TELEGRAM
🧧Hyper-Fast Instance Segmentation🧧

👉Novel Temporally Efficient Vision Transformer (TeViT) for VIS

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Video instance segmentation transformer
✅Contextual-info at frame/instance level
✅Nearly convolution-free framework ðŸĪ·â€â™‚ïļ
✅The new SOTA for VIS, ~70 FPS!
✅Code & models under MIT license

More: https://bit.ly/3rCMXIn
ðŸ”Ĩ10👍3👏1ðŸĪŊ1
📗Unified Scene Text/Layout Detection📗

👉World's first hierarchical scene text dataset + novel detection method

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Unified detection & geometric layout
✅Hierarchical annotations in natural scenes
✅Word, line, & paragraph level annotations
✅Source under CC Attribution Share Alike 4.0

More: https://bit.ly/3jRpezV
ðŸ”Ĩ3ðŸĪŊ2âĪ1👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🙌 #Oculus' new Hand Tracking 🙌

👉Hands are able to move as naturally and intuitively in the #metaverse as do in real life

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Hands2.0 powered by CV & ML
✅Tracking hand-over-hand interactions
✅Crossing hands, clapping, high-fives
✅Accurate thumbs-up gesture

More: https://bit.ly/3JXPvY2
ðŸĪŊ6âĪ4👍2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🎗ïļNew SOTA in #3D human avatar🎗ïļ

👉PHORHUM: photorealistic 3D human from mono-RGB

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Pixel-aligned method for 3D geometry
✅Unshaded surface color + illumination
✅Patch-based rendering losses for visible
✅Plausible color estimation for non-visible

More: https://bit.ly/3MkvBrA
ðŸĪŊ4👍2ðŸĨ°2âĪ1
This media is not supported in your browser
VIEW IN TELEGRAM
📟 What's in your hands (#3D) ? 📟

👉Reconstructing hand-held objects (from single RGB) without knowing their 3D templatesðŸĪ·â€â™‚ïļ

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Hand is highly predictive of object shape
✅Conditional-based on the articulation
✅Visual feats. / articulation-aware coords.
✅Code and models available!

More: https://bit.ly/3vuYn2a
👍9ðŸĪŊ2ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
🔋YODO: You Only Demonstrate Once🔋

👉A novel category-level manipulation learned in sim from single demonstration videoðŸĪŊ

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅One-shot IL, model-free 6D pose tracking
✅Demonstration BY single 3rd-person-view
✅manipulation including hi-precision tasks
✅Category-level Behavior Cloning
✅Attention for dynamic coords selection
✅Generalizability to novel unseen obj/env

More: https://bit.ly/3v0V4R4
ðŸĪŊ8âĪ3👍2ðŸ˜ą2ðŸĪĐ2👏1
This media is not supported in your browser
VIEW IN TELEGRAM
👗 Dress Code for Virtual Try-On 👗

👉UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Hi-Res paired front-view / full-body
✅Pixel-level Semantic-Aware Discriminator
✅9 SOTA VTON approaches / 3 baselines
✅New SOTA considering res. & garments

More: https://bit.ly/3xKXSUw
âĪ3👍3ðŸ”Ĩ1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
🍃Deep Equilibrium for Optical Flow🍃

👉DEQ: converge faster, less memory, often more accurate

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Novel formulation of optical flow method
✅Compatible with prior modeling/data-related
✅Sparse fixed-point correction for stability
✅Code/models under GNU Affero GPL v3.0

More: https://bit.ly/3v4fZmi
👍3ðŸĨ°2ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŒģUltra High-Resolution Neural SaliencyðŸŒģ

👉A novel ultra high-resolution saliency detector with dataset!

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Ultra Hi-Res Saliency Detection
✅5,920 pics at 4K-8K resolution
✅Pyramid Grafting Network
✅Cross-Model Grafting Module
✅AGL: Attention Guided Loss
✅Code/models under MIT

More: https://bit.ly/3MnU1Rf
âĪ6👍3ðŸĪŊ3ðŸ”Ĩ2ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
🊆StyleGAN-Human for fashion 🊆

👉A novel unconditional human generation based on StyleGAN is out!

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅200,000+ labeled sample (pose/texture)
✅1024x512 StyleGAN-Human StyleGAN3
✅512x256 StyleGAN-Human StyleGAN1
✅Face model for downstream: InsetGAN
✅Source code and model available!

More: https://bit.ly/3xMg5B2
âĪ5👍4ðŸ”Ĩ3ðŸĪŊ1ðŸ’Đ1
This media is not supported in your browser
VIEW IN TELEGRAM
💀 OSSO: Skeletal Shape from Outside 💀

👉Anatomic skeleton of a person from 3D surface of body ðŸĶī

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Max Planck + IMATI-CNR + INRIA
✅DXA images to obtain #3D shape
✅External body to internal skeleton

More: https://bit.ly/3v7Z5TQ
👍4ðŸĪŊ2ðŸ”Ĩ1ðŸ˜ą1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷 Pix2Seq: object detection by #Google 🎷

👉A novel framework to perform object detection as a language modeling task

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Obj. detection as a lang-modeling task
✅BBs/labels -> seq. of discrete token
✅Encoder-decoder (one token at a time)
✅Code under Apache License 2.0

More: https://bit.ly/3F49PX3
👍8ðŸĪŊ3ðŸ”Ĩ1ðŸ˜ą1🎉1ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸŒđ Generalizable Neural Performer ðŸŒđ

👉General neural framework to synthesize free-viewpoint images of arbitrary human performers

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Free-viewpoint synthesis of humans
✅Implicit Geometric Body Embedding
✅Screen-Space Occlusion-Aware Blending
✅GeneBody: 4M frames, multi-view cams

More: https://cutt.ly/SGcnQzn
👍5ðŸ”Ĩ1ðŸĪŊ1
This media is not supported in your browser
VIEW IN TELEGRAM
🚌 Tire-defect inspection 🚌

👉Unsupervised defects in tires using neural networks

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Impurity, same material as tire
✅Impurity, with different material
✅Damage by temp/pressure
✅Crack or etched material

More: https://bit.ly/37GX1JT
âĪ5👍3ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
🧋#4D Neural Fields🧋

👉4D N.F. visual representations from monocular RGB-D ðŸĪŊ

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅4D scene completion (occlusions)
✅Scene completion in cluttered scenes
✅Novel #AI for contextual point clouds
✅Data, code, models under MIT license

More: https://cutt.ly/6GveKiJ
👍6ðŸĪŊ2ðŸ”Ĩ1ðŸĨ°1
This media is not supported in your browser
VIEW IN TELEGRAM
👔Largest dataset of human-object 👔

👉BEHAVE by Google: largest dataset of human-object interactions

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅8 subjects, 20 objects, 5 envs.
✅321 clips with 4 Kinect RGB-D
✅Masks and segmented point clouds
✅3D SMPL & mesh registration
✅Textured scan reconstructions

More: https://bit.ly/3Lx6NNo
👏5👍4ðŸ”Ĩ2âĪ1ðŸ˜ą1ðŸĪĐ1
This media is not supported in your browser
VIEW IN TELEGRAM
ðŸĶīENARF-GAN Neural ArticulationsðŸĶī

👉Unsupervised method for 3D geometry-aware representation of articulated objects

𝐇ðĒð ðĄðĨðĒð ðĄð­ðŽ:
✅Novel efficient neural representation
✅Tri-planes deformation fields for training
✅Novel GAN for articulated representations
✅Controllable 3D from real unlabeled pic

More: https://bit.ly/3xYqedN
ðŸĪŊ3👍2âĪ1ðŸ”Ĩ1ðŸĨ°1