AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
95 photos
235 videos
11 files
1.26K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
πŸ“—Unified Scene Text/Layout DetectionπŸ“—

πŸ‘‰World's first hierarchical scene text dataset + novel detection method

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Unified detection & geometric layout
βœ…Hierarchical annotations in natural scenes
βœ…Word, line, & paragraph level annotations
βœ…Source under CC Attribution Share Alike 4.0

More: https://bit.ly/3jRpezV
πŸ”₯3🀯2❀1πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ™Œ #Oculus' new Hand Tracking πŸ™Œ

πŸ‘‰Hands are able to move as naturally and intuitively in the #metaverse as do in real life

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hands2.0 powered by CV & ML
βœ…Tracking hand-over-hand interactions
βœ…Crossing hands, clapping, high-fives
βœ…Accurate thumbs-up gesture

More: https://bit.ly/3JXPvY2
🀯6❀4πŸ‘2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸŽ—οΈNew SOTA in #3D human avatarπŸŽ—οΈ

πŸ‘‰PHORHUM: photorealistic 3D human from mono-RGB

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Pixel-aligned method for 3D geometry
βœ…Unshaded surface color + illumination
βœ…Patch-based rendering losses for visible
βœ…Plausible color estimation for non-visible

More: https://bit.ly/3MkvBrA
🀯4πŸ‘2πŸ₯°2❀1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ“Ÿ What's in your hands (#3D) ? πŸ“Ÿ

πŸ‘‰Reconstructing hand-held objects (from single RGB) without knowing their 3D templatesπŸ€·β€β™‚οΈ

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hand is highly predictive of object shape
βœ…Conditional-based on the articulation
βœ…Visual feats. / articulation-aware coords.
βœ…Code and models available!

More: https://bit.ly/3vuYn2a
πŸ‘9🀯2πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”‹YODO: You Only Demonstrate OnceπŸ”‹

πŸ‘‰A novel category-level manipulation learned in sim from single demonstration video🀯

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…One-shot IL, model-free 6D pose tracking
βœ…Demonstration BY single 3rd-person-view
βœ…manipulation including hi-precision tasks
βœ…Category-level Behavior Cloning
βœ…Attention for dynamic coords selection
βœ…Generalizability to novel unseen obj/env

More: https://bit.ly/3v0V4R4
🀯8❀3πŸ‘2😱2🀩2πŸ‘1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘— Dress Code for Virtual Try-On πŸ‘—

πŸ‘‰UniMORE (+ YOOX) unveils a novel dataset/approach for virtual try-on.

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Hi-Res paired front-view / full-body
βœ…Pixel-level Semantic-Aware Discriminator
βœ…9 SOTA VTON approaches / 3 baselines
βœ…New SOTA considering res. & garments

More: https://bit.ly/3xKXSUw
❀3πŸ‘3πŸ”₯1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸƒDeep Equilibrium for Optical FlowπŸƒ

πŸ‘‰DEQ: converge faster, less memory, often more accurate

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Novel formulation of optical flow method
βœ…Compatible with prior modeling/data-related
βœ…Sparse fixed-point correction for stability
βœ…Code/models under GNU Affero GPL v3.0

More: https://bit.ly/3v4fZmi
πŸ‘3πŸ₯°2🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🌳Ultra High-Resolution Neural Saliency🌳

πŸ‘‰A novel ultra high-resolution saliency detector with dataset!

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Ultra Hi-Res Saliency Detection
βœ…5,920 pics at 4K-8K resolution
βœ…Pyramid Grafting Network
βœ…Cross-Model Grafting Module
βœ…AGL: Attention Guided Loss
βœ…Code/models under MIT

More: https://bit.ly/3MnU1Rf
❀6πŸ‘3🀯3πŸ”₯2🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸͺ†StyleGAN-Human for fashion πŸͺ†

πŸ‘‰A novel unconditional human generation based on StyleGAN is out!

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…200,000+ labeled sample (pose/texture)
βœ…1024x512 StyleGAN-Human StyleGAN3
βœ…512x256 StyleGAN-Human StyleGAN1
βœ…Face model for downstream: InsetGAN
βœ…Source code and model available!

More: https://bit.ly/3xMg5B2
❀5πŸ‘4πŸ”₯3🀯1πŸ’©1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ’€ OSSO: Skeletal Shape from Outside πŸ’€

πŸ‘‰Anatomic skeleton of a person from 3D surface of body 🦴

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Max Planck + IMATI-CNR + INRIA
βœ…DXA images to obtain #3D shape
βœ…External body to internal skeleton

More: https://bit.ly/3v7Z5TQ
πŸ‘4🀯2πŸ”₯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
🎷 Pix2Seq: object detection by #Google 🎷

πŸ‘‰A novel framework to perform object detection as a language modeling task

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Obj. detection as a lang-modeling task
βœ…BBs/labels -> seq. of discrete token
βœ…Encoder-decoder (one token at a time)
βœ…Code under Apache License 2.0

More: https://bit.ly/3F49PX3
πŸ‘8🀯3πŸ”₯1😱1πŸŽ‰1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🌹 Generalizable Neural Performer 🌹

πŸ‘‰General neural framework to synthesize free-viewpoint images of arbitrary human performers

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Free-viewpoint synthesis of humans
βœ…Implicit Geometric Body Embedding
βœ…Screen-Space Occlusion-Aware Blending
βœ…GeneBody: 4M frames, multi-view cams

More: https://cutt.ly/SGcnQzn
πŸ‘5πŸ”₯1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
🚌 Tire-defect inspection 🚌

πŸ‘‰Unsupervised defects in tires using neural networks

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Impurity, same material as tire
βœ…Impurity, with different material
βœ…Damage by temp/pressure
βœ…Crack or etched material

More: https://bit.ly/37GX1JT
❀5πŸ‘3🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ§‹#4D Neural FieldsπŸ§‹

πŸ‘‰4D N.F. visual representations from monocular RGB-D 🀯

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…4D scene completion (occlusions)
βœ…Scene completion in cluttered scenes
βœ…Novel #AI for contextual point clouds
βœ…Data, code, models under MIT license

More: https://cutt.ly/6GveKiJ
πŸ‘6🀯2πŸ”₯1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ‘”Largest dataset of human-object πŸ‘”

πŸ‘‰BEHAVE by Google: largest dataset of human-object interactions

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…8 subjects, 20 objects, 5 envs.
βœ…321 clips with 4 Kinect RGB-D
βœ…Masks and segmented point clouds
βœ…3D SMPL & mesh registration
βœ…Textured scan reconstructions

More: https://bit.ly/3Lx6NNo
πŸ‘5πŸ‘4πŸ”₯2❀1😱1🀩1
This media is not supported in your browser
VIEW IN TELEGRAM
🦴ENARF-GAN Neural Articulations🦴

πŸ‘‰Unsupervised method for 3D geometry-aware representation of articulated objects

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Novel efficient neural representation
βœ…Tri-planes deformation fields for training
βœ…Novel GAN for articulated representations
βœ…Controllable 3D from real unlabeled pic

More: https://bit.ly/3xYqedN
🀯3πŸ‘2❀1πŸ”₯1πŸ₯°1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ–²οΈ HuMMan: 4D human dataset πŸ–²οΈ

πŸ‘‰HuMMan: 4D dataset with 1000 humans, 400k sequences & 60M frames 🀯

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…RGB, pt-clouds, keypts, SMPL, texture
βœ…Mobile device in the sensor suite
βœ…500+ actions to cover movements

More: https://bit.ly/3vTRW8Z
πŸ₯°2😱2πŸ‘1🀯1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯Neighborhood Attention Transformer πŸ”₯

πŸ‘‰A novel transformer for both image classification and downstream vision tasks

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Neighborhood Attention (NA)
βœ…Neighborhood Attention Transformer, NAT
βœ…Faster training/inference, good throughput
βœ…Checkpoints, train, #CUDA kernel available

More: https://bit.ly/3F5aVSo
🀯4πŸ‘3πŸ”₯1😱1
This media is not supported in your browser
VIEW IN TELEGRAM
πŸ”₯πŸ”₯FANs: Fully Attentional NetworksπŸ”₯πŸ”₯

πŸ‘‰#Nvidia unveils the fully attentional networks (FANs)

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…Efficient fully attentional design
βœ…Semantic seg. & object detection
βœ…Model/source code soon available!

More: https://bit.ly/3vtpITs
πŸ”₯7🀯3πŸ‘2❀1
πŸ‘¨πŸΌβ€πŸŽ¨ Open-Source DALLΒ·E 2 is out πŸ‘¨πŸΌβ€πŸŽ¨

πŸ‘‰#Pytorch implementation of DALL-E 2, #OpenAI's latest text-to-image neural net.

𝐇𝐒𝐠𝐑π₯𝐒𝐠𝐑𝐭𝐬:
βœ…SOTA for text-to-image generation
βœ…Source code/model under MIT License
βœ…"Medieval painting of wifi not working"

More: https://bit.ly/3vzsff6
🀯14πŸ‘6😁1