AI with Papers - Artificial Intelligence & Deep Learning
15K subscribers
96 photos
238 videos
11 files
1.27K links
All the AI with papers. Every day fresh updates on Deep Learning, Machine Learning, and Computer Vision (with Papers).

Curated by Alessandro Ferrari | https://www.linkedin.com/in/visionarynet/
Download Telegram
This media is not supported in your browser
VIEW IN TELEGRAM
☄️STEVE: Slot-TransformEr for VidEos☄️

👉STEVE: unsupervised model for object-centric learning in videos

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Adoption of a slot decoder (SLATE)
SLATE with slot-level recurrence model
Complex and naturalistic videos
Significantly outperforms previous SOTA

More: https://bit.ly/3PNxxM3
🔥7👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🦔 CogVideo: insane text-to-clip 🦔

👉CogVideo: 9B-parameters world's first large scale open-source text-to-video 😵

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Largest open-source T2C transformer
Finetuning of text-to-image model
Multi-frame-rate hierarchical training
From pretrained model CogView2

More: https://bit.ly/3Gzfl4n
🔥9👍6
This media is not supported in your browser
VIEW IN TELEGRAM
🦄Time-Aware Neural Voxels🦄

👉TiNeuVox: "NeRF" with time-aware voxel features 😵

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Dynamic scene w/ optimizable structure
Temporal information in radiance net
Small/large motion w/ single-res of feats
192× faster than previous Hyper-NeRF

More: https://bit.ly/3wR4O08
👍11🔥2🤯1
🫐Neural Anomaly Detection by AWS🫐

👉Ultra-competitive inference and SOTA for both detection and localization

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Locally aggregated, mid-level feats patch
Maximizing nominal information at test time
Reducing biases towards ImageNet classes
Image-level anomaly AUROC of up to 99.6%

More: https://bit.ly/3t7Ndjg
🔥7🤯3👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🛹 Project Skate from Google #AI 🛹

👉#AI tool to analyze the skateboarder's tricks in real-time

More: https://bit.ly/3zbQS3M
🔥15🤩3👏1
This media is not supported in your browser
VIEW IN TELEGRAM
🧬Neural Text2Human Generation🧬

👉Text-driven neural human generation

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Full-body from a given human pose
Hierarchical texture-aware codebook
DeepFashion -> 44k Hi-Res images
Code and models available!

More: https://bit.ly/3Mdnpt0
🔥15👍1
🧨EfficientFormers: 1.6ms inference 🧨

👉Transformers fast as MobileNet? Snap shows that on #iphone!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Low latency on mobile, high performance!
Revisiting the design of ViT through latency
New dimension-consistent design paradigm
EfficientFormers: a new ViT for mobile!

More: https://bit.ly/3MdgW15
🔥16👍1🤯1
This media is not supported in your browser
VIEW IN TELEGRAM
🐢 Transformer-Based Sens-Fusion 🐢

👉Updating TransFuser (CVPR21): image + LiDAR representations with self-attention

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Existing approach can't handle traffic 😢
Novel multi-modal fusion transformer
The new SOTA in driving performance
Reducing avg collisions per KM by 48%
Insights on current limitations of E2E

More: https://bit.ly/391dmd6
👍11🔥2
🧘🏻‍♂️YogNet: neural yoga assistant🧘🏻‍♂️

👉Multi-person yoga neural expert for 20 asanas

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
CNNs & reg.LSTMs + 3D-CNNs
Multi-person asanas in real-time
YAR: dataset for yoga & posture
1206 videos, 2D RGB camera

More: https://bit.ly/3NncVbE
13👍1
This media is not supported in your browser
VIEW IN TELEGRAM
🔴 Geogram: geometric algos in C++ 🔴

👉Novel open-source programming library with (research) geometric algorithms in C++

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Geometry Processing from #INRIA
30+ papers from SIGGRAPH, etc.
Grants: GOODSHAPE & VORPALINE
Code (mostly C++) under BSD 3

More: https://bit.ly/3mhS4L7
🔥6👍31
🍏 Open Source Vision from #Apple 🍏

👉CVNets: open-source (not a joke) lib for neural vision.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
PyTorch-based neural lib. for vision
Train 2−4× longer w/ augmentations
Plug-and-play components for CV
Source code under a custom license

More: https://bit.ly/39d1dSj
👍9
This media is not supported in your browser
VIEW IN TELEGRAM
🏇🏻Neural Clips by #Nvidia: INSANE 🏇🏻

👉Neural generation with changes in camera viewpoint & content that arises over time 🤯

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Novel hierarchical generator architecture
Temp. receptive field + temporal embed.
Multi-res. with super-resolution network
SOTA in long clip with motion & changes
Code, data & models in August 2022 🏖️

More: https://bit.ly/3zroWsC
🤯9👎21
This media is not supported in your browser
VIEW IN TELEGRAM
Zero to #Messi with #deeplearning

👉EA unveils a neural system to learn multiple soccer juggling skills 😍

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Learning difficult soccer juggling skills
Layer-wise mixture-of-experts architecture
Specialization arises naturally
Adaptive random walk training strategy

More: https://bit.ly/3mwRaL2
🔥7👍3
This media is not supported in your browser
VIEW IN TELEGRAM
🏖️ HumanNeRF: source code is out! 🏖️

👉Pausing the video at any frame and rendering the subject from arbitrary views!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Synthesizing photorealistic humans
Synthesizing details, ie. cloth & face
Volumetric canonical T-pose
Skeletal rigid/non-rigid decomposition

More: https://bit.ly/3NEkTNY
🤯17🔥5👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🎒 EG3D: source code is out! 🎒

👉#Nvidia just opened EG3D: real time multi-view faces w/ HQ #3D geometry!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Tri-plane-based 3D GAN framework
Pose-correlated attribute (expression)
SOTA in uncond. 3D-aware synthesis
Source code & models NOW available!

More: https://bit.ly/3aOfHs0
🔥7🤯6👍42
🔥One Millisecond Backbone. Fire!🔥

👉MobileOne by #Apple: efficient mobile backbone with inference <1 ms on #iPhone12!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
75.9% top-1 accuracy on ImageNet
38× faster than MobileFormer net
Classification, detection & segmentation
Source code & model soon available!

More: https://bit.ly/3tsT7f2
24👍2
This media is not supported in your browser
VIEW IN TELEGRAM
🧨 Scaling Transformers to GigaPixels!🧨

👉Novel ViT called Hierarchical Image Pyramid Transformer (HIPT) -> Scaling to GigaPixels!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Gigapixel whole-slide imaging (WSI)
Leveraging natural hier. structure of WSI
Self-supervised Hi-Res representations
Source code and models available!

More: https://bit.ly/3xLuzkg
🤯16👍1
This media is not supported in your browser
VIEW IN TELEGRAM
👗BodyMap: Hyper-Detailed Humans👗

👉#META unveils 1st-ever dense continuous correspondence for clothed humans

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
1st-ever dense continuous corresp.
HQ fingers, hair, and clothes
Novel ViT-based architecture
SOTA on DensePose COCO

More: https://bit.ly/39nEPps
👍132
🐹 NOAH just open-sourced! 🐹

👉A novel approach to find the optimal design of prompt modules through NAS algos.

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
NOAH from Neural prOmpt seArcH
Parameter-efficient “prompt modules”
Efficient NAS-based implementation
Better than transfer, few-shot & domain gen.

More: https://bit.ly/3MKfVhi
👍5👏2🥰1
This media is not supported in your browser
VIEW IN TELEGRAM
🏄🏻‍♀️Neural Super-Resolution in Movies🏄🏻‍♀️

👉Implicit neural representation to get arbitrary spatial resolution & FPS -> Super Resolution!

𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬:
Video as continuous video representation
Clips in arbitrary space/time resolution
OOD generalization in space-time
Source code and models available

More: https://bit.ly/3xsqccf
🔥6👍2