Computer Science and Programming

Accurate and Efficient Stereo Matching via Attention Concatenation Volume

Stereo Depth Estimation

Paper:
https://arxiv.org/pdf/2209.12699.pdf

Github:
https://github.com/gangweiX/Fast-ACVNet

Demo:
https://www.youtube.com/watch?v=az4Z3dp72Zw

ONNX:
ONNX-FastACVNet-Stereo-Depth-Estimation

@computer_science_and_programming

👍101👎4❤1

107K viewsAbdulaziz Gaibullayev, 06:52

Computer Science and Programming

Happy New Year!

Summary of our channel for 2022.
(thanks for curated summary for TGSTAT team)

TGSTAT team: In the new 2023 year, we wish a rapid increase in subscribers, high posts reach, high-quality active audience and, of course, happiness and health.

A traditional present from us is a New Year card with your channel's this year results.

See you in 2023,

@computer_science_and_programming

👍174

108K viewsAbdulaziz Gaibullayev, edited 07:11

Computer Science and Programming

PACO: Parts and Attributes of Common Objects

Sometimes object detection is not enough and you need more detail about object. Especially, when parts of objects is matters in your task. Then this dataset is for you from Facebook research team.

PACO is a detection dataset that goes beyond traditional object boxes and masks and provides richer annotations such as part masks and attributes. It spans 75 object categories, 456 object-part categories and 55 attributes across image (LVIS) and video (Ego4D) datasets.

Paper:
https://arxiv.org/pdf/2301.01795.pdf

Github:
https://github.com/facebookresearch/paco

Visualization:
https://github.com/facebookresearch/paco/tree/main/notebooks

@computer_science_and_programming

👍97👎5❤1

106K viewsAbdulaziz Gaibullayev, edited 12:01

Computer Science and Programming

MIT Introduction to Deep Learning - 2023 Starting soon! MIT Intro to DL is one of the most concise AI courses on the web that cover basic deep learning techniques, architectures, and applications.

2023 lectures are starting in just one day, Jan 9th!

Link to register:
http://introtodeeplearning.com

MIT Introduction to Deep Learning The 2022 lectures can be found here:

https://m.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI

👉 @computer_science_and_programming

👍157👎12❤1

118K viewsAbdulaziz Gaibullayev, edited 15:43

Computer Science and Programming

1:01

This media is not supported in your browser

VIEW IN TELEGRAM

YOLOv8 is the newest state-of-the-art YOLO model that can be used for object detection, image classification, and instance segmentation tasks. YOLOv8 includes numerous architectural and developer experience changes and improvements over YOLOv5.

Code:
https://github.com/ultralytics/ultralytics

What's New in YOLOv8 ?
https://blog.roboflow.com/whats-new-in-yolov8/

Yolov8 Instance Segmentation (ONNX):
https://github.com/ibaiGorordo/ONNX-YOLOv8-Instance-Segmentation

👉 @computer_science_and_programming

👍165👎5

121K viewsAbdulaziz Gaibullayev, edited 07:36

Computer Science and Programming

This media is not supported in your browser

VIEW IN TELEGRAM

Box2Mask: Box-supervised Instance Segmentation via Level-set Evolution

BoxInstSeg is a toolbox that aims to provide state-of-the-art box-supervised instance segmentation algorithms. It supports instance segmentation with only box annotations.

Github:
https://github.com/LiWentomng/BoxInstSeg

Paper:
https://arxiv.org/pdf/2212.01579.pdf

👉@computer_science_and_programming

👍118👎6

113K viewsAbdulaziz Gaibullayev, edited 13:22

Computer Science and Programming

This media is not supported in your browser

VIEW IN TELEGRAM

GLIGEN: Open-Set Grounded Text-to-Image Generation.

GLIGEN (Grounded-Language-to-Image Generation) a novel approach that builds upon and extends the functionality of existing pre-trained text-to-image diffusion models by enabling them to also be conditioned on grounding inputs.

Project page:
https://gligen.github.io/

Paper:
https://arxiv.org/abs/2301.07093

Github (coming soon):
https://github.com/gligen/GLIGEN

Demo:
https://huggingface.co/spaces/gligen/demo

👉@computer_science_and_programming

👍110👎5

116K viewsAbdulaziz Gaibullayev, edited 12:22

Computer Science and Programming

Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Cut-and-LEaRn (CutLER) is a simple approach for training object detection and instance segmentation models without human annotations. It outperforms previous SOTA by 2.7 times for AP50 and 2.6 times for AR on 11 benchmarks.

Paper:
https://arxiv.org/pdf/2301.11320.pdf

Github:
https://github.com/facebookresearch/CutLER

Demo:
https://colab.research.google.com/drive/1NgEyFHvOfuA2MZZnfNPWg1w5gSr3HOBb?usp=sharing

👉@computer_science_and_programming

👍99👎1

122K viewsAbdulaziz Gaibullayev, edited 07:19

Computer Science and Programming

Audio AI Timeline

Here we will keep track of the latest AI models for audio generation, starting in 2023!

▪️SingSong: Generating musical accompaniments from singing
- Paper

▪️AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
- Paper
- Code

▪️Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
- Paper
- Code

▪️Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
- Paper

▪️Noise2Music

▪️RAVE2
- Paper
- Code

▪️MusicLM: Generating Music From Text
- Paper

▪️Msanii: High Fidelity Music Synthesis on a Shoestring Budget
- Paper
- Code
- HuggingFace

▪️ArchiSound: Audio Generation with Diffusion
- Paper
- Code

▪️VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
- Paper

👉@computer_science_and_programming

👍174👎4❤2

122K viewsAbdulaziz Gaibullayev, edited 14:49

Computer Science and Programming

1:19

This media is not supported in your browser

VIEW IN TELEGRAM

Gen-1: The Next Step Forward for Generative AI

Use words and images to generate new videos out of existing

Introducing Gen-1: a new AI model that uses language and images to generate new videos out of existing ones.

https://research.runwayml.com/gen1

⭐️ Project:
https://research.runwayml.com/gen1

✅ Paper:
https://arxiv.org/abs/2302.03011

📌Request form:
https://docs.google.com/forms/d/e/1FAIpQLSfU0O_i1dym30hEI33teAvCRQ1i8UrGgXd4BPrvBWaOnDgs9g/viewform

👉@computer_science_and_programming

👍154👎7❤1

124K viewsAbdulaziz Gaibullayev, 10:17

Computer Science and Programming

This media is not supported in your browser

VIEW IN TELEGRAM

YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection

SPATIO-temporal action detection (STAD) aims to detect action instances in the current frame, which it has been widely applied, such as video surveillance and somatosensory game.

Paper:
https://arxiv.org/pdf/2302.06848.pdf

Github:
https://github.com/yjh0410/YOWOv2

Dataset:
https://drive.google.com/file/d/1Dwh90pRi7uGkH5qLRjQIFiEmMJrAog5J/view?usp=sharing

👉@computer_science_and_programming

👍131👎4

142K viewsAbdulaziz Gaibullayev, edited 09:04

Computer Science and Programming

0:31

This media is not supported in your browser

VIEW IN TELEGRAM

3D-aware Conditional Image Synthesis (pix2pix3D)

Pix2pix3D synthesizes 3D objects (neural fields) given a 2D label map, such as a segmentation or edge map

Github:
https://github.com/dunbar12138/pix2pix3D

Paper:
https://arxiv.org/abs/2302.08509

Project:
https://www.cs.cmu.edu/~pix2pix3D/

Datasets:
CelebAMask , AFHQ-Cat-Seg , Shapenet-Car-Edge

👉@computer_science_and_programming

👍192👎6

158K viewsAbdulaziz Gaibullayev, 11:44

Computer Science and Programming

Efficient Teacher: Semi-Supervised Object Detection for YOLOv5

✅ Efficient Teacher introduces semi-supervised object detection into practical applications, enabling users to obtain a strong generalization capability with only a small amount of labeled data and large amount of unlabeled data.

✅ Efficient Teacher provides category and custom uniform sampling, which can quickly improve the network performance in actual business scenarios.

Paper:
https://arxiv.org/abs/2302.07577

Github:
https://github.com/AlibabaResearch/efficientteacher

👉@computer_science_and_programming

👍174👎2

157K viewsAbdulaziz Gaibullayev, 10:22

Computer Science and Programming

Multivariate Probabilistic Time Series Forecasting with Informer

Efficient transformer-based model for LSTF.

Method introduces a Probabilistic Attention mechanism to select the “active” queries rather than the “lazy” queries and provides a sparse Transformer thus mitigating the quadratic compute and memory requirements of vanilla attention.

🤗Hugging face:
https://huggingface.co/blog/informer

⏩ Paper:
https://huggingface.co/docs/transformers/main/en/model_doc/informer

⭐️ Colab:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multivariate_informer.ipynb

💨 Dataset:
https://huggingface.co/docs/datasets/v2.7.0/en/package_reference/main_classes#datasets.Dataset.set_transform

👉@computer_science_and_programming

👍180👎8❤2

187K viewsAbdulaziz Gaibullayev, 12:14

Computer Science and Programming

0:24

This media is not supported in your browser

VIEW IN TELEGRAM

ViperGPT: Visual Inference via Python Execution for Reasoning

ViperGPT, a framework that leverages code-generation models to compose vision-and-language models into subroutines to produce a result for any query.

Github:
https://github.com/cvlab-columbia/viper

Paper:
https://arxiv.org/pdf/2303.08128.pdf

Project:
https://paperswithcode.com/dataset/beat

👉@computer_science_and_programming

👍225👎7❤1

189K viewsAbdulaziz Gaibullayev, 08:15

Computer Science and Programming

0:20

This media is not supported in your browser

VIEW IN TELEGRAM

Test of Time: Instilling Video-Language Models with a Sense of Time

GPT-5 will likely have video abilities, but will it have a sense of time? Here is answer to this question in #CVPR2023 paper by student of University of Amsterdam to learn how to instil time into video-language foundation models.

Paper:
https://arxiv.org/abs/2301.02074

Code:
https://github.com/bpiyush/TestOfTime

Project Page:
https://bpiyush.github.io/testoftime-website/

👉 @computer_science_and_programming

👍180👎7

201K views14:00

Computer Science and Programming

DragGAN.gif

20.6 MB

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Paper:
https://arxiv.org/abs/2305.10973

Github:
https://github.com/XingangPan/DragGAN

Project page:
https://vcai.mpi-inf.mpg.de/projects/DragGAN/

👉 @computer_science_and_programming

👍182👎10

188K views09:20

Computer Science and Programming

🔭 GRES: Generalized Referring Expression Segmentation

New benchmark (GRES), which extends the classic RES to allow expressions to refer to an arbitrary number of target objects.

🖥 Github: https://github.com/henghuiding/ReLA

⏩ Paper: https://arxiv.org/abs/2306.00968

🔎 Project: https://henghuiding.github.io/GRES/

📌 New dataset: https://github.com/henghuiding/gRefCOCO

👉 @computer_science_and_programming

👍131❤1👎1

188K views05:49

About

Blog

Apps

Platform