Machine learning books and papers
22.8K subscribers
974 photos
54 videos
928 files
1.31K links
Admin: @Raminmousa
Watsapp: +989333900804
ID: @Machine_learn
link: https://t.me/Machine_learn
Download Telegram
🏥 MedMNIST-C: benchmark dataset based on the MedMNIST+ collection covering 12 2D datasets and 9 imaging modalities.

pip install medmnistc

🖥 Github: https://github.com/francescodisalvo05/medmnistc-api

📕 Paper: https://arxiv.org/abs/2406.17536v2

🔥Dataset: https://paperswithcode.com/dataset/imagenet-c

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
2🔥1
This media is not supported in your browser
VIEW IN TELEGRAM
🌟 SEE-2-SOUND - a method for generating complex spatial sound based on images and videos

— pip install see2sound

🖥 GitHub
🟡 Hugging Face
🟡 Arxiv

@Machine_learn
🔥5
سلام دوستانی که مقاله دارن می تونن به این ژورنال بفرستن و من و به عنوان داور معرفی کنن
@Machine_learn
👍83🔥1
Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

🖥 Github: https://github.com/linghuyuhangyuan/m2s

📕 Paper: https://arxiv.org/abs/2407.05875v1

🔥Dataset: https://paperswithcode.com/task/denoising

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
👍3🔥21
Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation (ECCV 2024)

🖥 Github: https://github.com/fanghaook/ovformer

📕 Paper: https://arxiv.org/abs/2407.07427v1

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Multimodal contrastive learning for spatial gene expression prediction using histology images

🖥 Github: https://github.com/modelscope/data-juicer

📕 Paper: https://arxiv.org/abs/2407.08583v1

🚀 Dataset: https://paperswithcode.com/dataset/coco

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
👍4🔥2
🌟 An Empirical Study of Mamba-based Pedestrian Attribute Recognition

🖥 Github: https://github.com/event-ahu/openpar

📕 Paper: https://arxiv.org/pdf/2407.10374v1.pdf

🚀 Dataset: https://paperswithcode.com/dataset/peta

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

🖥 Github: https://github.com/kaistmm/SSLalignment

📕 Paper: https://arxiv.org/abs/2407.13676v1

🚀 Dataset: https://paperswithcode.com/dataset/is3-interactive-synthetic-sound-source

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
👍21
🌟 MG-LLaVA - multimodal LLM with advanced capabilities for working with visual information

Just recently, the guys from Shanghai University rolled out MG-LLaVA - MLLM, which expands the capabilities of processing visual information through the use of additional components: special components that are responsible for working with low and high resolution.

MG-LLaVA integrates an additional high-resolution visual encoder to capture fine details, which are then combined with underlying visual features using the Conv-Gate network.

Trained exclusively on publicly available multimodal data, MG-LLaVA achieves excellent results.

🟡 MG-LLaVA page
🖥 GitHub

@Machine_learn
👍2
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

🖥 Github: https://github.com/kaistmm/SSLalignment

📕 Paper: https://arxiv.org/abs/2407.13676v1

🚀 Dataset: https://paperswithcode.com/dataset/is3-interactive-synthetic-sound-source

@Machine_learn
🔥3
🖥 StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset.

🖥 Github: https://github.com/huochf/StackFLOW

📕 Paper: https://arxiv.org/abs/2407.20545v1

🚀 Dataset: https://paperswithcode.com/dataset/behave

@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
👍2
How to Think Like a Computer Scientist: Interactive Edition

https://runestone.academy/ns/books/published/thinkcspy/index.html

@Machine_learn
👍9