https://paperswithcode.com/methods/category/convolutions
Known types of convolutions, including deformable convolutions and the latest PushPull-Conv
Known types of convolutions, including deformable convolutions and the latest PushPull-Conv
Paperswithcode
Papers with Code - An Overview of Convolutions
Convolutions are a type of operation that can be used to learn representations from images. They involve a learnable kernel sliding over the image and performing element-wise multiplication with the input. The specification allows for parameter sharing and…
https://huggingface.co/XiaomiMiMo/MiMo-VL-7B-RL
https://github.com/XiaomiMiMo/MiMo-VL
New Vision Language Model(VLM) that outperforms Qwen2.5-VL #models #vlm
https://github.com/XiaomiMiMo/MiMo-VL
New Vision Language Model(VLM) that outperforms Qwen2.5-VL #models #vlm
huggingface.co
XiaomiMiMo/MiMo-VL-7B-RL · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Relatively new python tools:
https://docs.astral.sh/uv/ An extremely fast Python package and project manager, written in Rust
https://docs.marimo.io/ marimo is a reactive Python notebook
#Tools
https://docs.astral.sh/uv/ An extremely fast Python package and project manager, written in Rust
https://docs.marimo.io/ marimo is a reactive Python notebook
#Tools
docs.astral.sh
uv
uv is an extremely fast Python package and project manager, written in Rust.
🔥1
Deep Learning
https://github.com/parthsarthi03/raptor #Frameworks
https://github.com/illuin-tech/colpali
Efficient Document Retrieval with Vision Language Models #Frameworks #Models
Efficient Document Retrieval with Vision Language Models #Frameworks #Models
GitHub
GitHub - illuin-tech/colpali: The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol. - illuin-tech/colpali
https://arxiv.org/abs/2506.01928 #Paper
Esoteric Language Models ;)
a new family of models that fuses autoregressive and Masked Diffusion Models paradigms
Esoteric Language Models ;)
a new family of models that fuses autoregressive and Masked Diffusion Models paradigms
arXiv.org
Esoteric Language Models
Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models...
https://arxiv.org/abs/2503.19108
The plane ViT architecture without a decoder to perform fast image segmentation #Paper #Frameworks
The plane ViT architecture without a decoder to perform fast image segmentation #Paper #Frameworks
arXiv.org
Your ViT is Secretly an Image Segmentation Model
Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks. To apply single-scale ViTs to image segmentation, existing methods adopt a...
https://arxiv.org/abs/2411.04983
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning #Frameworks #Paper
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning #Frameworks #Paper
arXiv.org
DINO-WM: World Models on Pre-trained Visual Features enable...
The ability to predict future outcomes given control actions is fundamental for physical reasoning. However, such predictive models, often called world models, remains challenging to learn and are...
Achieving 10,000x training data reduction with high-fidelity labels https://share.google/PXeW6ut6dkPw4M0zw
#paper
#paper
research.google
Achieving 10,000x training data reduction with high-fidelity labels
https://xl0.github.io/lovely-tensors/ Lovely Tensors is just working with tensors.
https://github.com/google/mediapy And this library just makes it easy to display images and videos in jupyter notebooks.
#library
https://github.com/google/mediapy And this library just makes it easy to display images and videos in jupyter notebooks.
#library
lovely-tensors
❤️ Lovely Tensors – lovely-tensors
After all, you are only human.
[2509.13351] Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning https://share.google/Po2YXN8rOVrhNXMvz
arXiv.org
Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning...
Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, yet their ability to perform structured symbolic planning remains limited, particularly in domains...
https://moondream.ai/blog/moondream-3-preview A small vision language model (VLM) designed for use in extreme cases or on devices. #Models
Moondream
A fast & powerful vision model that rocks.
https://www.perceptron.inc/blog/introducing-isaac-0-1 Another vision language model(VLM) with similar properties #Models
marketing.perceptron.inc
A layer of intelligence for the physical world.
We are a research company building the future of Physical AGI.
We are a research company building the future of Physical AGI.
https://arxiv.org/abs/2510.05949v1 JEPA architectures such as DINOv3 can be effectively used for data curation, outlier detection and similar tasks. #Paper
arXiv.org
Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density
Joint Embedding Predictive Architectures (JEPAs) learn representations able to solve numerous downstream tasks out-of-the-box. JEPAs combine two objectives: (i) a latent-space prediction term,...
https://github.com/microsoft/markitdown
Converts all major document formats to markdown and can work as an MCP server
Converts all major document formats to markdown and can work as an MCP server
GitHub
GitHub - microsoft/markitdown: Python tool for converting files and office documents to Markdown.
Python tool for converting files and office documents to Markdown. - microsoft/markitdown