https://huggingface.co/collections/osanseviero/model-merging-65097893623330a3a51ead66
Model Merging: papers
#Paper
Model Merging: papers
#Paper
huggingface.co
Model Merging - a osanseviero Collection
Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it!
🔥1
SOTA in unsupervised semantic segmentation:
1. STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences - 2022 https://arxiv.org/abs/2203.08414
2. HP: Leveraging Hidden Positives for Unsupervised Semantic Segmentation -2023 https://arxiv.org/abs/2303.15014
3. CAUSE: Causal Unsupervised Semantic Segmentation - 2023 https://arxiv.org/abs/2310.07379
#Paper
1. STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences - 2022 https://arxiv.org/abs/2203.08414
2. HP: Leveraging Hidden Positives for Unsupervised Semantic Segmentation -2023 https://arxiv.org/abs/2303.15014
3. CAUSE: Causal Unsupervised Semantic Segmentation - 2023 https://arxiv.org/abs/2310.07379
#Paper
arXiv.org
Unsupervised Semantic Segmentation by Distilling Feature Correspondences
Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce...
🔥1
https://arxiv.org/pdf/2408.04840v1
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
#Paper
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
#Paper
https://arxiv.org/html/2405.18886v1 Compressing Large Language Models using Low Rank and Low Precision Decomposition #paper
https://arxiv.org/abs/2412.11768
https://github.com/AnonymousAlethiometer/SGD_SaI/
#Paper #Frameworks
https://github.com/AnonymousAlethiometer/SGD_SaI/
#Paper #Frameworks
arXiv.org
No More Adam: Learning Rate Scaling at Initialization is All You Need
In this work, we question the necessity of adaptive gradient methods for training deep neural networks. SGD-SaI is a simple yet effective enhancement to stochastic gradient descent with momentum...
Deep Learning
Dino/Dino v2 explained: Self-distillation with no labels & etc. #FYI #Tips #Explained #Tutorial 1. https://medium.com/@anuj.dutt9/emerging-properties-in-self-supervised-vision-transformers-dino-paper-summary-4c7a6ed68161 Original Dino 2. https://encord.com/blog/dinov2…
https://www.samarkhanna.com/ExPLoRA/ Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts
#Paper #Framework
#Paper #Framework
Samarkhanna
ExPLoRA
ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts
https://arxiv.org/pdf/2411.07975
JanusFlow: Harmonizing Autoregression and Rectified Flow
for Unified Multimodal Understanding and Generation
#Paper
Finally multimodality on input and output!
JanusFlow: Harmonizing Autoregression and Rectified Flow
for Unified Multimodal Understanding and Generation
#Paper
Finally multimodality on input and output!
https://arxiv.org/abs/2506.01928 #Paper
Esoteric Language Models ;)
a new family of models that fuses autoregressive and Masked Diffusion Models paradigms
Esoteric Language Models ;)
a new family of models that fuses autoregressive and Masked Diffusion Models paradigms
arXiv.org
Esoteric Language Models
Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models...
https://arxiv.org/abs/2503.19108
The plane ViT architecture without a decoder to perform fast image segmentation #Paper #Frameworks
The plane ViT architecture without a decoder to perform fast image segmentation #Paper #Frameworks
arXiv.org
Your ViT is Secretly an Image Segmentation Model
Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks. To apply single-scale ViTs to image segmentation, existing methods adopt a...
https://arxiv.org/abs/2411.04983
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning #Frameworks #Paper
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning #Frameworks #Paper
arXiv.org
DINO-WM: World Models on Pre-trained Visual Features enable...
The ability to predict future outcomes given control actions is fundamental for physical reasoning. However, such predictive models, often called world models, remains challenging to learn and are...
Achieving 10,000x training data reduction with high-fidelity labels https://share.google/PXeW6ut6dkPw4M0zw
#paper
#paper
research.google
Achieving 10,000x training data reduction with high-fidelity labels