Deep Learning

https://arxiv.org/pdf/2307.02486.pdf Scaling Transformers to
1,000,000,000 Tokens #Paper

113 viewsVadim ✨, 15:42

https://huggingface.co/collections/osanseviero/model-merging-65097893623330a3a51ead66

Model Merging: papers
#Paper

Model Merging - a osanseviero Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it!

🔥1

82 viewsVadim ✨, 05:34

Deep Learning

https://arxiv.org/pdf/2404.19756 Kolmogorov-Arnold Networks #paper

76 viewsVadim ✨, 16:43

Deep Learning

SOTA in unsupervised semantic segmentation:
1. STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences - 2022 https://arxiv.org/abs/2203.08414
2. HP: Leveraging Hidden Positives for Unsupervised Semantic Segmentation -2023 https://arxiv.org/abs/2303.15014
3. CAUSE: Causal Unsupervised Semantic Segmentation - 2023 https://arxiv.org/abs/2310.07379
#Paper

arXiv.org

Unsupervised Semantic Segmentation by Distilling Feature Correspondences

Unsupervised semantic segmentation aims to discover and localize semantically meaningful categories within image corpora without any form of annotation. To solve this task, algorithms must produce...

🔥1

85 viewsVadim, 19:57

Deep Learning

https://arxiv.org/pdf/2408.04840v1
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
#Paper

67 viewsVadim, 04:48

Deep Learning

https://arxiv.org/abs/2410.05258
Differential Transformer #Paper

arXiv.org

Differential Transformer

Transformer tends to overallocate attention to irrelevant context. In this work, we introduce Diff Transformer, which amplifies attention to the relevant context while canceling noise....

67 viewsVadim, 14:33

Deep Learning

https://arxiv.org/html/2405.18886v1 Compressing Large Language Models using Low Rank and Low Precision Decomposition #paper

46 viewsVadim, 20:04

Deep Learning

https://arxiv.org/abs/2412.11768
https://github.com/AnonymousAlethiometer/SGD_SaI/
#Paper #Frameworks

arXiv.org

No More Adam: Learning Rate Scaling at Initialization is All You Need

In this work, we question the necessity of adaptive gradient methods for training deep neural networks. SGD-SaI is a simple yet effective enhancement to stochastic gradient descent with momentum...

59 viewsVadim, edited 22:47

Deep Learning

Dino/Dino v2 explained: Self-distillation with no labels & etc. #FYI #Tips #Explained #Tutorial 1. https://medium.com/@anuj.dutt9/emerging-properties-in-self-supervised-vision-transformers-dino-paper-summary-4c7a6ed68161 Original Dino 2. https://encord.com/blog/dinov2…

https://www.samarkhanna.com/ExPLoRA/ Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts
#Paper #Framework

Samarkhanna

ExPLoRA

ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts

64 viewsVadim, 01:56

Deep Learning

https://arxiv.org/pdf/2411.07975
JanusFlow: Harmonizing Autoregression and Rectified Flow
for Unified Multimodal Understanding and Generation
#Paper

Finally multimodality on input and output!

61 viewsVadim, 03:32

Deep Learning

https://arxiv.org/abs/2502.07577 #Paper

arXiv.org

Automated Capability Discovery via Foundation Model Self-Exploration

Foundation models have become general-purpose assistants, exhibiting diverse capabilities across numerous domains through training on web-scale data. It remains challenging to precisely...

🔥1

39 viewsVadim, 04:23

Deep Learning

https://arxiv.org/pdf/2411.17525
https://huggingface.co/docs/transformers/main/en/quantization/higgs
https://github.com/HanGuo97/flute

Large Language Model Quantization
#Frameworks #Paper #Tips

40 viewsVadim, 21:21

Deep Learning

https://www.llmsresearch.com/ LLM paper reviews #Paper

LLMs Research

Daily newsletter categorizing & easily explaining LLMs research papers as they published.

35 viewsVadim, 21:04

Deep Learning

https://arxiv.org/abs/2505.24726 #Paper
Self-reflection helps to improve the LLM

arXiv.org

Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

We explore a method for improving the performance of large language models through self-reflection and reinforcement learning. By incentivizing the model to generate better self-reflections when...

41 viewsVadim, 06:14

Deep Learning

https://arxiv.org/abs/2506.01928 #Paper
Esoteric Language Models ;)
a new family of models that fuses autoregressive and Masked Diffusion Models paradigms

arXiv.org

Esoteric Language Models

Diffusion-based language models offer a compelling alternative to autoregressive (AR) models by enabling parallel and controllable generation. Among this family of models, Masked Diffusion Models...

38 viewsVadim, edited 03:09

Deep Learning

https://arxiv.org/abs/2503.19108
The plane ViT architecture without a decoder to perform fast image segmentation #Paper #Frameworks

arXiv.org

Your ViT is Secretly an Image Segmentation Model

Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks. To apply single-scale ViTs to image segmentation, existing methods adopt a...

47 viewsVadim, 21:10

Deep Learning

https://arxiv.org/abs/2412.16334 open-vocabulary segmentation! #Paper #Frameworks