https://arxiv.org/abs/2412.11768
https://github.com/AnonymousAlethiometer/SGD_SaI/
#Paper #Frameworks
https://github.com/AnonymousAlethiometer/SGD_SaI/
#Paper #Frameworks
arXiv.org
No More Adam: Learning Rate Scaling at Initialization is All You Need
In this work, we question the necessity of adaptive gradient methods for training deep neural networks. SGD-SaI is a simple yet effective enhancement to stochastic gradient descent with momentum...
Deep Learning
https://github.com/parthsarthi03/raptor #Frameworks
https://github.com/illuin-tech/colpali
Efficient Document Retrieval with Vision Language Models #Frameworks #Models
Efficient Document Retrieval with Vision Language Models #Frameworks #Models
GitHub
GitHub - illuin-tech/colpali: The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.
The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol. - illuin-tech/colpali
https://arxiv.org/abs/2503.19108
The plane ViT architecture without a decoder to perform fast image segmentation #Paper #Frameworks
The plane ViT architecture without a decoder to perform fast image segmentation #Paper #Frameworks
arXiv.org
Your ViT is Secretly an Image Segmentation Model
Vision Transformers (ViTs) have shown remarkable performance and scalability across various computer vision tasks. To apply single-scale ViTs to image segmentation, existing methods adopt a...
https://arxiv.org/abs/2411.04983
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning #Frameworks #Paper
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning #Frameworks #Paper
arXiv.org
DINO-WM: World Models on Pre-trained Visual Features enable...
The ability to predict future outcomes given control actions is fundamental for physical reasoning. However, such predictive models, often called world models, remains challenging to learn and are...