Spark in me
2.2K subscribers
829 photos
48 videos
116 files
2.68K links
Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.
Download Telegram
Trick for image preprocessing - histogram equalization

Playing with multi-GPU small batch-sizes

If you play with SemSeg with a big model with large images (HD, FullHD) - you may face a situation when only one image fits to one GPU.

Also this is useful if your train-test split is far from ideal and or you are using pre-trained imagenet encoders for a SemSeg task - so you cannot really update your bnorm params.

Also AFAIK - all the major deep-learning frameworks:
(0) do not have batch norm freeze options on evaluation (batch-norm contains 2 sets of parameters - learnable and updated on inference
(1) calculate batch-norm for each GPU separately

It all may mean, that your models may severely underperform in inference for these situations.


(0) Sync batch-norm. I believe to do it properly you will have to modify the framework you are using, but there is a PyTorch implementation done for the CVPR 2018 - also an explanation here - I guess if its multi-GPU wrappers for model can be used for any models - then we are in the money)
(1) Use affine=False in your batch-norm. But probably in this case imagenet initialization will not help - you will have to train your model from scratch completely
(2) Freeze your encoder batch-norm params completely (though I am not sure - they do not seem to be freezing the running mean parameters) - probably this also needs m.trainable = False or something like this
(3) Use recent Facebook group norm -

This is a finicky topic - please tell in comments about your experiences and tests


Like this post or have something to say => tell us more in the comments or donate!
Forwarded from Варим МЛ
От меня тут давно ничего не было, потому что переезжал в другую страну (начал ещё в мае и только сейчас всё устаканилось). Долго думал о чём бы написать пост, но так как сейчас на работе пишу библиотеку для метрик лёрнинга, а о такой задаче знает не очень много людей, про неё и будет пост.

#Миша #обзор #CV
​​DINOv2: Learning Robust Visual Features without Supervision

Get ready for a game-changer in computer vision! Building on the groundbreaking achievements in natural language processing, foundation models are revolutionizing the way we use images in various systems. By generating all-purpose visual features that excel across diverse image distributions and tasks without finetuning, these models are set to redefine the field.

The researchers behind this work have combined cutting-edge techniques to scale pretraining in terms of data and model size, turbocharging the training process like never before. They've devised an ingenious automatic pipeline to create a rich, diverse, and curated image dataset, setting a new standard in the self-supervised literature. To top it off, they've trained a colossal ViT model with a staggering 1 billion parameters and distilled it into a series of smaller, ultra-efficient models. These models outshine the best available all-purpose features, OpenCLIP, on most benchmarks at both image and pixel levels.

A detailed unofficial overview of the paper:

Project link:
#deeplearning #cv #pytorch #imagesegmentation #sota #pretraining
​​Fast Segment Anything

The Segment Anything Model (SAM), a revolutionary tool in computer vision tasks, has significantly impacted various high-level tasks like image segmentation, image captioning, and image editing. However, its application has been restricted in industry scenarios due to its enormous computational demand, largely attributed to the Transformer architecture handling high-resolution inputs.

The authors of this paper have proposed a speedier alternative method that accomplishes this foundational task with performance on par with SAM, but at a staggering 50 times faster! By ingeniously reformulating the task as segments-generation and prompting and employing a regular CNN detector with an instance segmentation branch, they've converted this task into the well-established instance segmentation task. The magic touch? They've trained the existing instance segmentation method using just 1/50 of the SA-1B dataset, a stroke of brilliance that led to a solution marrying performance and efficiency.

Paper link:
Code link:

A detailed unofficial overview of the paper:

#deeplearning #cv #segmentanythingmodel #efficiency