Подборка - статьи про pytorch
Гайд раз - https://t.me/snakers4/1362
Fast.ai и pytorch - https://t.me/snakers4/1367
Цикл статей про SSD на pytorch - https://t.me/snakers4/1435
Pytorch и докер -https://t.me/snakers4/1438
Впечатления про pytorch - https://t.me/snakers4/1442
Расширение классов Pytorch - https://t.me/snakers4/1447
Аугментации на pytorch - https://t.me/snakers4/1449
Pytorch - снижение lr по шагам - https://t.me/snakers4/1457
Внутрянка tf и pytorch - https://t.me/snakers4/1467
#digest
#deep_learning
#pytorch
Гайд раз - https://t.me/snakers4/1362
Fast.ai и pytorch - https://t.me/snakers4/1367
Цикл статей про SSD на pytorch - https://t.me/snakers4/1435
Pytorch и докер -https://t.me/snakers4/1438
Впечатления про pytorch - https://t.me/snakers4/1442
Расширение классов Pytorch - https://t.me/snakers4/1447
Аугментации на pytorch - https://t.me/snakers4/1449
Pytorch - снижение lr по шагам - https://t.me/snakers4/1457
Внутрянка tf и pytorch - https://t.me/snakers4/1467
#digest
#deep_learning
#pytorch
Telegram
Spark in me
Знакомый поделился классным гайдом по PyTorch, который объясняет в чем его фишка. Если вы не пользуетесь Keras и ищете на чем потренироваться - то вам как раз подойдет.
https://habrahabr.ru/post/334380/
#data_science
#neural_nets
https://habrahabr.ru/post/334380/
#data_science
#neural_nets
Если вы пытаетесь скрестить код, который должен работать на 2 GPU и код который должен работать только на 1 GPU (ужа с ежом) на Pytorch, не переписывая ни один из них, то такой сниппет поможет вам
#pytorch
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"
#pytorch
Заметка про тренировку сетей с разным lr в pytorch. В документации написано
Лучшее - враг хорошего.
#data_science
#pytorch
optim.SGD([На все слои не указанные в листе lr не распространяется. Может это очевидно, но я возился 3 дня, не понимая почему модель не работает, пока не догадался до такой простой вещи - что надо переписать модель вынеся все свои функции из forward в __init отдельного класса.
{'params': model.base.parameters()},
{'params': model.classifier.parameters(), 'lr': 1e-3}
], lr=1e-2, momentum=0.9)
Лучшее - враг хорошего.
#data_science
#pytorch
Лайфхак дня. Как прикрутить TensorBoard к pytorch за 10 минут
- https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/04-utils/tensorboard
По сути единственное, что продолжает вымораживать в pytorch по сравнению с tf - отсутствие инструментов для дебага графа вычислений (может потому, что он динамический?)
#deep_learning
#pytorch
- https://github.com/yunjey/pytorch-tutorial/tree/master/tutorials/04-utils/tensorboard
По сути единственное, что продолжает вымораживать в pytorch по сравнению с tf - отсутствие инструментов для дебага графа вычислений (может потому, что он динамический?)
#deep_learning
#pytorch
GitHub
pytorch-tutorial/tutorials/04-utils/tensorboard at master · yunjey/pytorch-tutorial
PyTorch Tutorial for Deep Learning Researchers. Contribute to yunjey/pytorch-tutorial development by creating an account on GitHub.
Библиотека, которая позволяет использовать почти все визуализации tensorboard с pytorch
- https://github.com/lanpa/tensorboard-pytorch
Особенно интересен пример для отладки графа вычислений
- https://goo.gl/oLWhfP
#data_science
#pytorch
- https://github.com/lanpa/tensorboard-pytorch
Особенно интересен пример для отладки графа вычислений
- https://goo.gl/oLWhfP
#data_science
#pytorch
GitHub
lanpa/tensorboardX
tensorboard for pytorch (and chainer, mxnet, numpy, ...) - lanpa/tensorboardX
Cyclical Learning rates are not merged in Pytorch yet, but they are in the PR stage
- https://github.com/pytorch/pytorch/pull/2016/files
#data_science
#pytorch
- https://github.com/pytorch/pytorch/pull/2016/files
#data_science
#pytorch
GitHub
Adds Cyclical Learning Rates by thomasjpfan · Pull Request #2016 · pytorch/pytorch
Adds feature requested in #1909. Mimics the parameters from https://github.com/bckenstler/CLR. Since Cyclical Learning Rate (CLR) requires updating the learning rate after every batch, I added batc...
Even though I am preparing a large release on GAN application on real example, I just could not help sharing these 2 links.
They are just an absolute of perfection for GANs on PyTroch
- https://github.com/martinarjovsky/WassersteinGAN
- https://github.com/soumith/ganhacks
Also this is the most idiomatic PyTorch code (Imagenet finetuning) code I have ever seen
- https://gist.github.com/panovr/2977d9f26866b05583b0c40d88a315bf
So if you are new to PyTorch, then these links will be very useful)
#pytorch
#deep_learning
#gans
They are just an absolute of perfection for GANs on PyTroch
- https://github.com/martinarjovsky/WassersteinGAN
- https://github.com/soumith/ganhacks
Also this is the most idiomatic PyTorch code (Imagenet finetuning) code I have ever seen
- https://gist.github.com/panovr/2977d9f26866b05583b0c40d88a315bf
So if you are new to PyTorch, then these links will be very useful)
#pytorch
#deep_learning
#gans
GitHub
GitHub - martinarjovsky/WassersteinGAN
Contribute to martinarjovsky/WassersteinGAN development by creating an account on GitHub.
I have seen questions on forums - how to add Keras-like progress bar to PyTorch for simple models?
The answer is to use tqdm and this property
- https://goo.gl/cG6Ug8
This example is also great
#deep_learning
#pytorch
The answer is to use tqdm and this property
- https://goo.gl/cG6Ug8
This example is also great
from tqdm import trange
from random import random, randint
from time import sleep
t = trange(100)
for i in t:
# Description will be displayed on the left
t.set_description('GEN %i' % i)
# Postfix will be displayed on the right, and will format automatically
# based on argument's datatype
t.set_postfix(loss=random(), gen=randint(1,999), str='h', lst=[1, 2])
sleep(0.1)
#deep_learning
#pytorch
Stack Overflow
Can I add message to the tqdm progressbar?
When using the tqdm progress bar: can I add a message to the same line as the progress bar in a loop?
I tried using the "tqdm.write" option, but it adds a new line on every write. I would like each
I tried using the "tqdm.write" option, but it adds a new line on every write. I would like each
Wow PyTorch is so cool that it even has a concat dataset class
http://pytorch.org/docs/master/data.html#torch.utils.data.ConcatDataset
Does not work for datasets with different resolution though
#pytorch
http://pytorch.org/docs/master/data.html#torch.utils.data.ConcatDataset
Does not work for datasets with different resolution though
#pytorch
PyTorch 0.4 released
https://github.com/pytorch/pytorch/releases/tag/v0.4.0
Key
(1) Tensor / Variable merged
(2) Zero-dimensional Tensors
(3) dtypes
(4) migration guide http://pytorch.org/2018/04/22/0_4_0-migration-guide.html
#pytorch
https://github.com/pytorch/pytorch/releases/tag/v0.4.0
Key
(1) Tensor / Variable merged
(2) Zero-dimensional Tensors
(3) dtypes
(4) migration guide http://pytorch.org/2018/04/22/0_4_0-migration-guide.html
#pytorch
GitHub
Release Trade-off memory for compute, Windows support, 24 distributions with cdf, variance etc., dtypes, zero-dimensional Tensors…
PyTorch 0.4.0 release notes
Table of Contents
Major Core Changes
Tensor / Variable merged
Zero-dimensional Tensors
dtypes
migration guide
New Features
Tensors
Full support for advanced indexi...
Table of Contents
Major Core Changes
Tensor / Variable merged
Zero-dimensional Tensors
dtypes
migration guide
New Features
Tensors
Full support for advanced indexi...
Once again stumbled upon this amazing PyTorch related post
For those learning PyTorch
https://discuss.pytorch.org/t/feedback-on-pytorch-for-kaggle-competitions/2252/11
#deep_learning
#pytorch
For those learning PyTorch
https://discuss.pytorch.org/t/feedback-on-pytorch-for-kaggle-competitions/2252/11
#deep_learning
#pytorch
PyTorch Forums
Feedback on PyTorch for Kaggle competitions
Hello team, Great work on PyTorch, keep the momentum. I wanted to try my hands on it with the launch of the new MultiLabeling Amazon forest satellite images on Kaggle. Note: new users can only post 2 links in a post so I can’t direct link everything I…
Lazy failsafe in PyTorch Data Loader
Sometimes you train a model and testing all the combinations of augmentations / keys / params in your dataloader is too difficult. Or the dataset is too large, so it would take some time to check it properly.
In such cases I usually used some kind of failsafe try/catch.
But looks like even simpler approach works:
#deep_learning
#pytorch
Sometimes you train a model and testing all the combinations of augmentations / keys / params in your dataloader is too difficult. Or the dataset is too large, so it would take some time to check it properly.
In such cases I usually used some kind of failsafe try/catch.
But looks like even simpler approach works:
if img is None:
# do not return anything
pass
else:
return img
#deep_learning
#pytorch
Useful Python / PyTorch bits
dot.notation access to dictionary attributes
PyTorch embedding layer - ignore padding
#python
#pytorch
dot.notation access to dictionary attributes
class dotdict(dict):
__getattr__ = dict.get
__setattr__ = dict.__setitem__
__delattr__ = dict.__delitem__
PyTorch embedding layer - ignore padding
nn.Embedding
has a padding_idx
attribute not to update the padding token embedding.#python
#pytorch
Monkey patching a PyTorch model
Well, ideally you should not do this.
But sometimes you just need to quickly test something and amend your model on the fly.
This helps:
The above code essentially does the same as:
.path.to.some.block = some_other_block
#python
#pytorch
#deep_learning
#oop
Well, ideally you should not do this.
But sometimes you just need to quickly test something and amend your model on the fly.
This helps:
import torch
import functools
def rsetattr(obj, attr, val):
pre, _, post = attr.rpartition('.')
return setattr(rgetattr(obj, pre) if pre else obj, post, val)
def rgetattr(obj, attr, *args):
def _getattr(obj, attr):
return getattr(obj, attr, *args)
return functools.reduce(_getattr, [obj] + attr.split('.'))
for module in model.named_modules():
old_module_path = module[0]
old_module_object = module[1]
# replace an old object with the new one
# copy some settings and its state
if isinstance(old_module_object,torch.nn.SomeClass):
new_module = SomeOtherClass(old_module_object.some_settings,
old_module_object.some_other_settings)
new_module.load_state_dict(module_object.state_dict())
rsetattr(model,old_module_path,new_module)
The above code essentially does the same as:
model
.path.to.some.block = some_other_block
`
#python
#pytorch
#deep_learning
#oop
PyTorch DataLoader, GIL thrashing and CNNs
Well all of this seems a bit like magic to me, but hear me out.
I abused my GPU box for weeks running CNNs on 2-4 GPUs.
Nothing broke.
And then my GPU box started shutting down for no apparent reason.
No, this was not:
- CPU overheating (I have a massive cooler, I checked - it works);
- PSU;
- Overclocking;
- It also adds to confusion that AMD has weird temperature readings;
To cut the story short - if you have a very fast Dataset class and you use PyTorch's DataLoader with
It is obvious in retrospect, but it is not when you face this issue.
#deep_learning
#pytorch
Well all of this seems a bit like magic to me, but hear me out.
I abused my GPU box for weeks running CNNs on 2-4 GPUs.
Nothing broke.
And then my GPU box started shutting down for no apparent reason.
No, this was not:
- CPU overheating (I have a massive cooler, I checked - it works);
- PSU;
- Overclocking;
- It also adds to confusion that AMD has weird temperature readings;
To cut the story short - if you have a very fast Dataset class and you use PyTorch's DataLoader with
workers > 0
it can lead to system instability instead of speeding up.It is obvious in retrospect, but it is not when you face this issue.
#deep_learning
#pytorch
Installing apex ... in style )
Sometimes you just need to try fp16 training (GANs, large networks, rare cases).
There is no better way to do this than use Nvidia's APEX library.
Luckily - they have very nice examples:
- https://github.com/NVIDIA/apex/tree/master/examples/docker
Well ... it installs on a clean machine, but I want my environment to work with this always)
So, I ploughed through all the conda / environment setup mumbo-jumbo and created a version of our deep-learning / ds dockerfile, but now instlalling from pytorch image (pytorch GPU / CUDA / CUDNN + APEX).
- https://github.com/snakers4/gpu-box-setup/blob/master/dockerfile/Dockerfile_apex
It was kind of painful, because PyTorch images already contain conda / pip and it was not apparent at first, causing all sorts of problems with my miniconda instalation.
So use it and please report if it is still buggy.
#deep_learning
#pytorch
Sometimes you just need to try fp16 training (GANs, large networks, rare cases).
There is no better way to do this than use Nvidia's APEX library.
Luckily - they have very nice examples:
- https://github.com/NVIDIA/apex/tree/master/examples/docker
Well ... it installs on a clean machine, but I want my environment to work with this always)
So, I ploughed through all the conda / environment setup mumbo-jumbo and created a version of our deep-learning / ds dockerfile, but now instlalling from pytorch image (pytorch GPU / CUDA / CUDNN + APEX).
- https://github.com/snakers4/gpu-box-setup/blob/master/dockerfile/Dockerfile_apex
It was kind of painful, because PyTorch images already contain conda / pip and it was not apparent at first, causing all sorts of problems with my miniconda instalation.
So use it and please report if it is still buggy.
#deep_learning
#pytorch
GitHub
apex/examples/docker at master · NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch - apex/examples/docker at master · NVIDIA/apex
Spark in me
2020 DS / ML Digest 2 Highlights - New STT benchmarks from FAIR - Analysis of GPT-2 by thegradient - Google’s Meena, a 2.6 billion parameter end-to-end trained neural conversational model (not AGI ofc) - OpenAI now uses PyTorch - LaserTag - cool idea on…
Trying PyTorch DDP
DDP = DistributedDataParallel
DP = DataParallel
I am a bit late to the party (PyTorch now even has its own "redis" key-value DB analog, its own RPC framework and numerous bells and whistles ... likely targeted at enterprise with over 9000 GPUs) but let me write my first impression here.
I usually always was just able to optimize my models and code not to require 4+ GPUs (DDP becomes essential after 4-5 GPUs, for 2-3 it does not really matter and DP just works, for 4 it is arguable):
- Docs are detailed, simple and clean
- Examples in the docs ... are just too plain, but there are guides now, which are also a bit simplistic
- The best way to start is to find some high quality boilerplate. There is lots of shitty boilerplate written in 2018 - PyTorch has evolved and polished its interfaces, so just look out for fresh boilerplate (see last update and cross-reference API invocations)
- Looks like DDP is not the most popular feature, but I did not really face the issues everyone (hangs and freezes, failure to kill the processes gracefully) claimed to face
Turning Your DP Script into a DDP
- Your code has to be properly structured and refactored - then migrating to DDP becomes a weekend project tops
- You need to understand the concepts of rank, world size, communication backend, gradient synchronization
- They finally included it in the docs - use NCCL backend for distributed GPU, Gloo backend for distributed CPU training
- You need to pass
- Do not forget to use
- You need to rewrite your main function to accept rank and args
- You need to spawn several processes using the provided utils and setup the process communication utils, i.e. something like:
In my case technically yes (but it has nothing to do with reasons why people use DDP usually). But in general case, it just solves bottleneck issues that arise out of having 6-8+ GPUs.
So you should optimize, refactor and profile your code first and only then, if you see some unsolvable issues or you need over9000 GPUs - then you should switch to DDP.
Is It Worth it?
100% for 6-8 GPUs.
It depends for 2-5 GPUs.
If your code is properly written, then there is little difference for 2-4 GPUs.
Major Design Drawbacks
DDP implies 1 GPU (at least) per process.
You can have 1+ GPUs per process.
You cannot share 1 GPU between 2 processes.
To do so, you would need an Ampere GPU with multi-instance GPU, but it is still not clear whether 3090 or Quadro GPUs will have it.
(I hope team Red will catch up here as well soon!)
Going Deeper
For now I opted for just splicing my train datasets into N parts as easy as
Also trying their RPC framework would be nice, but too much work for me.
#deep_learning
#pytorch
DDP = DistributedDataParallel
DP = DataParallel
I am a bit late to the party (PyTorch now even has its own "redis" key-value DB analog, its own RPC framework and numerous bells and whistles ... likely targeted at enterprise with over 9000 GPUs) but let me write my first impression here.
I usually always was just able to optimize my models and code not to require 4+ GPUs (DDP becomes essential after 4-5 GPUs, for 2-3 it does not really matter and DP just works, for 4 it is arguable):
- Docs are detailed, simple and clean
- Examples in the docs ... are just too plain, but there are guides now, which are also a bit simplistic
- The best way to start is to find some high quality boilerplate. There is lots of shitty boilerplate written in 2018 - PyTorch has evolved and polished its interfaces, so just look out for fresh boilerplate (see last update and cross-reference API invocations)
- Looks like DDP is not the most popular feature, but I did not really face the issues everyone (hangs and freezes, failure to kill the processes gracefully) claimed to face
Turning Your DP Script into a DDP
- Your code has to be properly structured and refactored - then migrating to DDP becomes a weekend project tops
- You need to understand the concepts of rank, world size, communication backend, gradient synchronization
- They finally included it in the docs - use NCCL backend for distributed GPU, Gloo backend for distributed CPU training
- You need to pass
is_leader
param to your logging functions to suppress some logging and checkpoints for non-master nodes (rank > 0). Each process has an almost exactly the same model copy anyway- Do not forget to use
barrier()
to avoid hangs and for more transparent syncing- You need to rewrite your main function to accept rank and args
- You need to spawn several processes using the provided utils and setup the process communication utils, i.e. something like:
import torch- I am still not exactly sure why, but best boilerplate does
import torch.distributed as dist
def setup_distributed(rank, args):
dist.init_process_group(backend=args.ddp.dist_backend,
rank=rank,
init_method=args.ddp.dist_url,
world_size=args.ddp.world_size)
def spawn_main(main, args):
if args.ddp.enabled:
torch.multiprocessing.spawn(
main, args=(args,), nprocs=args.ddp.world_size, join=True
)
else:
main(0, args)
.to(device, non_blocking=True)
instead of to(device)
Is it faster?In my case technically yes (but it has nothing to do with reasons why people use DDP usually). But in general case, it just solves bottleneck issues that arise out of having 6-8+ GPUs.
So you should optimize, refactor and profile your code first and only then, if you see some unsolvable issues or you need over9000 GPUs - then you should switch to DDP.
Is It Worth it?
100% for 6-8 GPUs.
It depends for 2-5 GPUs.
If your code is properly written, then there is little difference for 2-4 GPUs.
Major Design Drawbacks
DDP implies 1 GPU (at least) per process.
You can have 1+ GPUs per process.
You cannot share 1 GPU between 2 processes.
To do so, you would need an Ampere GPU with multi-instance GPU, but it is still not clear whether 3090 or Quadro GPUs will have it.
(I hope team Red will catch up here as well soon!)
Going Deeper
For now I opted for just splicing my train datasets into N parts as easy as
dataset[rank :: world_size]
, but you can use the provided `key-value`stores for some advanced syncing, but in this case you would really have to care about there seed for random number generators (also double the memory footprint).Also trying their RPC framework would be nice, but too much work for me.
#deep_learning
#pytorch
Telegram
Spark in me - Internet, data science, math, deep learning, philosophy
PyTorch NLP best practices
Very simple ideas, actually.
(1) Multi GPU parallelization and FP16 training
Do not bother reinventing the wheel.
Just use nvidia's apex, DistributedDataParallel, DataParallel.
Best examples [here](https://github.com/huggingface/pytorch…
Very simple ideas, actually.
(1) Multi GPU parallelization and FP16 training
Do not bother reinventing the wheel.
Just use nvidia's apex, DistributedDataParallel, DataParallel.
Best examples [here](https://github.com/huggingface/pytorch…