Spark in me – Telegram

Spark in me

2.24K subscribers

761 photos

48 videos

114 files

2.65K links

Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.

Download Telegram

About

Blog

Apps

Platform

2.24K subscribers

PyTorch NLP best practices

Very simple ideas, actually.

(1) Multi GPU parallelization and FP16 training

Do not bother reinventing the wheel.
Just use nvidia's apex, DistributedDataParallel, DataParallel.
Best examples [here](https://github.com/huggingface/pytorch-pretrained-BERT).

(2) Put as much as possible INSIDE of the model

Implement the as much as possible of your logic inside of nn.module.
Why?
So that you can seamleassly you all the abstractions from (1) with ease.
Also models are more abstract and reusable in general.

(3) Why have a separate train/val loop?

PyTorch 0.4 introduced context handlers.

You can simplify your train / val / test loops, and merge them into one simple function.

context = torch.no_grad() if loop_type=='Val' else torch.enable_grad()

if loop_type=='Train':
    model.train()
elif loop_type=='Val':
    model.eval()

with context:
    for i, (some_tensor) in enumerate(tqdm(train_loader)):
        # do your stuff here
        pass

(4) EmbeddingBag

Use EmbeddingBag layer for morphologically rich languages. Seriously!

(5) Writing trainers / training abstractions

This is waste of time imho if you follow (1), (2) and (3).

(6) Nice bonus

If you follow most of these, you can train on as many GPUs and machines as you wan for any language)

(7) Using tensorboard for logging

This goes without saying.

#nlp
#deep_learning

GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. - huggingface/transformers

1.1K viewsAlexander, 09:02

PyTorch DataLoader, GIL thrashing and CNNs

Well all of this seems a bit like magic to me, but hear me out.

I abused my GPU box for weeks running CNNs on 2-4 GPUs.
Nothing broke.
And then my GPU box started shutting down for no apparent reason.

No, this was not:
- CPU overheating (I have a massive cooler, I checked - it works);
- PSU;
- Overclocking;
- It also adds to confusion that AMD has weird temperature readings;

To cut the story short - if you have a very fast Dataset class and you use PyTorch's DataLoader with workers > 0 it can lead to system instability instead of speeding up.

It is obvious in retrospect, but it is not when you face this issue.

#deep_learning
#pytorch

1.1K viewsAlexander, 09:03

*
(2) is valid for models with complex forward pass and models with large embedding layers

1.1K viewsAlexander, 09:56

Whict type of content do you / would you like most on the channel?

Anonymous Poll

Weekly / bi-weekly digests;

Podcasts with actual ML practicioners;

Practical bits on real applied NLP;

Pre-trained BERT with Embedding Bags for Russian;

Jokes / memes / cats;

133 voters1.3K viewsAlexander, 06:20

Pinned post

What is this channel about?
(0)
This channel is a practitioner's channel on the following topics: Internet, Data Science, Deep Learning, Python, NLP

(1)
Don't get your opinion in a twist if your opinion differs.
You are welcome to contact me via telegram @snakers41 and email - aveysov@gmail.com

(2)
No BS and ads - I already rejected 3-4 crappy ad deals

(4)
DS ML digests - in the RSS or via URLs like this
https://spark-in.me/post/2019_ds_ml_digest_01

Donations
(0)
Buy me a coffee 🤟 https://buymeacoff.ee/8oneCIN

Give us a rating:
(0)
https://telegram.me/tchannelsbot?start=snakers4

Our chat
(0)
https://t.me/joinchat/Bv9tjkH9JHYvOr92hi5LxQ

More links
(0)
Our website http://spark-in.me

(1)
Our chat https://t.me/joinchat/Bv9tjkH9JHYvOr92hi5LxQ

(2)
DS courses review (RU) - very old
http://goo.gl/5VGU5A
https://spark-in.me/post/learn-data-science

(3)
2017 - 2018 SpaceNet Challenge
https://spark-in.me/post/spacenet-three-challenge

(4)
DS Bowl 2018
https://spark-in.me/post/playing-with-dwt-and-ds-bowl-2018

(7)
Data Science tag on the website
https://spark-in.me/tag/data-science

(7)
Profi.ru project
http://towardsdatascience.com/building-client-routing-semantic-search-in-the-wild-14db04687c7e

(8)
CFT 2018 competition
https://spark-in.me/post/cft-spelling-2018

(9)
2018 retrospective
https://spark-in.me/post/2018

More amazing NLP-related articles incoming!
Maybe finally we will make podcasts?

2019 DS/ML digest 01

2019 DS/ML digest 01
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me

1.4K viewsAlexander, edited 08:49

Spark in me pinned «Pinned post What is this channel about? (0) This channel is a practitioner's channel on the following topics: Internet, Data Science, Deep Learning, Python, NLP (1) Don't get your opinion in a twist if your opinion differs. You are welcome to contact…»

08:49

A bit of lazy Sunday admin stuff

Monitoring you CPU temperature with email notifications

- Change CPU temp to any metric you like
- Rolling log
- Sending email only one time, if the metric becomes critical (you can add an email when metric becomes non-critical again)

https://gist.github.com/snakers4/cf0ffd57c3ef7f4e2e25f6b3347dcdec

Setting up a GPU box on Ubuntu 18.04 from scratch

https://github.com/snakers4/gpu-box-setup/

#deep_learning
#linux

Plain temperature monitoring in Ubuntu 18.04

Plain temperature monitoring in Ubuntu 18.04. GitHub Gist: instantly share code, notes, and snippets.

1.3K viewsAlexander, 10:22

4th 2019 DS / ML digest

Highlights of the week
- OpenAI controversy;
- BERT pre-training;
- Using transformer for conversational challenges;

https://spark-in.me/post/2019_ds_ml_digest_04

#digest
#data_science
#deep_learning

2019 DS/ML digest 04

2019 DS/ML digest 04
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me

1.3K viewsAlexander, 09:24

New variation of Adam?

- [Website](https://www.luolc.com/publications/adabound/);
- [Code](https://github.com/Luolc/AdaBound);
- Eliminate the generalization gap between adaptive methods and SGD;
- TL;DR: A Faster And Better Optimizer with Highly Robust Performance;
- Dynamic bound on learning rates. Inspired by gradient clipping;
- Not very sensitive to the hyperparameters, especially compared with Sgd(M);
- Tested on MNIST, CIFAR, Penn Treebank - no serious datasets;

#deep_learning

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Abstract Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with Sgd…

1.0K viewsAlexander, 07:50

We tried it

... yeah we tried it on a real task
just adam is a bit better

946 viewsAlexander, 12:39

Dependency parsing and POS tagging in Russian

Less popular set of NLP tasks.

Popular tools reviewed
https://habr.com/ru/company/sberbank/blog/418701/

Only morphology:
(0) Well known pymorphy2 package;

Only POS tags and morphology:
(0) https://github.com/IlyaGusev/rnnmorph (easy to use);
(1) https://github.com/nlpub/pymystem3 (easy to use);

Full dependency parsing
(0) Russian spacy plugin:
- https://github.com/buriy/spacy-ru - installation
- https://github.com/buriy/spacy-ru/blob/master/examples/POS_and_syntax.ipynb - usage with examples
(1) Malt parser based solution (drawback - no examples)
- https://github.com/oxaoo/mp4ru
(2) Google's syntaxnet
- https://github.com/tensorflow/models/tree/master/research/syntaxnet

#nlp

Изучаем синтаксические парсеры для русского языка

Привет! Меня зовут Денис Кирьянов, я работаю в Сбербанке и занимаюсь проблемами обработки естественного языка (NLP). Однажды нам понадобилось выбрать синтаксический парсер для работы с русским языком....

1.5K viewsAlexander, 13:02

LSTM vs TCN vs Trellis network

- Did not try the Trellis network - decided it was too complex;
- All the TCN properties from the digest https://spark-in.me/post/2018_ds_ml_digest_31 hold - did not test for very long sequences;
- Looks like a really simple and reasonable alternative for RNNs for modeling and ensembling;
- On a sensible benchmark - performes mostly the same as LSTM from a practical standpoint;

https://github.com/locuslab/TCN/blob/master/TCN/tcn.py

#deep_learning

2018 DS/ML digest 31

2018 DS/ML digest 31
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me

1.2K viewsAlexander, 07:16

https://youtu.be/eUzB0L0mSCI

Can You Recover Sound From Images?

Is it possible to reconstruct sound from high-speed video images?
Part of this video was sponsored by LastPass: http://bit.ly/2SmRQkk
Special thanks to Dr. Abe Davis for revisiting his research with me: http://abedavis.com

This video was based on research…

1.1K viewsAlexander, 04:49

Tracking your hardware ... for data science

For a long time I though that if you really want to track all your servers' metrics you need Zabbix (which is very complicated).

A friend recommended me an amazing tool
- https://prometheus.io/docs/guides/node-exporter/

It installs and runs literally in minutes.
If you want to auto-start it properly, there are even a bit older Ubuntu packages and systemd examples
- https://github.com/prometheus/node_exporter/tree/master/examples/systemd

Dockerized metric exporters for GPUs by Nvidia
- https://github.com/NVIDIA/gpu-monitoring-tools/tree/master/exporters/prometheus-dcgm

It also features extensive alerting features, but they are very difficult to easily start, there being no minimal example
- https://prometheus.io/docs/alerting/overview/
- https://github.com/prometheus/docs/issues/581

#linux

Monitoring Linux host metrics with the Node Exporter | Prometheus

An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.

1.1K viewsAlexander, 08:46

Anyone knows anyone from TopCoder?
As usual with competition platforms organization sometimes has its issues

884 viewsAlexander, 09:23

Forwarded from Анна

Привет!
Если кто не знает, кроме призовых за топ места, в спутниках была ещё одна классная фича - student's prize - приз для _студента_ с самым высоким скором. Там всё оказалось довольно неочевидно, отдельного лидерборда для студентов не было. Долго пыталась достучаться до админов, писала на почту, на форум, чтобы узнать больше подробностей. Спустя месяц админ таки ответил, что я единственный претендент на приз и, вроде, никаких проблем, всё улаживаем, кидай студак. И снова пропал. Периодически напоминала о своем существовании, интересовалась, как там дела, есть ли подвижки, в ответ игнор. *Ответа нет до сих пор.* Я впервые участвую в серьезном сореве и не совсем понимаю, что можно сделать в такой ситуации. Ждать новостей? Писать посты в твитер? Есть ли какой-то способ достучаться до админов?

Олсо, написала тут небольшую статейку про свое решение. https://spark-in.me/post/spacenet4

1.2K viewsAlexander, 09:23

5th 2019 DS / ML digest

Highlights of the week
- New Adam version;
- POS tagging and semantic parsing in Russian;
- ML industrialization again;

https://spark-in.me/post/2019_ds_ml_digest_05

#digest
#data_science
#deep_learning

2019 DS/ML digest 05

2019 DS/ML digest 05
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me

1.1K viewsAlexander, 10:31

PyTorch internals

https://speakerdeck.com/perone/pytorch-under-the-hood

#deep_learning

PyTorch under the hood

Presentation about PyTorch internals presented at the PyData Montreal in Feb 2019.

944 viewsAlexander, 06:47

Russian STT datasets

Anyone knows more proper datasets?

I found this (60 hours), but I could not find the link to the dataset:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/274_Paper.pdf

Anyway, here is the list I found:

- 20 hours of Bible https://github.com/festvox/datasets-CMU_Wilderness;
- https://www.kaggle.com/bryanpark/russian-single-speaker-speech-dataset - does not say how many hours
- Ofc audio book datasets - https://www.caito.de/data/Training/stt_tts/ + and some scraping scripts https://github.com/ainy/shershe/tree/master/scripts
- And some disappointment here https://voice.mozilla.org/ru/languages

#deep_learning

915 viewsAlexander, edited 09:58

Inception v1 layers visualized on a map

A joint work by Google and OpenAI:
https://distill.pub/2019/activation-atlas/
https://distill.pub/2019/activation-atlas/app.html
https://blog.openai.com/introducing-activation-atlases/
https://ai.googleblog.com/2019/03/exploring-neural-networks.html

TLDR:
- Take 1M random images;
- Feed to a CNN, collect some spatial activation;
- Produce a corresponding idealized image that would result in such an activation;
- Plot in 2D (via UMAP), add grid, averaging, etc etc;

#deep_learning

Activation Atlas

By using feature inversion to visualize millions of activations from an image classification network, we create an explorable activation atlas of features the network has learned and what concepts it typically represents.

945 viewsAlexander, edited 11:21

Our experiments with Transformers, BERT and generative language pre-training

TLDR

For morphologically rich languages pre-trained Transformers are not a silver bullet and from a layman's perspective they are not feasible unless someone invests huge computational resources into sub-word tokenization methods that work well + actually training these large networks.

On the other hand we have definitively shown that:

- Starting a transformer with Embedding bag initialized via FastText works and is relatively feasible;
- On complicated tasks - such transformer significantly outperforms training from scratch (as well as naive models) and shows decent results compared to state-of-the-art specialized models;
- Pre-training worked, but it overfitted more thatn FastText initialization and given the complexity required for such pre-training - it is not useful;

https://spark-in.me/post/bert-pretrain-ru

All in all this was a relatively large gamble, which did not pay off - on some more down-to-earth task we hoped the Transformer would excel at - it did not.

#deep_learning

1.3K viewsAlexander, edited 15:42