2019 DS / ML digest 14
Link
Highlights of the week(s):
- FAIR embraces embedding bags for misspellings;
- New version of Adam - RAdam. But on the only real test author has concluded (Imagenet) - SGD is better;
- Yet another LSTM replacement - SRU. Similar to QRRN - it requires additional dependencies;
#digest
#deep_learning
Link
Highlights of the week(s):
- FAIR embraces embedding bags for misspellings;
- New version of Adam - RAdam. But on the only real test author has concluded (Imagenet) - SGD is better;
- Yet another LSTM replacement - SRU. Similar to QRRN - it requires additional dependencies;
#digest
#deep_learning
Our STT Dark Forest post on TDS
https://towardsdatascience.com/navigating-the-speech-to-text-dark-forest-b511e6e7aa88?source=friends_link&sk=d46e5b96aaa7e94f337cc6e2e540545b
Please 👏x50 if you have an account
#deep_learning
https://towardsdatascience.com/navigating-the-speech-to-text-dark-forest-b511e6e7aa88?source=friends_link&sk=d46e5b96aaa7e94f337cc6e2e540545b
Please 👏x50 if you have an account
#deep_learning
Medium
Navigating the Speech to Text Dark Forest
Make your ASR network 4x faster, 5x smaller and 10x cooler
How to solve an arbitrary CV task ...
- W/o annotation
- W/o GPUs in production
- And make your model work in real life and help people
https://spark-in.me/post/deploy_classifier
https://medium.com/@slizhikova.a.v/how-to-get-your-own-image-classifier-region-labelling-model-without-annotation-d95aabbd8599?sk=fd844bf2a6f48171f02cbbda7bc493a6
#deep_learning
#computer_vision
- W/o annotation
- W/o GPUs in production
- And make your model work in real life and help people
https://spark-in.me/post/deploy_classifier
https://medium.com/@slizhikova.a.v/how-to-get-your-own-image-classifier-region-labelling-model-without-annotation-d95aabbd8599?sk=fd844bf2a6f48171f02cbbda7bc493a6
#deep_learning
#computer_vision
Poor man's ensembling techniques
So you want to improve your model's performance a bit.
Ensembling helps. But as is ... it's useful only on Kaggle competitions, where people stack over9000 networks trained on 100MB of data.
But for real life usage / production, there exist ensembling techniques, that do not require significant computation cost increase (!).
All of this is not mainstream yet, but it may work on you dataset!
Especially if your task is easy and the dataset is small.
- SWA (proven to work, usually used as a last stage when training a model);
- Lookahead optimizer (kind of new, not thoroughly tested);
- Multi-Sample Dropout (seems like a cheap ensemble, should work for classification);
Applicability will vary with your task.
Plain vanilla classification can use all of these, s2s networks probably only partially.
#data_science
#deep_learning
So you want to improve your model's performance a bit.
Ensembling helps. But as is ... it's useful only on Kaggle competitions, where people stack over9000 networks trained on 100MB of data.
But for real life usage / production, there exist ensembling techniques, that do not require significant computation cost increase (!).
All of this is not mainstream yet, but it may work on you dataset!
Especially if your task is easy and the dataset is small.
- SWA (proven to work, usually used as a last stage when training a model);
- Lookahead optimizer (kind of new, not thoroughly tested);
- Multi-Sample Dropout (seems like a cheap ensemble, should work for classification);
Applicability will vary with your task.
Plain vanilla classification can use all of these, s2s networks probably only partially.
#data_science
#deep_learning
PyTorch
Stochastic Weight Averaging in PyTorch
In this blogpost we describe the recently proposed Stochastic Weight Averaging (SWA) technique [1, 2], and its new implementation in torchcontrib. SWA is a simple procedure that improves generalization in deep learning over Stochastic Gradient Descent (SGD)…
ML without train / val split
Yeah, I am not crazy. But probably this applies only to NLP.
Sometimes you just need your pipeline to be flexible enough to work with any possible "in the wild" data.
A cool and weird trick - if you can make your dataset so large that your model just MUST generalize to work on it, then you do not need a validation set.
If you sample data randomly and your data generator is good enough, each new batch is just random and can serve as validation.
#deep_learning
Yeah, I am not crazy. But probably this applies only to NLP.
Sometimes you just need your pipeline to be flexible enough to work with any possible "in the wild" data.
A cool and weird trick - if you can make your dataset so large that your model just MUST generalize to work on it, then you do not need a validation set.
If you sample data randomly and your data generator is good enough, each new batch is just random and can serve as validation.
#deep_learning
Now they stack ... normalization!
Tough to choose between BN / LN / IN?
Now a stacked version with attention exists!
https://github.com/switchablenorms/Switchable-Normalization
Also, their 1D implementation does not work, but you can hack their 2D (actually BxCxHxW) layer to work with 1D (actually BxCxW) data =)
#deep_learning
Tough to choose between BN / LN / IN?
Now a stacked version with attention exists!
https://github.com/switchablenorms/Switchable-Normalization
Also, their 1D implementation does not work, but you can hack their 2D (actually BxCxHxW) layer to work with 1D (actually BxCxW) data =)
#deep_learning
GitHub
GitHub - switchablenorms/Switchable-Normalization: Code for Switchable Normalization from "Differentiable Learning-to-Normalize…
Code for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv.org/abs/1806.10779 - switchablenorms/Switchable-Normalization
Support Open STT
Now you can support Open STT on our github page via opencollective!
https://github.com/snakers4/open_stt
Opencollective seemed to be the best platform supported by GitHub for now.
#dataset
Now you can support Open STT on our github page via opencollective!
https://github.com/snakers4/open_stt
Opencollective seemed to be the best platform supported by GitHub for now.
#dataset
2019 DS / ML digest 15
Link
Highlights of the week(s):
- Facebook's upcoming deep fake detection challenge;
- Lyft competion on Kaggle;
- Waymo open-sources its data;
- Cool ways to deal with imbalanced data and noisy data;
#digest
#deep_learning
Link
Highlights of the week(s):
- Facebook's upcoming deep fake detection challenge;
- Lyft competion on Kaggle;
- Waymo open-sources its data;
- Cool ways to deal with imbalanced data and noisy data;
#digest
#deep_learning
Spark in me
2019 DS/ML digest 15
2019 DS/ML digest 15
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me
Forwarded from yara
Jetsons are... not cool
The case
For some of you who might think jetsons are cool... This is a pure rant, so be prepared or skip the post altogether.
Well, maybe, they seemed cool until you finally tried to use them for edge-computing. I don't refer to your pet project in you garage like tracking your cat or chickens or something, I refer to real applications, like dealing with no network connection whatsoever, where any possible support work results in huge expenses. It's not comparable to sending humans into space, but mistakes are more expensive than your average case nonetheless. I would describe dealing with jetsons as just pure frustration. That is definitely the case because expectations do not meet reality. You expect a good solid product well-suited for it's purpose. What you get is 'nope, not even close'.
Installation
From the very moment of installing OS and SDK it seems like this product is just raw. The only one that is simple enough is Nano , but we did not try it in production, so not much info to share. TX2 and Xavier are just pain in the ass though. 5 tries it took me to install everything, and then halfway through it turned out we needed a sixth one. I still wonder how you deal with updates. Not to mention that you need a dev account and host ubuntu machine (well, this one is tolerable) to install everything. After I went through registration, I didn't get the email for verification, and they blocked me saying check your email for unblocking instructions. They blocked me, because I did not verify my email and tried logging in too many times. Well, I got my email for verification in a couple of hours, but never got one with unblocking instructions. So I don't have a dev account) Luckily, my colleague had one.
Production behavior
So far so good, I never went back to dealing with these machines, but our engineers did) To save your time: they still encounter surprises. TX2 was rejected as Xavier has less issues, but still a pain in the ass too. Even when everything works on a test stand, nothing works on site. Common troubles are related to power supply and autostart. There are also troubles with finding good alternatives for common libraries and frameworks (or trying to fix and tweak existing ones) that do not have a version for jetsons, sometimes because of their architecture.
Alternatives?
Why did we try jetson? Because industrial pc with 1060 is $5k-$8k and some good 10 weeks to ship to Russia. In a limited time window we had, we decided to try jetson's dev kits, they seemed like a possible alternative. But industrial version turned out to cost $5k and same good 10 weeks to ship. Blegh.
Is it hopeless?
I hope they'll make jetsons a really good product, it just takes time. And for now jetsons are definitely not a good product. Having qualified engineers to deal with this box will ease some pain I guess (not for engineers though), but that's just ridiculous.
The case
For some of you who might think jetsons are cool... This is a pure rant, so be prepared or skip the post altogether.
Well, maybe, they seemed cool until you finally tried to use them for edge-computing. I don't refer to your pet project in you garage like tracking your cat or chickens or something, I refer to real applications, like dealing with no network connection whatsoever, where any possible support work results in huge expenses. It's not comparable to sending humans into space, but mistakes are more expensive than your average case nonetheless. I would describe dealing with jetsons as just pure frustration. That is definitely the case because expectations do not meet reality. You expect a good solid product well-suited for it's purpose. What you get is 'nope, not even close'.
Installation
From the very moment of installing OS and SDK it seems like this product is just raw. The only one that is simple enough is Nano , but we did not try it in production, so not much info to share. TX2 and Xavier are just pain in the ass though. 5 tries it took me to install everything, and then halfway through it turned out we needed a sixth one. I still wonder how you deal with updates. Not to mention that you need a dev account and host ubuntu machine (well, this one is tolerable) to install everything. After I went through registration, I didn't get the email for verification, and they blocked me saying check your email for unblocking instructions. They blocked me, because I did not verify my email and tried logging in too many times. Well, I got my email for verification in a couple of hours, but never got one with unblocking instructions. So I don't have a dev account) Luckily, my colleague had one.
Production behavior
So far so good, I never went back to dealing with these machines, but our engineers did) To save your time: they still encounter surprises. TX2 was rejected as Xavier has less issues, but still a pain in the ass too. Even when everything works on a test stand, nothing works on site. Common troubles are related to power supply and autostart. There are also troubles with finding good alternatives for common libraries and frameworks (or trying to fix and tweak existing ones) that do not have a version for jetsons, sometimes because of their architecture.
Alternatives?
Why did we try jetson? Because industrial pc with 1060 is $5k-$8k and some good 10 weeks to ship to Russia. In a limited time window we had, we decided to try jetson's dev kits, they seemed like a possible alternative. But industrial version turned out to cost $5k and same good 10 weeks to ship. Blegh.
Is it hopeless?
I hope they'll make jetsons a really good product, it just takes time. And for now jetsons are definitely not a good product. Having qualified engineers to deal with this box will ease some pain I guess (not for engineers though), but that's just ridiculous.
NVIDIA
NVIDIA Jetson Nano
Bring incredible new capabilities to millions of small, power-efficient AI systems.
А more detailed review on edge computing devices
https://m.habr.com/ru/company/recognitor/blog/468421/
#deep_learning
https://m.habr.com/ru/company/recognitor/blog/468421/
#deep_learning
Хабр
Ультимативное сравнение embedded платформ для AI
Нейронные сеточки захватывают мир. Они считают посетителей, контролируют качество, ведут статистику и оценивают безопасность. Куча стартапов, использование в промышленности.
Замечательные...
Замечательные...
Now CV is mainstream?
Now CV is covered even by SmarterEveryDay, which is good
Their system will of course not work in real life, but proper coverage of real capabilities of CV systems is a good thing
https://youtu.be/Lh0x54GC1sw
Now CV is covered even by SmarterEveryDay, which is good
Their system will of course not work in real life, but proper coverage of real capabilities of CV systems is a good thing
https://youtu.be/Lh0x54GC1sw
YouTube
The Gun Detector - Smarter Every Day 225
We built a gun detector using machine learning that works with existing surveillance cameras.
Get started with 8 free meals – that’s $80 off your first month of HelloFresh. Go to https://bit.ly/2PGu55r and enter smarter80
Click here if you're interested…
Get started with 8 free meals – that’s $80 off your first month of HelloFresh. Go to https://bit.ly/2PGu55r and enter smarter80
Click here if you're interested…
Why a system from video will not work IRL?
(though such coverage of ML is much better than what usually happens =) )
- Angles / sizes / frame composition not representative of real CCTV footage;
- HNM done manually via phone camera, but not via CCTV;
- 30k sample size seems ok, but probably for 99.5% precision and high recall will require some actual testing in front of CCTV;
And worst of all ... should any such system (even perfect) be adopted, it will go off AFTER the weapon is taken out. There is nothing wrong with this. But the public / stakeholders will inevitably blame the developers for not doing magic.
Mixed system (radio / sound / radio) will ofc work better. But you should fix problems in your society, not fix the consequences.
(though such coverage of ML is much better than what usually happens =) )
- Angles / sizes / frame composition not representative of real CCTV footage;
- HNM done manually via phone camera, but not via CCTV;
- 30k sample size seems ok, but probably for 99.5% precision and high recall will require some actual testing in front of CCTV;
And worst of all ... should any such system (even perfect) be adopted, it will go off AFTER the weapon is taken out. There is nothing wrong with this. But the public / stakeholders will inevitably blame the developers for not doing magic.
Mixed system (radio / sound / radio) will ofc work better. But you should fix problems in your society, not fix the consequences.
2019 DS / ML digest 16
Link
Highlights of the week(s):
- Finally a 10x smaller Transfomer - but it starts to look like a RNN inspired model;
- Deep fake detection dataset;
- Paraphrase dataset;
- Deconstructing the convolution - in essence, you just need a shift operator + a 1x1 mix convolution. Such things are not mainstream yet;
#digest
#deep_learning
Link
Highlights of the week(s):
- Finally a 10x smaller Transfomer - but it starts to look like a RNN inspired model;
- Deep fake detection dataset;
- Paraphrase dataset;
- Deconstructing the convolution - in essence, you just need a shift operator + a 1x1 mix convolution. Such things are not mainstream yet;
#digest
#deep_learning
Spark in me
Managing your DS / ML environment neatly and in style If you have a sophisticated environment that you need to do DS / ML / DL, then using a set of Docker images may be a good idea. You can also tap into a vast community of amazing and well-maintained Dockerhub…
PyTorch 1.2 update
So, I updated my DS / ML environment to use PyTorch 1.2 =)
(0) Basic DS / ML layer -
(1) DS / ML libraries -
Your final dockerfile may look something like this just pulling from any of those layers.
Note that when building this, you will need to pass your
So, I updated my DS / ML environment to use PyTorch 1.2 =)
(0) Basic DS / ML layer -
FROM aveysov/ml_images:layer-0-pt12
/ dockerfile;(1) DS / ML libraries -
FROM aveysov/ml_images:layer-1
/ dockerfile;Your final dockerfile may look something like this just pulling from any of those layers.
Note that when building this, you will need to pass your
UID
as a variable, e.g.:docker build --build-arg NB_UID=1000 -t av_final_layer -f Layer_final.dockerfile .
GitHub
gpu-box-setup/dockerfile/Layer0_gpu_base_apex_python_pt12.dockerfile at master · snakers4/gpu-box-setup
Contribute to snakers4/gpu-box-setup development by creating an account on GitHub.
AllenNLP ... does not use their library when doing such PR stunts?
Oh ...I wonder why
But PyTorch + TPU seems to be one love, if it works =)
Oh ...I wonder why
But PyTorch + TPU seems to be one love, if it works =)
Forwarded from DL in NLP (nlpcontroller_bot)
PyTorch XLA потихоньку оживает. Скоро можно будет тренировать языковые модели за несколько часов на 🔥+TPU
At last, language model pretraining with PyTorch+TPUs https://github.com/allenai/tpu_pretrain
Our code trains PyTorch BERT/RoBERTa on TPUs, which is faster and cheaper than GPUs.
Also check the repo for a more detailed comparison between TPUs/GPUs on PyTorch/Tensorflow.
https://twitter.com/i_beltagy/status/1181320500783415296
At last, language model pretraining with PyTorch+TPUs https://github.com/allenai/tpu_pretrain
Our code trains PyTorch BERT/RoBERTa on TPUs, which is faster and cheaper than GPUs.
Also check the repo for a more detailed comparison between TPUs/GPUs on PyTorch/Tensorflow.
https://twitter.com/i_beltagy/status/1181320500783415296
GitHub
GitHub - allenai/tpu_pretrain: LM Pretraining with PyTorch/TPU
LM Pretraining with PyTorch/TPU. Contribute to allenai/tpu_pretrain development by creating an account on GitHub.
Spark in me
PyTorch 1.2 update So, I updated my DS / ML environment to use PyTorch 1.2 =) (0) Basic DS / ML layer - FROM aveysov/ml_images:layer-0-pt12 / dockerfile; (1) DS / ML libraries - FROM aveysov/ml_images:layer-1 / dockerfile; Your final dockerfile may look…
Also this is amazing for teams of 5-10 people tops.
If you work on the same hardware ... you just inherit from the same base image ... and conserve traffic / space / build time =)
No sudo / venv required =)
If you work on the same hardware ... you just inherit from the same base image ... and conserve traffic / space / build time =)
No sudo / venv required =)
SSH hopping from Windows?
Yeah, finally I found a recipe.
I always was one flag away from it.
Rsync at your pleasure and have your key only on your laptop!
You just need to:
(0) Use PuTTY / PuTTT-gen to create your ssh key (note that putty format and open-ssh format are different!)
(1) (or just import your open-ssh key into PuTTY-gen if you have it)
(2) Add your private key to pageant (PuTTY authentication agent)
(3) Do not forget to check
(5) Go to
Now you can rsync as much as you want.
Also inside of tmux.
#linux
Yeah, finally I found a recipe.
I always was one flag away from it.
Rsync at your pleasure and have your key only on your laptop!
You just need to:
(0) Use PuTTY / PuTTT-gen to create your ssh key (note that putty format and open-ssh format are different!)
(1) (or just import your open-ssh key into PuTTY-gen if you have it)
(2) Add your private key to pageant (PuTTY authentication agent)
(3) Do not forget to check
Allow agent forwarding
flag in PuTTY under Connection => SSH => Auth
(4) SSH into your server(5) Go to
/etc/ssh/ssh_config
(6) Uncomment and change the ForwardAgent yes
lineNow you can rsync as much as you want.
Also inside of tmux.
#linux
Assembling a NAS for less than US$50
So ... you want a NAS for emergency backups that only you know about.
You have spent money on GPUs, drives, devboxes and you would like to get your NAS for free.
Ofc, if you are a clever boi, you will have RAID arrays on your devbox, offsite backups, etc etc
If you feel particularly S&M, you might even use AWS Glacier or smth similar.
Or you may buy a NAS (decent devices start from US$500-1000 w/o drives! rip-off!)
But you see, all of the above variants cost money.
Or you cannot easily throw such a backup out of the window / encryption creates overhead.
So you can create a NAS on the cheap in style:
- Buy any raspberry pi (US$5 - US$20, you can find one used even cheaper);
- Buy a USB HDD enclosure (US$5 - US$40);
- Find some garbage drives for free;
- Copy your files, put HDD under your pillow;
- Profit;
Added bonuses:
- If you live in a police state - you can use RAID 0 (just hide the second drive) => in essence this is like have a perfect one-time pad encryption;
- Easily use RAID 1 or RAID 10 with 4 drives;
- Very high portability, if you use 2.5'' drives;
- Mdadm arrays are easily transferrable;
- Cyber punk vibe;
#hardware
So ... you want a NAS for emergency backups that only you know about.
You have spent money on GPUs, drives, devboxes and you would like to get your NAS for free.
Ofc, if you are a clever boi, you will have RAID arrays on your devbox, offsite backups, etc etc
If you feel particularly S&M, you might even use AWS Glacier or smth similar.
Or you may buy a NAS (decent devices start from US$500-1000 w/o drives! rip-off!)
But you see, all of the above variants cost money.
Or you cannot easily throw such a backup out of the window / encryption creates overhead.
So you can create a NAS on the cheap in style:
- Buy any raspberry pi (US$5 - US$20, you can find one used even cheaper);
- Buy a USB HDD enclosure (US$5 - US$40);
- Find some garbage drives for free;
- Copy your files, put HDD under your pillow;
- Profit;
Added bonuses:
- If you live in a police state - you can use RAID 0 (just hide the second drive) => in essence this is like have a perfect one-time pad encryption;
- Easily use RAID 1 or RAID 10 with 4 drives;
- Very high portability, if you use 2.5'' drives;
- Mdadm arrays are easily transferrable;
- Cyber punk vibe;
#hardware