2018 DS/ML digest 17
Highlights of the week
(0) Troubling trends with ML scholars
http://approximatelycorrect.com/2018/07/10/troubling-trends-in-machine-learning-scholarship/
(1) NLP close to its ImageNet stage?
https://thegradient.pub/nlp-imagenet/
Papers / posts / articles
(0) Working with multi-modal data https://distill.pub/2018/feature-wise-transformations/
- concatenation-based conditioning
- conditional biasing or scaling ("residual" connections)
- sigmoidal gating
- all in all this approach seems like a mixture of attention / gating for multi-modal problems
(1) Glow, a reversible generative model which uses invertible 1x1 convolutions
https://blog.openai.com/glow/
(2) Facebooks moonshots - I kind of do not understand much here
- https://research.fb.com/facebook-research-at-icml-2018/
(3) RL concept flaws?
- https://thegradient.pub/why-rl-is-flawed/
(4) Intriguing failures of convolutions
https://eng.uber.com/coordconv/ - this is fucking amazing
(5) People are only STARTING to apply ML to reasoning
https://deepmind.com/blog/measuring-abstract-reasoning/
Yet another online book on Deep Learning
(1) Kind of standard https://livebook.manning.com/#!/book/grokking-deep-learning/chapter-1/v-10/1
Libraries / code
(0) Data version control continues to develop https://dvc.org/features
#deep_learning
#data_science
#digest
Like this post or have something to say => tell us more in the comments or donate!
Highlights of the week
(0) Troubling trends with ML scholars
http://approximatelycorrect.com/2018/07/10/troubling-trends-in-machine-learning-scholarship/
(1) NLP close to its ImageNet stage?
https://thegradient.pub/nlp-imagenet/
Papers / posts / articles
(0) Working with multi-modal data https://distill.pub/2018/feature-wise-transformations/
- concatenation-based conditioning
- conditional biasing or scaling ("residual" connections)
- sigmoidal gating
- all in all this approach seems like a mixture of attention / gating for multi-modal problems
(1) Glow, a reversible generative model which uses invertible 1x1 convolutions
https://blog.openai.com/glow/
(2) Facebooks moonshots - I kind of do not understand much here
- https://research.fb.com/facebook-research-at-icml-2018/
(3) RL concept flaws?
- https://thegradient.pub/why-rl-is-flawed/
(4) Intriguing failures of convolutions
https://eng.uber.com/coordconv/ - this is fucking amazing
(5) People are only STARTING to apply ML to reasoning
https://deepmind.com/blog/measuring-abstract-reasoning/
Yet another online book on Deep Learning
(1) Kind of standard https://livebook.manning.com/#!/book/grokking-deep-learning/chapter-1/v-10/1
Libraries / code
(0) Data version control continues to develop https://dvc.org/features
#deep_learning
#data_science
#digest
Like this post or have something to say => tell us more in the comments or donate!
Approximately Correct
Troubling Trends in Machine Learning Scholarship
By Zachary C. Lipton* & Jacob Steinhardt*
*equal authorship
Originally presented at ICML 2018: Machine Learning Debates [arXiv link]
Published in Communications of the ACM
1 Introduction
Collectively, machine learning (ML) researchers are engaged…
*equal authorship
Originally presented at ICML 2018: Machine Learning Debates [arXiv link]
Published in Communications of the ACM
1 Introduction
Collectively, machine learning (ML) researchers are engaged…
Tensorboard + PyTorch
6 months ago looked at this - and it was messy
now it looks really polished
https://github.com/lanpa/tensorboard-pytorch
#data_science
6 months ago looked at this - and it was messy
now it looks really polished
https://github.com/lanpa/tensorboard-pytorch
#data_science
GitHub
lanpa/tensorboardX
tensorboard for pytorch (and chainer, mxnet, numpy, ...) - lanpa/tensorboardX
Forwarded from Админим с Буквой (bykva)
Git commit messages
Как правильно комиттить в гит. Хорошая статья с хабра:
https://habr.com/post/416887/
#thirdparty #read #git
Как правильно комиттить в гит. Хорошая статья с хабра:
https://habr.com/post/416887/
#thirdparty #read #git
Хабр
Как следует писать комментарии к коммитам
Предисловие от переводчика На протяжении многих лет разработки ПО, будучи участником многих команд, работая с разными хорошими и опытными людьми, я часто наблю...
Once again stumbled upon this amazing PyTorch related post
For those learning PyTorch
https://discuss.pytorch.org/t/feedback-on-pytorch-for-kaggle-competitions/2252/11
#deep_learning
#pytorch
For those learning PyTorch
https://discuss.pytorch.org/t/feedback-on-pytorch-for-kaggle-competitions/2252/11
#deep_learning
#pytorch
PyTorch Forums
Feedback on PyTorch for Kaggle competitions
Hello team, Great work on PyTorch, keep the momentum. I wanted to try my hands on it with the launch of the new MultiLabeling Amazon forest satellite images on Kaggle. Note: new users can only post 2 links in a post so I can’t direct link everything I…
Feeding images / tensors of different size using PyTorch dataloader classes
Struggled to do this properly on DS Bowl (I resorted to random crops there for training and 1-image sized batches for validation).
Suppose your dataset has some internal structure in it.
For example - you may have images of vastly different aspect ratios (3x1, 1x3 and 1x1) and you would like to squeeze every bit of performance from your pipeline.
Of course, you may pad your images / center-crop them / random crop them - but in this case you will lose some of the information.
I played with this on some tasks - sometimes force-resize works better than crops, but trying to apply your model convolutionally worked really good on SemSeg challenges.
So it may work very well on plain classification as well.
So, if you apply your model convolutionally, you will end up with differently-sized feature maps for each cluster of images.
Within the model, it can be fixed with:
(0) Adaptive avg pooling layers
(1) Some simple logic in .forward statement of the model
But anyway you end up with a small technical issue - PyTorch cannot concatenate tensors of different sizes using standard collation function.
Theoretically, there are several ways to fix this:
(0) Stupid solution - create N datasets, train on them sequentially.
In practice I tried that on DS Bowl - it worked poorly - the model overfitted to each cluster, and then performed poorly on next one;
(1) Crop / pad / resize images (suppose you deliberately want to avoid that);
(2) Insert some custom logic into PyTorch collattion function, i.e. resize there;
(3) Just sample images so that only images of one size end up within each batch;
(0) and (1) I would like to avoid intentionally.
(2) seems a bit stupid as well, because resizing should be done as a pre-processing step (collation function deals with normalized tensors, not images) and it is better not to mix purposes of your modules
Ofc, you can try to produce N tensors in (2) - i.e. tensor for each image size, but that would require additional loop downstream.
In the end, I decided that (3) is the best approach - because it can be easily transferred to other datasets / domains / tasks.
Long story short - here is my solution - I just extended their sampling function:
https://github.com/pytorch/pytorch/issues/1512#issuecomment-405015099
Maybe it is worth a PR on Github?
What do you think?
#deep_learning
#data_science
Like this post or have something to say => tell us more in the comments or donate!
Struggled to do this properly on DS Bowl (I resorted to random crops there for training and 1-image sized batches for validation).
Suppose your dataset has some internal structure in it.
For example - you may have images of vastly different aspect ratios (3x1, 1x3 and 1x1) and you would like to squeeze every bit of performance from your pipeline.
Of course, you may pad your images / center-crop them / random crop them - but in this case you will lose some of the information.
I played with this on some tasks - sometimes force-resize works better than crops, but trying to apply your model convolutionally worked really good on SemSeg challenges.
So it may work very well on plain classification as well.
So, if you apply your model convolutionally, you will end up with differently-sized feature maps for each cluster of images.
Within the model, it can be fixed with:
(0) Adaptive avg pooling layers
(1) Some simple logic in .forward statement of the model
But anyway you end up with a small technical issue - PyTorch cannot concatenate tensors of different sizes using standard collation function.
Theoretically, there are several ways to fix this:
(0) Stupid solution - create N datasets, train on them sequentially.
In practice I tried that on DS Bowl - it worked poorly - the model overfitted to each cluster, and then performed poorly on next one;
(1) Crop / pad / resize images (suppose you deliberately want to avoid that);
(2) Insert some custom logic into PyTorch collattion function, i.e. resize there;
(3) Just sample images so that only images of one size end up within each batch;
(0) and (1) I would like to avoid intentionally.
(2) seems a bit stupid as well, because resizing should be done as a pre-processing step (collation function deals with normalized tensors, not images) and it is better not to mix purposes of your modules
Ofc, you can try to produce N tensors in (2) - i.e. tensor for each image size, but that would require additional loop downstream.
In the end, I decided that (3) is the best approach - because it can be easily transferred to other datasets / domains / tasks.
Long story short - here is my solution - I just extended their sampling function:
https://github.com/pytorch/pytorch/issues/1512#issuecomment-405015099
Maybe it is worth a PR on Github?
What do you think?
#deep_learning
#data_science
Like this post or have something to say => tell us more in the comments or donate!
GitHub
[feature request] Support tensors of different sizes as batch elements in DataLoader · Issue #1512 · pytorch/pytorch
Motivating example is returning bounding box annotation for images along with an image. An annotation list can contain variable number of boxes depending on an image, and padding them to a single l...
Sometimes in supervised ML tasks leveraging the data sctructure in a self-supervised fashion really helps!
Playing with CrowdAI mapping competition
In my opinion it is a good test-ground for testing your ideas with SemSeg - as the dataset is really clean and balanced
https://spark-in.me/post/a-small-case-for-search-of-structure-within-your-data
#deep_learning
#data_science
#satellite_imaging
Playing with CrowdAI mapping competition
In my opinion it is a good test-ground for testing your ideas with SemSeg - as the dataset is really clean and balanced
https://spark-in.me/post/a-small-case-for-search-of-structure-within-your-data
#deep_learning
#data_science
#satellite_imaging
Spark in me
Playing with Crowd-AI mapping challenge - or how to improve your CNN performance with self-supervised techniques
In this article I tell about a couple of neat optimizations / tricks / useful ideas that can be applied to many SemSeg / ML tasks
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me
New cool (but useless ofc) competitions on CrowdAI
https://mailchi.mp/crowdai/crowdai-mapping-ieee-challenge-music-challenge-2-calls-1432557?e=a3b6aa9b1a
#data_science
#deep_learning
https://mailchi.mp/crowdai/crowdai-mapping-ieee-challenge-music-challenge-2-calls-1432557?e=a3b6aa9b1a
#data_science
#deep_learning
Colab SeedBank
- TF is everywhere (naturally) - but at least they use keras
- On the other hand - all of the files are (at least now) downloadable via .ipynb or .py
- So - it may be a good place to look for boilerplate code
Also interesting facts, that are not mentioned openly
- Looks like they use Tesla K80s, which practically are 2.5-3x slower than 1080Ti
(https://medium.com/initialized-capital/benchmarking-tensorflow-performance-and-cost-across-different-gpu-options-69bd85fe5d58)
- Full screen notebook format is clearly inspired by Jupyter plugins
- Ofc there is a time limit for GPU scripts and GPU availability is not guaranteed (reported by people who used it)
- Personally - it looks a bit like slow instances from FloydHub - time limitations / slow GPU etc/etc
In a nutshell - perfect source of boilerplate code + playground for new people.
#deep_learning
- TF is everywhere (naturally) - but at least they use keras
- On the other hand - all of the files are (at least now) downloadable via .ipynb or .py
- So - it may be a good place to look for boilerplate code
Also interesting facts, that are not mentioned openly
- Looks like they use Tesla K80s, which practically are 2.5-3x slower than 1080Ti
(https://medium.com/initialized-capital/benchmarking-tensorflow-performance-and-cost-across-different-gpu-options-69bd85fe5d58)
- Full screen notebook format is clearly inspired by Jupyter plugins
- Ofc there is a time limit for GPU scripts and GPU availability is not guaranteed (reported by people who used it)
- Personally - it looks a bit like slow instances from FloydHub - time limitations / slow GPU etc/etc
In a nutshell - perfect source of boilerplate code + playground for new people.
#deep_learning
Medium
Benchmarking Tensorflow Performance and Cost Across Different GPU Options
Machine learning practitioners— from students to professionals — understand the value of moving their work to GPUs . Without one, certain…
Lazy failsafe in PyTorch Data Loader
Sometimes you train a model and testing all the combinations of augmentations / keys / params in your dataloader is too difficult. Or the dataset is too large, so it would take some time to check it properly.
In such cases I usually used some kind of failsafe try/catch.
But looks like even simpler approach works:
#deep_learning
#pytorch
Sometimes you train a model and testing all the combinations of augmentations / keys / params in your dataloader is too difficult. Or the dataset is too large, so it would take some time to check it properly.
In such cases I usually used some kind of failsafe try/catch.
But looks like even simpler approach works:
if img is None:
# do not return anything
pass
else:
return img
#deep_learning
#pytorch
Yet another kaggle competition with high prizes and easy challenge
https://www.kaggle.com/c/tgs-salt-identification-challenge
#deep_learning
https://www.kaggle.com/c/tgs-salt-identification-challenge
#deep_learning
Kaggle
TGS Salt Identification Challenge
Segment salt deposits beneath the Earth's surface
Playing with focal loss for multi-class classification
Playing with this Loss
https://gist.github.com/snakers4/5739ade67e54230aba9bd8a468a3b7be
If anyone has a better option - please PM me / or comment in the gist.
#deep_learning
#data_science
Playing with this Loss
https://gist.github.com/snakers4/5739ade67e54230aba9bd8a468a3b7be
If anyone has a better option - please PM me / or comment in the gist.
#deep_learning
#data_science
Gist
Multi class classification focal loss
Multi class classification focal loss . GitHub Gist: instantly share code, notes, and snippets.
Playing with open-images
Did a benchmark of multi-class classification models and approaches useful in general with multi-tier classificators.
The basic idea is - follow the graph structure of class dependencies - train a good multi-class classifier => train coarse semseg models for each big cluster.
What worked
- Using SOTA classifiers from imagenet
- Pre-training with frozen encoder (otherwise the model performes worse)
- Best performing architecture so far - ResNet152 (a couple of others to try as well)
- Different resolutions => binarise them => divide into 3 major clusters (2:1,1:2,1:1)
- Using adaptive pooling for different aspect ratio clusters
What did not work or did not significantly improve results
- Oversampling
- Using modest or minor augs (10% or 25% of images augmented)
What did not work
- Using 1xN + Nx1 convolutions instead of pooling - too heavy
- Using some minimal avg. pooling (like 16x16), then using different 1xN + Nx1 convolutions for different clusters - performed mostly worse than just adaptive pooling
Yet to try
- Focal loss
- Oversampling + augs
#deep_learning
Did a benchmark of multi-class classification models and approaches useful in general with multi-tier classificators.
The basic idea is - follow the graph structure of class dependencies - train a good multi-class classifier => train coarse semseg models for each big cluster.
What worked
- Using SOTA classifiers from imagenet
- Pre-training with frozen encoder (otherwise the model performes worse)
- Best performing architecture so far - ResNet152 (a couple of others to try as well)
- Different resolutions => binarise them => divide into 3 major clusters (2:1,1:2,1:1)
- Using adaptive pooling for different aspect ratio clusters
What did not work or did not significantly improve results
- Oversampling
- Using modest or minor augs (10% or 25% of images augmented)
What did not work
- Using 1xN + Nx1 convolutions instead of pooling - too heavy
- Using some minimal avg. pooling (like 16x16), then using different 1xN + Nx1 convolutions for different clusters - performed mostly worse than just adaptive pooling
Yet to try
- Focal loss
- Oversampling + augs
#deep_learning
2018 DS/ML digest 18
Highlights of the week
(0) RL flaws
https://thegradient.pub/why-rl-is-flawed/
https://thegradient.pub/how-to-fix-rl/
(1) An intro to AUTO-ML
http://www.fast.ai/2018/07/16/auto-ml2/
(2) Overview of advances in ML in last 12 months
https://www.stateof.ai/
Market / applied stuff / papers
(0) New Nvidia Jetson released
https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-Jetson-Xavier-Dev-Kit
(1) Medical CV project in Russia - 90% is data gathering
http://cv-blog.ru/?p=217
(2) Differentiable architecture search
https://arxiv.org/pdf/1806.09055.pdf
-- 1800 GPU days of reinforcement learning (RL) (Zoph et al., 2017)
-- 3150 GPU days of evolution (Real et al., 2018)
-- 4 GPU days to achieve SOTA in CIFAR => transferrable to Imagenet with 26.9% top-1 error
(3) Some basic thoughts about hyper-param tuning
https://engineering.taboola.com/hitchhikers-guide-hyperparameter-tuning/
(4) FB extending fact checking to mark similar articles
https://www.poynter.org/news/rome-facebook-announces-new-strategies-combat-misinformation
(5) Architecture behind Alexa choosing skills https://goo.gl/dWmXZf
- Char-level RNN + Word-level RNN
- Shared encoder, but attention is personalized
(6) An overview of contemporary NLP techniques
https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e
(7) RNNs in particle physics?
https://indico.cern.ch/event/722319/contributions/3001310/attachments/1661268/2661638/IML-Sequence.pdf?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=NLP%20News
(8) Google cloud provides PyTorch images
https://twitter.com/i/web/status/1016515749517582338
NLP
(0) Use embeddings for positions - no brainer
https://twitter.com/i/web/status/1018789622103633921
(1) Chatbots were a hype train - lol
https://medium.com/swlh/chatbots-were-the-next-big-thing-what-happened-5fc49dd6fa61
(0) Reasons to use OpenStreetMap
https://www.openstreetmap.org/user/jbelien/diary/44356
(1) Google deployes its internet ballons
https://goo.gl/d5cv6U
(2) Amazing problem solving
https://nevalalee.wordpress.com/2015/11/27/the-hotel-bathroom-puzzle/
(3) Nice flame thread about CS / ML is not science / just engineering etc
https://twitter.com/RandomlyWalking/status/1017899452378550273
#deep_learning
#data_science
#digest
Highlights of the week
(0) RL flaws
https://thegradient.pub/why-rl-is-flawed/
https://thegradient.pub/how-to-fix-rl/
(1) An intro to AUTO-ML
http://www.fast.ai/2018/07/16/auto-ml2/
(2) Overview of advances in ML in last 12 months
https://www.stateof.ai/
Market / applied stuff / papers
(0) New Nvidia Jetson released
https://www.phoronix.com/scan.php?page=news_item&px=NVIDIA-Jetson-Xavier-Dev-Kit
(1) Medical CV project in Russia - 90% is data gathering
http://cv-blog.ru/?p=217
(2) Differentiable architecture search
https://arxiv.org/pdf/1806.09055.pdf
-- 1800 GPU days of reinforcement learning (RL) (Zoph et al., 2017)
-- 3150 GPU days of evolution (Real et al., 2018)
-- 4 GPU days to achieve SOTA in CIFAR => transferrable to Imagenet with 26.9% top-1 error
(3) Some basic thoughts about hyper-param tuning
https://engineering.taboola.com/hitchhikers-guide-hyperparameter-tuning/
(4) FB extending fact checking to mark similar articles
https://www.poynter.org/news/rome-facebook-announces-new-strategies-combat-misinformation
(5) Architecture behind Alexa choosing skills https://goo.gl/dWmXZf
- Char-level RNN + Word-level RNN
- Shared encoder, but attention is personalized
(6) An overview of contemporary NLP techniques
https://medium.com/@ageitgey/natural-language-processing-is-fun-9a0bff37854e
(7) RNNs in particle physics?
https://indico.cern.ch/event/722319/contributions/3001310/attachments/1661268/2661638/IML-Sequence.pdf?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=NLP%20News
(8) Google cloud provides PyTorch images
https://twitter.com/i/web/status/1016515749517582338
NLP
(0) Use embeddings for positions - no brainer
https://twitter.com/i/web/status/1018789622103633921
(1) Chatbots were a hype train - lol
https://medium.com/swlh/chatbots-were-the-next-big-thing-what-happened-5fc49dd6fa61
The vast majority of bots are built using decision-tree logic, where the bot’s canned response relies on spotting specific keywords in the user input.Interesting links
(0) Reasons to use OpenStreetMap
https://www.openstreetmap.org/user/jbelien/diary/44356
(1) Google deployes its internet ballons
https://goo.gl/d5cv6U
(2) Amazing problem solving
https://nevalalee.wordpress.com/2015/11/27/the-hotel-bathroom-puzzle/
(3) Nice flame thread about CS / ML is not science / just engineering etc
https://twitter.com/RandomlyWalking/status/1017899452378550273
#deep_learning
#data_science
#digest
The Gradient
RL’s foundational flaw
RL as classically formulated has lately accomplished many things - but that formulation is unlikely to tackle problems beyond games. Read on to see why!
My post on open images stage 1
For posterity
Please comment
https://spark-in.me/post/playing-with-google-open-images
#deep_learning
#data_science
For posterity
Please comment
https://spark-in.me/post/playing-with-google-open-images
#deep_learning
#data_science
New Keras version
https://github.com/keras-team/keras/releases/tag/2.2.1
No real major changes...
#deep_learning
https://github.com/keras-team/keras/releases/tag/2.2.1
No real major changes...
#deep_learning
GitHub
Release Keras 2.2.1 · keras-team/keras
Areas of improvement
Bugs fixes
Performance improvements
Documentation improvements
API changes
Add output_padding argument in Conv2DTranspose (to override default padding behavior).
Enable auto...
Bugs fixes
Performance improvements
Documentation improvements
API changes
Add output_padding argument in Conv2DTranspose (to override default padding behavior).
Enable auto...