Spark in me
2.26K subscribers
649 photos
42 videos
114 files
2.57K links
Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.
Download Telegram
New cool papers on CNNs

(0) Do Better ImageNet Models Transfer Better?

An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet necessarily perform better on other vision tasks.
However, this hypothesis has never been systematically tested.

- Wow an empiric study why ResNets rule - they are just better non-finetuned feature extractors and then are probably easier to fine-tune
- ResNets are the best fixed feature extractors
- Also ImageNet pretraining accelerates convergence
- Also my note is that inception-based models are more difficult to fine-tune.
- Among top ranking models are - Inception, NasNet, AmoebaNet
- Also my personal remark - any CNN architecture can be ft-ed to be relatively good, you just need to invent a proper training regime

Just the abstract says it all
Here, we compare the performance of 13 classification models on 12 image classification tasks in three settings: as fixed feature extractors, fine-tuned, and trained from random initialization. We find that, when networks are used as fixed feature extractors, ImageNet accuracy is only weakly predictive of accuracy on other tasks (r2 = 0.24). In this setting, ResNets consistently outperform networks that achieve higher accuracy on ImageNet. When networks are fine-tuned, we observe a substantially stronger correlation (r2 = 0.86). We achieve state-of-the-art performance on eight image classification tasks simply by fine-tuning state-of-the-art ImageNet architectures, outperforming previous results based on specialized methods for transfer learning.

(1) Shampoo: Preconditioned Stochastic Tensor Optimization

Looks really cool - but their implementation requires SVD and is slow for real tasks
Also they tested it only on toy tasks

http://arxiv.org/abs/1802.09568
https://github.com/moskomule/shampoo.pytorch

In real application PyTorch implementation takes 175.58s/it per batch

#deep_learning
A very useful combination in tmux

You can resize your panes via pressing
- first ctrl+b
- hold ctrl
- press arrow keys several time holding ctrl
...
- profit

#linux
#deep_learning
Digest about Internet

(0) Ben Evans Internet digest - https://goo.gl/uoQCBb

(1) GitHub purchased by Microsoft - https://goo.gl/49X74r
-- If you want to migrate - there are guides already - https://about.gitlab.com/2018/06/03/movingtogitlab/

(2) And a post on how Microsoft kind of ruined Skype - https://goo.gl/Y7MJJL
-- focus on b2b
--lack of focus, constant redesigns, faltering service

(3) No drop in FB usage after its controversies - https://goo.gl/V93j2v

(4) Facebook allegedly employes 1200 moderators for Germany - https://goo.gl/VBcYQQ

(5) Looks like many Linux networking tools have been outdated for years
https://dougvitale.wordpress.com/2011/12/21/deprecated-linux-networking-commands-and-their-replacements/

#internet
#digest
2018 DS/ML digest 11

Datasets
(0)
New Andrew Ng paper on radiology datasets
YouTube 8M Dataset post
As mentioned before - this is more or less blatant TF marketing

New papers / models / architectures
(0) Google RL search for optimal augmentations
- Blog, paper
- Finally Google paid attention to augmentations
- 83.54% top1 accuracy on ImageNet
- Discrete search problem, each policy consists of 5 sub-policies each each operation associated with two hyperparameters: probability and magnitude
- Training regime cosine decay for 200 epochs
- Top accuracy on ImageNet
- Best policy
- Typical examples of augmentations

(1)
Training CNNs with less data
Key idea - with clever selection of data you can decrease annotation costs 2-3x

(2)
Regularized Evolution for Image Classifier Architecture Search (AmoebaNet)
- The first controlled comparison of the two search algorithms (genetic and RL)
- Mobile-size ImageNet (top-1 accuracy = 75.1% with 5.1M parameters)
- ImageNet (top-1 accuracy = 83.1%)

Evolution vs. RL at Large-Compute Scale
• Evolution and RL do equally well on accuracy
• Both are significantly better than Random Search
• Evolution is faster

But the proper description of the architecture is nowhere to be seen...

Libraries / code / frameworks
(0) OpenCV installation for Ubuntu18 from source (if you need e.g. video support)

News / market
(0) Idea adversarial filters for apps - https://goo.gl/L4Vne7
(1) A list of 30 best practices for amateur ML / DL specialits - http://forums.fast.ai/t/30-best-practices/12344
- Some ideas about tackling naive NLP problems
- PyTorch allegedly supports just freezing bn layers
- Also a neat idea I tried with inception nets - assign different learning rates to larger models when fine-tuning them
(2) Stumbled upon a reference on NAdam as optimizer as being a bit better than Adam
It is also described in this popular article
(3) Barcode reader via OpenCV

#deep_learning
#digest

Like this post or have something to say => tell us more in the comments or donate!
An interesting idea from a CV conference

Imagine that you have some kind of algorithm, that is not exactly differentiable, but is "back-propable".

In this case you can have very convoluted logic in your "forward" statement (essentially something in between trees and dynamic programming) - for example a set of clever if-statements.

In this case you will be able to share both of the 2 worlds - both your algorithm (you will have to re-implement in your framework) and backprop + CNN. Nice.

Ofc this works only for dynamic deep-learning frameworks.

#deep_learning
#data_science
Interesting links about Internet

- Ben Evans' digest - https://goo.gl/7NkYn6
- Why it took so much time to create previews for Wikipedia - https://goo.gl/xg7N99
- Google postulating its AI principles? https://blog.google/topics/ai/ai-principles/
- Google product alternatives - https://goo.gl/RmA76N - I personally started to switch to more open-source stuff lately, but Docs and Android have no real options
- The future of ML in embedded devices - https://goo.gl/PjWpKj (sound ideas, but a post is by an evangelist)
- Yahoo messenger shutting down (20 years!) - https://goo.gl/uhomds - hi ICQ
- Microsoft Buys GitHub for $7.5 Billion - 16z write-up - https://goo.gl/3znstT
- NYC medallions dropped 5x in price - https://goo.gl/Vi7pG6
- JD covers villages in China with drone delivery already - https://goo.gl/bMGKSY

#digest
The age of open-source

Recently I started using more and more open-source / CLI tools for mundane everyday tasks.

Sometimes they have higher barriers to entry (example - compare google slides vs markdown + latex), but usually more simplistic, yet more powerful.

Recently I was just appaled by MuTorrent bugs and ads - and I just found out that there is even a beta of Transmission for Windows (the alternative being - just using transmission daemon on Linux).

The question is - do you know any highly useful open-source / CLI / free tools to replace standard entrenched software, which is getting a bit annoying?

Like this post or have something to say => tell us more in the comments or donate!
Playing with renewing SSL certificates + Cloudflare

I am using certbot, which makes SSL certificate installation for any web-server literally a one-liner (a couple of guides - https://goo.gl/nP2tij / https://goo.gl/X6rVxs).
It also has an amazing command certbot renew for renewing your certificates.

Unsurprisingly, it does not work, when you have Cloudflare enabled. The solution in my case was as easy as:
- falling back to registrar's name-servers (luckily, my registrar stores its old DNS zone settings)
- certbot renew
- reverting back to cloudflare's DNS servers
- also, in this case when using VPN I did not have to wait for DNS records to propagate - it was instant

#linux
Playing with multi-GPU small batch-sizes

If you play with SemSeg with a big model with large images (HD, FullHD) - you may face a situation when only one image fits to one GPU.

Also this is useful if your train-test split is far from ideal and or you are using pre-trained imagenet encoders for a SemSeg task - so you cannot really update your bnorm params.

Also AFAIK - all the major deep-learning frameworks:
(0) do not have batch norm freeze options on evaluation (batch-norm contains 2 sets of parameters - learnable and updated on inference
(1) calculate batch-norm for each GPU separately

It all may mean, that your models may severely underperform in inference for these situations.

Solutions?

(0) Sync batch-norm. I believe to do it properly you will have to modify the framework you are using, but there is a PyTorch implementation done for the CVPR 2018 - also an explanation here http://hangzh.com/PyTorch-Encoding/notes/syncbn.html - I guess if its multi-GPU wrappers for model can be used for any models - then we are in the money)
(1) Use affine=False in your batch-norm. But probably in this case imagenet initialization will not help - you will have to train your model from scratch completely
(2) Freeze your encoder batch-norm params completely
https://discuss.pytorch.org/t/how-to-train-with-frozen-batchnorm/12106/10 (though I am not sure - they do not seem to be freezing the running mean parameters) - probably this also needs m.trainable = False or something like this
(3) Use recent Facebook group norm - https://arxiv.org/pdf/1803.08494.pdf

This is a finicky topic - please tell in comments about your experiences and tests

#deep_learning
#cv

Like this post or have something to say => tell us more in the comments or donate!
Interesting links about Internet

- Ben Evans' digest - https://goo.gl/t9zG4y
- China plans to track cars - https://goo.gl/jeroFW
- Ben Evans - content is not king anymore - distribution / eco-system are https://goo.gl/ms2tQd
- Google opens AI center in Ghana - https://goo.gl/PRHBjq

- (RU) A funny case on censorship in Russia - funny article deleted from habr - https://sohabr.net/habr/post/414595/
-- It kind of clearly shows that you cannot safely post anything to habr

- India + WhatsApp + lynch mobs - https://goo.gl/tSBUCp
- Tor foundation about web-tracking and Facebook - https://goo.gl/H9DSuL
- Docker image jacking for crypto-mining - https://goo.gl/KrLLuQ
- Ethereum - 75% transactions automated bots - https://goo.gl/Q9BSNL
- (RU) - analyzing fake elections in Russia - 3-10M votes are fake - https://habr.com/post/358790/

#internet
2018 DS/ML digest 12

As usual, this is whatever I found really interesting / worth reading.

Implementations / papers / ideas
(0)
You can count bees well with UNet - http://matpalm.com/blog/counting_bees/
(1)
A really super cool idea - use affine transformations in 3D to stack augmentations on the level of transformation matrices
(3D augs are costly)

- https://gist.github.com/ematvey/5ca7df5d37c2f6a674390d42ef9e7d59
- both for rotation and scaling
- note a couple of things for easier understanding:
-- there is offset in tranformations - because the coordinate center is not in "center"
-- zoom essentially scales unit vectors after applying the offset
- 3Blue1Brown videos about linear algebra - https://www.youtube.com/watch?v=fNk_zzaMoSs
(2)
A top solution from Google's Landmark Challenge - https://goo.gl/pkZULZ
Essentially
- ensemble of features / skip connections from a CNN (ResNeXt)
- KNN
- use KNN + augment the extracted features by averaging with similar images
- query expansion (use the fact that different crops of the same landmark remain the same landmark)
(3)
(RU) A super cool series about interestring clustering algorithms
- Affinity propagation
-- https://habr.com/post/321216/
-- http://www.icmla-conference.org/icmla07/FreyDueckScience07.pdf
- DBSCAN https://habrahabr.ru/post/322034/
- (spoiler - in practice use awesome HDBSCAN library)
(4)
Brief review of image super-resolution techniques
- https://habr.com/post/359016/
- In a nutshell try in this order FCN CNNs, auto-encoders with skip connections or GANs
(5)
SOTA NLP by open-ai
https://blog.openai.com/language-unsupervised/
Key ideas
- Train a transformer language models on large corpus in an unsupervised way
- Fine-tune on a smaller task
- Profit
Caveats
- "Our approach requires an expensive pre-training step - 1 month on 8 GPUs" (probably this should be discounted somewhat)
- TF and unreadable enterprise code
(6)
One more claimed SOTA word embedding set
https://allennlp.org/elmo
(7)
A cool github page by Sebastian Ruder to track major NLP tasks
https://github.com/sebastianruder/NLP-progress

Visualizations
(0)
Amazing visual explanations of how decision trees work
- http://www.r2d3.us/visual-intro-to-machine-learning-part-2/
- it explains visually how overfitting occurs in decisions tree models
(1)
CIFAR T-SNE can be done in real-time on the GPU + tensorflow.js integration
- Blog https://goo.gl/Pk5Lq3
- Website https://goo.gl/1vpeFf
- Arxiv - http://arxiv.org/abs/1802.03680
- Demo - https://nicola17.github.io/tfjs-tsne-demo/
(2) Why people fail to use d3.js - https://goo.gl/hSt5dL

Datasets
(0) Nice idea - use available tools and videos to collect datasets
- https://goo.gl/HULsyH
- https://goo.gl/7AfRZZ

#digest