Data Science by ODS.ai 🦜

Reproducing Imagenet in 18 minutes

The code to reproduce #ImageNet in 18 minutes is posted in the GitHub repo. It actually becomes «Imagenet in 12 minutes» if using 74.9% top1, used in Chainer's "Imagenet in 15" paper, last few bits are the hardest.

Link: https://github.com/diux-dev/imagenet18

GitHub

GitHub - cybertronai/imagenet18_old: Code to reproduce "imagenet in 18 minutes" DAWN-benchmark entry

Code to reproduce "imagenet in 18 minutes" DAWN-benchmark entry - cybertronai/imagenet18_old

5.0K views20:35

Data Science by ODS.ai 🦜

ImageNet/ResNet-50 Training speed dramatically (6.6 min -> 224 sec) reduced

ResNet-50 on ImageNet now (allegedly) down to 224sec (3.7min) using 2176 V100s. Increasing batch size schedule, LARS, 5 epoch LR warmup, synch BN without mov avg. (mixed) fp16 training. "2D-Torus" all-reduce on NCCL2, with NVLink2 & 2 IB EDR interconnect.

1.28M images over 90 epochs with 68K batches, so the entire optimization is ~1700 updates to converge.

ArXiV: https://arxiv.org/abs/1811.05233

#ImageNet #ResNet

5.2K viewsedited 02:40

👍 30 🙏👎 5

Data Science by ODS.ai 🦜

Do Better ImageNet Models Transfer Better?

Finding: better ImageNet architectures tend to work better on other datasets too. Surprise: pretraining on ImageNet dataset sometimes doesn't help very much.

ArXiV: https://arxiv.org/abs/1805.08974

#ImageNet #finetuning #transferlearning

6.0K views10:20

Data Science by ODS.ai 🦜

"Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet"

A "bag of words" of nets on tiny 17x17 patches suffice to reach AlexNet-level performance on ImageNet. A lot of the information is very local.

Paper: https://openreview.net/forum?id=SkfMWhAqYQ

#fun #CNN #CV #ImageNet

OpenReview

Approximating CNNs with Bag-of-local-Features models works...

Aggregating class evidence from many small image patches suffices to solve ImageNet, yields more interpretable models and can explain aspects of the decision-making of popular DNNs.

6.5K viewsedited 12:34

🤓 18 😐 2

Data Science by ODS.ai 🦜

Critics: AI competitions don’t produce useful models

Post, suggesting a viewpoint that AI competitions never seem to lead to products, how the one can overfit on a hold out test set, and why #Imagenet results since the mid-2010s are suspect.

Link: https://lukeoakdenrayner.wordpress.com/2019/09/19/ai-competitions-dont-produce-useful-models/

#critics #meta #AI #kaggle #imagenet #lenet

Luke Oakden-Rayner

AI competitions don’t produce useful models

Ai competitions are fun, community building, talent scouting, brand promoting, and attention grabbing. But competitions are not intended to develop useful models.

10.8K views05:19

🙈 8 😑 17 👍 31

Data Science by ODS.ai 🦜

📹What's Hidden in a Randomly Weighted Neural Network?

Amazingly this paper finds a subnetwork with random weights in a Wide ResNet-50 that outperforms optimized weights in a ResNet-34 for ImageNet!

On the last ICLR article by Lottery Ticket Hypothesis — the authors showed that it is possible to take a trained big net, and throw out at 95% of the scales so that the rest can be learned on the same quality, starting with the same initialization.
In the follow-up Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask found out that, it is possible to leave the weight with the initialization and learn only the mask, throwing unnecessary connections from the network - so it was possible to get under 40% of the quality on the Cifar, teaching not the weight of the model, but only its structure. Similar observations were made for simple RL tasks, see Weight-Agnostic Neural Network.
However, it was not clear how much structure-only training works on normal datasets and large nets, or without the right weights.

In the article the authors for the first time start struture-only on Imagenet. For this purpose:
- It takes a bold grid aka DenseNet, weights are initialized from the "binaryized" kaiming normal (either +std, or -std instead of normal).
- For each weight, an additional scalar - score s, showing how important it is for a good prediction. On the inference we take the top-k% weights and zero out the rest.
- With fixed weights, we train the scores. The main trick is that although in the forward pass we, like in the inference, take only top-k weights, in the backward pass the gradient flows through all the scores. It is ambiguous LRD where all weights are used in the forward, and in the backward - only a small subset.

Thus we can to prune a random WideResnet50 and get 73.3% accuracy on imagenet and there will be less active weights than in Resnet34. Magic.

ArXiV: https://arxiv.org/pdf/1911.13299.pdf
YouTube explanation: https://www.youtube.com/watch?v=C6Tj8anJO-Q
via @JanRocketMan

#ImageNet #ResNet

arXiv.org

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without...

7.8K viewsedited 11:06

🤡 5 😎 11

About

Blog

Apps

Platform