Data Science by ODS.ai 🦜
51K subscribers
363 photos
34 videos
7 files
1.52K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
​​Towards Lingua Franca Named Entity Recognition with BERT

The authors present a simple and effective recipe for building #multilingual #NER systems with #BERT.
By utilizing a multilingual BERT framework, they were able to not only train a system that can perform inference on English, German, Spanish, and Dutch languages, but it performs better than the same model trained only on one language at a time, and also is able to perform 0-shot inference.
The resulting model yields #SotA results on CoNLL Spanish and Dutch, and on OntoNotes Chinese and Arabic datasets.

Also, the English trained model yields SotA results for 0-shot languages for Spanish, Dutch, and German NER, improving it by a range of 2.4F to 17.8F.
Furthermore, the runtime signature (memory/CPU/GPU) of the model is the same as the models built on single languages, significantly simplifying its life- cycle maintenance.

paper: https://arxiv.org/abs/1912.01389
​​SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

Abstract: CNN typically encodes an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection). The encoder-decoder architectures are proposed to resolve this by applying a decoder network onto a backbone model designed for classification tasks. In this paper, we argue that encoder-decoder architecture is ineffective in generating strong multi-scale features because of the scale-decreased backbone. We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that are learned on an object detection task by Neural Architecture Search. SpineNet achieves the SOTA performance of a one-stage object detector on COCO with 60% less computation and outperforms ResNet-FPN counterparts by 6% AP. SpineNet architecture can transfer to classification tasks, achieving 6% top-1 accuracy improvement on a challenging iNaturalist fine-grained dataset.

So, by Google's beloved method of creating a new SOTA, there is a new one! They just permute ResNet layers by NAS with adding resample cross-scale connections for correct connection scales output between layers. It seems that no need FPN cause the whole backbone is FPN. They train from scratch on RetinaNet just replace ResNet backbone with SpineNet and get SOTA. On two-stage detectors, there is the same result by replacing the backbone with SpineNet. If you want just classify something with that backbone it is performed very well too. So new architecture for any application!
Good job.

paper: https://arxiv.org/abs/1912.05027
code: Very wanted, but not release yet

#CV #ObjectDetection #GoogleResearch #NAS #SOTA
​​MaxUp: A Simple Way to Improve Generalization of Neural Network Training

A new approach to augmentation both images and text. The idea is to generate a set of augmented data with some random perturbations or transforms and minimize the maximum, or worst case loss over the augmented data. By doing so, the authors implicitly introduce a smoothness or robustness regularization against the random perturbations, and hence improve the generation performance. Testing MaxUp on a range of tasks, including image classification, language modeling, and adversarial certification, it is consistently outperforming the existing best baseline methods, without introducing substantial computational overhead.

Each sample in the batch is augmented m times and then found aug with maximum loss and does backprop only through that. i.e. minimizing max loss.

There is some proof of the theorem that MaxUp is gradient-norm regularization if minimizing loss through all batch. Also, It can be viewed as an adversarial variant of data augmentation, in that it minimizes the worse case loss on the perturbed data, instead of an average loss like typical data augmentation methods.

MaxUp easy to mix with other augs without the overhead. Only m times to forward pass on the sample but one time to backprop.


paper: https://arxiv.org/abs/2002.09024

#augmentations #SOTA #ml
​​ResNeSt: Split-Attention Networks

A novel variation of ResNet architecture that outperforms other networks with similar model complexities.
Usually, downstream applications use the ResNet or one of its variants as the backbone CNN. Its simple and modular design can be easily adapted to various tasks. However, since ResNet models are originally designed for image classification, they may not be suitable for various downstream applications because of the limited receptive-field size and lack of cross-channel interaction.

Main contributions of the paper:
- Split-Attention block. Each block divides the feature-map into several groups (along the channel dimension) and finer-grained subgroups or splits, where the feature representation of each group is determined via a weighted combination of the representations of its splits. By stacking several Split-Attention blocks, they get a ResNet-like network called ResNeSt (S stands for β€œsplit”). This architecture requires no more computation than existing ResNet-variants, and is easy to be adopted as a backbone for other vision tasks
- a lot of large scale benchmarks on image classification and transfer learning.

Models utilizing a ResNeSt backbone are able to achieve SOTA performance on several tasks, namely: image classification, object detection, instance segmentation, and semantic segmentation.
ResNeSt-50 achieves 81.13% top-1 accuracy on ImageNet using a single crop-size of 224 Γ— 224, outperforming previous best ResNet variant by more than 1% accuracy


Paper: https://arxiv.org/abs/2004.08955
Github: https://github.com/zhanghang1989/ResNeSt

#computervision #deeplearning #resnet #image #backbone #downstream #sota
​​A new SOTA on voice separation model that distinguishes multiple speakers simultaneously

Pandemic given a sufficient rise to new technologies covering voice communication. Noise cancelling is required more than ever and now #Facebook introduced a new method for separating as many as five voices speaking simultaneously into a single microphone. It pushes state of the art on multiple benchmarks, including ones with challenging noise and reverberations.

Blogpost: https://ai.facebook.com/blog/a-new-state-of-the-art-voice-separation-model-that-distinguishes-multiple-speakers-simultaneously
Paper: https://arxiv.org/pdf/2003.01531.pdf

#SOTA #FacebookAI #voicerecognition #soundlearning #DL
​​Do Adversarially Robust ImageNet Models Transfer Better?

TLDR - Yes.

Authors decide to check will adversarial trained network performed better on transfer learning tasks despite on worst accuracy on the trained dataset (ImageNet of course). And it is true.

They tested this idea on a frozen pre-trained feature extractor and trained only linear classifier that outperformed classic counterpart. And they tested on a full unfrozen fine-tuned network, that outperformed too on transfer learning tasks.

On pre-train task they use the adversarial robustness prior, that refers to a model’s invariance to small (often imperceptible) perturbations of its inputs.

They show also that such an approach gives better future representation properties of the networks.

They did many experiments (14 pages of graphics) and an ablation study.


paper: https://arxiv.org/abs/2007.08489
code: https://github.com/Microsoft/robust-models-transfer

#transfer_learning #SOTA #adversarial
​​QVMix and QVMix-Max: Extending the Deep Quality-Value Family of Algorithms to Cooperative Multi-Agent Reinforcement Learning

Paper extends the Deep Quality-Value (DQV) family of al-
gorithms to multi-agent  reinforcement learning and outperforms #SOTA

ArXiV: https://arxiv.org/abs/2012.12062

#DQV #RL #Starcraft
​​Self-training improves pretraining for natural language understanding

Authors suggested another way to leverage unlabeled data through semi-supervised learning. They use #SOTA sentence embeddings to structure the information of a very large bank of sentences.

Code: https://github.com/facebookresearch/SentAugment
Link: https://arxiv.org/abs/2010.02194
​​Revisiting ResNets: Improved Training and Scaling Strategies

The authors of the paper (from Google Brain and UC Berkeley) have decided to analyze the effects of the model architecture, training, and scaling strategies separately and concluded that these strategies might have a higher impact on the score than the architecture.

They offer two new strategies:
- scale model depth if overfitting is possible, scale model width otherwise
- increase image resolution slower than recommended in previous papers

Based on these ideas, the new architecture ResNet-RS was developed. It is 2.1x–3.3x faster than EfficientNets on GPU while reaching similar accuracy on ImageNet.

In semi-supervised learning, ResNet-RS achieves 86.2% top-1 ImageNet accuracy while being 4.7x faster than EfficientNet-NoisyStudent.

Transfer learning on downstream tasks also has improved performance.

The authors suggest using these ResNet-RS as a baseline for further research.


Paper: https://arxiv.org/abs/2103.07579

Code and checkpoints are available in TensorFlow:
https://github.com/tensorflow/models/tree/master/official/vision/beta

https://github.com/tensorflow/tpu/tree/master/models/official/resnet/resnet_rs

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-resnetsr


#deeplearning #computervision #sota
​​EfficientNetV2: Smaller Models and Faster Training

A new paper from Google Brain with a new SOTA architecture called EfficientNetV2. The authors develop a new family of CNN models that are optimized both for accuracy and training speed. The main improvements are:

- an improved training-aware neural architecture search with new building blocks and ideas to jointly optimize training speed and parameter efficiency;
- a new approach to progressive learning that adjusts regularization along with the image size;

As a result, the new approach can reach SOTA results while training faster (up to 11x) and smaller (up to 6.8x).

Paper: https://arxiv.org/abs/2104.00298

Code will be available here:
https://github.com/google/automl/tree/master/efficientnetv2

A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-effnetv2

#cv #sota #nas #deeplearning