Data Science by ODS.ai 🦜
51.2K subscribers
359 photos
32 videos
7 files
1.51K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
​​Episodic Memory in Lifelong Language Learning

tl;dr – the model needs to learn from a stream of text examples without any dataset identifier.

The authors propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experience replay & local adaptation to allow the model to continuously learn from new datasets.

Also, they show that the space complexity of the episodic memory module can be reduced significantly (∼50-90%) by randomly choosing which examples to store in memory with a minimal decrease in performance. They consider an episodic memory component as a crucial building block of general linguistic intelligence and see the model as the first step in that direction.

paper: https://arxiv.org/abs/1906.01076

#nlp #bert #NeurIPSConf19
🀩27πŸ₯³
Open invitation to student Olympiad

IT students from all over the world are invited to participate in Digital Economy International Olympiad held in Russia from January to March 2020. First two stages will held online via Stepik platform and finalists will go to Moscow at the expense of organizers to participate in the final stage and award ceremony!
Winners will be preferentially enrolled in the top Russian Universities and get job-offers from the best IT-companies.

Nominations include:
- Neurotechnology and artificial intelligence
- Data science
- Big data
- Parts of robotics and sensors

Link: https://olymp.digitaleconomy.world/index_en.html
​​What we learned from NeurIPS 2019 data

x4 growth since 2014
21.6% acceptance rate

Takeaways:

1. No free-loader problem: Relatively few papers are submitted where none of the authors invited to participate in the review process accepted the invitation
2. Unclear how to rapidly filter papers prior to full review: Allowing for early desk rejects by ACs is unlikely to have a significant impact on reviewer load without producing inappropriate decisions. Likewise, the eagerness of reviewers to review a particular paper is not a strong signal, either.
3. No clear evidence that review quality as measured by length is lower for NeurIPS: NeurIPS is surprisingly not much different from other conferences of smaller sizes when it comes to review length.
4. Impact of engagement in rebuttal/discussion period: Overall engagement seemed to be higher than in 2018.

#Nips #NeurIPS #NIPS2019 #conference #meta
​​Low-variance Black-box Gradient Estimates for the Plackett-Luce Distribution

The authors consider models with #latent #permutations and propose control variates for the #PlackettLuce distribution. In particular, the control variates allow them to optimize #blackBox functions over permutations using stochastic gradient descent. To illustrate the approach, they consider a variety of causal structure learning tasks for continuous and discrete data.
They show that the method outperforms competitive relaxation-based optimization methods and is also applicable to non-differentiable score functions.

paper: https://arxiv.org/abs/1911.10036
tweet: https://twitter.com/bayesgroup/status/1199023536653950976?s=20
Football performance modelling in python

An article with football game outcome analysis in #jupyter.

One of the key features to predict a football game turned out to be the best attacking player even though "A ferocious scream from the stands, a mistaken whistle from the ref’, or the shrimps on the lunch menu may jeopardise the whole outcome of the match."

Authors also highlighted improtance of amplitude of persistence diagrams as a key feature.

This is a suggested by the channel readers material. Don’t forget to thank them by giving claps πŸ‘ on Medium and starring repository if you found a code useful.

Link: https://towardsdatascience.com/the-shape-of-football-games-1589dc4e652a
Code: https://github.com/giotto-ai/football-tda

#football #soccer #betting #readersmaterial
​​Reinforcement Learning Upside Down: Don't Predict Rewards – Just Map Them to Actions by Juergen Schmidhuber

Traditional #RL predicts rewards and uses a myriad of methods for translating those predictions into good actions. κ“Άκ“€ shortcuts this process, creating a direct mapping from rewards, time horizons and other inputs to actions.

Without depending on reward predictions, and without explicitly maximizing expected rewards, κ“Άκ“€ simply learn by gradient descent to map task specifications or commands (such as: get lots of reward within little time) to action probabilities. Its success depends on the generalization abilities of deep/recurrent neural nets. Its potential drawbacks are essentially those of traditional gradient-based learning: local minima, underfitting, overfitting, etc.
Nevertheless, experiments in a separate paper show that even them initial pilot version of κ“Άκ“€ can outperform traditional RL methods on certain challenging problems.

A closely related Imitate-Imitator approach is to imitate a robot, then let it learn to map its observations of the imitated behavior to its own behavior, then let it generalize, by demonstrating something new, to be imitated by the robot.

more at paper: https://arxiv.org/abs/1912.02875
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

TLDR by HuggingFace

Source: tweet

#BERT #NLU #NLP
πŸ† Moscow ML Trainings meetup on the 14th of December

ML Trainings are based on Kaggle and other platform competitions and are held regularly with free attendance. Winners and top-performing participants discuss competition tasks, share their solutions, and results.

Program and the registration link - https://pao-megafon--org.timepad.ru/event/1137770/
* Note: the first talk will be in English and the rest will be in Russian
​​Not Enough Data? Deep Learning to the Rescue!

The authors use a powerful pre-trained #NLP NN model to artificially synthesize new #labeled #data for #supervised learning. They mainly focus on cases with scarce labeled data. Their method referred to as language-model-based data augmentation (#LAMBADA), involves fine-tuning a SOTA language generator to a specific task through an initial training phase on the existing (usually small) labeled data.
Using the fine-tuned model and given a class label, new sentences for the class are generated. Then they filter these new sentences by using a classifier trained on the original data.

In a series of experiments, they show that LAMBADA improves classifiers performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the SOTA techniques for data augmentation, specifically those applicable to text classification tasks with little data.

paper: https://arxiv.org/abs/1911.03118
Media is too big
VIEW IN TELEGRAM
StyleGANv2 πŸŽ‰:
- significantly better samples (better FID scores & reduced artifacts)
- no more progressive growing
- improved Style-mixing
- smoother interpolations (extra regularization)
- faster training

paper: https://arxiv.org/abs/1912.04958
code: https://github.com/NVlabs/stylegan2
origin video: https://www.youtube.com/watch?v=c-NJtV9Jvp0&feature=emb_logo
AR-Net: A simple autoregressive NN for #timeSeries

AR-Net, has 2 distinct advantages over its traditional counterpart:
* scales well to large orders, making it possible to estimate long-range dependencies (important in high-resolution monitoring applications, such as those in the data center domain);
* automatically selects and estimates the important coefficients of a sparse AR process, eliminating the need to know the true order of the AR process

To overcome the scalability challenge, they train a NN with #SGD to learn the AR (#autoregression) coefficients. AR-Net effectively learns near-identical weights as classic AR implementations & is equally good at predicting the next value of the time series.
Also, AR-Net automatically learns the relevant weights, even if the underlying data is generated by a noisy & extremely sparse AR process.

blog: https://ai.facebook.com/blog/ar-net-a-simple-autoregressive-neural-network-for-time-series/
paper: https://arxiv.org/abs/1911.03118
​​The 2019 AI Index report!

Stanford University Human-Centered Artificial Intelligence released an annual report about the state of #AI this year. 160 pages of text with various metrics and measurements. Good to read with a cup of coffee ;)

TL;DR by TheVerge in the picture but inside report more interesting!

site of the report: https://hai.stanford.edu/ai-index/2019
​​A Deep Neural Network's Loss Surface Contains Every Low-dimensional Pattern

New work from #DeepMind built in top of Loss Landscape Sightseeing with Multi-Point Optimization

ArXiV: https://arxiv.org/abs/1912.07559
Predecessor’s github: https://github.com/universome/loss-patterns