Data Science by ODS.ai 🦜

🔥OpenAI realesed the 1.5billion parameter GPT-2 model

Post: https://openai.com/blog/gpt-2-1-5b-release/
GPT-2 output detection model: https://github.com/openai/gpt-2-output-dataset/tree/master/detector
Research from partners on potential malicious uses: https://d4mucfpksywv.cloudfront.net/papers/GPT_2_Report.pdf

#NLU #GPT2 #OpenAI #NLP

Openai

GPT-2: 1.5B release

As the final model release of GPT-2’s staged release, we’re releasing the largest version (1.5B parameters) of GPT-2 along with code and model weights to facilitate detection of outputs of GPT-2 models. While there have been larger language models released…

8.2K views22:30

😣 4 🤩 32

Data Science by ODS.ai 🦜

Lectures on computer architecture

Videos and slides about computer architecture by Professor Onur Mutlu

Channel: https://www.youtube.com/channel/UCIwQ8uOeRFgOEvBLYc3kc3g/featured
Professor: https://people.inf.ethz.ch/omutlu/

#hardware #lectures

8.4K viewsedited 10:03

👎⚙️ 7 👍 21

Data Science by ODS.ai 🦜

ODS breakfast in Paris! See you this Saturday (9th of November) at 10:30 at Malongo Café, 50 Rue Saint-André des Arts.

8.0K views12:14

Data Science by ODS.ai 🦜

Generalization through Memorization: Nearest Neighbor Language Models

Introduced kNN-LMs, which extend LMs with nearest neighbor search in embedding space, achieving a new SOTA perplexity on Wikitext-103, without additional training!
Also show that kNN-LM can efficiently scale up LMs to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore without further training. It seems to be helpful in predicting long tail patterns, such as factual knowledge!

code available soon
Paper: https://arxiv.org/abs/1911.00172

#nlp #generalization #kNN

8.5K viewsedited 06:03

👍 18 😑 3

Data Science by ODS.ai 🦜

Data science Munich dinner at Nov 8 Fri 20:00, table booked by name Eugen. 12 persons
Wirtshaus Valley´s
Aberlestraße 52, 81371 München
089 76775151
https://maps.app.goo.gl/XyrWcx15LBmMzGZV9

Wirtshaus Valley´s · Aberlestraße 52, 81371 München, Germany

★★★★★ · German restaurant

8.2K views11:07

Data Science by ODS.ai 🦜

Separate voice from music

Spleeter is the Deezer source separation library with pretrained models written in Python and uses Tensorflow. It makes it easy to train source separation model (assuming you have a dataset of isolated sources), and provides already trained state of the art model for performing various flavor of separation:
* vocals (singing voice) / accompaniment separation (2 stems)
* vocals / drums / bass / other separation (4 stems)
* vocals / drums / bass / piano / other separation (5 stems)

Spleeter is also very fast as it can perform separation of audio files to 4 stems 100x faster than real-time when run on a GPU

blog: https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e
paper: http://archives.ismir.net/ismir2019/latebreaking/000036.pdf
github: https://github.com/deezer/spleeter

#voice #music #tf

8.6K views13:22

👎 2 🎧 15 👍 34

Data Science by ODS.ai 🦜

Revealing the Dark Secrets of BERT

This work interpretation of self-attention.
Using a subset of GLUE tasks and a set of handcrafted features-of-interest, they proposed the methodology and carry out a qualitative and quantitative analysis of the information encoded by the individual BERT’s heads.
The findings suggest that there is a limited set of attention patterns that are repeated across different heads, indicating the overall model overparametrization.
Also, show that manually disabling attention in certain heads leads to a performance improvement over the regular fine-tuned BERT models.

paper: https://arxiv.org/abs/1908.08593

#nlp #bert

8.4K views18:40

👎 4 👍 24

Data Science by ODS.ai 🦜

Unsupervised Cross-lingual Representation Learning at Scale

They release XLM-R, a Transformer MLM trained in 100 langs on 2.5 TB of text data! Which obtains state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.
Introduced a comprehensive analysis of the capacity and limits of unsupervised multilingual masked language modeling at scale.
XLM-R especially outperforms mBERT and XLM-100 on low-resource languages, for which CommonCrawl data enables representation learning: +13.7% and +9.3% for Urdu, +21.6% and +13.8% accuracy for Swahili on XNLI.

Soon on transformers by huggingface repo & at tf.hub

paper: https://arxiv.org/abs/1911.02116
code: https://github.com/pytorch/fairseq/tree/master/examples/xlmr

#nlp #bert #xlu #transformer

9.1K viewsedited 08:11

👎 2 😱 10 👍 13

Data Science by ODS.ai 🦜

9.4K views20:31

Data Science by ODS.ai 🦜

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

tl;dr: GPT2 + Dialogue data = DialoGPT
trained on Reddit comments from 2005 through 2017 (not a very big dataset, about 2Gb)

Paper: https://arxiv.org/abs/1911.00536
Code: https://github.com/microsoft/DialoGPT
Blog: https://www.microsoft.com/en-us/research/project/large-scale-pretraining-for-response-generation/

#nlp #gpt2 #dialog

8.3K viewsedited 08:33

👎 3 👥 3 👍 13

Data Science by ODS.ai 🦜

GPU cooling tool

This script lets you set a custom GPU fan curve on a headless Linux server.

If you want to install multiple GPUs in a single machine, you have to use blower-style GPUs else the hot exhaust builds up in your case. Blower-style GPUs can get very loud, so to avoid annoying customers nvidia artificially limits their fans to ~50% duty. At 50% duty and a heavy workload, blower-style GPUs will hot up to 85C or so and throttle themselves.

Now if you're on Windows nvidia happily lets you override that limit by setting a custom fan curve. If you're on Linux though you need to use nvidia-settings, which - as of Sept 2019 - requires a display attached to each GPU you want to set the fan for. This is a pain to set up, as is checking the GPU temp every few seconds and adjusting the fan speed.

This script does all that for you.

Code: https://github.com/andyljones/coolgpus

#hardware #gpu

8.2K viewsedited 14:59

⚙️ 10 🤩 17

Data Science by ODS.ai 🦜

BPE-Dropout: Simple and Effective Subword Regularization

The dominant approach to subword segmentation is Byte Pair Encoding (BPE), which keeps the most frequent words intact while splitting the rare ones into multiple tokens.
And while multiple segmentations are possible even with the same vocabulary, BPE splits words into unique sequences; this may prevent a model from better learning the compositionality of words and being robust to segmentation errors.

In this paper introduced BPE-dropout – simple and effective subword regularization method based on and compatible with conventional BPE.
It stochastically corrupts the segmentation procedure of BPE, which leads to producing multiple segmentations within the same fixed BPE framework.

Using BPE-dropout during training and the standard BPE during inference improves translation quality up to 3 BLEU compared to BPE and up to 0.9 BLEU compared to the previous subword regularization.

Paper: https://arxiv.org/abs/1910.13267
Code: https://github.com/rsennrich/subword-nmt

#nlp #bpe

7.9K viewsedited 08:13

👍 9 🤔 3

Data Science by ODS.ai 🦜

Neural network reconstructs human thoughts from brain waves
in real time

MIPT (top Russian university) researchers published results on mind-reading technology.

Link: https://techxplore.com/news/2019-10-neural-network-reconstructs-human-thoughts.html
Video: https://www.youtube.com/watch?v=nf-P3b2AnZw

#Neuroscience #thoughts2pic #BCI #neuralink #MIPT

YouTube

Нейросети научили "читать мысли" в режиме реального времени

https://www.biorxiv.org/content/10.1101/787101v2

В рамках проекта "Ассистивные нейротехнологии" NeuroNet НТИ сотрудники ГК "Нейроботикс" и МФТИ обучили нейросети воссоздавать изображения по электрической активности мозга, ранее такие эксперименты никем не…

9.2K views08:36

🤯 38 😻 13

Data Science by ODS.ai 🦜

How to remember difference between Type 1 and Type 2 errors.

9.3K views16:57

Data Science by ODS.ai 🦜

Using AI to Understand What Causes Diseases

An overview on applying data science in healthcare

Poster: https://info.gnshealthcare.com/hubfs/Publications_2019/ESMO_GI_Final_Poster_Printed_PD_20.pdf
Link: https://hbr.org/2019/11/using-ai-to-understand-what-causes-diseases

#meta #biolearning #dl #medical #healthcare

8.8K views19:43

Data Science by ODS.ai 🦜

🏆 Moscow ML Trainings meetup on the 16th of November

ML Trainings are based on Kaggle and other platform competitions and are held regularly with free attendance. Winners and top-performing participants discuss competition tasks, share their solutions, and results.

You may find the program and the registration link here - @mltrainings
* Note: this time the first talk will be in English and the rest will be in Russian.

8.7K views09:35

Data Science by ODS.ai 🦜

The female problem: how male bias in medical trials ruined women's health

Intersting article on #bias in #medical trials and how proper #statistics training is still important.

Link: https://www.theguardian.com/lifeandstyle/2019/nov/13/the-female-problem-male-bias-in-medical-trials

the Guardian

The female problem: how male bias in medical trials ruined women's health

Centuries of female exclusion has meant women’s diseases are often missed, misdiagnosed or remain a total mystery

8.6K views05:19

Data Science by ODS.ai 🦜

ODS breakfast in Paris! See you this Saturday at 10:30 at Malongo Café, 50 Rue Saint-André des Arts. We are expecting at least from 5 to 10 people.

7.9K views09:38

Data Science by ODS.ai 🦜

Self-training with Noisy Student improves ImageNet classification

Using unlabeled data with pseudolabeling improves accuracy in ImageNet.

In the work uses self-training on unlabeled data to achieve 87.4% top-1 on ImageNet, 1% better than SOTA. Huge gains are seen on harder benchmarks (ImageNet-A, C and P).

Method is super simple:

1) Train a classifier on ImageNet
2) Infer labels on a much larger unlabeled dataset
3) Train a larger classifier on the combined set
4) Iterate the process, adding noise

For the beginning they use EfficientNet-B0, pretrained on ImageNet, and predict JFT dataset. Then use predicted classes with confidence > 0.3 on each classes 130k images and get 130М images, delete duplicates, so left 81M.

Architecture:

EfficeintNet, student model much bigger than teacher.

Learning process:

Batch size of 2048
Sota model L2 learned 350 epochs.
The learning rate starts at 0.128 for labeled batch size 2048 and decays by 0.97 every 2.4 epochs if trained for 350 epochs or every 4.8 epochs if trained for 700 epochs :alchemy:

The biggest model L2 have learned 3.5 days on Cloud TPU v3 Pod with 2048 cores.

To start they learned B7 as a student and a teacher. Then, using B7 like teacher learned L0 student. Then they learned L1 and so on to L2 and L2 as the teacher learned L2 student.

Result:
SOTA with 2 times fewer params then last SOTA (FixRes ResNeXt-101 WSL 829M par)

paper: https://arxiv.org/abs/1911.04252
tweet: https://twitter.com/quocleix/status/1194334947156193280?s=20

#cv #selfTraining

9.6K viewsedited 10:13

😱 22 🔥 15

Data Science by ODS.ai 🦜

Updating Pre-trained Word Vectors and Text Classifiers using Monolingual Alignment

The authors drew inspiration from the way #multilingual word vectors are learned. They treated general-purpose and domain-specific corpora as separate languages and used a word-embedding model to learn independent vectors from each. Then they aligned the vectors from one corpus with those from another.

To align word vectors from two corpora, common words are used to find a consistent way to represent all words. For example, if one corpus is [human, cat] and the other is [cat, dog], the model applies a transformation that unifies the dog word vectors while retaining the relative positions of the word vectors between cats, dogs, and humans.
A word-embedding model learns independent word vectors from both corpora.

The authors use a loss function called #RCSLS for training. RCSLS balances two objectives: General-purpose vectors that are close together remain close together, while general-purpose vectors that far apart remain far apart. Common words in the two corpora now have duplicate vectors. Averaging them produces a single vector representation.

They consider applications to word embedding and text, classification models. Show that the proposed approach yields good performance in all setups and outperforms a baseline consisting of fine-tuning the model on new data.

paper: https://arxiv.org/abs/1910.06241

#nlp

8.6K views11:14

👍 12 🔥 11

Data Science by ODS.ai 🦜

Emerging Cross-lingual Structure in Pretrained Language Models

tl;dr – dissect mBERT & XLM and show monolingual BERTs are similar

They offer an ablation study on bilingual #MLM considering all relevant factors. Sharing only the top 2 layers of the #transformer finally break cross-lingual transfer.
Factors importance: parameter sharing >> domain similarity, anchor points, language universal softmax, joint BPE

We can align monolingual BERT representation at word-level & sentence level with orthogonal mapping. CKA visualizes the similarity of monitoring. & billing. BERT

Paper: https://arxiv.org/abs/1911.01464

#nlp #multilingual

8.1K viewsedited 08:13

👀 4 👍 18

About

Blog

Apps

Platform