Data Science by ODS.ai 🦜
51K subscribers
363 photos
34 videos
7 files
1.52K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
​​Image GPT
by openai

The authors have shown that by trading off 2-D knowledge for scale and by choosing predictive features from the middle of the network, a sequence transformer can be competitive with top convolutional nets for unsupervised image classification.
Notably, they achieved their results by directly applying the GPT-2 language model to image generation. Their results suggest that due to its simplicity and generality, a sequence transformer given sufficient compute might ultimately be an effective way to learn excellent features in many domains.

There are two methods they use to assess model performance:
[0] linear probe, uses the trained model to extract features from the images in the downstream dataset and then fits a logistic regression to the labels
[1] fine-tunes the entire model on the downstream dataset :youknow:


blog: https://openai.com/blog/image-gpt/
papers:
icml 2020 (v1)
(v2)
github (code is provided as-is, no updates expected): https://github.com/openai/image-gpt

#openai #gpt2 #language #image #icml2020
​​announcing scann: efficient vector similarity search
ruiqi guo, philip sun, erik lindgren, quan geng, david simcha, felix chern, & sanjiv kumar @ google research

scann is a method for efficient vector similarity search at scale. them implements includes search space pruning & quantization for maximum inner product search & also supports other distance functions such as euclidean distance
the implementation is designed for x86 processors with avx2 support
scann achieves sota performance on ann-benchmarks.com as shown on the glove-100-angular dataset on the attached


blog post: https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html
paper: https://arxiv.org/abs/1908.10396
github: https://github.com/google-research/google-research/tree/master/scann

#icml2020 #similarity #scann #annoy
​​REALM: Integrating Retrieval into Language Representation Models
by google research

A new paper from google with a novel approach for language model pre-training, which augments a language representation model with a knowledge retriever.

The idea is the following: we take a sentence or a piece of text and augment it with additional knowledge (pass original text and additional texts to the model).

An example:
The masked text is:

We paid twenty __ at the Buckingham Palace gift shop.


Knowledge retriever could add the following information to it:
Buckingham Palace is the London residence of the British monarchy.
The official currency of the United Kingdom is the Pound.


blog post: https://ai.googleblog.com/2020/08/realm-integrating-retrieval-into.html
paper: https://arxiv.org/abs/2002.08909
github: https://github.com/google-research/language/tree/master/language/realm

#nlp #languagemodel #knowledgeretriever #icml2020