Data Science by ODS.ai 🦜

Generalization through Memorization: Nearest Neighbor Language Models

Introduced kNN-LMs, which extend LMs with nearest neighbor search in embedding space, achieving a new SOTA perplexity on Wikitext-103, without additional training!
Also show that kNN-LM can efficiently scale up LMs to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore without further training. It seems to be helpful in predicting long tail patterns, such as factual knowledge!

code available soon
Paper: https://arxiv.org/abs/1911.00172

#nlp #generalization #kNN

8.5K viewsedited 06:03

👍 18 😑 3

Neighbourhood Components Analysis
a PyTorch implementation of Neighbourhood Components Analysis

NCA learns a linear transformation of the dataset such that the expected leave-one-out performance of kNN in the transformed space is maximized.

The authors propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set.

It can also learn low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, this classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them.

The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction.

paper (only pdf): https://www.cs.toronto.edu/~hinton/absps/nca.pdf
github: https://github.com/kevinzakka/nca

#kNN #pca #nca #PyTorch

9.9K views21:45

👀 13 💪 23

About

Blog

Apps

Platform