Generalization through Memorization: Nearest Neighbor Language Models
Introduced kNN-LMs, which extend LMs with nearest neighbor search in embedding space, achieving a new SOTA perplexity on Wikitext-103, without additional training!
Also show that kNN-LM can efficiently scale up LMs to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore without further training. It seems to be helpful in predicting long tail patterns, such as factual knowledge!
code available soon
Paper: https://arxiv.org/abs/1911.00172
#nlp #generalization #kNN
Introduced kNN-LMs, which extend LMs with nearest neighbor search in embedding space, achieving a new SOTA perplexity on Wikitext-103, without additional training!
Also show that kNN-LM can efficiently scale up LMs to larger training sets and allows for effective domain adaptation, by simply varying the nearest neighbor datastore without further training. It seems to be helpful in predicting long tail patterns, such as factual knowledge!
code available soon
Paper: https://arxiv.org/abs/1911.00172
#nlp #generalization #kNN
Neighbourhood Components Analysis
a PyTorch implementation of Neighbourhood Components Analysis
NCA learns a linear transformation of the dataset such that the expected leave-one-out performance of kNN in the transformed space is maximized.
The authors propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set.
It can also learn low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, this classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them.
The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction.
paper (only pdf): https://www.cs.toronto.edu/~hinton/absps/nca.pdf
github: https://github.com/kevinzakka/nca
#kNN #pca #nca #PyTorch
a PyTorch implementation of Neighbourhood Components Analysis
NCA learns a linear transformation of the dataset such that the expected leave-one-out performance of kNN in the transformed space is maximized.
The authors propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set.
It can also learn low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, this classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them.
The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction.
paper (only pdf): https://www.cs.toronto.edu/~hinton/absps/nca.pdf
github: https://github.com/kevinzakka/nca
#kNN #pca #nca #PyTorch