Simple, Scalable Adaptation for Neural Machine Translation
Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a separate model for each target task. Researchers from Google propose a simple yet efficient approach for adaptation in #NMT. Their proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model. These lightweight adapters, with just a small fraction of the original model size, adapt the model to multiple individual tasks simultaneously.
Guess it can be applied not only in #NMT but in many other #NLP, #NLU and #NLG tasks.
Paper: https://arxiv.org/pdf/1909.08478.pdf
#BERT
❇️ @AI_Python_EN
Fine-tuning pre-trained Neural Machine Translation (NMT) models is the dominant approach for adapting to new languages and domains. However, fine-tuning requires adapting and maintaining a separate model for each target task. Researchers from Google propose a simple yet efficient approach for adaptation in #NMT. Their proposed approach consists of injecting tiny task specific adapter layers into a pre-trained model. These lightweight adapters, with just a small fraction of the original model size, adapt the model to multiple individual tasks simultaneously.
Guess it can be applied not only in #NMT but in many other #NLP, #NLU and #NLG tasks.
Paper: https://arxiv.org/pdf/1909.08478.pdf
#BERT
❇️ @AI_Python_EN
Communication-based Evaluation for Natural Language Generation (#NLG) that's dramatically out-performed standard n-gram-based methods.
Have you ever think that n-gram overlap measures like #BLEU or #ROUGE is not good enough for #NLG evaluation and human based evaluation is too expensive? Researchers from Stanford University also think so. The main shortcoming of #BLEU or #ROUGE methods is that they fail to take into account the communicative function of language; a speaker's goal is not only to produce well-formed expressions, but also to convey relevant information to a listener.
Researchers propose approach based on color reference game. In this game, a speaker and a listener see a set of three colors. The speaker is told one color is the target and tries to communicate the target to the listener using a natural language utterance. A good utterance is more likely to lead the listener to select the target, while a bad utterance is less likely to do so. In turn, effective metrics should assign high scores to good utterances and low scores to bad ones.
Paper: https://arxiv.org/pdf/1909.07290.pdf
Code: https://github.com/bnewm0609/comm-eval
#NLP #NLU
❇️ @AI_Python_EN
Have you ever think that n-gram overlap measures like #BLEU or #ROUGE is not good enough for #NLG evaluation and human based evaluation is too expensive? Researchers from Stanford University also think so. The main shortcoming of #BLEU or #ROUGE methods is that they fail to take into account the communicative function of language; a speaker's goal is not only to produce well-formed expressions, but also to convey relevant information to a listener.
Researchers propose approach based on color reference game. In this game, a speaker and a listener see a set of three colors. The speaker is told one color is the target and tries to communicate the target to the listener using a natural language utterance. A good utterance is more likely to lead the listener to select the target, while a bad utterance is less likely to do so. In turn, effective metrics should assign high scores to good utterances and low scores to bad ones.
Paper: https://arxiv.org/pdf/1909.07290.pdf
Code: https://github.com/bnewm0609/comm-eval
#NLP #NLU
❇️ @AI_Python_EN
Forwarded from DLeX: AI Python (Farzad)
This media is not supported in your browser
VIEW IN TELEGRAM
Statistician: How might you analyze this data
Data scientist: Definitely start with a neural net
Statistician:
Alright, fine, start off with a multiple regression with every possible covariate, look at the t-tests and remove those that aren't significant.
Statistician: have you tried PCA?
Data scientist: no, but I'm using a linearly-activated autoencoder, and that seems to work pretty well.
Statistician: ...
❇ @AI_Python
✴ @AI_Python_EN
Data scientist: Definitely start with a neural net
Statistician:
Alright, fine, start off with a multiple regression with every possible covariate, look at the t-tests and remove those that aren't significant.
Statistician: have you tried PCA?
Data scientist: no, but I'm using a linearly-activated autoencoder, and that seems to work pretty well.
Statistician: ...
❇ @AI_Python
✴ @AI_Python_EN
Forwarded from DLeX: AI Python (Farzad)
Transformers working for RL! Two simple modifications: move layer-norm and add gating creates GTrXL: an incredibly stable and effective architecture for integrating experience through time in RL.
https://arxiv.org/abs/1910.06764
❇️ @AI_Python
✴️ @AI_Python_EN
https://arxiv.org/abs/1910.06764
❇️ @AI_Python
✴️ @AI_Python_EN
Forwarded from DLeX: AI Python (Farzad)
Uncertainty Quantification in Deep Learning
https://www.inovex.de/blog/uncertainty-quantification-deep-learning/
https://www.inovex.de/blog/uncertainty-quantification-deep-learning/
Most of the world’s text is not in English. We are releasing MultiFiT to train and fine-tune language models efficiently in any language.
Post:
http://nlp.fast.ai/classification/2019/09/10/multifit.html
Paper:
https://arxiv.org/abs/1909.04761
✴ @AI_Python_EN
Post:
http://nlp.fast.ai/classification/2019/09/10/multifit.html
Paper:
https://arxiv.org/abs/1909.04761
✴ @AI_Python_EN
Did you run any experiments in XNLI? Also curious how it compares to XLM. Also, shameless plug for the cross-lingual QA dataset we just released, MLQA
https://github.com/facebookresearch/MLQA - could be a great testbed for models like this
❇ @AI_Python_EN
https://github.com/facebookresearch/MLQA - could be a great testbed for models like this
❇ @AI_Python_EN
AI, Python, Cognitive Neuroscience
Did you run any experiments in XNLI? Also curious how it compares to XLM. Also, shameless plug for the cross-lingual QA dataset we just released, MLQA https://github.com/facebookresearch/MLQA - could be a great testbed for models like this ❇ @AI_Python_EN
XNLI and MLQA needs bidirectional context, and multifit is unidirectional since it uses casual language modeling and RNNs. But it is on the todo list just below of training multifit in zeroshoot scenario with XLM as a teacher model.
❇ @AI_Python_EN
❇ @AI_Python_EN
Does the brain do backpropagation? CAN Public Lecture - Geoffrey Hinton
https://www.youtube.com/watch?v=qIEfJ6OBGj8
❇ @AI_Python_EN
https://www.youtube.com/watch?v=qIEfJ6OBGj8
❇ @AI_Python_EN
YouTube
Does the brain do backpropagation? CAN Public Lecture - Geoffrey Hinton - May 21, 2019
Canadian Association for Neuroscience 2019 Public lecture: Geoffrey Hinton
https://can-acn.org/2019-public-lecture-geoffrey-hinton
https://can-acn.org/2019-public-lecture-geoffrey-hinton
Netflix Open-sourcing Polynote: an IDE-inspired polyglot notebook
#DataScience #MachineLearning #ArtificialIntelligence
http://bit.ly/2N9m8qe
❇️ @AI_Python_EN
#DataScience #MachineLearning #ArtificialIntelligence
http://bit.ly/2N9m8qe
❇️ @AI_Python_EN
Omid Sarfarzadeh and Maysam Asgari-Chenaghlu , we will have a session on #DeepNLP and it’s applications to #SearchEngine and #Chatbot in #Google’s #DevFest, Istanbul. We will be honored to represent adesso Turkey. Thanks to Tufan K. and all adesso Turkey family to provide this chance for us. More information is provided as follows:
#DeepLearning #DeepNLP #NLP #chatbot #SearchEngine #adesso #adessoTurkey
https://devfest.istanbul
https://dfist19.firebaseapp.com/
❇ @AI_Python_EN
#DeepLearning #DeepNLP #NLP #chatbot #SearchEngine #adesso #adessoTurkey
https://devfest.istanbul
https://dfist19.firebaseapp.com/
❇ @AI_Python_EN
Google is now using BERT to improve its core search algorithm. It has been live now for the past couple of days for search queries made in English in the US. A/B tests have shown good results so far. That's huge news especially for people who make money on web traffic! #deeplearning #machinelearning
📝 Article:
https://lnkd.in/d_fwVeg
📝 Article:
https://lnkd.in/d_fwVeg
Google is training graph neural networks to predict smells
#DataScience #MachineLearning #ArtificialIntelligence
http://bit.ly/344EaAZ
❇️ @AI_Python_EN
#DataScience #MachineLearning #ArtificialIntelligence
http://bit.ly/344EaAZ
❇️ @AI_Python_EN
Public datasets: weather and climate Google Cloud’s Public Datasets Program :
https://lnkd.in/edhe7wj
#ArtificialIntelligence #Datasets
✴ @AI_Python_EN
https://lnkd.in/edhe7wj
#ArtificialIntelligence #Datasets
✴ @AI_Python_EN
When regression models perform poorly, there are typically several reasons. Here are some:
Important variables have been omitted from the model. These may include latent (unobservable) variables, such as socio-economic status.
Errors in the data and missing data.
Measurement error. This is not the same as data errors.
The wrong type of regression was used - there are many kinds of regression models. For example, OLS linear regression was used for count data.
Moderated effects (interactions) were ignored or improperly modeled. A simple example would be when the relationship between age and purchase frequency depends on gender but this has not been accounted for.
Curvilinear effects have been ignored or modeled improperly. For example, the relationship between age and expenditures often does not follow a straight-line path.
Heterogeneity has been ignored or modeled incorrectly. There may be several segments of consumers with different drivers (preferences), for instance.
Our expectations were wrong - the data are low-signal and there's little value to be extracted whatever we do.
Often these errors can be avoided or fixed with little effort, but the person doing the modeling must know how. Unfortunately, some people using regression have little understanding of it.
✴️ @AI_Python_EN
Important variables have been omitted from the model. These may include latent (unobservable) variables, such as socio-economic status.
Errors in the data and missing data.
Measurement error. This is not the same as data errors.
The wrong type of regression was used - there are many kinds of regression models. For example, OLS linear regression was used for count data.
Moderated effects (interactions) were ignored or improperly modeled. A simple example would be when the relationship between age and purchase frequency depends on gender but this has not been accounted for.
Curvilinear effects have been ignored or modeled improperly. For example, the relationship between age and expenditures often does not follow a straight-line path.
Heterogeneity has been ignored or modeled incorrectly. There may be several segments of consumers with different drivers (preferences), for instance.
Our expectations were wrong - the data are low-signal and there's little value to be extracted whatever we do.
Often these errors can be avoided or fixed with little effort, but the person doing the modeling must know how. Unfortunately, some people using regression have little understanding of it.
✴️ @AI_Python_EN
AI, Python, Cognitive Neuroscience
When regression models perform poorly, there are typically several reasons. Here are some: Important variables have been omitted from the model. These may include latent (unobservable) variables, such as socio-economic status. Errors in the data and missing…
Regression is a very general term - notice the similarities between neural nets and regression, for instance. Another example is structural equation modeling (SEM). About 40 years ago UCLA professor Peter Bentler and his doctoral student David Weeks showed how SEM could be represented as regression, which reduced the number of required matrices from eight in the traditional LISREL notation to three. Results will be identical. Factor analysis and latent class clustering can also be conceptualized as regression.
I did not have space to mention clustered (multilevel) data. In some cases ignoring hierarchical structure in data can be very consequential and cause us to misinterpret our data. Sometimes different models are required for different levels of the data, as well.
✴️ @AI_Python_EN
I did not have space to mention clustered (multilevel) data. In some cases ignoring hierarchical structure in data can be very consequential and cause us to misinterpret our data. Sometimes different models are required for different levels of the data, as well.
✴️ @AI_Python_EN
FUNIT: Few-Shot Unsupervised Image-to-Image Translation
Code on GitHub:
http://bit.ly/2patFxh
Link to Paper:
https://lnkd.in/ekgDy5u
Link to Blog Post:
https://bit.ly/2JIm28n
#DataScience #MachineLearning #ArtificialIntelligence
✴️ @AI_Python_EN
Code on GitHub:
http://bit.ly/2patFxh
Link to Paper:
https://lnkd.in/ekgDy5u
Link to Blog Post:
https://bit.ly/2JIm28n
#DataScience #MachineLearning #ArtificialIntelligence
✴️ @AI_Python_EN
Pytorch-Struct
Fast, general, and tested differentiable structured prediction in PyTorch. By Harvard NLP : https://lnkd.in/e2iGiNa
#PyTorch #DeepLearning #ArtificialIntelligence
✴️ @AI_Python_EN
Fast, general, and tested differentiable structured prediction in PyTorch. By Harvard NLP : https://lnkd.in/e2iGiNa
#PyTorch #DeepLearning #ArtificialIntelligence
✴️ @AI_Python_EN
Evaluating the Factual Consistency of Abstractive Text Summarization
https://lnkd.in/ewFMX8T
#ArtificialIntelligence #DeepLearning #NLP #NaturalLanguageProcessing
✴ @AI_Python_EN
https://lnkd.in/ewFMX8T
#ArtificialIntelligence #DeepLearning #NLP #NaturalLanguageProcessing
✴ @AI_Python_EN
PaperRobot: Incremental Draft Generation of Scientific Ideas
https://lnkd.in/exHGHjW
#ArtificialIntelligence #AI #MachineLearning #DeepLearning
❇ @AI_Python_EN
https://lnkd.in/exHGHjW
#ArtificialIntelligence #AI #MachineLearning #DeepLearning
❇ @AI_Python_EN