Supporting content decision makers with machine learning
#Netflix shared a post providing information about how they research and prepare data for new title production.
Link: https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f
#NLU #NLP #recommendation #embeddings
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
#Netflix shared a post providing information about how they research and prepare data for new title production.
Link: https://netflixtechblog.com/supporting-content-decision-makers-with-machine-learning-995b7b76006f
#NLU #NLP #recommendation #embeddings
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
MIT Introduction to Deep Learning
And specifically, lecture about RNN and its modifications:
https://youtu.be/qjrad0V0uJE
The #course is excellent as well, but more about image processing. For NLP beginners, such clear and elegant survey about RNNs will be quite useful. So, a lot of architectures in #NLP models came from image processing tasks. If you want to recap some theory or get understanding of basics of DL — strong recommendation!
#DL
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
And specifically, lecture about RNN and its modifications:
https://youtu.be/qjrad0V0uJE
The #course is excellent as well, but more about image processing. For NLP beginners, such clear and elegant survey about RNNs will be quite useful. So, a lot of architectures in #NLP models came from image processing tasks. If you want to recap some theory or get understanding of basics of DL — strong recommendation!
#DL
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
Deep learning to translate between programming languages
#FacebookAI released TransCoder, an entirely self-supervised neural transcompiler system that is claimed to make code migration easier and more efficient.
ArXiV: https://arxiv.org/pdf/2006.03511.pdf
Github: https://github.com/facebookresearch/TransCoder/
#NLU #codegeneration #NLP
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
#FacebookAI released TransCoder, an entirely self-supervised neural transcompiler system that is claimed to make code migration easier and more efficient.
ArXiV: https://arxiv.org/pdf/2006.03511.pdf
Github: https://github.com/facebookresearch/TransCoder/
#NLU #codegeneration #NLP
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
The Cost of Training NLP Models: A Concise Overview
The authors review the cost of training large-scale language models, and the drivers of these costs.
More at the paper: https://arxiv.org/pdf/2004.08900
#nlp #language
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
The authors review the cost of training large-scale language models, and the drivers of these costs.
More at the paper: https://arxiv.org/pdf/2004.08900
#nlp #language
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study
Authors use the NER task to analyze the generalization behavior of existing models from different perspectives. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement.
The authors also release two datasets for future research: ReCoNLL and PLONER.
The main findings of the paper:
– the performance of existing models (including the state-of-the-art model) heavily influenced by the degree to which test entities have been seen in the training set with the same label
– the proposed measure enables to detect human annotation errors.
Once these errors are fixed, previous models can achieve new state-of-the-art results
– authors introduce two measures to characterize the data bias and the cross-dataset generalization experiment shows that the performance of NER systems is influenced not only by whether the test entity has been seen in the training set but also by whether the context of the test entity has been observed
– providing more training samples is not a guarantee of better results. A targeted increase in training samples will make it more profitable
– the relationship between entity categories influences the difficulty of model learning, which leads to some hard test samples that are difficult to solve using common learning methods
Paper: https://arxiv.org/abs/2001.03844
Github: https://github.com/pfliu-nlp/Named-Entity-Recognition-NER-Papers
Website: http://pfliu.com/InterpretNER/
#nlp #generalization #NER #annotations #dataset
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
Authors use the NER task to analyze the generalization behavior of existing models from different perspectives. Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models in terms of breakdown performance analysis, annotation errors, dataset bias, and category relationships, which suggest directions for improvement.
The authors also release two datasets for future research: ReCoNLL and PLONER.
The main findings of the paper:
– the performance of existing models (including the state-of-the-art model) heavily influenced by the degree to which test entities have been seen in the training set with the same label
– the proposed measure enables to detect human annotation errors.
Once these errors are fixed, previous models can achieve new state-of-the-art results
– authors introduce two measures to characterize the data bias and the cross-dataset generalization experiment shows that the performance of NER systems is influenced not only by whether the test entity has been seen in the training set but also by whether the context of the test entity has been observed
– providing more training samples is not a guarantee of better results. A targeted increase in training samples will make it more profitable
– the relationship between entity categories influences the difficulty of model learning, which leads to some hard test samples that are difficult to solve using common learning methods
Paper: https://arxiv.org/abs/2001.03844
Github: https://github.com/pfliu-nlp/Named-Entity-Recognition-NER-Papers
Website: http://pfliu.com/InterpretNER/
#nlp #generalization #NER #annotations #dataset
ᅠᅠ
Team
@OpenArchiveBooks
@data_entusiasts
What is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions
The authors propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. Focused on English, French, and Russian languages during the last four months of 2018.
They approach consists of the following steps:
– extraction of a sub-network of trending Wikipedia articles and identification of trends
– extraction of keywords from the summaries of every Wikipedia article in the sub-network and weighting according to their importance
– labeling of the trends with high-level topics using the extracted keywords
Paper: https://arxiv.org/pdf/2002.06885
Code: https://github.com/epfl-lts2/sparkwiki
#nlp #trend #wikipedia
Team
@OpenArchiveBooks
@data_enthusiasts
The authors propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. Focused on English, French, and Russian languages during the last four months of 2018.
They approach consists of the following steps:
– extraction of a sub-network of trending Wikipedia articles and identification of trends
– extraction of keywords from the summaries of every Wikipedia article in the sub-network and weighting according to their importance
– labeling of the trends with high-level topics using the extracted keywords
Paper: https://arxiv.org/pdf/2002.06885
Code: https://github.com/epfl-lts2/sparkwiki
#nlp #trend #wikipedia
Team
@OpenArchiveBooks
@data_enthusiasts
Summarizing Books with Human Feedback
#OpenAI fine-tuned #GPT3 to summarize books well enough to be human-readable. Main approach: recursively split text into parts and then meta-summarize summaries.
This is really important because once there will be a great summarization #SOTA we won't need editors to write posts for you. And researchers ultimatively will have some asisstance interpreting models' results.
BlogPost: https://openai.com/blog/summarizing-books/
Paper: https://arxiv.org/pdf/2109.10862
#summarization #NLU #NLP
Team
@OpenArchiveBooks
@data_enthusiasts
#OpenAI fine-tuned #GPT3 to summarize books well enough to be human-readable. Main approach: recursively split text into parts and then meta-summarize summaries.
This is really important because once there will be a great summarization #SOTA we won't need editors to write posts for you. And researchers ultimatively will have some asisstance interpreting models' results.
BlogPost: https://openai.com/blog/summarizing-books/
Paper: https://arxiv.org/pdf/2109.10862
#summarization #NLU #NLP
Team
@OpenArchiveBooks
@data_enthusiasts
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
This paper presents a new participatory Python-based natural language augmentation framework that supports the creation of transformations (modifications to the data) and filters (data splits according to specific features).
The current version of the framework contains 117 transformations and 23 filters for a variety of natural language tasks.
The authors demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models.
Paper: https://arxiv.org/abs/2112.02721
Code: https://github.com/GEM-benchmark/NL-Augmenter
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-nlaugmenter
#deeplearning #nlp #augmentation #robustness
Team
@data_enthusiasts
@OpenArchiveBooks
This paper presents a new participatory Python-based natural language augmentation framework that supports the creation of transformations (modifications to the data) and filters (data splits according to specific features).
The current version of the framework contains 117 transformations and 23 filters for a variety of natural language tasks.
The authors demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models.
Paper: https://arxiv.org/abs/2112.02721
Code: https://github.com/GEM-benchmark/NL-Augmenter
A detailed unofficial overview of the paper: https://andlukyane.com/blog/paper-review-nlaugmenter
#deeplearning #nlp #augmentation #robustness
Team
@data_enthusiasts
@OpenArchiveBooks