ββWhat is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions
The authors propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. Focused on English, French, and Russian languages during the last four months of 2018.
They approach consists of the following steps:
β extraction of a sub-network of trending Wikipedia articles and identification of trends
β extraction of keywords from the summaries of every Wikipedia article in the sub-network and weighting according to their importance
β labeling of the trends with high-level topics using the extracted keywords
paper: https://arxiv.org/abs/2002.06885
github: https://github.com/epfl-lts2/sparkwiki
#nlp #trend #wikipedia
The authors propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. Focused on English, French, and Russian languages during the last four months of 2018.
They approach consists of the following steps:
β extraction of a sub-network of trending Wikipedia articles and identification of trends
β extraction of keywords from the summaries of every Wikipedia article in the sub-network and weighting according to their importance
β labeling of the trends with high-level topics using the extracted keywords
paper: https://arxiv.org/abs/2002.06885
github: https://github.com/epfl-lts2/sparkwiki
#nlp #trend #wikipedia
Most of the Scots NLP models used Wikipedia for training are wrong
One person who had done 200,000 edits and written 20,000 articles of Scots Wikipedia was not using Scots language but rather faking it. Since Wikipedia texts are often used as a dataset for #NLU / #NLP / #NMT neural nets training, those models using it as an input had a flaw.
Reddit thread: https://www.reddit.com/r/Scotland/comments/ig9jia/ive_discovered_that_almost_every_single_article/
#datasets #translation #scots #wikipedia
One person who had done 200,000 edits and written 20,000 articles of Scots Wikipedia was not using Scots language but rather faking it. Since Wikipedia texts are often used as a dataset for #NLU / #NLP / #NMT neural nets training, those models using it as an input had a flaw.
Reddit thread: https://www.reddit.com/r/Scotland/comments/ig9jia/ive_discovered_that_almost_every_single_article/
#datasets #translation #scots #wikipedia
Reddit
From the Scotland community on Reddit: Iβve discovered that almost every single article on the Scots version of Wikipedia is writtenβ¦
Explore this post and more from the Scotland community