Data Science by ODS.ai 🦜
51K subscribers
363 photos
34 videos
7 files
1.52K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
Communication-based Evaluation for Natural Language Generation (#NLG) that's dramatically out-performed standard n-gram-based methods.

Have you ever think that n-gram overlap measures like #BLEU or #ROUGE is not good enough for #NLG evaluation and human based evaluation is too expensive? Researchers from Stanford University also think so. The main shortcoming of #BLEU or #ROUGE methods is that they fail to take into account the communicative function of language; a speaker's goal is not only to produce well-formed expressions, but also to convey relevant information to a listener.

Researchers propose approach based on color reference game. In this game, a speaker and a listener see a set of three colors. The speaker is told one color is the target and tries to communicate the target to the listener using a natural language utterance. A good utterance is more likely to lead the listener to select the target, while a bad utterance is less likely to do so. In turn, effective metrics should assign high scores to good utterances and low scores to bad ones.

Paper: https://arxiv.org/pdf/1909.07290.pdf
Code: https://github.com/bnewm0609/comm-eval

#NLP #NLU