Data Science by ODS.ai 🦜

Communication-based Evaluation for Natural Language Generation (#NLG) that's dramatically out-performed standard n-gram-based methods.

Have you ever think that n-gram overlap measures like #BLEU or #ROUGE is not good enough for #NLG evaluation and human based evaluation is too expensive? Researchers from Stanford University also think so. The main shortcoming of #BLEU or #ROUGE methods is that they fail to take into account the communicative function of language; a speaker's goal is not only to produce well-formed expressions, but also to convey relevant information to a listener.

Researchers propose approach based on color reference game. In this game, a speaker and a listener see a set of three colors. The speaker is told one color is the target and tries to communicate the target to the listener using a natural language utterance. A good utterance is more likely to lead the listener to select the target, while a bad utterance is less likely to do so. In turn, effective metrics should assign high scores to good utterances and low scores to bad ones.

Paper: https://arxiv.org/pdf/1909.07290.pdf
Code: https://github.com/bnewm0609/comm-eval

#NLP #NLU

GitHub

GitHub - bnewm0609/comm-eval: Communication-based Evaluation for Natural Language Generation

Communication-based Evaluation for Natural Language Generation - GitHub - bnewm0609/comm-eval: Communication-based Evaluation for Natural Language Generation

9.7K viewsedited 14:01

About

Blog

Apps

Platform