AI, Python, Cognitive Neuroscience
New paper & new dataset for spoken language understanding π£ππ€ Spoken language understanding (SLU) maps speech to meaning (or "intent"). (This is usually the actual end goal of speech recognition: you want to figure out what the speaker means/wants, not justβ¦
The conventional way to do SLU is to convert the #speech into text, and then convert the text into the intent. For a great example of this type of system, see this paper by alice coucke and others: https://arxiv.org/abs/1805.10190
Another approach is end-to-end SLU, where the speech is mapped to the intent through a single neural model. End-to-end SLU: -is simpler, -maximizes the actual metric we care about (intent accuracy), -and can harness info not present in the text, like prosody (e.g. sarcasm).
End-to-end #SLU is theoretically nice, but learning to understand speech totally from scratch is really hardβyou need a ton of data to get it to work. Our solution: transfer learning! First, teach the model to recognize words and phonemes; then, teach it SLU.
Some people at GoogleAI and fb research have been doing some excellent work on end-to-end SLU, but without access to their datasets, it's impossible for most people to reproduce their results or do any useful research.
So we created an SLU dataset, Fluent Speech Commands, which http://Fluent.ai is releasing for free!
It's a simple SLU task where the goal is to predict the "action", "object", and "location" for spoken commands.
We hope that you find our dataset, #PyTorch code, pre-trained models, and paper useful. Even if you don't want to do SLU, the dataset can be used as a good old #classification task, adding to the list of open-source #audio datasets. Enjoy!
β΄οΈ @AI_Python_EN
Another approach is end-to-end SLU, where the speech is mapped to the intent through a single neural model. End-to-end SLU: -is simpler, -maximizes the actual metric we care about (intent accuracy), -and can harness info not present in the text, like prosody (e.g. sarcasm).
End-to-end #SLU is theoretically nice, but learning to understand speech totally from scratch is really hardβyou need a ton of data to get it to work. Our solution: transfer learning! First, teach the model to recognize words and phonemes; then, teach it SLU.
Some people at GoogleAI and fb research have been doing some excellent work on end-to-end SLU, but without access to their datasets, it's impossible for most people to reproduce their results or do any useful research.
So we created an SLU dataset, Fluent Speech Commands, which http://Fluent.ai is releasing for free!
It's a simple SLU task where the goal is to predict the "action", "object", and "location" for spoken commands.
We hope that you find our dataset, #PyTorch code, pre-trained models, and paper useful. Even if you don't want to do SLU, the dataset can be used as a good old #classification task, adding to the list of open-source #audio datasets. Enjoy!
β΄οΈ @AI_Python_EN