Deep learning
Yann LeCun, Yoshua Bengio & Geoffrey Hinton
http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html
#Deep_learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object #recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep #convolutional nets have brought about breakthroughs in processing #images, #video, #speech and #audio, whereas #recurrent nets have shone light on sequential data such as #text and speech.
Yann LeCun, Yoshua Bengio & Geoffrey Hinton
http://www.nature.com/nature/journal/v521/n7553/full/nature14539.html
#Deep_learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object #recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep #convolutional nets have brought about breakthroughs in processing #images, #video, #speech and #audio, whereas #recurrent nets have shone light on sequential data such as #text and speech.
خانه ی #هوشمند مارک #زاکربرگ بنیان گذار فیس بوک که از متدهای نوین هوش مصنوعی نظیر بازشناسی شئ، بازشناسی چهره، بازشناسی گفتار، پردازش زبانهای طبیعی و ... بهره برده است.
زاکربرگ از انگیزه ی خود برای این کار و گام های انجام کارش مینویسد:
https://www.facebook.com/notes/mark-zuckerberg/building-jarvis/10154361492931634/
چالش شخصی من برای سال 2016 ساخت یک هوش مصنوعی ساده برای خانه ام بوده - مثل جارویس در فیلم مرد آهنین...
Building Jarvis:
- Getting Started: Connecting the Home
- #Natural_Language
- #Vision and #Face_Recognition
- Messenger Bot
- Voice and #Speech_Recognition
- Facebook Engineering Environment
—------
Vision and Face Recognition:
About one-third of the human #brain is dedicated to vision, and there are many important #AI problems related to understanding what is happening in images and videos. These problems include #tracking (eg is Max awake and moving around in her crib?), #object_recognition (eg is that Beast or a rug in that room?), and face recognition (eg who is at the door?).
Face recognition is a particularly difficult version of object recognition because most people look relatively similar compared to telling apart two random objects — for example, a sandwich and a house. But Facebook has gotten very good at face recognition for identifying when your friends are in your photos. That expertise is also useful when your friends are at your door and your AI needs to determine whether to let them in.
To do this, I installed a few cameras at my door that can capture images from all angles. AI systems today cannot identify people from the back of their heads, so having a few angles ensures we see the person's face. I built a simple server that continuously watches the cameras and runs a two step process: first, it runs face detection to see if any person has come into view, and second, if it finds a face, then it runs face recognition to identify who the person is. Once it identifies the person, it checks a list to confirm I'm expecting that person, and if I am then it will let them in and tell me they're here.
This type of visual AI system is useful for a number of things, including knowing when Max is awake so it can start playing music or a Mandarin lesson, or solving the context problem of knowing which room in the house we're in so the AI can correctly respond to context-free requests like "turn the lights on" without providing a location. Like most aspects of this AI, vision is most useful when it informs a broader model of the world, connected with other abilities like knowing who your friends are and how to open the door when they're here. The more context the system has, the smarter is gets overall.
#mark_zuckerberg #smart_home
زاکربرگ از انگیزه ی خود برای این کار و گام های انجام کارش مینویسد:
https://www.facebook.com/notes/mark-zuckerberg/building-jarvis/10154361492931634/
چالش شخصی من برای سال 2016 ساخت یک هوش مصنوعی ساده برای خانه ام بوده - مثل جارویس در فیلم مرد آهنین...
Building Jarvis:
- Getting Started: Connecting the Home
- #Natural_Language
- #Vision and #Face_Recognition
- Messenger Bot
- Voice and #Speech_Recognition
- Facebook Engineering Environment
—------
Vision and Face Recognition:
About one-third of the human #brain is dedicated to vision, and there are many important #AI problems related to understanding what is happening in images and videos. These problems include #tracking (eg is Max awake and moving around in her crib?), #object_recognition (eg is that Beast or a rug in that room?), and face recognition (eg who is at the door?).
Face recognition is a particularly difficult version of object recognition because most people look relatively similar compared to telling apart two random objects — for example, a sandwich and a house. But Facebook has gotten very good at face recognition for identifying when your friends are in your photos. That expertise is also useful when your friends are at your door and your AI needs to determine whether to let them in.
To do this, I installed a few cameras at my door that can capture images from all angles. AI systems today cannot identify people from the back of their heads, so having a few angles ensures we see the person's face. I built a simple server that continuously watches the cameras and runs a two step process: first, it runs face detection to see if any person has come into view, and second, if it finds a face, then it runs face recognition to identify who the person is. Once it identifies the person, it checks a list to confirm I'm expecting that person, and if I am then it will let them in and tell me they're here.
This type of visual AI system is useful for a number of things, including knowing when Max is awake so it can start playing music or a Mandarin lesson, or solving the context problem of knowing which room in the house we're in so the AI can correctly respond to context-free requests like "turn the lights on" without providing a location. Like most aspects of this AI, vision is most useful when it informs a broader model of the world, connected with other abilities like knowing who your friends are and how to open the door when they're here. The more context the system has, the smarter is gets overall.
#mark_zuckerberg #smart_home
بهبود صوت گفتار با شبکه های کانولوشنالی عمیق، و فریم ورک تنسورفلو:
WaveMedic: Convolutional Neural Networks for #Speech Audio #Enhancement:
http://cs229.stanford.edu/proj2016/report/FisherScherlis-WaveMedic-project.pdf
-------------
مرتبط (کاری در deepmind)
The WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation (see the DeepMind blog post and paper for details).
A #TensorFlow implementation of DeepMind's WaveNet paper:
https://github.com/ibab/tensorflow-wavenet
WaveMedic: Convolutional Neural Networks for #Speech Audio #Enhancement:
http://cs229.stanford.edu/proj2016/report/FisherScherlis-WaveMedic-project.pdf
-------------
مرتبط (کاری در deepmind)
The WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation (see the DeepMind blog post and paper for details).
A #TensorFlow implementation of DeepMind's WaveNet paper:
https://github.com/ibab/tensorflow-wavenet
#مقاله
مقاله ی جدید و جالب Google Brain + کد #تنسرفلو
آموزش یک شبکه عصبی برای چندین کار مختلف همزمان!
One Model To Learn Them All
(Submitted on 16 Jun 2017)
pic: http://deepnn.ir/tensorflow-telegram-files/tensor2tensor.PNG
🔗abstract:
https://arxiv.org/abs/1706.05137
🔗Paper:
https://arxiv.org/pdf/1706.05137.pdf
🔗Code:
https://github.com/tensorflow/tensor2tensor
یادگیری عمیق در بسیاری از زمینه ها نظیر تشخیص گفتار، طبقه بندی تصویر، ترجمه و ... استفاده میشود.
اما تا کنون بدین نحو بوده که برای هر مساله، یک مدل عمیق با یک معماری خاص انتخاب میشد و با تنظیم پارامترها و با فرآیند یادگیری و تنظیم اوزان شبکه برای آن مساله به خوبی کار میکرد اما برای مسائل دیگر قابل استفاده نبود.
در این مقاله یک مدل واحد که در حوزه های مختلف نتایج خوبی داشته استفاده شده و چندین کار را آموزش دیده است. به طور خاص، این مدل تنها به صورت همزمان در ImageNet، وظایف مختلف ترجمه، شرح تصویر، تشخیص گفتار، و کار تجزیه زبان انگلیسی آموزش داده است.
این مدل در بسیاری از مسائل با مدلهای state-of-the-art هر حوزه که فقط برای آن کار آموزش دیده اند قابل مقایسه بوده و در برخی از حوزه ها کارایی بهتری نسبت به زمانی که فقط برای همان حوزه آموزش دیده شده گزارش شده است.
# Google_Brain #tensor2tensor
#deep_learning
#speech_recognition, #image_classification, #translation
مقاله ی جدید و جالب Google Brain + کد #تنسرفلو
آموزش یک شبکه عصبی برای چندین کار مختلف همزمان!
One Model To Learn Them All
(Submitted on 16 Jun 2017)
pic: http://deepnn.ir/tensorflow-telegram-files/tensor2tensor.PNG
🔗abstract:
https://arxiv.org/abs/1706.05137
🔗Paper:
https://arxiv.org/pdf/1706.05137.pdf
🔗Code:
https://github.com/tensorflow/tensor2tensor
یادگیری عمیق در بسیاری از زمینه ها نظیر تشخیص گفتار، طبقه بندی تصویر، ترجمه و ... استفاده میشود.
اما تا کنون بدین نحو بوده که برای هر مساله، یک مدل عمیق با یک معماری خاص انتخاب میشد و با تنظیم پارامترها و با فرآیند یادگیری و تنظیم اوزان شبکه برای آن مساله به خوبی کار میکرد اما برای مسائل دیگر قابل استفاده نبود.
در این مقاله یک مدل واحد که در حوزه های مختلف نتایج خوبی داشته استفاده شده و چندین کار را آموزش دیده است. به طور خاص، این مدل تنها به صورت همزمان در ImageNet، وظایف مختلف ترجمه، شرح تصویر، تشخیص گفتار، و کار تجزیه زبان انگلیسی آموزش داده است.
این مدل در بسیاری از مسائل با مدلهای state-of-the-art هر حوزه که فقط برای آن کار آموزش دیده اند قابل مقایسه بوده و در برخی از حوزه ها کارایی بهتری نسبت به زمانی که فقط برای همان حوزه آموزش دیده شده گزارش شده است.
# Google_Brain #tensor2tensor
#deep_learning
#speech_recognition, #image_classification, #translation
#سورس_کد #مقاله
در این روش که چند روز پیش توسط فیس بوک اوپن سورس شده آموزش speech recognition به صورت end-to-end صورت میگیرد .
Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native
https://code.fb.com/ai-research/wav2letter/
CNN architectures are competitive with #recurrent architectures for tasks in which modeling long-range dependencies is important, such as #language_modeling, machine translation, and #speech_synthesis. In end-to-end #speech_recognition, however, recurrent architectures are still more prevalent for both acoustic and language modeling.
در این روش که چند روز پیش توسط فیس بوک اوپن سورس شده آموزش speech recognition به صورت end-to-end صورت میگیرد .
Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native
https://code.fb.com/ai-research/wav2letter/
CNN architectures are competitive with #recurrent architectures for tasks in which modeling long-range dependencies is important, such as #language_modeling, machine translation, and #speech_synthesis. In end-to-end #speech_recognition, however, recurrent architectures are still more prevalent for both acoustic and language modeling.
Engineering at Meta
Open sourcing wav2letter++, the fastest state-of-the-art speech system, and flashlight, an ML library going native
Wav2letter++ is the fastest state-of-the-art end-to-end speech recognition system available. We're also releasing flashlight, a fast, flexible ML library.
#آموزش
Recognizing Speech Commands Using Recurrent Neural Networks with Attention
https://towardsdatascience.com/recognizing-speech-commands-using-recurrent-neural-networks-with-attention-c2b2ba17c837
سورس کد:
A Keras implementation of neural attention model for speech command recognition
https://github.com/douglas125/SpeechCmdRecognition
مرتبط با:
سورس و مقاله wav2letter++ یک روش end2end
https://t.me/cvision/850
جلسه مربوط به Attention در RNNها:
https://www.aparat.com/v/SPZzH
جلسه مربوط به پردازش صوت در RNNها:
https://www.aparat.com/v/cEKal
#attention #rnn #lstm #keras #Speech
Recognizing Speech Commands Using Recurrent Neural Networks with Attention
https://towardsdatascience.com/recognizing-speech-commands-using-recurrent-neural-networks-with-attention-c2b2ba17c837
سورس کد:
A Keras implementation of neural attention model for speech command recognition
https://github.com/douglas125/SpeechCmdRecognition
مرتبط با:
سورس و مقاله wav2letter++ یک روش end2end
https://t.me/cvision/850
جلسه مربوط به Attention در RNNها:
https://www.aparat.com/v/SPZzH
جلسه مربوط به پردازش صوت در RNNها:
https://www.aparat.com/v/cEKal
#attention #rnn #lstm #keras #Speech
Medium
Recognizing Speech Commands Using Recurrent Neural Networks with Attention
Speech recognition has become an integral part of human-computer interfaces (HCI). They are present in personal assistants like Google…
#سورس_کد
#Mozilla has released open source #speech recognition model & data. Word error rate 6.5%, which is close to human.
Project DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques, based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow project to make the implementation easier.
Data: https://voice.mozilla.org/data
400k recordings, 500 hours of speech.
Model: https://github.com/mozilla/DeepSpeech
TensorFlow implementation of Baidu's DeepSpeech architecture.
https://deepspeech.readthedocs.io/en/latest/
DeepSpeech’s code documentation!
مرتبط با:
https://t.me/cvision/875
https://t.me/cvision/850
#speech_recognition #Tensorflow
#Mozilla has released open source #speech recognition model & data. Word error rate 6.5%, which is close to human.
Project DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques, based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow project to make the implementation easier.
Data: https://voice.mozilla.org/data
400k recordings, 500 hours of speech.
Model: https://github.com/mozilla/DeepSpeech
TensorFlow implementation of Baidu's DeepSpeech architecture.
https://deepspeech.readthedocs.io/en/latest/
DeepSpeech’s code documentation!
مرتبط با:
https://t.me/cvision/875
https://t.me/cvision/850
#speech_recognition #Tensorflow
commonvoice.mozilla.org
Common Voice by Mozilla
Common Voice is a project to help make voice recognition open to everyone. Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web.
#مجموعه_داده
مجموعه داده عظیم دادگان صوتی و گفتار که #ناسا منتشر کرده است.
حدود 19.000 ساعت گفتار ضبط شده از آپالو 11!
Massive Speech Dataset !!! 19,000 hours of Apollo-11 recordings
TASK#1: Speech Activity Detection: SAD
TASK#2: Speaker Diarization: SD
TASK#3: Speaker Identification: SID
TASK#4: Automatic Speech Recognition: ASR
TASK#5: Sentiment Detection: SENTIMENT
http://fearlesssteps.exploreapollo.org/
#NASA #speech #sentiment #dataset
مجموعه داده عظیم دادگان صوتی و گفتار که #ناسا منتشر کرده است.
حدود 19.000 ساعت گفتار ضبط شده از آپالو 11!
Massive Speech Dataset !!! 19,000 hours of Apollo-11 recordings
TASK#1: Speech Activity Detection: SAD
TASK#2: Speaker Diarization: SD
TASK#3: Speaker Identification: SID
TASK#4: Automatic Speech Recognition: ASR
TASK#5: Sentiment Detection: SENTIMENT
http://fearlesssteps.exploreapollo.org/
#NASA #speech #sentiment #dataset
#سورس_کد
Real Time Trigger Word Detection with #Keras
https://github.com/Tony607/Keras-Trigger-Word
فیلم های آموزش مبحث مورد نظر:
https://www.aparat.com/v/cEKal
https://www.aparat.com/v/1aGeQ
#speech #trigger_word_detection #real_time #keras
Real Time Trigger Word Detection with #Keras
https://github.com/Tony607/Keras-Trigger-Word
فیلم های آموزش مبحث مورد نظر:
https://www.aparat.com/v/cEKal
https://www.aparat.com/v/1aGeQ
#speech #trigger_word_detection #real_time #keras
GitHub
GitHub - Tony607/Keras-Trigger-Word: How to do Real Time Trigger Word Detection with Keras | DLology
How to do Real Time Trigger Word Detection with Keras | DLology - Tony607/Keras-Trigger-Word
Clone a voice in 5 seconds to generate arbitrary speech in real-time.
تنها ۵ ثانیه از صداتون رو به این نرمافزار بدید تا هر متنی که دلتون میخواد رو با صدای خودتون ایجاد کنه! البته من با CPU تست کردم احتمالا با GPU نتیجه خفنتری بده
#AI #voice #realtime #application #CUDA #python #pytorch #torch #artificial #intelligence #speech #neural #network
https://github.com/CorentinJ/Real-Time-Voice-Cloning
🙏Thanks to: @pythony
تنها ۵ ثانیه از صداتون رو به این نرمافزار بدید تا هر متنی که دلتون میخواد رو با صدای خودتون ایجاد کنه! البته من با CPU تست کردم احتمالا با GPU نتیجه خفنتری بده
#AI #voice #realtime #application #CUDA #python #pytorch #torch #artificial #intelligence #speech #neural #network
https://github.com/CorentinJ/Real-Time-Voice-Cloning
🙏Thanks to: @pythony