#خبر
در حال حاضر BERT از فارسی پشتیبانی میکند.
Persian (Farsi)
https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages
پست مدیوم مرتبط در مورد چند زبانگی bert
Hallo multilingual BERT, cómo funcionas?
https://medium.com/omnius/hallo-multilingual-bert-c%C3%B3mo-funcionas-2b3406cc4dc2
پست فارسی از آقای خوشمهر در مورد BERT و کاربردهایش:
معرفی BERT، تحولی در NLP
http://blog.class.vision/1397/09/bert-in-nlp/
#bert #nlp
در حال حاضر BERT از فارسی پشتیبانی میکند.
Persian (Farsi)
https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages
پست مدیوم مرتبط در مورد چند زبانگی bert
Hallo multilingual BERT, cómo funcionas?
https://medium.com/omnius/hallo-multilingual-bert-c%C3%B3mo-funcionas-2b3406cc4dc2
پست فارسی از آقای خوشمهر در مورد BERT و کاربردهایش:
معرفی BERT، تحولی در NLP
http://blog.class.vision/1397/09/bert-in-nlp/
#bert #nlp
GitHub
bert/multilingual.md at master · google-research/bert
TensorFlow code and pre-trained models for BERT. Contribute to google-research/bert development by creating an account on GitHub.
XLM – Enhancing BERT for Cross-lingual Language Model
https://www.lyrn.ai/2019/02/11/xlm-cross-lingual-language-model/
پست های مرتبط با BERT در کانال:
https://t.me/cvision/957
https://t.me/cvision/907
https://t.me/cvision/745
#BERT
https://www.lyrn.ai/2019/02/11/xlm-cross-lingual-language-model/
پست های مرتبط با BERT در کانال:
https://t.me/cvision/957
https://t.me/cvision/907
https://t.me/cvision/745
#BERT
Language, trees, and geometry in neural networks
A visualization technique to understand BERT.
https://twitter.com/burkov/status/1139391818443808769
#bert #NLP
A visualization technique to understand BERT.
https://twitter.com/burkov/status/1139391818443808769
#bert #NLP
Twitter
Andriy Burkov
A visualization technique to understand BERT. https://t.co/kP5QPBcdlG
#مقاله #پیاده_سازی
XLNet: Generalized Autoregressive Pretraining for Language Understanding
#XLNet outperforms #BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking. Code and comparisons here:
سورس کد (تنسرفلو)
https://github.com/zihangdai/xlnet
مقاله
https://arxiv.org/abs/1906.08237v1
#NLP
XLNet: Generalized Autoregressive Pretraining for Language Understanding
#XLNet outperforms #BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking. Code and comparisons here:
سورس کد (تنسرفلو)
https://github.com/zihangdai/xlnet
مقاله
https://arxiv.org/abs/1906.08237v1
#NLP
GitHub
GitHub - zihangdai/xlnet: XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding - zihangdai/xlnet
#مقاله #پیاده_سازی
XLNet: Generalized Autoregressive Pretraining for Language Understanding
#XLNet outperforms #BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking. Code and comparisons here:
سورس کد (تنسرفلو)
https://github.com/zihangdai/xlnet
مقاله
https://arxiv.org/abs/1906.08237v1
#NLP
XLNet: Generalized Autoregressive Pretraining for Language Understanding
#XLNet outperforms #BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking. Code and comparisons here:
سورس کد (تنسرفلو)
https://github.com/zihangdai/xlnet
مقاله
https://arxiv.org/abs/1906.08237v1
#NLP
New Google Brain Optimizer Reduces BERT Pre-Training Time From Days to Minutes
کاهش مدت زمان pre-training مدل زبانی BERT از سه روز به 76 دقیقه با ارائه یک تابع بهینه ساز جدید!
Google Brain researchers have proposed LAMB (Layer-wise Adaptive Moments optimizer for Batch training), a new optimizer which reduces training time for its NLP training model BERT (Bidirectional Encoder Representations from Transformers) from three days to just 76 minutes.
لینک مقاله: https://arxiv.org/abs/1904.00962
لینک بلاگ پست: https://medium.com/syncedreview/new-google-brain-optimizer-reduces-bert-pre-training-time-from-days-to-minutes-b454e54eda1d
#BERT #language_model #optimizer
کاهش مدت زمان pre-training مدل زبانی BERT از سه روز به 76 دقیقه با ارائه یک تابع بهینه ساز جدید!
Google Brain researchers have proposed LAMB (Layer-wise Adaptive Moments optimizer for Batch training), a new optimizer which reduces training time for its NLP training model BERT (Bidirectional Encoder Representations from Transformers) from three days to just 76 minutes.
لینک مقاله: https://arxiv.org/abs/1904.00962
لینک بلاگ پست: https://medium.com/syncedreview/new-google-brain-optimizer-reduces-bert-pre-training-time-from-days-to-minutes-b454e54eda1d
#BERT #language_model #optimizer
arXiv.org
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Training large deep neural networks on massive datasets is computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle...
#مقاله #کد #BERT #RoBERTa
RoBERTa: A Robustly Optimized BERT Pretraining Approach
همانطور که میدونید انواع انتخاب مقادیر مختلف برای هایپرپارامترها تاثیر جدی در نتیجه نهایی آموزش یک شبکه دارد.
طبق مطالعه انجام شده توسط محققین فیس بوک و دانشگاه واشنگتن و بقیه تیم پژوهشی به این نتیجه رسیده اند که مدل BERT که منتشر شد به خوبی آموزش داده نشده بود. درحالی که میتوانست با انتخاب بهینه هایپرپارامترها از تمام مدل های موجود و حتی بعد از خودش بهتر عمل کند.
"We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it."
در این مقاله میتوانید روش به کار گرفته شده برای حل این مسئله را بیشتر بررسی کنید.
https://arxiv.org/pdf/1907.11692
همینطور تمام کدها و مدل ها را هم منتشر کردند که میتوانید در این لینک ببینید.
https://github.com/pytorch/fairseq/tree/master/examples/roberta
RoBERTa: A Robustly Optimized BERT Pretraining Approach
همانطور که میدونید انواع انتخاب مقادیر مختلف برای هایپرپارامترها تاثیر جدی در نتیجه نهایی آموزش یک شبکه دارد.
طبق مطالعه انجام شده توسط محققین فیس بوک و دانشگاه واشنگتن و بقیه تیم پژوهشی به این نتیجه رسیده اند که مدل BERT که منتشر شد به خوبی آموزش داده نشده بود. درحالی که میتوانست با انتخاب بهینه هایپرپارامترها از تمام مدل های موجود و حتی بعد از خودش بهتر عمل کند.
"We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it."
در این مقاله میتوانید روش به کار گرفته شده برای حل این مسئله را بیشتر بررسی کنید.
https://arxiv.org/pdf/1907.11692
همینطور تمام کدها و مدل ها را هم منتشر کردند که میتوانید در این لینک ببینید.
https://github.com/pytorch/fairseq/tree/master/examples/roberta
GitHub
fairseq/examples/roberta at main · facebookresearch/fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python. - facebookresearch/fairseq
Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT
Knowledge Distillation — Transferring generalization capabilities
Knowledge distillation (sometimes also referred to as teacher-student learning) is a compression technique in which a small model is trained to reproduce the behavior of a larger model (or an ensemble of models). It was introduced by Bucila et al. and generalized by Hinton et al. a few years later.
Another way to understand distillation is that it prevents the model to be too sure about its prediction (similarly to label smoothing).
We want to compress a large language model (like BERT) using distilling. For distilling, we’ll use the Kullback-Leibler loss since the optimizations are equivalent. When computing the gradients with respect to the student distribution we obtain the same gradients.
Blog post: https://medium.com/huggingface/distilbert-8cf3380435b5
Code: https://github.com/huggingface/pytorch-transformers/tree/master/examples/distillation
#language_model #BERT
Knowledge Distillation — Transferring generalization capabilities
Knowledge distillation (sometimes also referred to as teacher-student learning) is a compression technique in which a small model is trained to reproduce the behavior of a larger model (or an ensemble of models). It was introduced by Bucila et al. and generalized by Hinton et al. a few years later.
Another way to understand distillation is that it prevents the model to be too sure about its prediction (similarly to label smoothing).
We want to compress a large language model (like BERT) using distilling. For distilling, we’ll use the Kullback-Leibler loss since the optimizations are equivalent. When computing the gradients with respect to the student distribution we obtain the same gradients.
Blog post: https://medium.com/huggingface/distilbert-8cf3380435b5
Code: https://github.com/huggingface/pytorch-transformers/tree/master/examples/distillation
#language_model #BERT
Fast-Bert
This library will help you build and deploy BERT based models within minutes:
Fast-Bert is the deep learning library that allows developers and data scientists to train and deploy BERT and XLNet based models for natural language processing tasks beginning with Text Classification.
The work on FastBert is built on solid foundations provided by the excellent Hugging Face BERT PyTorch library and is inspired by fast.ai and strives to make the cutting edge deep learning technologies accessible for the vast community of machine learning practitioners.
With FastBert, you will be able to:
Train (more precisely fine-tune) BERT, RoBERTa and XLNet text classification models on your custom dataset.
Tune model hyper-parameters such as epochs, learning rate, batch size, optimiser schedule and more.
Save and deploy trained model for inference (including on AWS Sagemaker).
Fast-Bert will support both multi-class and multi-label text classification for the following and in due course, it will support other NLU tasks such as Named Entity Recognition, Question Answering and Custom Corpus fine-tuning.
Blog post: https://medium.com/huggingface/introducing-fastbert-a-simple-deep-learning-library-for-bert-models-89ff763ad384
Code: https://github.com/kaushaltrivedi/fast-bert
#language_model #BERT
This library will help you build and deploy BERT based models within minutes:
Fast-Bert is the deep learning library that allows developers and data scientists to train and deploy BERT and XLNet based models for natural language processing tasks beginning with Text Classification.
The work on FastBert is built on solid foundations provided by the excellent Hugging Face BERT PyTorch library and is inspired by fast.ai and strives to make the cutting edge deep learning technologies accessible for the vast community of machine learning practitioners.
With FastBert, you will be able to:
Train (more precisely fine-tune) BERT, RoBERTa and XLNet text classification models on your custom dataset.
Tune model hyper-parameters such as epochs, learning rate, batch size, optimiser schedule and more.
Save and deploy trained model for inference (including on AWS Sagemaker).
Fast-Bert will support both multi-class and multi-label text classification for the following and in due course, it will support other NLU tasks such as Named Entity Recognition, Question Answering and Custom Corpus fine-tuning.
Blog post: https://medium.com/huggingface/introducing-fastbert-a-simple-deep-learning-library-for-bert-models-89ff763ad384
Code: https://github.com/kaushaltrivedi/fast-bert
#language_model #BERT
Medium
Introducing FastBert — A simple Deep Learning library for BERT Models
A simple to use Deep Learning library to build and deploy BERT models
#پیاده_سازی #آموزش
داکیومنت جدیده سایت رسمی کراس
این بار از #BERT و #Transformer ها برای تسک Question-Answering
BERT (from HuggingFace Transformers) for Text Extraction
https://keras.io/examples/nlp/text_extraction_with_bert/
داکیومنت جدیده سایت رسمی کراس
این بار از #BERT و #Transformer ها برای تسک Question-Answering
BERT (from HuggingFace Transformers) for Text Extraction
https://keras.io/examples/nlp/text_extraction_with_bert/
keras.io
Keras documentation: Text Extraction with BERT