Forwarded from DLeX: AI Python (Milad Farzalizadeh)
دیتاست کامل توییت های سیاسیون ایرانی در توییتر برای کارهای پردازش متن (NLP) + همراه با کد
#دیتاست #dataset
https://github.com/miladfa7/Iranian-politicians-twitter-dataset-persian
@ai_python
#دیتاست #dataset
https://github.com/miladfa7/Iranian-politicians-twitter-dataset-persian
@ai_python
✳️ مجموعه داده(دیتاست) ویکی پدیا فارسی شامل (تمامی) مقالات فارسی تا تاریخ 12 مرداد 1399
🔹مناسب برای کارهای NLP و Data Mining
دیتاست شامل :
739870 مقاله
4004765 جمله
94002094 کلمه
#دیتاست
🌐 https://github.com/miladfa7/Persian-Wikipedia-Dataset
❇️ @ai_python
🔹مناسب برای کارهای NLP و Data Mining
دیتاست شامل :
739870 مقاله
4004765 جمله
94002094 کلمه
#دیتاست
🌐 https://github.com/miladfa7/Persian-Wikipedia-Dataset
❇️ @ai_python
GitHub
GitHub - miladfa7/Persian-Wikipedia-Dataset: Persian(Farsi) Wikipedia Dataset | دیتاست ویکی پدیا فارسی شامل تمامی مقالات فارسی…
Persian(Farsi) Wikipedia Dataset | دیتاست ویکی پدیا فارسی شامل تمامی مقالات فارسی تا تاریخ 12 مرداد 1399 - GitHub - miladfa7/Persian-Wikipedia-Dataset: Persian(Farsi) Wikipedia Dataset | دیتاست وی...
دیتاستی از توییتهای جو بایدن از ۲۰۰۷ تا ۲۰۲۰:
https://www.kaggle.com/rohanrao/joe-biden-tweets/tasks?taskId=2527&utm_medium=social&utm_source=twitter.com&utm_campaign=task+published
#دیتاست #دیتا
#dataset
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python
https://www.kaggle.com/rohanrao/joe-biden-tweets/tasks?taskId=2527&utm_medium=social&utm_source=twitter.com&utm_campaign=task+published
#دیتاست #دیتا
#dataset
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python
Kaggle
Joe Biden Tweets (2007 - 2020)
Tweets of Joe Biden's official Twitter handle @JoeBiden
انتشار دیتاست objectron گوگل
https://github.com/google-research-datasets/objectron
#دیتاست #دیتا
#dataset
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
https://github.com/google-research-datasets/objectron
#دیتاست #دیتا
#dataset
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
GitHub
GitHub - google-research-datasets/Objectron: Objectron is a dataset of short, object-centric video clips. In addition, the videos…
Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the came...
مقاله داغ روز برای علاقمندان به خلاصه سازی و پردازش متن در این مقاله به خلاصهسازی رفرنسهای مقالات وکیپدیا پرداخته شده است
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
Paper:
https://arxiv.org/abs/2011.07832
Data:
https://github.com/neulab/wikiasp
#مقاله #خلاصه_سازی #پردازش_زبان_طبیعی #دیتا #دیتاست
#NLP
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization
Paper:
https://arxiv.org/abs/2011.07832
Data:
https://github.com/neulab/wikiasp
#مقاله #خلاصه_سازی #پردازش_زبان_طبیعی #دیتا #دیتاست
#NLP
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
A dataset of 14M articles (CSV file ~ 14.12 GB) for medical NLP pretraining, via abbreviation disambiguation.
appearing in EMNLP's Clinical NLP workshop.
Details: https://redd.it/jx63fd
https://www.aclweb.org/anthology/2020.clinicalnlp-1.15/
Details: https://github.com/BruceWen120/medal
#دیتا #دیتاست #پردازش_زبان_طبیعی
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Pytho
appearing in EMNLP's Clinical NLP workshop.
Details: https://redd.it/jx63fd
https://www.aclweb.org/anthology/2020.clinicalnlp-1.15/
Details: https://github.com/BruceWen120/medal
#دیتا #دیتاست #پردازش_زبان_طبیعی
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Pytho
reddit
[R] A 14M articles dataset for medical NLP pretraining
Posted in r/MachineLearning by u/bruce_wen • 290 points and 5 comments
multilingual dataset for common-sense reasoning.
https://github.com/cambridgeltl/xcopa
#دیتا #دیتاست
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_En
https://github.com/cambridgeltl/xcopa
#دیتا #دیتاست
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_En
GitHub
GitHub - cambridgeltl/xcopa: XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning - GitHub - cambridgeltl/xcopa: XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Apple publicly releases its first large image #dataset for #AI #research 74K high-resolution HDR computer-generated images of realistic indoor scenes with 1.9TB of pixel-perfect labels
Dataset download link:
http://github.com/apple/ml-hypersim
#دیتاست #دیتا #هوش_مصنوعی
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_En
Dataset download link:
http://github.com/apple/ml-hypersim
#دیتاست #دیتا #هوش_مصنوعی
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_En
GitHub
GitHub - apple/ml-hypersim: Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding - apple/ml-hypersim
MeDAL: Medical Abbreviation Disambiguation Dataset for NLU Pretraining
Github: https://github.com/BruceWen120/medal
Paper: https://arxiv.org/abs/2012.13978v1
Dataset: https://www.kaggle.com/xhlulu/medal-emnlp
Pre-trained: https://huggingface.co/xhlu/electra-medal
#هوش_مصنوعی #منابع #مقاله #پردازش_زبان_طبیعی #فهم_زبان_طبیعی #دیتا #دیتاست
#NLP #NLU
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
❇️ @AI_Python
Github: https://github.com/BruceWen120/medal
Paper: https://arxiv.org/abs/2012.13978v1
Dataset: https://www.kaggle.com/xhlulu/medal-emnlp
Pre-trained: https://huggingface.co/xhlu/electra-medal
#هوش_مصنوعی #منابع #مقاله #پردازش_زبان_طبیعی #فهم_زبان_طبیعی #دیتا #دیتاست
#NLP #NLU
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
❇️ @AI_Python
GitHub
GitHub - BruceWen120/medal: A large medical text dataset curated for abbreviation disambiguation
A large medical text dataset curated for abbreviation disambiguation - GitHub - BruceWen120/medal: A large medical text dataset curated for abbreviation disambiguation
مقاله داغ روز
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
- A first-of-its-kind large synthetic training dataset for online hate classification, created from scratch with trained annotators over multiple rounds of dynamic data collection.
Paper:
https://arxiv.org/abs/2012.15761
Dataset:
https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
#مقاله #هوش_مصنوعی #دیتا #دیتاست
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
❇️ @AI_Python
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
- A first-of-its-kind large synthetic training dataset for online hate classification, created from scratch with trained annotators over multiple rounds of dynamic data collection.
Paper:
https://arxiv.org/abs/2012.15761
Dataset:
https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
#مقاله #هوش_مصنوعی #دیتا #دیتاست
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
❇️ @AI_Python
GitHub
GitHub - bvidgen/Dynamically-Generated-Hate-Speech-Dataset: Repository for the Dynamically Generated Hate Speech Dataset by Vidgen…
Repository for the Dynamically Generated Hate Speech Dataset by Vidgen et al. (2021). - bvidgen/Dynamically-Generated-Hate-Speech-Dataset
دیتا مربوط به واکنشهای ناسازگار به واکسن کووید در این پیج در سایت CDC است.
https://wonder.cdc.gov/vaers.html
#دیتا #دیتاست
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
https://wonder.cdc.gov/vaers.html
#دیتا #دیتاست
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection
- A first-of-its-kind large synthetic training dataset for online hate classification, created from scratch with trained annotators over multiple rounds of dynamic data collection.
Paper:
https://arxiv.org/abs/2012.15761
Dataset:
https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
#مقاله #پردازش_زبان_طبیعی #دیتا #دیتاست
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
- A first-of-its-kind large synthetic training dataset for online hate classification, created from scratch with trained annotators over multiple rounds of dynamic data collection.
Paper:
https://arxiv.org/abs/2012.15761
Dataset:
https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset
#مقاله #پردازش_زبان_طبیعی #دیتا #دیتاست
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
یک دیتاست بسیار ارزشمندبرای کارهای پژوهشی و...
https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/Backtranslations.md
#دیتا #دیتاست #منابع #پردازش_زبان_طبیعی
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
https://github.com/Helsinki-NLP/Tatoeba-Challenge/blob/master/Backtranslations.md
#دیتا #دیتاست #منابع #پردازش_زبان_طبیعی
❇️ @AI_Python
🗣 @AI_Python_arXiv
✴️ @AI_Python_EN
ConditionalQA is a question answering dataset that contains complex questions with conditional answers, i.e. the answers are only true when certain conditions apply.
It can motivate doing research for complex question answering over long documents.
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
https://paperswithcode.com/dataset/conditionalqa
#مقاله #دیتا #دیتاست
❇️ @AI_Python
It can motivate doing research for complex question answering over long documents.
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
https://paperswithcode.com/dataset/conditionalqa
#مقاله #دیتا #دیتاست
❇️ @AI_Python
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development" :
https://morgan-klaus.com/pdfs/pubs/Scheuerman-CSCW2021-datapolitics.pdf
#دیتاست #بینایی_کامپیوتر #مقاله
❇️ @AI_Python
https://morgan-klaus.com/pdfs/pubs/Scheuerman-CSCW2021-datapolitics.pdf
#دیتاست #بینایی_کامپیوتر #مقاله
❇️ @AI_Python
دیتاست کلمات رکیک
https://github.com/mohamad-dehghani/Persian-Abusive-Words
#دیتاست #دیتا
❇️ @AI_Python
https://github.com/mohamad-dehghani/Persian-Abusive-Words
#دیتاست #دیتا
❇️ @AI_Python
Yandex: An Open-source Yet another Language Model 100B
YaLM 100B is trained for 2 terabyte of text: dataset the Pile and web-pages, including not only Wikipedia, news articles, and books, but also Github and arxiv.org. Yandex has applied the generative neural networks YaLM in the recent Y1 search update. Now they are already helping to give answers to searches in Yandex and Alice.
Github: https://github.com/yandex/YaLM-100B
#دیتاست
❇️ @AI_Python
YaLM 100B is trained for 2 terabyte of text: dataset the Pile and web-pages, including not only Wikipedia, news articles, and books, but also Github and arxiv.org. Yandex has applied the generative neural networks YaLM in the recent Y1 search update. Now they are already helping to give answers to searches in Yandex and Alice.
Github: https://github.com/yandex/YaLM-100B
#دیتاست
❇️ @AI_Python
دیتاست حاوی ۳۴۰MB متن است.
دیتاست مجموعه اخبار تسنیم نیوز هست که اخبار label نوع خبر هم دارند. خود خزشگر رو داخل گیت هابم به آدرس زیر گذاشتم
https://github.com/pourmand1376/TasnimNewsCrawler
و #دیتاست هم در kaggle اپلود شده است.
https://www.kaggle.com/datasets/amirpourmand/tasnimdataset
اینم فقط برا تسنیمه
دیتاستی که اسکریپ شده رو هم گذاشته
❇️ @AI_Python
دیتاست مجموعه اخبار تسنیم نیوز هست که اخبار label نوع خبر هم دارند. خود خزشگر رو داخل گیت هابم به آدرس زیر گذاشتم
https://github.com/pourmand1376/TasnimNewsCrawler
و #دیتاست هم در kaggle اپلود شده است.
https://www.kaggle.com/datasets/amirpourmand/tasnimdataset
اینم فقط برا تسنیمه
دیتاستی که اسکریپ شده رو هم گذاشته
❇️ @AI_Python
GitHub
GitHub - pourmand1376/PersianCrawler: Open source crawler for Persian websites.
Open source crawler for Persian websites. . Contribute to pourmand1376/PersianCrawler development by creating an account on GitHub.