The 10 Biggest datasets of 2018
0) Open Images V4 from Google AI on April 30th Contains 15.4M bounding-boxes for 600 categories on 1.9M images.
Paper: https://lnkd.in/fm4xiUm
1) MURA from Stanford University ML Group on May 24 Radiographic image dataset
Paper: https://lnkd.in/fBy5szB
2) BDD100K from BAIR, Georgia Tech, Peking University, Uber AI
on May 30 Self-Driving Car Dataset.
Paper: https://lnkd.in/f-sYj9k
3) SQuAD 2.0 from Stanford
on June 11 QA Dataset.
Paper: https://lnkd.in/fYc6c5W
4) CoQA from Stanford on August 21 QA Dataset
Paper: https://lnkd.in/fKvuTvE
5) Spider 1.0 from Yale Univ on September 24 Cross-domain semantic parsing and text-to-SQL dataset.
Paper: https://lnkd.in/fWyR2x8
6) HototQA from Carnegie, Stanford, and Montreal on September 25 QA Dataset on Wiki
Paper: https://lnkd.in/fTtTgZt
7) Tencent ML Images from Tencent AI Lab on Oct 18 largest open-source multi-label image dataset
Paper: https://lnkd.in/ffV6VD5
8) Tencent AI Lab Embedding Corpus for Chinese words and phrases on Oct 19 Embeddings Dataset
Paper: https://lnkd.in/ffV6VD5
9) fastMRI from NYU and Facebook AI on November 26
Knee MRI Images Dataset
Paper: https://lnkd.in/fQuUDNk
Read: https://lnkd.in/fXU9Kr6
#dataset #datasets
โด๏ธ @AI_Python_EN
๐ฃ @AI_Python_Arxiv
  0) Open Images V4 from Google AI on April 30th Contains 15.4M bounding-boxes for 600 categories on 1.9M images.
Paper: https://lnkd.in/fm4xiUm
1) MURA from Stanford University ML Group on May 24 Radiographic image dataset
Paper: https://lnkd.in/fBy5szB
2) BDD100K from BAIR, Georgia Tech, Peking University, Uber AI
on May 30 Self-Driving Car Dataset.
Paper: https://lnkd.in/f-sYj9k
3) SQuAD 2.0 from Stanford
on June 11 QA Dataset.
Paper: https://lnkd.in/fYc6c5W
4) CoQA from Stanford on August 21 QA Dataset
Paper: https://lnkd.in/fKvuTvE
5) Spider 1.0 from Yale Univ on September 24 Cross-domain semantic parsing and text-to-SQL dataset.
Paper: https://lnkd.in/fWyR2x8
6) HototQA from Carnegie, Stanford, and Montreal on September 25 QA Dataset on Wiki
Paper: https://lnkd.in/fTtTgZt
7) Tencent ML Images from Tencent AI Lab on Oct 18 largest open-source multi-label image dataset
Paper: https://lnkd.in/ffV6VD5
8) Tencent AI Lab Embedding Corpus for Chinese words and phrases on Oct 19 Embeddings Dataset
Paper: https://lnkd.in/ffV6VD5
9) fastMRI from NYU and Facebook AI on November 26
Knee MRI Images Dataset
Paper: https://lnkd.in/fQuUDNk
Read: https://lnkd.in/fXU9Kr6
#dataset #datasets
โด๏ธ @AI_Python_EN
๐ฃ @AI_Python_Arxiv
Shuffling large datasets, have you ever tried that?
Here the author presents an algorithm for shuffling large datasets.
Here you learn the following;
0. why Shuffle in the first place?
1. A 2-pass shuffle algorithm is tested
2. How to deal with oversized piles
3. Parallelization & more
Link to article : https://lnkd.in/dZ8-tyJ
Gist on #Github: for a cool visualization of the shuffle https://lnkd.in/d8iK8fd
#algorithms #github #datasets #deeplearning #machinelearning
โ๏ธ @AI_Python
๐ฃ @AI_Python_Arxiv
โด๏ธ @AI_Python_EN
  Here the author presents an algorithm for shuffling large datasets.
Here you learn the following;
0. why Shuffle in the first place?
1. A 2-pass shuffle algorithm is tested
2. How to deal with oversized piles
3. Parallelization & more
Link to article : https://lnkd.in/dZ8-tyJ
Gist on #Github: for a cool visualization of the shuffle https://lnkd.in/d8iK8fd
#algorithms #github #datasets #deeplearning #machinelearning
โ๏ธ @AI_Python
๐ฃ @AI_Python_Arxiv
โด๏ธ @AI_Python_EN
image_2019-02-20_12-48-38.png
    872.8 KB
  Deep Convolutional Sum-Product Networks for Probabilistic Image Representations
Sum-Product Networks (SPNs) are hierarchical probabilistic graphical models capable of fast and exact inference.
Applications of SPNs to real-world data such as large image datasets has been fairly limited in previous literature. Here is a Convolutional Sum-Product Networks (ConvSPNs) which exploit the inherent structure of images in a way similar to deep convolutional neural networks, optionally with weight sharing.
#neuralnetworks #datasets #deeplearning
Paper: https://lnkd.in/ei4Gqjy
โด๏ธ @AI_Python_EN
โ๏ธ @AI_Python
๐ฃ @AI_Python_arXiv
  Sum-Product Networks (SPNs) are hierarchical probabilistic graphical models capable of fast and exact inference.
Applications of SPNs to real-world data such as large image datasets has been fairly limited in previous literature. Here is a Convolutional Sum-Product Networks (ConvSPNs) which exploit the inherent structure of images in a way similar to deep convolutional neural networks, optionally with weight sharing.
#neuralnetworks #datasets #deeplearning
Paper: https://lnkd.in/ei4Gqjy
โด๏ธ @AI_Python_EN
โ๏ธ @AI_Python
๐ฃ @AI_Python_arXiv
It is time we shared the dataset with everyone. This is a collection of text from Tamil news articles. Has around 7 millions lines of text, all cleaned up, ready to used for language modelling task, in case anyone want to try. You can use the code from git repo below to get started.
Dataset:
https://lnkd.in/fzg3xyM]
Code:
https://lnkd.in/fezt4M8 #datasets
โด๏ธ @AI_Python_EN
  Dataset:
https://lnkd.in/fzg3xyM]
Code:
https://lnkd.in/fezt4M8 #datasets
โด๏ธ @AI_Python_EN
To much spelling error in your dataset?
Peter Norvig (Research Director at Google, previously director of search quality) revolutionize search engine quality by giving power to reduce spelling error (by splits, deletes, transposes, replaces, and inserts). You can see the comprehensive guide (with python code) at his website https://lnkd.in/fEb3v2a
#python #datasets #codes #statistician
โด๏ธ @AI_Python_EN
  Peter Norvig (Research Director at Google, previously director of search quality) revolutionize search engine quality by giving power to reduce spelling error (by splits, deletes, transposes, replaces, and inserts). You can see the comprehensive guide (with python code) at his website https://lnkd.in/fEb3v2a
#python #datasets #codes #statistician
โด๏ธ @AI_Python_EN
Not everyone knows but my #book has its Github repository where all #Python code used to build illustrations is gathered.
So, while reading the book, you can actually run the described #algorithms, play with hyperparameters and #datasets, and generate your versions of illustrations.
https://github.com/aburkov/theMLbook
โด๏ธ @AI_Python_EN
  So, while reading the book, you can actually run the described #algorithms, play with hyperparameters and #datasets, and generate your versions of illustrations.
https://github.com/aburkov/theMLbook
โด๏ธ @AI_Python_EN
image_2019-04-16_16-17-24.png
    710.5 KB
  Transition guide from Excelโs analyst to Python Programming for Data Analysis  
1. From Excel to Pandas https://lnkd.in/fnU5apw
2. Communication & Data Storytelling https://lnkd.in/eqf5gUV
3. Data Manipulation with Python https://lnkd.in/g4DFNpJ
4. Data Visualization with Python (Matplotlib/Seaborn): https://lnkd.in/g_3fx_6
5. Advanced Pandas https://lnkd.in/fZWGp9B
6. Tricks on Pandas by Real Python https://lnkd.in/fXc9XSp
7. Becoming Efficient with Pandas https://lnkd.in/f64hU-Y
8. Pandas Advances Tips https://lnkd.in/fGyBc4c
9. Jupyter Notebook (Beginner) https://lnkd.in/fTFinFi
10. Jupyter Notebook (Advanced) https://lnkd.in/fFufePv
#datavisualization #python #programming #pydata #datasets #pandas #datasets
โด๏ธ @AI_Python_EN
  1. From Excel to Pandas https://lnkd.in/fnU5apw
2. Communication & Data Storytelling https://lnkd.in/eqf5gUV
3. Data Manipulation with Python https://lnkd.in/g4DFNpJ
4. Data Visualization with Python (Matplotlib/Seaborn): https://lnkd.in/g_3fx_6
5. Advanced Pandas https://lnkd.in/fZWGp9B
6. Tricks on Pandas by Real Python https://lnkd.in/fXc9XSp
7. Becoming Efficient with Pandas https://lnkd.in/f64hU-Y
8. Pandas Advances Tips https://lnkd.in/fGyBc4c
9. Jupyter Notebook (Beginner) https://lnkd.in/fTFinFi
10. Jupyter Notebook (Advanced) https://lnkd.in/fFufePv
#datavisualization #python #programming #pydata #datasets #pandas #datasets
โด๏ธ @AI_Python_EN
#Statistics such as correlation, mean and standard deviation (variance) create strong visual images and meaning.  Two different #datasets with the same correlation would sort of look the same.  Right?
Not so much.
Each of these very different-looking graphs are plotting datasets with the same correlation, mean and SD. This is why plotting data is so important though oddly so rarely (in my expereince) done.
https://bit.ly/2oZ29MP
โด๏ธ @AI_Python_EN
  Not so much.
Each of these very different-looking graphs are plotting datasets with the same correlation, mean and SD. This is why plotting data is so important though oddly so rarely (in my expereince) done.
https://bit.ly/2oZ29MP
โด๏ธ @AI_Python_EN
Another lovely development in #Healthcare #DeepLearning
Building a Benchmark Dataset and Classifiers for Sentence-Level Findings in AP Chest X-rays.
#datasets
Arxiv: https://lnkd.in/dxx5iCY
โด๏ธ @AI_Python_EN
  Building a Benchmark Dataset and Classifiers for Sentence-Level Findings in AP Chest X-rays.
#datasets
Arxiv: https://lnkd.in/dxx5iCY
โด๏ธ @AI_Python_EN
Google announced the updated YouTube-8M dataset
Updated set now includes a subset with verified 5-s segment level labels, along with the 3rd Large-Scale Video Understanding Challenge and Workshop at #ICCV19.
Link: https://lnkd.in/f_6Jb7Y
#DL #datasets
โด๏ธ @AI_Python_EN
  Updated set now includes a subset with verified 5-s segment level labels, along with the 3rd Large-Scale Video Understanding Challenge and Workshop at #ICCV19.
Link: https://lnkd.in/f_6Jb7Y
#DL #datasets
โด๏ธ @AI_Python_EN
Public datasets: weather and climate Google Cloudโs Public Datasets Program : 
https://lnkd.in/edhe7wj
#ArtificialIntelligence #Datasets
โด @AI_Python_EN
  https://lnkd.in/edhe7wj
#ArtificialIntelligence #Datasets
โด @AI_Python_EN
