AI, Python, Cognitive Neuroscience
3.88K subscribers
1.09K photos
47 videos
78 files
893 links
Download Telegram
Why are Scikit-learn machine learning models not as widely used in industry as TensorFlow or PyTorch?

The algorithms in scikit-learn are kind of like toy algorithms.

The neural networks are a joke. They were introduced only a couple of years ago and come in two flavors: MLPClassifier and MLPRegressor. MLP is for Multi-layer Perceptron. The name alone should be enough to tell you that this isn’t the greatest implementation. Scikit-learn doesn’t support GPUs and the neural networks don’t scale at all. No one in their right mind would use this in production.

The implementation of the popular gradient boosting algorithm is useless too. Known as GradientBoostingClassifier and GradientBoostingRegressor, it’s a painfully slow implementation that gets completely embarrassed by libraries like XGBoost, LightGBM and CatBoost. I should note that the scikit-learn team is working on a new implementation of gradient boosting that borrows heavily from LightGBM and XGBoost.

The random forest implementation is decent enough, but it generally gets outperformed by gradient boosting on almost any #machinelearning task anyway.

The #SVM implementation with #nonlinear kernels is extremely slow too, and generally useless.

The Naive Bayes implementation is okay, I guess, but it’s not a type of model that one would realistically use in production.

#Logisticregression can actually be useful. If the requirement is a simple classifier that’s fast to train and easy to interpret, this can be a good choice, even in production. I mean, it’s pretty hard to get a dead simple algorithm like that wrong.

The linear regression algorithms are completely fine too. OLS, ridge regression, lasso, elastic nets and what have you. These can be useful for simple tasks that need interpretability.

I love scikit-learn for its helper functions for things like preprocessing, cross-validation, hyperparameter tuning and so on, but it’s generally not a library that’s suited for any sort of heavy lifting when it comes to model training.


✴️ @AI_Python_EN
Machine Learning from scratch!

Implementation of some classic Machine Learning model from scratch and benchmarking against popular ML library, by Quan Tran: https://lnkd.in/er_ZNgY

#ArtificialIntelligence #DeepLearning #NeuralNetworks
#MachineLearning
HOW TO IMPROVE YOUR SKILL ON TEXT DATA?


Rubens Zimbres, PhD compile amazing resources on Machine Learning, NLP, and Computer Vision. On NLP Side he cover pretty much every common topic on NLP, this is very useful because as data scientist we often dealing with text data.

Yo can see the repository here https://lnkd.in/fyyvZYt

#repository #machinelearning #patternrecognition #artificialintellegence

✴️ @AI_Python_EN
Here are some #statistics and research #journals I can recommend:

- Statistical Analysis and Data Mining (ASA)
- Analytics Journal (DMA)
- The American Statistician (ASA)
- Journal of the American Statistical Association (ASA)
- Statistics in Biopharmaceutical Research (ASA)
- Journal of Agricultural, Biological, and Environmental Statistics (ASA)
- Journal of Statistics Education (ASA)
- Statistics and Public Policy (ASA)
- Journal of Survey Statistics and Methodology (AAPOR and ASA)
- Journal of Educational and Behavioral Statistics (ASA)
- British Journal of Mathematical and Statistical Psychology (Wiley)
- Statistics Surveys (IMS)
- Stata Journal (StataCorp)
- The R Journal (R Project)
- Structural Equation Modeling: A Multidisciplinary Journal (Routledge)
- Journal of Business & Economic Statistics (ASA)
- Journal of Marketing Research (AMA)
- Journal of Computational and Graphical Statistics (ASA)
- Journal of Artificial General Intelligence (AGIS)

These are not purely theoretical publications and provide plenty of examples I can adapt for my own work. I try to read them as regularly as I can.

There's so much innovation happening in analytics that it's hard to keep up!

✴️ @AI_Python_EN
Not everyone knows but my #book has its Github repository where all #Python code used to build illustrations is gathered.

So, while reading the book, you can actually run the described #algorithms, play with hyperparameters and #datasets, and generate your versions of illustrations.

https://github.com/aburkov/theMLbook

✴️ @AI_Python_EN
Welcome to Word Vector Space (visualization)
Demo: https://lnkd.in/eWTHCEd
Blog: https://lnkd.in/e4WM8qy
#machinelearning #word2vec #nlp

✴️ @AI_Python_EN
Prior-aware Neural Network for Partially-Supervised Multi-Organ Segmentation
Researchers: Yuyin Zhou, Zhe Li, Song Bai, Chong Wang, Xinlei Chen, Mei Han, Elliot Fishman, Alan Yuille
Paper: http://ow.ly/IdmR50qiURd
#technology #artificialinteligence #machineleaning #bigdata #machinelearning #deeplearning

✴️ @AI_Python_EN
CS294-158 Deep Unsupervised Learning

Ilya Sutskever guest lecture on GPT-2: https://lnkd.in/eNUSMTY

#DeepLearning #MachineLearning #UnsupervisedLearning

✴️ @AI_Python_EN
Big but Imperceptible Adversarial Perturbations via Semantic Manipulation
Researchers: Anand Bhattad, Min Jin Chong, Kaizhao Liang, Bo Li, David A. Forsyth
Paper: http://ow.ly/1aDn50qiU7G
#machinelearning #artificialintelligence #bigdata #deeplearning

✴️ @AI_Python_EN
My reflection for today: it is okay to dream, but it is more important to focus on the present and polish the current skill even if you think the skill is irrelevant. For example, when I was at school, I was a statistics TA who did not like statistics, because I wanted to be an engineer. Statistics department was kind enough to give me a job because the engineering department did not have open positions at the time. Then I was hired as a Data Scientist, but I liked the reservoir simulation better because I dreamt to be an engineer, which led to my lay off. Then I wanted to be a Data Scientist, but I was a Spotfire Engineer. Again, this Data Science passion did not work out well with my Spotfire Engineer job. Now I think if I would focus on all current skills at the time, I would become a Data Scientist anyways and would have better-polished skills since data visualization and statistics are both needed in this job.
So the moral of the story is: excel at your current job and use it as a foundation for your dream. Do your job well. And learn things even though they seem irrelevant at the time - you never know what future holds - it turns out you will need them. While dreaming about the future, stay grounded and in the present. Every single opportunity is a gift.

✴️ @AI_Python_EN
image_2019-04-16_16-17-24.png
710.5 KB
Transition guide from Excel’s analyst to Python Programming for Data Analysis

1. From Excel to Pandas https://lnkd.in/fnU5apw
2. Communication & Data Storytelling https://lnkd.in/eqf5gUV
3. Data Manipulation with Python https://lnkd.in/g4DFNpJ
4. Data Visualization with Python (Matplotlib/Seaborn): https://lnkd.in/g_3fx_6
5. Advanced Pandas https://lnkd.in/fZWGp9B
6. Tricks on Pandas by Real Python https://lnkd.in/fXc9XSp
7. Becoming Efficient with Pandas https://lnkd.in/f64hU-Y
8. Pandas Advances Tips https://lnkd.in/fGyBc4c
9. Jupyter Notebook (Beginner) https://lnkd.in/fTFinFi
10. Jupyter Notebook (Advanced) https://lnkd.in/fFufePv

#datavisualization #python #programming #pydata #datasets #pandas #datasets

✴️ @AI_Python_EN
Liveness Detection with OpenCV - PyImageSearch
http://bit.ly/2VI91j6 #AI #DataScience #MachineLearning #DataScience

✴️ @AI_Python_EN
Stack Deep Learning Bootcamp

(Most of) Lectures of Day 1: https://lnkd.in/eei67vp

Happy learning!

#ArtificialIntelligence #DeepLearning #MachineLearning

✴️ @AI_Python_EN
Whether you’re a:
- data scientist
- data analyst
- data engineer
- statistician
- BI Specialist
- business analyst
- software engineer
- research scientist
- machine learning engineer

At the end of the day, you’re a problem solver.

#datascience #machinelearning #analytics

✴️ @AI_Python_EN
Automated theorem prover driven by deep reinforcement learning: DeepHOL. Comes with a benchmark suite of 29,462 theorems to be proven. It can already prove 58% of them using 41"tactics".

PDF: https://arxiv.org/pdf/1904.03241.pdf

✴️ @AI_Python_EN
A thread of research that I've been particularly excited about lately is the linearized training of neural networks and the Neural Tangent Kernel. To that end, we're releasing code - written in JAX - that we've been using for our research:
https://github.com/google/neural-tangents

✴️ @AI_Python_EN