Andriy Burkov
I often receive questions from people in my network about what should they learn and master to become a data scientist. While I personally think that the term "data scientist" is very unfortunate and without a clear definition, this is what a good modern #dataanalyst has to master:
#DataScience
– Data structures (local and distributed)
– Data indexing
– Data privacy and anonymization
– Data lifecycle management
– Data transformation (deduplication, handling outliers, and missing values, dimensionality reduction)
– Data analysis (experiment design, classification, regression, unsupervised methods)
– #Machinelearning methods (feature engineering, regularization, hyperparameter tuning, ensemble methods, and #neuralnetwork s)
– Computer and database programming, numerical optimization
– Distributed data processing
– Real-time and high-frequency data processing
– Linux (my personal bias)
A modern data analyst also has to be a good popularizer of complex ideas. Having a Ph.D. is not a requirement, but a very big plus: it contributes to the popularizing skill and teaches the scientific approach to problem-solving.
✴️ @AI_Python_EN
I often receive questions from people in my network about what should they learn and master to become a data scientist. While I personally think that the term "data scientist" is very unfortunate and without a clear definition, this is what a good modern #dataanalyst has to master:
#DataScience
– Data structures (local and distributed)
– Data indexing
– Data privacy and anonymization
– Data lifecycle management
– Data transformation (deduplication, handling outliers, and missing values, dimensionality reduction)
– Data analysis (experiment design, classification, regression, unsupervised methods)
– #Machinelearning methods (feature engineering, regularization, hyperparameter tuning, ensemble methods, and #neuralnetwork s)
– Computer and database programming, numerical optimization
– Distributed data processing
– Real-time and high-frequency data processing
– Linux (my personal bias)
A modern data analyst also has to be a good popularizer of complex ideas. Having a Ph.D. is not a requirement, but a very big plus: it contributes to the popularizing skill and teaches the scientific approach to problem-solving.
✴️ @AI_Python_EN
San Francisco became the first major U.S. city to ban the use of facial recognition technology by police and other municipal agencies
https://www.nytimes.com/2019/05/14/us/facial-recognition-ban-san-francisco.html?smtyp=cur&smid=tw-nytimes
✴️ @AI_Python_EN
https://www.nytimes.com/2019/05/14/us/facial-recognition-ban-san-francisco.html?smtyp=cur&smid=tw-nytimes
✴️ @AI_Python_EN
A #Keras usage pattern that allows for maximum flexibility when defining arbitrary losses and metrics (that don't match the usual signature) is the "endpoint layer" pattern. It works like this: https://colab.research.google.com/drive/1zzLcJ2A2qofIvv94YJ3axRknlA6cBSIw
In short, you use
Of course logistic regression is a basic case that doesn't actually need this advanced pattern. But endpoint layers will work every time, even when you have losses & metrics that don't match the usual
✴️ @AI_Python_EN
In short, you use
add_loss
/add_metric
inside an "endpoint layer" that also has access to model targets. The layer then returns the inference-time predictions. You compile without an external "loss" argument, and you fit with a dictionary of data that contains the targets.Of course logistic regression is a basic case that doesn't actually need this advanced pattern. But endpoint layers will work every time, even when you have losses & metrics that don't match the usual
fn(y_true, y_pred, sampl_weight)
signature that is required in compile
.✴️ @AI_Python_EN
Build a chat widget with Python and JavaScript
http://bit.ly/2JnD8d0
#python #javascript #development
http://bit.ly/2JI78jc
✴️ @AI_Python_EN
http://bit.ly/2JnD8d0
#python #javascript #development
http://bit.ly/2JI78jc
✴️ @AI_Python_EN
Natural Language Processing with Deep Learning in Python ☞ http://bit.ly/2HlcwXV #DeepLearning #TensorFlow
academy.learnstartup.net
Learn Startup - Build a successful business and change the world
Learn Startup, starting a business, Mobile Development and Design with Node.js, Angular.js, React.js, Python, MongoDB, HTML5, CSS3, JavaScript, PHP, mobile app development, Responsive Web Design, Maketing
Machine vision is the newest weapon against crop loss
https://zd.net/2Vq1AvV
#ai #ArtificialIntelligence #farming
✴️ @AI_Python_EN
https://zd.net/2Vq1AvV
#ai #ArtificialIntelligence #farming
✴️ @AI_Python_EN
Accelerating quantum technologies with materials processing at the atomic scale #quantum #QuantumComputing
https://t.co/mHuuywfESG
✴️ @AI_Python_EN
https://t.co/mHuuywfESG
✴️ @AI_Python_EN
A 2019 guide to 3D Human Pose Estimation
https://blog.nanonets.com/human-pose-estimation-3d-guide/
✴️ @AI_Python_EN
https://blog.nanonets.com/human-pose-estimation-3d-guide/
✴️ @AI_Python_EN
A Convolutional Neural Network Tutorial in Keras and Tensorflow 2
https://medium.com/@isakbosman/a-convolutional-neural-network-tutorial-in-keras-and-tensorflow-2-2bff79f477c0
#Keras #neuralnetwork #TensorFlow #ConvolutionalNeuralNetwork
✴️ @AI_Python_EN
https://medium.com/@isakbosman/a-convolutional-neural-network-tutorial-in-keras-and-tensorflow-2-2bff79f477c0
#Keras #neuralnetwork #TensorFlow #ConvolutionalNeuralNetwork
✴️ @AI_Python_EN
Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Artificial Intelligence Podcast
https://www.youtube.com/watch?v=yCd3CzGSte8
✴️ @AI_Python_EN
https://www.youtube.com/watch?v=yCd3CzGSte8
✴️ @AI_Python_EN
YouTube
Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Lex Fridman Podcast #21
How to tell whether machine-learning systems are robust enough for the real world
http://news.mit.edu/2019/how-tell-whether-machine-learning-systems-are-robust-enough-real-worl-0510
#MachineLearning
✴️ @AI_Python_EN
http://news.mit.edu/2019/how-tell-whether-machine-learning-systems-are-robust-enough-real-worl-0510
#MachineLearning
✴️ @AI_Python_EN
Deep Learning Determinism
🌎 Deep Learning
🌎 This is a talk from GTC 2019 in San Jose, California. Slides: http://bit.ly/dl-determinism-slides
#DeepLearning
✴️ @AI_Python_EN
🌎 Deep Learning
🌎 This is a talk from GTC 2019 in San Jose, California. Slides: http://bit.ly/dl-determinism-slides
#DeepLearning
✴️ @AI_Python_EN
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations has been accepted as a long paper at #ACL2019. With D. Hazarika, N. Majumder, G. Naik, E. Cambria,.
Arxiv - https://arxiv.org/abs/1810.02508
Dataset -
https://affective-meld.github.io
✴️ @AI_Python_EN
Arxiv - https://arxiv.org/abs/1810.02508
Dataset -
https://affective-meld.github.io
✴️ @AI_Python_EN
Discover 3D graphics capabilities for #TensorFlow >> https://github.com/tensorflow/graphics … | #DeepLearning
✴️ @AI_Python_EN
✴️ @AI_Python_EN
image_2019-05-15_20-14-10.png
353.8 KB
Deep Learning: GANs and Variational Autoencoders ☞ http://bit.ly/2JdVTP3 #tensorflow #ai
✴️ @AI_Python_EN
✴️ @AI_Python_EN
image_2019-05-15_20-15-37.png
291.6 KB
Expression Conditional GAN for Facial Expression-to Expression Translation https://arxiv.org/pdf/1905.05416.pdf
✴️ @AI_Python_EN
✴️ @AI_Python_EN
In #DataScience textbooks I frequently read that #logisticregression (LR) is a misnomer because it's a classifier, not regression.
Some also are disdainful of the method, claiming its predictions are generally poor compared to other classifiers.
Both comments suggest the author became aware of LR through predictive analytics and is unfamiliar with its origins and the ways it is commonly used by statisticians and researchers.
LR, like the more familiar OLS regression introduced to us in Stats 101, is a member of the Generalized Linear Model (GLM) family. These are all regression methods. Regression methods for analyzing categorical data have been widely-used in many fields to help us understand phenomena.
Applied Logistic Regression (Hosmer and Lemeshow) Logistic Regression Models (Hilbe) are two classic books on LR.
Though not its original purpose, LR can also be used for classification. The output of LR are estimated probabilities of group membership. You can set the cutoff wherever you like - 0.50 is only a standard program default and inappropriate for imbalanced data.
The righthand side of the LR equation can also be modified to account for interactions and curvilinear relationships.
LR is not always the best choice for classification but often works very well.
My first serious use of LR was to both explain and predict, in this case, student loan default based on loan application data. I was not aware of the term "predictive analytics" at the time (early '80s) and it probably wasn't yet in use.
Explanation and prediction are not mutually exclusive, though historically LR and stats generally have been used more for explanation. Statisticians tend to frown on equations that don't make sense even if they predict well out of sample. It can be a warning sign.
An arbitrary distinction between "regression" and "classification" has emerged in recent years, the former being used when the dependent variable (label) is continuous or interval and the latter when it is categorical (e.g., purchased/didn't purchase). A statistician will tend to see both cases, as well as when the dependent variable is ordinal, count, or multinomial, as regression problems.
Discriminant analysis, which is related to MANOVA, was designed for classification but can also be used to help us understand a phenomenon.
There are many excellent books on GLM and categorical data analysis, and here are just a few:
- Generalized Linear Models and Extensions (Hardin and Hilbe)
- Generalized Linear Models & Generalized Estimating Equations (Garson)
- Regression Modeling Strategies (Harrell)
- Categorical Data Analysis (Agresti)
- Analyzing Categorical Data (Simonoff)
- Regression Models for Categorical Dependent Variables (Long and Freese)
✴️ @AI_Python_EN
Some also are disdainful of the method, claiming its predictions are generally poor compared to other classifiers.
Both comments suggest the author became aware of LR through predictive analytics and is unfamiliar with its origins and the ways it is commonly used by statisticians and researchers.
LR, like the more familiar OLS regression introduced to us in Stats 101, is a member of the Generalized Linear Model (GLM) family. These are all regression methods. Regression methods for analyzing categorical data have been widely-used in many fields to help us understand phenomena.
Applied Logistic Regression (Hosmer and Lemeshow) Logistic Regression Models (Hilbe) are two classic books on LR.
Though not its original purpose, LR can also be used for classification. The output of LR are estimated probabilities of group membership. You can set the cutoff wherever you like - 0.50 is only a standard program default and inappropriate for imbalanced data.
The righthand side of the LR equation can also be modified to account for interactions and curvilinear relationships.
LR is not always the best choice for classification but often works very well.
My first serious use of LR was to both explain and predict, in this case, student loan default based on loan application data. I was not aware of the term "predictive analytics" at the time (early '80s) and it probably wasn't yet in use.
Explanation and prediction are not mutually exclusive, though historically LR and stats generally have been used more for explanation. Statisticians tend to frown on equations that don't make sense even if they predict well out of sample. It can be a warning sign.
An arbitrary distinction between "regression" and "classification" has emerged in recent years, the former being used when the dependent variable (label) is continuous or interval and the latter when it is categorical (e.g., purchased/didn't purchase). A statistician will tend to see both cases, as well as when the dependent variable is ordinal, count, or multinomial, as regression problems.
Discriminant analysis, which is related to MANOVA, was designed for classification but can also be used to help us understand a phenomenon.
There are many excellent books on GLM and categorical data analysis, and here are just a few:
- Generalized Linear Models and Extensions (Hardin and Hilbe)
- Generalized Linear Models & Generalized Estimating Equations (Garson)
- Regression Modeling Strategies (Harrell)
- Categorical Data Analysis (Agresti)
- Analyzing Categorical Data (Simonoff)
- Regression Models for Categorical Dependent Variables (Long and Freese)
✴️ @AI_Python_EN
What Is Your Purpose of Visualizing Data?
Visualize data based on purpose
Detail
https://lnkd.in/fa95F8d
Alternative Reading
✅ Know Data Science
https://lnkd.in/fMHtxYP
✅ Understand How to answer Why
https://lnkd.in/f396Dqg
✅ Know Machine Learning Key Terminology
https://lnkd.in/fCihY9W
✅ Understand Machine Learning Implementation
https://lnkd.in/f5aUbBM
✅ Machine Learning on Retail
https://lnkd.in/fihPTJf
✅ Machine Learning on Marketing
https://lnkd.in/fUDGAQW
#datascience #visualization #machinelearning
✴️ @AI_Python_EN
Visualize data based on purpose
Detail
https://lnkd.in/fa95F8d
Alternative Reading
✅ Know Data Science
https://lnkd.in/fMHtxYP
✅ Understand How to answer Why
https://lnkd.in/f396Dqg
✅ Know Machine Learning Key Terminology
https://lnkd.in/fCihY9W
✅ Understand Machine Learning Implementation
https://lnkd.in/f5aUbBM
✅ Machine Learning on Retail
https://lnkd.in/fihPTJf
✅ Machine Learning on Marketing
https://lnkd.in/fUDGAQW
#datascience #visualization #machinelearning
✴️ @AI_Python_EN
Introducing Translatotron: An End-to-End Speech-to-Speech Translation Model
Blog by Ye Jia and Ron Weiss: https://lnkd.in/ePaGRZj
#ArtificialIntelligence #DeepLearning #MachineLearning
✴️ @AI_Python_EN
Blog by Ye Jia and Ron Weiss: https://lnkd.in/ePaGRZj
#ArtificialIntelligence #DeepLearning #MachineLearning
✴️ @AI_Python_EN