Some random thoughts on p-values...
Inferential statistics comes into play when we wish to generalize a result from a sample to the population from which the sample was drawn.
The type of sampling procedure used must be taken into account. This is important since most statistical programs assume simple random sampling.
The quality of the sample and the definition of the population must also be considered. A textbook-quality sample from the wrong population, for example, could seriously mislead us.
Coverage problems and non-response in the case of surveys can be serious problems.
Measurement error and missing data can wreak havoc on the fanciest of analytics.
Distributional assumptions should not be ignored.
The "So What?" test, IMO, is most important. A very large and highly statistically significant correlation may have little significance to decision-makers. Conversely, a tiny correlation might be big news.
After a 100 years, if so many scientists and researchers still can't get their heads around p-values, what are the chances that Bayesian statistics will fare any better?
For much deeper thoughts on this important topic, see "Statistical Inference in the 21st Century: A World Beyond p < 0.05", linked below.
"Total Survey Error: Past, Present, and Future" may also be of interest -
https://academic.oup.com/poq/article/74/5/849/1817502
Inferential statistics are often used inappropriately, IMO. One example would be performing a t-test to assess whether a regression coefficient is really zero in the population...when the regression was performed on the population data. Similarly, significance testing is frequently used in model diagnostics when it might be more sensible to investigate how potential violation of an assumption might be affecting the model.
"Statistical Inference in the 21st Century: A World Beyond p < 0.05" -
https://www.tandfonline.com/toc/utas20/current
✴️ @AI_Python_EN
Inferential statistics comes into play when we wish to generalize a result from a sample to the population from which the sample was drawn.
The type of sampling procedure used must be taken into account. This is important since most statistical programs assume simple random sampling.
The quality of the sample and the definition of the population must also be considered. A textbook-quality sample from the wrong population, for example, could seriously mislead us.
Coverage problems and non-response in the case of surveys can be serious problems.
Measurement error and missing data can wreak havoc on the fanciest of analytics.
Distributional assumptions should not be ignored.
The "So What?" test, IMO, is most important. A very large and highly statistically significant correlation may have little significance to decision-makers. Conversely, a tiny correlation might be big news.
After a 100 years, if so many scientists and researchers still can't get their heads around p-values, what are the chances that Bayesian statistics will fare any better?
For much deeper thoughts on this important topic, see "Statistical Inference in the 21st Century: A World Beyond p < 0.05", linked below.
"Total Survey Error: Past, Present, and Future" may also be of interest -
https://academic.oup.com/poq/article/74/5/849/1817502
Inferential statistics are often used inappropriately, IMO. One example would be performing a t-test to assess whether a regression coefficient is really zero in the population...when the regression was performed on the population data. Similarly, significance testing is frequently used in model diagnostics when it might be more sensible to investigate how potential violation of an assumption might be affecting the model.
"Statistical Inference in the 21st Century: A World Beyond p < 0.05" -
https://www.tandfonline.com/toc/utas20/current
✴️ @AI_Python_EN
I love the Machine Learning and NLP articles published by Medium's Towards Data Science (Online Publication). They motivate each article really well, provide just the right amount of mathematical explanation, show really cool visualizations and provide code snippets. Most importantly, they have been written fairly recently (2018-2019) so the results and references they contain are pretty much state-of-the-art today. Here are some of my favourite articles.
RNNs and LSTMs: https://lnkd.in/eWM-ncT
Variational Autoencoders: https://lnkd.in/enp4KQs
Transformers: https://lnkd.in/e2JQbkG
CNNs: https://lnkd.in/esrqMZH
✴️ @AI_Python_EN
RNNs and LSTMs: https://lnkd.in/eWM-ncT
Variational Autoencoders: https://lnkd.in/enp4KQs
Transformers: https://lnkd.in/e2JQbkG
CNNs: https://lnkd.in/esrqMZH
✴️ @AI_Python_EN
Introduction to Text Wrangling Techniques for Natural Language Processing
https://bit.ly/2GzNgg1
Generalized Language Models http://bit.ly/2TRDwSE #AI #DeepLearning #MachineLearning #DataScience
✴️ @AI_Python_EN
https://bit.ly/2GzNgg1
Generalized Language Models http://bit.ly/2TRDwSE #AI #DeepLearning #MachineLearning #DataScience
✴️ @AI_Python_EN
How to start learning data science with zero-progragimming experience
1. Start learning data science with zero-progragimming experience https://lnkd.in/fUZKqjg
2. Selecting course on data science https://lnkd.in/fXBw833
3. From Excel to Pandas https://lnkd.in/fnU5apw
4. Communication & Data Storytelling https://lnkd.in/eqf5gUV
5. Data Manipulation with Python https://lnkd.in/g4DFNpJ
6. Data Visualization with Python (Matplotlib/Seaborn): https://lnkd.in/g_3fx_6
7. Advanced Pandas https://lnkd.in/fZWGp9B
8. Tricks on Pandas by Real Python https://lnkd.in/fXc9XSp
9. Becoming Efficient with Pandas https://lnkd.in/f64hU-Y
10. Pandas Advances Tips https://lnkd.in/fGyBc4c
11. Jupyter Notebook (Beginner) https://lnkd.in/fTFinFi
12. Jupyter Notebook (Advanced)
https://lnkd.in/fFufePv
Youtube : https://lnkd.in/ftVzrtk
✴️ @AI_Python_EN
1. Start learning data science with zero-progragimming experience https://lnkd.in/fUZKqjg
2. Selecting course on data science https://lnkd.in/fXBw833
3. From Excel to Pandas https://lnkd.in/fnU5apw
4. Communication & Data Storytelling https://lnkd.in/eqf5gUV
5. Data Manipulation with Python https://lnkd.in/g4DFNpJ
6. Data Visualization with Python (Matplotlib/Seaborn): https://lnkd.in/g_3fx_6
7. Advanced Pandas https://lnkd.in/fZWGp9B
8. Tricks on Pandas by Real Python https://lnkd.in/fXc9XSp
9. Becoming Efficient with Pandas https://lnkd.in/f64hU-Y
10. Pandas Advances Tips https://lnkd.in/fGyBc4c
11. Jupyter Notebook (Beginner) https://lnkd.in/fTFinFi
12. Jupyter Notebook (Advanced)
https://lnkd.in/fFufePv
Youtube : https://lnkd.in/ftVzrtk
✴️ @AI_Python_EN
Functional brain network architecture supporting the learning of social networks in humans
Tompson et al.: https://lnkd.in/e4r93sC
#brainnetworks #neuroscience #socialnetworks #neuralnetworks
✴️ @AI_Python_EN
Tompson et al.: https://lnkd.in/e4r93sC
#brainnetworks #neuroscience #socialnetworks #neuralnetworks
✴️ @AI_Python_EN
This media is not supported in your browser
VIEW IN TELEGRAM
Generating adversarial patches against YOLOv2. Very cool paper on adversarial attacks in particular on a person detector. Understanding adversarial attacks on machine learning models is an important research field in order to create more robust models. Code is also provided and a really funny demo video:) Check it out! #deeplearning #machinelearning
📜 Paper: https://lnkd.in/daJEPqj
🔤 Code: https://lnkd.in/dPGFhwE
✴️ @AI_Python_EN
📜 Paper: https://lnkd.in/daJEPqj
🔤 Code: https://lnkd.in/dPGFhwE
✴️ @AI_Python_EN
Some random thoughts about AI BS...
Humans design, implement and use AI, thus AI cannot eliminate human error.
AI is "almost instantaneous" after it has been designed, tested, implemented and evaluated. Many human tasks and decisions are one-off and for these AI is slow and impractical.
Adaptive surveys have been used for decades in marketing research and other fields. There are things called skips...At a more sophisticated level, they are an outgrowth of adaptive testing, which psychometricians have been investigating for decades. That nut is not entirely cracked.
Text mining is a legitimate application of AI and has been since the 1950s.
I now see programmatic advertising, which has been used for quite some time now, rebranded as AI.
Chatbots are still very much a work in progress and, as such, haven't revolutionized anything. AI cannot read hearts and minds. Sorry.
Eye tracking has been used since the 1920s. Facial imagining is newer but not new. In these contexts, people selling them used to refer to neural nets as neural nets, not as AI.
Automated demand forecasting has been around at least since the mid-70s when AFS launched Autobox. Inventory management and control has been increasingly automated since the 1960s. They were never called #AI.
I'd better stop now...
✴️ @AI_Python_EN
Humans design, implement and use AI, thus AI cannot eliminate human error.
AI is "almost instantaneous" after it has been designed, tested, implemented and evaluated. Many human tasks and decisions are one-off and for these AI is slow and impractical.
Adaptive surveys have been used for decades in marketing research and other fields. There are things called skips...At a more sophisticated level, they are an outgrowth of adaptive testing, which psychometricians have been investigating for decades. That nut is not entirely cracked.
Text mining is a legitimate application of AI and has been since the 1950s.
I now see programmatic advertising, which has been used for quite some time now, rebranded as AI.
Chatbots are still very much a work in progress and, as such, haven't revolutionized anything. AI cannot read hearts and minds. Sorry.
Eye tracking has been used since the 1920s. Facial imagining is newer but not new. In these contexts, people selling them used to refer to neural nets as neural nets, not as AI.
Automated demand forecasting has been around at least since the mid-70s when AFS launched Autobox. Inventory management and control has been increasingly automated since the 1960s. They were never called #AI.
I'd better stop now...
✴️ @AI_Python_EN
Precise Detection in Densely Packed Scenes
Goldman et al.: https://lnkd.in/eqsrD3p
Code and dataset: https://lnkd.in/eN_qXvj
#ObjectDetection #DeepLearning #MachineLearning
✴️ @AI_Python_EN
Goldman et al.: https://lnkd.in/eqsrD3p
Code and dataset: https://lnkd.in/eN_qXvj
#ObjectDetection #DeepLearning #MachineLearning
✴️ @AI_Python_EN
This media is not supported in your browser
VIEW IN TELEGRAM
***Code Faster in Python*** with Line-of-Code Completions.
Kite integrates with your IDE (Atom, PyCharm, Sublime, VSCode and Vim) and uses machine learning to give you useful code completions for Python. Give it a try.
Link - https://kite.com/
#python #pythonprogramming #ml
✴️ @AI_Python_EN
Kite integrates with your IDE (Atom, PyCharm, Sublime, VSCode and Vim) and uses machine learning to give you useful code completions for Python. Give it a try.
Link - https://kite.com/
#python #pythonprogramming #ml
✴️ @AI_Python_EN
All the Super-Resolution algorithms in one place.
"A Deep Journey into Super-resolution: A Survey"
#pytorch #ai #algorithms
https://lnkd.in/dfnd5se
✴️ @AI_Python_EN
"A Deep Journey into Super-resolution: A Survey"
#pytorch #ai #algorithms
https://lnkd.in/dfnd5se
✴️ @AI_Python_EN
***Anomaly Detection Cheat sheet***
✴️ @AI_Python_EN
✴️ @AI_Python_EN
paper "Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data" with BertrandThirion and Gael Varoquaux got accepted to #ICML2019 ! Arxiv: https://arxiv.org/abs/1807.11718# code: https://github.com/sergulaydore/Feature-Grouping-Regularizer
✴️ @AI_Python_EN
✴️ @AI_Python_EN
HOW DO YOU KNOW THAT YOU HAVE ENOUGH TRAINING DATA? Check out my MEDIUM post:
https://lnkd.in/eVadKPb
#machineleaning
✴️ @AI_Python_EN
https://lnkd.in/eVadKPb
#machineleaning
✴️ @AI_Python_EN
What does a machine learning engineers day look like?
Someone asked me what skills they should learn for the rest of the year if they wanted to get into machine learning.
It's hard to narrow it down so I shared what my days usually look like.
9 am - reading articles/papers online about machine learning.
10 am - working on the current project and (sometimes) applying what I've just been reading online.
4 pm - pushing code to GitHub and writing down experiments for the next day.
5 pm - sending a small report to the team about what I've been working on during the day.
(these are all ideal scenarios)
Now, what happens during the 10-4pm?
Usually, it will be all be Python code within a Jupyter Notebook playing with different datasets.
Right now, I'm working on a text classification problem using the Flair library.
So what should you learn?
In my case, the following have been the most valuable.
1. Exploring and analysing new datasets, this notebook by Daniel Formosso is a great example: https://lnkd.in/gbayWcQ
2. Researching and mixing together existing methods and applying them to solve problems.
So how can you practice these outside of a job?
Kaggle and your own projects (even if they don't work).
#machinelearning #datascience
✴️ @AI_Python_EN
Someone asked me what skills they should learn for the rest of the year if they wanted to get into machine learning.
It's hard to narrow it down so I shared what my days usually look like.
9 am - reading articles/papers online about machine learning.
10 am - working on the current project and (sometimes) applying what I've just been reading online.
4 pm - pushing code to GitHub and writing down experiments for the next day.
5 pm - sending a small report to the team about what I've been working on during the day.
(these are all ideal scenarios)
Now, what happens during the 10-4pm?
Usually, it will be all be Python code within a Jupyter Notebook playing with different datasets.
Right now, I'm working on a text classification problem using the Flair library.
So what should you learn?
In my case, the following have been the most valuable.
1. Exploring and analysing new datasets, this notebook by Daniel Formosso is a great example: https://lnkd.in/gbayWcQ
2. Researching and mixing together existing methods and applying them to solve problems.
So how can you practice these outside of a job?
Kaggle and your own projects (even if they don't work).
#machinelearning #datascience
✴️ @AI_Python_EN
Don't stop sharing, done is better than perfect
For people who actively continue to blame, condemn and complain online, especially when reacting to content containing statistics, programming and machine learning that has been simplified, look for value in the imperfections of others.
We both know that machine learning models will never be perfect, as George P.Box said, "there are no perfect models, but some are useful". As with the content mentioned above, there are often reduced details to facilitate understanding, actionability, business value and expand the spread of knowledge.
Not all of us will face cases that are on each topic of the content mentioned above, but if we know in part, we can get the opportunity to work on a better process, even helping people.
Don't stop sharing, done is better than perfect
#programming #statistics #machinelearning
✴️ @AI_Python_EN
For people who actively continue to blame, condemn and complain online, especially when reacting to content containing statistics, programming and machine learning that has been simplified, look for value in the imperfections of others.
We both know that machine learning models will never be perfect, as George P.Box said, "there are no perfect models, but some are useful". As with the content mentioned above, there are often reduced details to facilitate understanding, actionability, business value and expand the spread of knowledge.
Not all of us will face cases that are on each topic of the content mentioned above, but if we know in part, we can get the opportunity to work on a better process, even helping people.
Don't stop sharing, done is better than perfect
#programming #statistics #machinelearning
✴️ @AI_Python_EN
Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares
By Stephen Boyd and Lieven Vandenberghe, Cambridge University Press: https://lnkd.in/eQnqVQ9
#ArtificialIntelligence #LinearAlgebra #Vectors #Matrices #MachineLearning
✴️ @AI_Python_EN
By Stephen Boyd and Lieven Vandenberghe, Cambridge University Press: https://lnkd.in/eQnqVQ9
#ArtificialIntelligence #LinearAlgebra #Vectors #Matrices #MachineLearning
✴️ @AI_Python_EN
All ***Cheat Sheets*** in one place.
Github link - https://lnkd.in/fGeGXQs
#datascience #machinelearning #excel #deeplearning #python #R #sql #matlab #datamining #datawarehousing
✴️ @AI_Python_EN
Github link - https://lnkd.in/fGeGXQs
#datascience #machinelearning #excel #deeplearning #python #R #sql #matlab #datamining #datawarehousing
✴️ @AI_Python_EN
Learn how to train BERT faster with Tensor Cores for optimized #NLP in this technical blog. Code now available from GitHub.
✴️ @AI_Python_EN
✴️ @AI_Python_EN
In a simple key driver analysis, we may have a single dependent variable and a dozen or so predictors.
Even in this simple case, there are many ways to analyze the data. We might, for instance, realize, that one or more of the predictors is really endogenous, i.e., itself a dependent variable, or that it does not belong in our analysis at all.
Multicollinearity is common in many kinds of data and can be a major headache. Curvilinear relationships, interaction effects, missing data and clustering are other things we need to think about.
Some recommend machine learning as the solution. Indeed, this may be an option, but we must remember that there are many types of #machinelearning . Each may give very different answers. Machine learners can also be hard to interpret, and explanation is the main purpose of key driver.
Others may be tempted to just use cross tabs. But that too, in a sense, is a model and it may be a very inappropriate one that seriously misleads us.
There often is no simple answer to "simple" problems. Understanding decision makers needs and expectations is a fundamental first step.
Extensive data cleaning may also be necessarily and, in the case of surveys, we may need to adjust for response styles. At the end of our exploratory data analysis, we might also conclude that the data we have aren't right for the task. It's important to bear in mind that key driver analysis is a form of causal analysis, which is usually very challenging.
✴️ @AI_Python_EN
Even in this simple case, there are many ways to analyze the data. We might, for instance, realize, that one or more of the predictors is really endogenous, i.e., itself a dependent variable, or that it does not belong in our analysis at all.
Multicollinearity is common in many kinds of data and can be a major headache. Curvilinear relationships, interaction effects, missing data and clustering are other things we need to think about.
Some recommend machine learning as the solution. Indeed, this may be an option, but we must remember that there are many types of #machinelearning . Each may give very different answers. Machine learners can also be hard to interpret, and explanation is the main purpose of key driver.
Others may be tempted to just use cross tabs. But that too, in a sense, is a model and it may be a very inappropriate one that seriously misleads us.
There often is no simple answer to "simple" problems. Understanding decision makers needs and expectations is a fundamental first step.
Extensive data cleaning may also be necessarily and, in the case of surveys, we may need to adjust for response styles. At the end of our exploratory data analysis, we might also conclude that the data we have aren't right for the task. It's important to bear in mind that key driver analysis is a form of causal analysis, which is usually very challenging.
✴️ @AI_Python_EN
If your data makes sense then it is either fake or generated.
✴️ @AI_Python_EN
✴️ @AI_Python_EN
LBS Autoencoder: Self-supervised Fitting of Articulated Meshes to Point Clouds
Paper: http://ow.ly/ga4c50rqgsN
#artificialinteligence #machineleaning #bigdata #machinelearning #deeplearning #technology
✴️ @AI_Python_EN
Paper: http://ow.ly/ga4c50rqgsN
#artificialinteligence #machineleaning #bigdata #machinelearning #deeplearning #technology
✴️ @AI_Python_EN