One of my favorite tricks is adding a constant to each of the independent variables in a regression so as to shift the intercept. Of course just shifting the data will not change R-squared, slopes, F-scores, P-values, etc., so why do it?
Because just about any software package capable of doing regression, even Excel, can give you standard errors and confidence intervals for the Intercept, but it is much harder to get most packages to give you standard errors and confidence intervals around the predicted value of the dependent variable for OTHER combinations of the independent variables. Shifting the intercept is an easy way to get confidence intervals for arbitrary combinations of the independent variables.
This sort of thing becomes especially important at a time when the Statistics community is loudly calling for a move away from P-values. Instead it is recommended that researchers give confidence intervals in clinically meaningful terms.
#data #researchers #statistics #r #excel #regression
✴️ @AI_Python_EN
Because just about any software package capable of doing regression, even Excel, can give you standard errors and confidence intervals for the Intercept, but it is much harder to get most packages to give you standard errors and confidence intervals around the predicted value of the dependent variable for OTHER combinations of the independent variables. Shifting the intercept is an easy way to get confidence intervals for arbitrary combinations of the independent variables.
This sort of thing becomes especially important at a time when the Statistics community is loudly calling for a move away from P-values. Instead it is recommended that researchers give confidence intervals in clinically meaningful terms.
#data #researchers #statistics #r #excel #regression
✴️ @AI_Python_EN
Important Machine Learning algorithms and their Hyperparameters
#machinelearning #datascience #statistics #algorithms
✴️ @AI_Python_EN
#machinelearning #datascience #statistics #algorithms
✴️ @AI_Python_EN
Why statistics should make you suspicious
Spiegelhalter on algorithm, luck, bias, probabilities, machine learning and AI.
https://lnkd.in/e-X9hXJ
#artificialintelligence #bias #ai #statistics #ai #bigdata
✴️ @AI_Python_EN
Spiegelhalter on algorithm, luck, bias, probabilities, machine learning and AI.
https://lnkd.in/e-X9hXJ
#artificialintelligence #bias #ai #statistics #ai #bigdata
✴️ @AI_Python_EN
Here are some #statistics and research #journals I can recommend:
- Statistical Analysis and Data Mining (ASA)
- Analytics Journal (DMA)
- The American Statistician (ASA)
- Journal of the American Statistical Association (ASA)
- Statistics in Biopharmaceutical Research (ASA)
- Journal of Agricultural, Biological, and Environmental Statistics (ASA)
- Journal of Statistics Education (ASA)
- Statistics and Public Policy (ASA)
- Journal of Survey Statistics and Methodology (AAPOR and ASA)
- Journal of Educational and Behavioral Statistics (ASA)
- British Journal of Mathematical and Statistical Psychology (Wiley)
- Statistics Surveys (IMS)
- Stata Journal (StataCorp)
- The R Journal (R Project)
- Structural Equation Modeling: A Multidisciplinary Journal (Routledge)
- Journal of Business & Economic Statistics (ASA)
- Journal of Marketing Research (AMA)
- Journal of Computational and Graphical Statistics (ASA)
- Journal of Artificial General Intelligence (AGIS)
These are not purely theoretical publications and provide plenty of examples I can adapt for my own work. I try to read them as regularly as I can.
There's so much innovation happening in analytics that it's hard to keep up!
✴️ @AI_Python_EN
- Statistical Analysis and Data Mining (ASA)
- Analytics Journal (DMA)
- The American Statistician (ASA)
- Journal of the American Statistical Association (ASA)
- Statistics in Biopharmaceutical Research (ASA)
- Journal of Agricultural, Biological, and Environmental Statistics (ASA)
- Journal of Statistics Education (ASA)
- Statistics and Public Policy (ASA)
- Journal of Survey Statistics and Methodology (AAPOR and ASA)
- Journal of Educational and Behavioral Statistics (ASA)
- British Journal of Mathematical and Statistical Psychology (Wiley)
- Statistics Surveys (IMS)
- Stata Journal (StataCorp)
- The R Journal (R Project)
- Structural Equation Modeling: A Multidisciplinary Journal (Routledge)
- Journal of Business & Economic Statistics (ASA)
- Journal of Marketing Research (AMA)
- Journal of Computational and Graphical Statistics (ASA)
- Journal of Artificial General Intelligence (AGIS)
These are not purely theoretical publications and provide plenty of examples I can adapt for my own work. I try to read them as regularly as I can.
There's so much innovation happening in analytics that it's hard to keep up!
✴️ @AI_Python_EN
Don't stop sharing, done is better than perfect
For people who actively continue to blame, condemn and complain online, especially when reacting to content containing statistics, programming and machine learning that has been simplified, look for value in the imperfections of others.
We both know that machine learning models will never be perfect, as George P.Box said, "there are no perfect models, but some are useful". As with the content mentioned above, there are often reduced details to facilitate understanding, actionability, business value and expand the spread of knowledge.
Not all of us will face cases that are on each topic of the content mentioned above, but if we know in part, we can get the opportunity to work on a better process, even helping people.
Don't stop sharing, done is better than perfect
#programming #statistics #machinelearning
✴️ @AI_Python_EN
For people who actively continue to blame, condemn and complain online, especially when reacting to content containing statistics, programming and machine learning that has been simplified, look for value in the imperfections of others.
We both know that machine learning models will never be perfect, as George P.Box said, "there are no perfect models, but some are useful". As with the content mentioned above, there are often reduced details to facilitate understanding, actionability, business value and expand the spread of knowledge.
Not all of us will face cases that are on each topic of the content mentioned above, but if we know in part, we can get the opportunity to work on a better process, even helping people.
Don't stop sharing, done is better than perfect
#programming #statistics #machinelearning
✴️ @AI_Python_EN
Machine Learning (ML) & Artificial Intelligence (AI): From Black Box to White Box Models in 4 Steps - Resources for Explainable AI & ML Model Interpretability.
✔️STEP 1 - ARTICLES
- (short) KDnuggets article: https://lnkd.in/eRyTXcQ
- (long) O'Reilly article: https://lnkd.in/ehMHYsr
✔️STEP 2 - BOOKS
- Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (free e-book): https://lnkd.in/eUWfa5y
- An Introduction to Machine Learning Interpretability: An Applied Perspective on Fairness, Accountability, Transparency, and Explainable AI (free e-book): https://lnkd.in/dJm595N
✔️STEP 3 - COLLABORATE
- Join Explainable AI (XAI) Group: https://lnkd.in/dQjmhZQ
✔️STEP 4 - PRACTICE
- Hands-On Practice: Open-Source Tools & Tutorials for ML Interpretability (Python/R): https://lnkd.in/d5bXgV7
- Python Jupyter Notebooks: https://lnkd.in/dETegUH
#machinelearning #datascience #analytics #bigdata #statistics #artificialintelligence #ai #datamining #deeplearning #neuralnetworks #interpretability #science #research #technology #business #healthcare
✴️ @AI_Python_EN
✔️STEP 1 - ARTICLES
- (short) KDnuggets article: https://lnkd.in/eRyTXcQ
- (long) O'Reilly article: https://lnkd.in/ehMHYsr
✔️STEP 2 - BOOKS
- Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (free e-book): https://lnkd.in/eUWfa5y
- An Introduction to Machine Learning Interpretability: An Applied Perspective on Fairness, Accountability, Transparency, and Explainable AI (free e-book): https://lnkd.in/dJm595N
✔️STEP 3 - COLLABORATE
- Join Explainable AI (XAI) Group: https://lnkd.in/dQjmhZQ
✔️STEP 4 - PRACTICE
- Hands-On Practice: Open-Source Tools & Tutorials for ML Interpretability (Python/R): https://lnkd.in/d5bXgV7
- Python Jupyter Notebooks: https://lnkd.in/dETegUH
#machinelearning #datascience #analytics #bigdata #statistics #artificialintelligence #ai #datamining #deeplearning #neuralnetworks #interpretability #science #research #technology #business #healthcare
✴️ @AI_Python_EN
#Statistics has many uses but, fundamentally, it's a systematic way of dealing uncertainty. When something is certain, there is no need to bring in a statistician or ask anyone for their council.
Since we're concerned with uncertainty, statisticians approach questions probabilistically. To conclude that something is likely to be true does not mean we're claiming it IS true, only that it's more likely to be true than not.
We may estimate this probability as being very high but, again, this is not saying the #probability is perfect (1.0).
Statisticians also think in terms of conditional probabilities, which means we've estimated the probability after having taken other information into account.
For instance, we might estimate the probability of a person buying a certain type of product within the next three months as 0.7 because he is a 25 year-old male. This estimate may have been made with a statistical model and data from thousands or millions of other consumers. For a 55 year-old woman our estimate might be 0.15.
Part of the challenge of being a statistician is that decision-makers often come to us for definitive yes-or-no answers. They can become irritated when we ask for more information or give them very qualified recommendations.
It ain't just math and programming!
tips: If someone says, for example, that A is not the only possible explanation for something and that B, C, or D are other possibilities, a common reaction is for the other party to conclude the first person is saying A is NOT a possible explanation. Humans are funny people.
✴️ @AI_Python_EN
Since we're concerned with uncertainty, statisticians approach questions probabilistically. To conclude that something is likely to be true does not mean we're claiming it IS true, only that it's more likely to be true than not.
We may estimate this probability as being very high but, again, this is not saying the #probability is perfect (1.0).
Statisticians also think in terms of conditional probabilities, which means we've estimated the probability after having taken other information into account.
For instance, we might estimate the probability of a person buying a certain type of product within the next three months as 0.7 because he is a 25 year-old male. This estimate may have been made with a statistical model and data from thousands or millions of other consumers. For a 55 year-old woman our estimate might be 0.15.
Part of the challenge of being a statistician is that decision-makers often come to us for definitive yes-or-no answers. They can become irritated when we ask for more information or give them very qualified recommendations.
It ain't just math and programming!
tips: If someone says, for example, that A is not the only possible explanation for something and that B, C, or D are other possibilities, a common reaction is for the other party to conclude the first person is saying A is NOT a possible explanation. Humans are funny people.
✴️ @AI_Python_EN
As the author states: "work in process and even in an early dirty phase"
But still very cool 🙂
Book: Predictive Models: Visual Exploration, Explanation and Debugging With examples in R and Python By Przemyslaw Biecek
#book #datascience #machinelearning #statistics #programming_language
🌎 Book
✴️ @AI_Python_EN
But still very cool 🙂
Book: Predictive Models: Visual Exploration, Explanation and Debugging With examples in R and Python By Przemyslaw Biecek
#book #datascience #machinelearning #statistics #programming_language
🌎 Book
✴️ @AI_Python_EN
share knowledge on one of basic topic in Statistics and Machine Learning.
"Assumptions of Linear Regression"
Understanding the assumptions is very important for anybody to build a robust model and improve the performance.
#machinelearning #AIML #statistics #artificialintelligence
https://lnkd.in/eJupcDZ
✴️ @AI_Python_EN
"Assumptions of Linear Regression"
Understanding the assumptions is very important for anybody to build a robust model and improve the performance.
#machinelearning #AIML #statistics #artificialintelligence
https://lnkd.in/eJupcDZ
✴️ @AI_Python_EN
There are now many methods we can use when our dependent variable is not continuous. SVM, XGBoost and Random Forests are some popular ones.
There are also "traditional" methods, such as Logistic Regression. These usually scale well and, when used properly, are competitive in terms of predictive accuracy.
They are probabilistic models, which gives them additional flexibility. They also are often easier to interpret, critical when the goal is explanation, not just prediction.
They can be more work, however, and are probably easier to misuse than newer methods such as Random Forests. Here are some excellent books on these methods that may be of interest:
- Categorical Data Analysis (Agresti)
- Analyzing Categorical Data (Simonoff)
- Regression Models for Categorical Dependent Variables (Long and Freese)
- Generalized Linear Models and Extensions (Hardin and Hilbe)
- Regression Modeling Strategies (Harrell)
- Applied Logistic Regression (Hosmer and Lemeshow)
- Logistic Regression Models (Hilbe)
- Analysis of Ordinal Categorical Data (Agresti)
- Applied Ordinal Logistic Regression (Liu)
- Modeling Count Data (Hilbe)
- Negative Binomial Regression (Hilbe)
- Handbook of Survival Analysis (Klein et al.)
- Survival Analysis: A Self-Learning Text (Kleinbaum and Klein)
#statistics #book #Machinelearning
✴️ @AI_Python
There are also "traditional" methods, such as Logistic Regression. These usually scale well and, when used properly, are competitive in terms of predictive accuracy.
They are probabilistic models, which gives them additional flexibility. They also are often easier to interpret, critical when the goal is explanation, not just prediction.
They can be more work, however, and are probably easier to misuse than newer methods such as Random Forests. Here are some excellent books on these methods that may be of interest:
- Categorical Data Analysis (Agresti)
- Analyzing Categorical Data (Simonoff)
- Regression Models for Categorical Dependent Variables (Long and Freese)
- Generalized Linear Models and Extensions (Hardin and Hilbe)
- Regression Modeling Strategies (Harrell)
- Applied Logistic Regression (Hosmer and Lemeshow)
- Logistic Regression Models (Hilbe)
- Analysis of Ordinal Categorical Data (Agresti)
- Applied Ordinal Logistic Regression (Liu)
- Modeling Count Data (Hilbe)
- Negative Binomial Regression (Hilbe)
- Handbook of Survival Analysis (Klein et al.)
- Survival Analysis: A Self-Learning Text (Kleinbaum and Klein)
#statistics #book #Machinelearning
✴️ @AI_Python
#Statistics such as correlation, mean and standard deviation (variance) create strong visual images and meaning. Two different #datasets with the same correlation would sort of look the same. Right?
Not so much.
Each of these very different-looking graphs are plotting datasets with the same correlation, mean and SD. This is why plotting data is so important though oddly so rarely (in my expereince) done.
https://bit.ly/2oZ29MP
✴️ @AI_Python_EN
Not so much.
Each of these very different-looking graphs are plotting datasets with the same correlation, mean and SD. This is why plotting data is so important though oddly so rarely (in my expereince) done.
https://bit.ly/2oZ29MP
✴️ @AI_Python_EN
The field of statistics has very long history, dating back to ancient times.
Much of marketing data science can be traced to the origins of actuarial science, demography, sociology and psychology, with early statisticians playing major roles in all of these fields.
Big is relative, and statisticians have been working with "big data" all along. "Machine learners" such as SVM and random forests originated in statistics, and neural nets were inspired as much by regression as by theories of the human brain.
Statisticians are involved in a diverse range of fields, including marketing, psychology, pharmacology, economics, meteorology, political science and ecology, and have helped developed research methods and analytics for nearly any kind of data.
The history and richness of #statistics is not always appreciated, though. For example, this morning I was asked "How's your #machinelearning?" :-)
✴️ @AI_Python_EN
Much of marketing data science can be traced to the origins of actuarial science, demography, sociology and psychology, with early statisticians playing major roles in all of these fields.
Big is relative, and statisticians have been working with "big data" all along. "Machine learners" such as SVM and random forests originated in statistics, and neural nets were inspired as much by regression as by theories of the human brain.
Statisticians are involved in a diverse range of fields, including marketing, psychology, pharmacology, economics, meteorology, political science and ecology, and have helped developed research methods and analytics for nearly any kind of data.
The history and richness of #statistics is not always appreciated, though. For example, this morning I was asked "How's your #machinelearning?" :-)
✴️ @AI_Python_EN
Sampling is a deceptively complex subject, and some academic statisticians have devoted the bulk of their careers to it.
It's not a subject that thrills everyone but is a very important one, and one which seems underappreciated in marketing research and #data science.
Here are some books on or related to sampling I've found helpful:
- Survey Sampling (Kish)
- Sampling Techniques (Cochran)
- Model Assisted Survey Sampling (Särndal et al.)
- Sampling: Design and Analysis (Lohr)
- Practical Tools for Designing and Weighting Survey Samples (Valliant et al.)
- Survey Weights: A Step-by-step Guide to Calculation (Valliant and Dever)
- Complex Surveys (Lumley)
- Hard-to-Survey Populations (Tourangeau et al.)
- Small Area Estimation (Rao and Molina)
The first three are regarded as classics (though still relevant.) Sharon Lohr's book is the friendliest introduction I know of on this subject. Standard marketing research textbooks also give simple overviews of sampling but do not get into depth.
There are also academic journals that feature articles on sampling, such as the Public Opinion Quarterly (AAPOR) and the Journal of Survey #Statistics and Methodology (AAPOR and ASA).
✴️ @AI_Python_EN
It's not a subject that thrills everyone but is a very important one, and one which seems underappreciated in marketing research and #data science.
Here are some books on or related to sampling I've found helpful:
- Survey Sampling (Kish)
- Sampling Techniques (Cochran)
- Model Assisted Survey Sampling (Särndal et al.)
- Sampling: Design and Analysis (Lohr)
- Practical Tools for Designing and Weighting Survey Samples (Valliant et al.)
- Survey Weights: A Step-by-step Guide to Calculation (Valliant and Dever)
- Complex Surveys (Lumley)
- Hard-to-Survey Populations (Tourangeau et al.)
- Small Area Estimation (Rao and Molina)
The first three are regarded as classics (though still relevant.) Sharon Lohr's book is the friendliest introduction I know of on this subject. Standard marketing research textbooks also give simple overviews of sampling but do not get into depth.
There are also academic journals that feature articles on sampling, such as the Public Opinion Quarterly (AAPOR) and the Journal of Survey #Statistics and Methodology (AAPOR and ASA).
✴️ @AI_Python_EN
This is Your Brain on Code 🧠💻🔢 computer programming is often associated with math, but researchers used functional MRI scans to show the role of the brain's language processing centers: https://lnkd.in/eN_-3RA
#datascience #machinelearning #ai #bigdata #analytics #statistics #artificialintelligence #datamining #computing #programmers #neuroscience
✴️ @AI_Python_EN
#datascience #machinelearning #ai #bigdata #analytics #statistics #artificialintelligence #datamining #computing #programmers #neuroscience
✴️ @AI_Python_EN
Uncertainty in big data analytics: survey, opportunities, and challenges
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0206-3
#BigData #statistics #NLP
✴️ @AI_Python_EN
https://journalofbigdata.springeropen.com/articles/10.1186/s40537-019-0206-3
#BigData #statistics #NLP
✴️ @AI_Python_EN
#AI/ #DataScience/ #MachineLearning/ #ML:
7 Steps for Data Preparation Using #Python
Link => https://bit.ly/PyDataPrep
#datamining #statistics #bigdata #artificialintelligence
✴️ @AI_Python_EN
7 Steps for Data Preparation Using #Python
Link => https://bit.ly/PyDataPrep
#datamining #statistics #bigdata #artificialintelligence
✴️ @AI_Python_EN
Data in the Life: Authorship Attribution in Lennon-McCartney Songs", was just published in the first issue of the HARVARD DATA SCIENCE REVIEW, the inaugural publication of harvard datascience published by the mit press. Combining features of a premier research journal, a leading educational publication, and a popular magazine, HDSR leverages digital technologies and data visualizations to facilitate author-reader interactions globally. Besides our article, the first issue features articles on topics ranging from machine learning models for predicting drug approvals to artificial intelligence. Read it now:
https://bit.ly/2Kuze2q.
#datascience #bigdata #machinelearing #statistics #AI
✴️ @AI_Python_EN
https://bit.ly/2Kuze2q.
#datascience #bigdata #machinelearing #statistics #AI
✴️ @AI_Python_EN
Great Statistical software for Beginners.
Here is the Gretl Tutorial by Simone Gasperin
1)Simple Linear Regression
https://lnkd.in/ecfsV9c
2)Coding Dummy Variables
https://lnkd.in/ef7Yd7f
3)Forecasting New Observations
https://lnkd.in/eNKbxbU
4)Forecasting a Large Number of Observations
https://lnkd.in/eHmibGs
5)Logistic Regression
https://lnkd.in/eRfhQ87
6)Forecasting and Confusion Matrix
https://lnkd.in/eaqrFJr
7)Modeling and Forecasting Time Series Data
https://lnkd.in/e6fqKpF
8)Comparing Time Series Trend Models
https://lnkd.in/eKjEUAE
#datascience #machinelearning #statistics #dataanalytics #dataanalysis
✴️ @AI_Python_EN
Here is the Gretl Tutorial by Simone Gasperin
1)Simple Linear Regression
https://lnkd.in/ecfsV9c
2)Coding Dummy Variables
https://lnkd.in/ef7Yd7f
3)Forecasting New Observations
https://lnkd.in/eNKbxbU
4)Forecasting a Large Number of Observations
https://lnkd.in/eHmibGs
5)Logistic Regression
https://lnkd.in/eRfhQ87
6)Forecasting and Confusion Matrix
https://lnkd.in/eaqrFJr
7)Modeling and Forecasting Time Series Data
https://lnkd.in/e6fqKpF
8)Comparing Time Series Trend Models
https://lnkd.in/eKjEUAE
#datascience #machinelearning #statistics #dataanalytics #dataanalysis
✴️ @AI_Python_EN
1-point RANSAC for Circular Motion Estimation in Computed Tomography (CT)
https://deepai.org/publication/1-point-ransac-for-circular-motion-estimation-in-computed-tomography-ct
by Mikhail O. Chekanov et al.
#Statistics #Estimator
❇️ @AI_Python_EN
https://deepai.org/publication/1-point-ransac-for-circular-motion-estimation-in-computed-tomography-ct
by Mikhail O. Chekanov et al.
#Statistics #Estimator
❇️ @AI_Python_EN
What's the purpose of statistics?
"Do you think the purpose of existence is to pass out of existence is the purpose of existence?" - Ray Manzarek
The former Doors organist poses some fundamental questions to which definitive answers remain elusive. Happily, the purpose of statistics is easier to fathom since humans are its creator. Put simply, it is to enhance decision making.
These decisions could be those made by scientists, businesspeople, politicians and other government officials, by medical and legal professionals, or even by religious authorities. In informal ways, ordinary folks also use statistics to help make better decisions.
How does it do this?
One way is by providing basic information, such as how many, how much and how often. Stat in statistics is derived from the word state, as in nation state and, as it emerged as a formal discipline, describing nations quantitatively (e.g., population size, number of citizens working in manufacturing) became a fundamental purpose. Frequencies, means, medians and standard deviations are now familiar to anyone.
Often we must rely on samples to make inferences about our population of interest. From a consumer survey, for example, we might estimate mean annual household expenditures on snack foods. This is known as inferential statistics, and confidence intervals will be familiar to anyone who has taken an introductory course in statistics. So will methods such as t-tests and chi-squared tests which can be used to make population inferences about groups (e.g., are males more likely than females to eat pretzels?).
Another way statistics helps us make decisions is by exploring relationships among variables through the use of cross tabulations, correlations and data visualizations. Exploratory data analysis (EDA) can also take on more complex forms and draw upon methods such as principal components analysis, regression and cluster analysis. EDA is often used to develop hypotheses which will be assessed more rigorously in subsequent research.
These hypotheses are often causal in nature, for example, why some people avoid snacks. Randomized experiments are generally considered the best approach in causal analysis but are not always possible or appropriate; see Why experiment? for some more thoughts on this subject. Hypotheses can be further developed and refined, not simply tested through Null Hypothesis Significance Testing, though this has been traditionally frowned upon since we are using the same data for multiple purposes.
Many statisticians are actively involved in designing research, not merely using secondary data. This is a large subject but briefly summarized in Preaching About Primary Research.
Making classifications, predictions and forecasts is another traditional role of statistics. In a data science context, the first two are often called predictive analytics and employ methods such as random forests and standard (OLS) regression. Forecasting sales for the next year is a different matter and normally requires the use of time-series analysis. There is also unsupervised learning, which aims to find previously unknown patterns in unlabeled data. Using K-means clustering to partition consumer survey respondents into segments based on their attitudes is an example of this.
Quality control, operations research, what-if simulations and risk assessment are other areas where statistics play a key role. There are many others, as this page illustrates.
The fuzzy buzzy term analytics is frequently used interchangeably with statistics, an offense to which I also plead guilty.
"The best thing about being a statistician is that you get to play in everyone's backyard." - John Tukey
#ai #artificialintelligence #ml #statistics #bigdata #machinelearning
#datascience
❇️ @AI_Python_EN
"Do you think the purpose of existence is to pass out of existence is the purpose of existence?" - Ray Manzarek
The former Doors organist poses some fundamental questions to which definitive answers remain elusive. Happily, the purpose of statistics is easier to fathom since humans are its creator. Put simply, it is to enhance decision making.
These decisions could be those made by scientists, businesspeople, politicians and other government officials, by medical and legal professionals, or even by religious authorities. In informal ways, ordinary folks also use statistics to help make better decisions.
How does it do this?
One way is by providing basic information, such as how many, how much and how often. Stat in statistics is derived from the word state, as in nation state and, as it emerged as a formal discipline, describing nations quantitatively (e.g., population size, number of citizens working in manufacturing) became a fundamental purpose. Frequencies, means, medians and standard deviations are now familiar to anyone.
Often we must rely on samples to make inferences about our population of interest. From a consumer survey, for example, we might estimate mean annual household expenditures on snack foods. This is known as inferential statistics, and confidence intervals will be familiar to anyone who has taken an introductory course in statistics. So will methods such as t-tests and chi-squared tests which can be used to make population inferences about groups (e.g., are males more likely than females to eat pretzels?).
Another way statistics helps us make decisions is by exploring relationships among variables through the use of cross tabulations, correlations and data visualizations. Exploratory data analysis (EDA) can also take on more complex forms and draw upon methods such as principal components analysis, regression and cluster analysis. EDA is often used to develop hypotheses which will be assessed more rigorously in subsequent research.
These hypotheses are often causal in nature, for example, why some people avoid snacks. Randomized experiments are generally considered the best approach in causal analysis but are not always possible or appropriate; see Why experiment? for some more thoughts on this subject. Hypotheses can be further developed and refined, not simply tested through Null Hypothesis Significance Testing, though this has been traditionally frowned upon since we are using the same data for multiple purposes.
Many statisticians are actively involved in designing research, not merely using secondary data. This is a large subject but briefly summarized in Preaching About Primary Research.
Making classifications, predictions and forecasts is another traditional role of statistics. In a data science context, the first two are often called predictive analytics and employ methods such as random forests and standard (OLS) regression. Forecasting sales for the next year is a different matter and normally requires the use of time-series analysis. There is also unsupervised learning, which aims to find previously unknown patterns in unlabeled data. Using K-means clustering to partition consumer survey respondents into segments based on their attitudes is an example of this.
Quality control, operations research, what-if simulations and risk assessment are other areas where statistics play a key role. There are many others, as this page illustrates.
The fuzzy buzzy term analytics is frequently used interchangeably with statistics, an offense to which I also plead guilty.
"The best thing about being a statistician is that you get to play in everyone's backyard." - John Tukey
#ai #artificialintelligence #ml #statistics #bigdata #machinelearning
#datascience
❇️ @AI_Python_EN