Machine Learning
39.3K subscribers
3.87K photos
32 videos
42 files
1.31K links
Machine learning insights, practical tutorials, and clear explanations for beginners and aspiring data scientists. Follow the channel for models, algorithms, coding guides, and real-world ML applications.

Admin: @HusseinSheikho || @Hussein_Sheikho
Download Telegram
Topic: Python SciPy – From Easy to Top: Part 5 of 6: Working with SciPy Statistics

---

1. Introduction to `scipy.stats`

• The scipy.stats module contains a large number of probability distributions and statistical functions.
• You can perform tasks like descriptive statistics, hypothesis testing, sampling, and fitting distributions.

---

2. Descriptive Statistics

Use these functions to summarize and describe data characteristics:

from scipy import stats
import numpy as np

data = [2, 4, 4, 4, 5, 5, 7, 9]

mean = np.mean(data)
median = np.median(data)
mode = stats.mode(data, keepdims=True)
std_dev = np.std(data)

print("Mean:", mean)
print("Median:", median)
print("Mode:", mode.mode[0])
print("Standard Deviation:", std_dev)


---

3. Probability Distributions

SciPy has built-in continuous and discrete distributions such as normal, binomial, Poisson, etc.

Normal Distribution Example

from scipy.stats import norm

# PDF at x = 0
print("PDF at 0:", norm.pdf(0, loc=0, scale=1))

# CDF at x = 1
print("CDF at 1:", norm.cdf(1, loc=0, scale=1))

# Generate 5 random numbers
samples = norm.rvs(loc=0, scale=1, size=5)
print("Random Samples:", samples)


---

4. Hypothesis Testing

One-sample t-test – test if the mean of a sample is equal to a known value:

sample = [5.1, 5.3, 5.5, 5.7, 5.9]
t_stat, p_val = stats.ttest_1samp(sample, popmean=5.0)

print("T-statistic:", t_stat)
print("P-value:", p_val)


Interpretation: If the p-value is less than 0.05, reject the null hypothesis.

---

5. Two-sample t-test

Test if two samples come from populations with equal means:

group1 = [20, 22, 19, 24, 25]
group2 = [28, 27, 26, 30, 31]

t_stat, p_val = stats.ttest_ind(group1, group2)

print("T-statistic:", t_stat)
print("P-value:", p_val)


---

6. Chi-Square Test for Independence

Use to test independence between two categorical variables:

# Example contingency table
data = [[10, 20], [20, 40]]
chi2, p, dof, expected = stats.chi2_contingency(data)

print("Chi-square statistic:", chi2)
print("P-value:", p)


---

7. Correlation and Covariance

Measure linear relationship between variables:

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

corr, _ = stats.pearsonr(x, y)
print("Pearson Correlation Coefficient:", corr)


Covariance:

cov_matrix = np.cov(x, y)
print("Covariance Matrix:\n", cov_matrix)


---

8. Fitting Distributions to Data

You can fit a distribution to real-world data:

data = np.random.normal(loc=50, scale=10, size=1000)
params = norm.fit(data) # returns mean and std dev

print("Fitted mean:", params[0])
print("Fitted std dev:", params[1])


---

9. Sampling from Distributions

Generate random numbers from different distributions:

# Binomial distribution
samples = stats.binom.rvs(n=10, p=0.5, size=10)
print("Binomial Samples:", samples)

# Poisson distribution
samples = stats.poisson.rvs(mu=3, size=10)
print("Poisson Samples:", samples)


---

10. Summary

scipy.stats is a powerful tool for statistical analysis.
• You can compute summaries, perform tests, model distributions, and generate random samples.

---

Exercise

• Generate 1000 samples from a normal distribution and compute mean, median, std, and mode.
• Test if a sample has a mean significantly different from 5.
• Fit a normal distribution to your own dataset and plot the histogram with the fitted PDF curve.

---

#Python #SciPy #Statistics #HypothesisTesting #DataAnalysis

https://t.me/DataScienceM
4