progit.pdf
18 MB
Pro Git (Second Edition) - Everything you need to know about Git ๐๐ง
A complete and practical guide to Git version controlโfrom basics to advanced workflows. Ideal for developers at any level.
๐ Credit:
Shared for educational purposes only. All rights belong to the authors and publisher.
Happy learning & keep growing! ๐ก๐
โ๏ธ @TheInfinityAI
A complete and practical guide to Git version controlโfrom basics to advanced workflows. Ideal for developers at any level.
๐จโ๐ป Authors: Scott Chacon & Ben Straub
๐ข Publisher: Apress
๐ Online Git Book: Link
๐ Credit:
Shared for educational purposes only. All rights belong to the authors and publisher.
Happy learning & keep growing! ๐ก๐
โ๏ธ @TheInfinityAI
โค3๐ณ1
Article 22: Gaussian Mixture Models (GMM) and EM Algorithm ๐ฒ๐
In the last article, we studied K-Means. K-Means is Hard Clustering. Its mean one data point belongs to only one group. But in the real world, data is not always clear. A data point can have two groups at the same time. Gaussian Mixture Models (GMM) help us to solve this. GMM is Soft Clustering. It gives a probability for each group.
1. What is a Gaussian Distribution? ๐
GMM assumes the data comes from many Gaussian Distributions. We also call it as normal distribution or bell curve. Each group (cluster) has its own shape. To define a Gaussian shape, we need two things. Mean (ฮผ) and Covariance (ฮฃ). The mean is the center of the curve and the covariance is the width and direction of the curve.
๐(xโฃฮผ,ฮฃ) = (1 / ((2ฯ)โฝแดฐ/ยฒโพโฃฮฃโฃโฝยน/ยฒโพ))exp (โยฝ(x โ ฮผ)แตฮฃโปยน(x โ ฮผ))
2. How GMM Works: The EM Algorithm ๐
We dont know the center (ฮผ) or the width (ฮฃ) of the groups at the start. So we are using the Expectation-Maximization (EM) Algorithm. It works in a loop with two main steps.
Step 1: The E-Step (Expectation) ๐ง
In this step, machine calculates the responsibility. It asks, "what is the chance that this data point belongs to group A, group B, or group C?" We are using this formula to find the chance (ฮณ),
ฮณ(zโโ) = ((ฯโ๐(xโโฃฮผโ,ฮฃโ)) / (โโฑผแดทโโฯโฑผ๐(xโโฃฮผโฑผ,ฮฃโฑผ)))
Step 2: The M-Step (Maximization) ๐
Now the machine uses the chances from the E-Step to update the groups. It changes the center and the width to fit the data better. It tries to make the Log-Likelihood (the total fit) as high as possible.
3. Why is GMM better than K-Means? ๐ค๐
4. How to find the number of groups? (AIC and BIC) ๐
In K-Means, we are using the Elbow Method but in GMM, we are using AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion).
Therefore, we are choosing the number of groups that gives the lowest BIC score.
Summary ๐
GMM is a powerful tool for finding groups using probability. It uses the EM Algorithm to find the best centers and shapes for the data. Unlike K-Means it is very flexible and works well when groups overlap or have different shapes. In Article 23, we will discuss Hierarchical Clustering (building a tree of data). ๐ณโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
In the last article, we studied K-Means. K-Means is Hard Clustering. Its mean one data point belongs to only one group. But in the real world, data is not always clear. A data point can have two groups at the same time. Gaussian Mixture Models (GMM) help us to solve this. GMM is Soft Clustering. It gives a probability for each group.
1. What is a Gaussian Distribution? ๐
GMM assumes the data comes from many Gaussian Distributions. We also call it as normal distribution or bell curve. Each group (cluster) has its own shape. To define a Gaussian shape, we need two things. Mean (ฮผ) and Covariance (ฮฃ). The mean is the center of the curve and the covariance is the width and direction of the curve.
๐(xโฃฮผ,ฮฃ) = (1 / ((2ฯ)โฝแดฐ/ยฒโพโฃฮฃโฃโฝยน/ยฒโพ))exp (โยฝ(x โ ฮผ)แตฮฃโปยน(x โ ฮผ))
2. How GMM Works: The EM Algorithm ๐
We dont know the center (ฮผ) or the width (ฮฃ) of the groups at the start. So we are using the Expectation-Maximization (EM) Algorithm. It works in a loop with two main steps.
Step 1: The E-Step (Expectation) ๐ง
In this step, machine calculates the responsibility. It asks, "what is the chance that this data point belongs to group A, group B, or group C?" We are using this formula to find the chance (ฮณ),
ฮณ(zโโ) = ((ฯโ๐(xโโฃฮผโ,ฮฃโ)) / (โโฑผแดทโโฯโฑผ๐(xโโฃฮผโฑผ,ฮฃโฑผ)))
Step 2: The M-Step (Maximization) ๐
Now the machine uses the chances from the E-Step to update the groups. It changes the center and the width to fit the data better. It tries to make the Log-Likelihood (the total fit) as high as possible.
3. Why is GMM better than K-Means? ๐ค๐
โ Soft Assignment - we can get a percentage for each group. This is more useful for complex data.
โ Flexible Shapes - K-Means only finds circles but GMM can find ellipses (oval shapes) that point in any direction.
โ Covariance Types - we can choose different settings for the shape like spherical, diagonal and full.
4. How to find the number of groups? (AIC and BIC) ๐
In K-Means, we are using the Elbow Method but in GMM, we are using AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion).
โ These are scores that tell us if the model is good.
โ If we add too many groups, the model becomes too complex (Overfitting).
โ AIC and BIC give penalties for complexity.
Therefore, we are choosing the number of groups that gives the lowest BIC score.
Summary ๐
GMM is a powerful tool for finding groups using probability. It uses the EM Algorithm to find the best centers and shapes for the data. Unlike K-Means it is very flexible and works well when groups overlap or have different shapes. In Article 23, we will discuss Hierarchical Clustering (building a tree of data). ๐ณโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
Gaussian Mixture Models (GMM) and EM Algorithm ๐ฒ
โค2
Article 23: Hierarchical Clustering โ Building the Tree of Data ๐ณ๐ฒ
Hierarchical clustering is finding groups by building a hierarchy. Unlike K-Means, we do not need to choose the number of groups (K) at the beginning. In this, creates a tree of data called a Dendrogram.
1. The Core Logic: Agglomerative Clustering ๐
Most people use the Agglomerative (Bottom-Up) method for this,
2. Linkage Criteria: The Math of Merging ๐๐งฎ
Linkage is a method used in hierarchical clustering to define how the distance between two clusters is computed. It is based on the distances between the data points in those clusters. Instead of measuring the distance between individual points, linkage tells the algorithm to how to measure the distance between groups of points (clusters).
I. Single Linkage (Minimum Distance)
It measures the distance between the two closest points in two clusters. It can create long and thin clusters. We call it as the Chaining Effect.
II. Complete Linkage (Maximum Distance)
It measures the distance between the two furthest points in two clusters. It avoids chaining and creates compact and round clusters.
III. Average Linkage
It calculates the average distance between all pairs of points in two clusters.
IV. Wardโs Method
It does not just look at distance. It looks at the variance also. Ward's joins two clusters only if the total within-cluster variance stays as small as possible. It will create very clear and equally sized clusters. It is the mathematically strongest one for general data.
3. The Dendrogram Analysis ๐โ๏ธ
The dendrogram is a visual representation of the hierarchical clustering process. It is showing how clusters are formed step by step.
โ The vertical axis (height) represents the distance or dissimilarity at which clusters merge.
โ Clusters that merge at lower heights are more similar than clusters that merge at higher heights.
To decide the number of clusters, ๐ฏ
Now, the number of branches crossed by the horizontal cut is the value of K.
4. Cophenetic Correlation and Performance ๐งชโณ
We are using the Cophenetic Correlation Coefficient (c) to prove the tree's accuracy. It measures the correlation between the original distances of the data points and the distances where they join in the Dendrogram. If c > 0.75, tree is a good representation of the data. Hierarchical clustering is heavy for computers. Time complexity is O(n^2 log n) or O(n^3). It requires O(n^2) space to store the distance matrix. This means it is very slow for millions of rows. โ ๐พ
Summary ๐
Hierarchical Clustering helps to see the structure of data like a family tree. Use Wardโs Method for the best groups and the Dendrogram to pick the K value. Always check the Cophenetic Correlation to ensure the results are correct. ๐ ๐๐
More: Link
โ๏ธ @TheInfinityAI
Hierarchical clustering is finding groups by building a hierarchy. Unlike K-Means, we do not need to choose the number of groups (K) at the beginning. In this, creates a tree of data called a Dendrogram.
1. The Core Logic: Agglomerative Clustering ๐
Most people use the Agglomerative (Bottom-Up) method for this,
โ Every data point starts as its own small cluster.
โ The machine finds the two clusters that are closest together.
โ The machine joins (merges) them into one new cluster.
โ The machine updates the distance between the new cluster and all other clusters.
โ It repeats this until all data is in one big cluster.
2. Linkage Criteria: The Math of Merging ๐๐งฎ
Linkage is a method used in hierarchical clustering to define how the distance between two clusters is computed. It is based on the distances between the data points in those clusters. Instead of measuring the distance between individual points, linkage tells the algorithm to how to measure the distance between groups of points (clusters).
I. Single Linkage (Minimum Distance)
It measures the distance between the two closest points in two clusters. It can create long and thin clusters. We call it as the Chaining Effect.
II. Complete Linkage (Maximum Distance)
It measures the distance between the two furthest points in two clusters. It avoids chaining and creates compact and round clusters.
III. Average Linkage
It calculates the average distance between all pairs of points in two clusters.
IV. Wardโs Method
It does not just look at distance. It looks at the variance also. Ward's joins two clusters only if the total within-cluster variance stays as small as possible. It will create very clear and equally sized clusters. It is the mathematically strongest one for general data.
3. The Dendrogram Analysis ๐โ๏ธ
The dendrogram is a visual representation of the hierarchical clustering process. It is showing how clusters are formed step by step.
โ The vertical axis (height) represents the distance or dissimilarity at which clusters merge.
โ Clusters that merge at lower heights are more similar than clusters that merge at higher heights.
To decide the number of clusters, ๐ฏ
โ Identify the largest vertical gap in the dendrogram (a region with a big jump in height where no merges occur).
โ Draw a horizontal line across the dendrogram within this gap.
โ Count the number of vertical branches intersected by the line.
Now, the number of branches crossed by the horizontal cut is the value of K.
4. Cophenetic Correlation and Performance ๐งชโณ
We are using the Cophenetic Correlation Coefficient (c) to prove the tree's accuracy. It measures the correlation between the original distances of the data points and the distances where they join in the Dendrogram. If c > 0.75, tree is a good representation of the data. Hierarchical clustering is heavy for computers. Time complexity is O(n^2 log n) or O(n^3). It requires O(n^2) space to store the distance matrix. This means it is very slow for millions of rows. โ ๐พ
Summary ๐
Hierarchical Clustering helps to see the structure of data like a family tree. Use Wardโs Method for the best groups and the Dendrogram to pick the K value. Always check the Cophenetic Correlation to ensure the results are correct. ๐ ๐๐
More: Link
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค3
Article 24: Association Rule Learning โ Finding Hidden Patterns ๐๐
Association Rule Learning is a rule based machine learning method for discovering interesting relations between variables in large databases. It is famous for Market Basket Analysis. For example, if a customer buys bread and butter, they are also likely to buy milk.
1. Core Concepts ๐๐งฎ
To find a good rule we use three main mathematical measurements,
2. The Apriori Algorithm (Level 1) ๐ข
Apriori is one of the widely used algorithms for association rule mining. It is designed to identify frequent itemsets in a transactional dataset.. It is using a bottom-up approach. In this logic, it assumes that if an itemset is frequent, all its subsets must also be frequent. If an itemset is infrequent, all its supersets will also be infrequent (we call this Pruning).
In process, it will finds all individual items with support higher than a minimum threshold. Then, it will joins these items to create pairs (itemsets of size 2) and check their support. Repeats this for triplets (size 3) and larger sets until no more frequent sets are found.
3. FP-Growth Algorithm (Advanced Frequent Pattern Mining) ๐๐ณ
Apriori is slow because it scans the whole database many times. FP-Growth (Frequent Pattern Growth) is advanced algorithm used to discover frequent itemsets more efficiently than Apriori.
โ Advantages of FP-Growth,
โ How FP-Growth Works
4. Advanced Rule Evaluation Metrics (Beyond Lift) ๐ง โ๏ธ
When evaluating association rules professionally, Lift alone is not enough always. Researchers figure on additional metrics like Conviction and Leverage to better understand the strength and usefulness of relationships between items. Because some rules can look strong statistically, but still misleading in practice.
Summary ๐
Association Rule Learning helps to find connections (like If A, then B). Apriori is a classic method that uses Pruning to save time. FP-Growth is a advanced choice uses a FP-Tree. Most probably, it can be faster. We are using Support, Confidence and Lift to decide if a rule is strong or just a coincidence.โจ ๐๐. In the next article (Article 25), we will discuss about Anomaly Detection (Isolation Forest, LOF, & One-Class SVM). โ โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Association Rule Learning is a rule based machine learning method for discovering interesting relations between variables in large databases. It is famous for Market Basket Analysis. For example, if a customer buys bread and butter, they are also likely to buy milk.
1. Core Concepts ๐๐งฎ
To find a good rule we use three main mathematical measurements,
โ Support - This shows how popular an itemset is in the whole dataset.
โ Support(A) = ((Number of transactions containing A) / (Total number of transactions))
โ Confidence - This shows how likely item B is purchased when item A is purchased.
โ Confidence(A โ B) = ((Support(A,B)) / (Support(A)))
โ Lift -This shows the strength of the rule. If Lift is greater than 1, B is likely to be purchased if A is purchased. If Lift is 1, there is no relationship.
โ Lift(A โ B) = ((Support(A,B)) / (Support(A) ร Support(B)))
2. The Apriori Algorithm (Level 1) ๐ข
Apriori is one of the widely used algorithms for association rule mining. It is designed to identify frequent itemsets in a transactional dataset.. It is using a bottom-up approach. In this logic, it assumes that if an itemset is frequent, all its subsets must also be frequent. If an itemset is infrequent, all its supersets will also be infrequent (we call this Pruning).
In process, it will finds all individual items with support higher than a minimum threshold. Then, it will joins these items to create pairs (itemsets of size 2) and check their support. Repeats this for triplets (size 3) and larger sets until no more frequent sets are found.
3. FP-Growth Algorithm (Advanced Frequent Pattern Mining) ๐๐ณ
Apriori is slow because it scans the whole database many times. FP-Growth (Frequent Pattern Growth) is advanced algorithm used to discover frequent itemsets more efficiently than Apriori.
โ Advantages of FP-Growth,
โ It only scans the database twice.
โ It stores the data in a special tree structure called an FP-Tree.
โ After that, the algorithm works mostly with the tree in memory.
โ How FP-Growth Works
Step 1 โ Build the FP-Tree
โ Removes infrequent items.
โ Sorts remaining items by frequency.
โ Inserts transactions into tree so shared prefixes overlap.
Step 2 โ Mine the Tree (Divide-and-Conquer)
โ Starts from the least frequent items
โ Builds a Conditional FP-Tree for each item
โ Recursively extracts frequent patterns
This is why it is called Frequent Pattern Growth. Patterns grow from smaller conditional structures.
4. Advanced Rule Evaluation Metrics (Beyond Lift) ๐ง โ๏ธ
When evaluating association rules professionally, Lift alone is not enough always. Researchers figure on additional metrics like Conviction and Leverage to better understand the strength and usefulness of relationships between items. Because some rules can look strong statistically, but still misleading in practice.
โ Conviction โ Measuring Rule Reliability
Conviction measures how strongly a rule depends on the relationship between A and B by comparing it to a scenario where they are independent.
โ Leverage โ Measuring True Co-Occurrence Gain
Leverage measures how much more often A and B occur together than we would expect if they were independent.
Summary ๐
Association Rule Learning helps to find connections (like If A, then B). Apriori is a classic method that uses Pruning to save time. FP-Growth is a advanced choice uses a FP-Tree. Most probably, it can be faster. We are using Support, Confidence and Lift to decide if a rule is strong or just a coincidence.โจ ๐๐. In the next article (Article 25), we will discuss about Anomaly Detection (Isolation Forest, LOF, & One-Class SVM). โ โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค2๐1
Article 25: Anomaly Detection โ Finding the Hidden Outliers ๐๐จ
Anomaly Detection is the process of identifying rare items, events or observations that significantly differ from the majority of the data. It is like a security guard for Machine Learning.
1. What is an Anomaly? ๐ค
In any dataset most data points follow a predictable pattern. We call these normal observations. But some points differ significantly from this behaviour. We call them as anomalies or outliers. we can identify 3 types of anomalies,
2. Isolation Forest ๐ฒโ๏ธ
This is the most popular algorithm for high dimensional data. Most algorithms try to learn what is Normal to find Abnormal. Isolation Forest does the opposite. It tries to isolate every point. Normal points are in crowded areas and anomalies are in lonely areas.
s(x,n) = 2โฝโปโฝโฝแดฑโฝสฐโฝหฃโพโพโพ/โฝแถโฝโฟโพโพโพโพ
If the score is close to 1, it is an anomaly. If the score is much less than 0.5, it is a normal point.
3. Local Outlier Factor (LOF) ๐ก๐
LOF is a Density-Based algorithm. It works on the idea that an anomaly is often located in a low density region compared to its neighbours. Thw work flow is,
If a pointโs density is much lower than its neighbors, its LOF score will be high (> 1) so marking it as an outlier.
4. One-Class SVM ๐ก๐งฑ
This is an extension of the Support Vector Machine we discussed earlier. While a standard SVM separates two classes (A vs B), a One-Class SVM learns the boundary of only one class - the normal class.
5. Evaluation Metrics ๐๐
In Anomaly Detection we cannot use accuracy because anomalies are very rare. If the model says "Everything is Normal", it will have 99% accuracy but fail 100% of its job.
Summary ๐
Anomaly Detection identifies strange points that differ from the majority. Isolation Forest uses trees to isolate outliers quickly. LOF looks for points in low density areas. One Class SVM builds a wall around normal data. We use Precision and Recall instead of Accuracy to measure success. โจ ๐๐. In the next article (Article 26), we begin Phase 7: Ensemble Methods, starting with Bagging and Random Forests. ๐ณโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Anomaly Detection is the process of identifying rare items, events or observations that significantly differ from the majority of the data. It is like a security guard for Machine Learning.
1. What is an Anomaly? ๐ค
In any dataset most data points follow a predictable pattern. We call these normal observations. But some points differ significantly from this behaviour. We call them as anomalies or outliers. we can identify 3 types of anomalies,
โ Point Anomalies (Global Outliers) - A single observation is so far from the other data in the dataset (One data point is abnormal compared to the entire dataset).
Example - a bank account typically showing transactions around $10. But suddenly recording a $1,000,000 of transaction. It immediately stands out.
โ Contextual Anomalies (Conditional Outliers) - A data point is normal in a general context. but becomes anomalous when studied within a specific context like time, location or user behaviour.
Example - 30ยฐC is normal during the summer season but the same temperature is highly unusual in winter (same value, but provides completely different ideas based on the context).
โ Collective Anomalies - in this, individual data points look normal but a sequence or group of those same data points sometimes shows an unusual pattern.
Example - One failed login attempt is normal but hundreds of failed logins within seconds are highly suspicious.
2. Isolation Forest ๐ฒโ๏ธ
This is the most popular algorithm for high dimensional data. Most algorithms try to learn what is Normal to find Abnormal. Isolation Forest does the opposite. It tries to isolate every point. Normal points are in crowded areas and anomalies are in lonely areas.
s(x,n) = 2โฝโปโฝโฝแดฑโฝสฐโฝหฃโพโพโพ/โฝแถโฝโฟโพโพโพโพ
If the score is close to 1, it is an anomaly. If the score is much less than 0.5, it is a normal point.
3. Local Outlier Factor (LOF) ๐ก๐
LOF is a Density-Based algorithm. It works on the idea that an anomaly is often located in a low density region compared to its neighbours. Thw work flow is,
โ K-Distance - For each point, compute the distance to its kth nearest neighbor.
โ Local Reachability Density (LRD) - Now we estimate how crowded the area is.
โ Compute the LOF Score - It compares the LRD of a point to the LRD of its neighbours.
If a pointโs density is much lower than its neighbors, its LOF score will be high (> 1) so marking it as an outlier.
4. One-Class SVM ๐ก๐งฑ
This is an extension of the Support Vector Machine we discussed earlier. While a standard SVM separates two classes (A vs B), a One-Class SVM learns the boundary of only one class - the normal class.
5. Evaluation Metrics ๐๐
In Anomaly Detection we cannot use accuracy because anomalies are very rare. If the model says "Everything is Normal", it will have 99% accuracy but fail 100% of its job.
โ Precision-Recall Curve - To see how many true anomalies we detect vs false alarms.
โ F1-Score - The balance between finding all anomalies and not being too sensitive. F1 score combines precision and recall into a single number.
Summary ๐
Anomaly Detection identifies strange points that differ from the majority. Isolation Forest uses trees to isolate outliers quickly. LOF looks for points in low density areas. One Class SVM builds a wall around normal data. We use Precision and Recall instead of Accuracy to measure success. โจ ๐๐. In the next article (Article 26), we begin Phase 7: Ensemble Methods, starting with Bagging and Random Forests. ๐ณโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค1
An Agentic-AI Social Platform Moltbook is currently Hitting Major Security Flaws โ๏ธโ ๏ธ
An experimental social network for AI agents Moltbook (A Human-Restricted Platform), exposed its entire production database through publicly accessible unauthenticated APIs, revealing user secrets and personally identifiable information.
Moltbook is strongly highlighting the fast rise of agentic AI and the immediate need for stronger security before such platforms scale further.
@TheInfinityAI
An experimental social network for AI agents Moltbook (A Human-Restricted Platform), exposed its entire production database through publicly accessible unauthenticated APIs, revealing user secrets and personally identifiable information.
๐ Researchers are quickly discovering the vulnerability and warned that the platformโs design creates broader risks including bot hijacking and malicious activity.
๐ค The site allows anyone to launch AI agent-based bots to interact with others. Due to a lack of rate limiting, the platform currently has over 1 million AI-agents.
๐จ Experts say that the concept is not yet ready for production due to risky boundaries and threats like large-scale prompt injection attacks that fall across agent networks.
Moltbook is strongly highlighting the fast rise of agentic AI and the immediate need for stronger security before such platforms scale further.
@TheInfinityAI
โค1
Article 26: Bagging and Random Forests โ Strength in Numbers ๐ฒ๐ฒ๐ฒ
The word Bagging comes from Bootstrap Aggregating. It is a technique used to reduce the overfitting of a model, especially Decision Trees. ๐ ๐
1. How Bagging Works? ๐ค
Imagine you have a complex problem. Instead of asking one expert, you ask 100 people. But, to ensure they don't all say the same thing, you give each person a slightly different set of information.
2. Random Forest
A Random Forest is an ensemble of many Decision Trees. It is better than a simple Bagging model because it adds a second layer of randomness.
3. Out of Bag Error (OOB Error) ๐งฌ
One of the things about Random Forests is that we don't need a separate validation set to test the model. Because of Bootstrapping, 36.8% of the data is left out of each tree's training. This is called as Out of Bag data. The machine can test each tree on its own OOB data to calculate an accuracy score. The OOB error is a very good estimate of how the model will perform on real-world with unseen data.
4. Feature Importance
Random Forests are not black boxes. They can tell us which features are the most important for making a prediction. The machine calculates how much the Gini Impurity or Entropy decreases when a specific feature is used. If a feature consistently reduces impurity across all 100 trees, it gets a high Importance Score.
Summary ๐
Bagging combines multiple models to reduce error. Random Forest is a collection of Decision Trees that uses Bootstrapping and Feature Randomness to be incredibly accurate and stable. It is one of the most reliable Go-To algorithms for any tabular data project. โจ ๐๐In the next article (Article 27) we will discuss the opposite of Bagging.๐โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
The word Bagging comes from Bootstrap Aggregating. It is a technique used to reduce the overfitting of a model, especially Decision Trees. ๐ ๐
1. How Bagging Works? ๐ค
Imagine you have a complex problem. Instead of asking one expert, you ask 100 people. But, to ensure they don't all say the same thing, you give each person a slightly different set of information.
Bootstrapping - the machine creates multiple subsets of the original data. It does this by sampling with replacement. This means some data points appear multiple times in one subset while others do not appear at all.
Parallel Training - we train a separate model (usually a Decision Tree) on each subset simultaneously. Aggregating - to make a final prediction, machine combines the results of all models. For Classification, It uses majority voting. For Regression, It uses the average of all predictions.
2. Random Forest
A Random Forest is an ensemble of many Decision Trees. It is better than a simple Bagging model because it adds a second layer of randomness.
The Math & Logic,
In a normal Decision Tree, the machine looks at all features to find the best split. In a Random Forest, for every split, the machine only looks at a random subset of features. This prevents one very strong feature from dominating every tree. It forces the trees to be different. It makes the final forest much more stable and accurate.
3. Out of Bag Error (OOB Error) ๐งฌ
One of the things about Random Forests is that we don't need a separate validation set to test the model. Because of Bootstrapping, 36.8% of the data is left out of each tree's training. This is called as Out of Bag data. The machine can test each tree on its own OOB data to calculate an accuracy score. The OOB error is a very good estimate of how the model will perform on real-world with unseen data.
4. Feature Importance
Random Forests are not black boxes. They can tell us which features are the most important for making a prediction. The machine calculates how much the Gini Impurity or Entropy decreases when a specific feature is used. If a feature consistently reduces impurity across all 100 trees, it gets a high Importance Score.
Summary ๐
Bagging combines multiple models to reduce error. Random Forest is a collection of Decision Trees that uses Bootstrapping and Feature Randomness to be incredibly accurate and stable. It is one of the most reliable Go-To algorithms for any tabular data project. โจ ๐๐In the next article (Article 27) we will discuss the opposite of Bagging.๐โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค1๐ฅ1
Get a special 50% discount on Revo Uninstaller Pro for your Valentine's Day and completely uninstall your girlfriend.exe ๐๐๐
โ๏ธ @TheInfinityAI
Life hack โก๏ธ: Free up 90% of your stress by removing unnecessary background processes. ๐ง ๐ง๐
โ๏ธ @TheInfinityAI
โค3