Article 26: Bagging and Random Forests โ Strength in Numbers ๐ฒ๐ฒ๐ฒ
The word Bagging comes from Bootstrap Aggregating. It is a technique used to reduce the overfitting of a model, especially Decision Trees. ๐ ๐
1. How Bagging Works? ๐ค
Imagine you have a complex problem. Instead of asking one expert, you ask 100 people. But, to ensure they don't all say the same thing, you give each person a slightly different set of information.
2. Random Forest
A Random Forest is an ensemble of many Decision Trees. It is better than a simple Bagging model because it adds a second layer of randomness.
3. Out of Bag Error (OOB Error) ๐งฌ
One of the things about Random Forests is that we don't need a separate validation set to test the model. Because of Bootstrapping, 36.8% of the data is left out of each tree's training. This is called as Out of Bag data. The machine can test each tree on its own OOB data to calculate an accuracy score. The OOB error is a very good estimate of how the model will perform on real-world with unseen data.
4. Feature Importance
Random Forests are not black boxes. They can tell us which features are the most important for making a prediction. The machine calculates how much the Gini Impurity or Entropy decreases when a specific feature is used. If a feature consistently reduces impurity across all 100 trees, it gets a high Importance Score.
Summary ๐
Bagging combines multiple models to reduce error. Random Forest is a collection of Decision Trees that uses Bootstrapping and Feature Randomness to be incredibly accurate and stable. It is one of the most reliable Go-To algorithms for any tabular data project. โจ ๐๐In the next article (Article 27) we will discuss the opposite of Bagging.๐โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
The word Bagging comes from Bootstrap Aggregating. It is a technique used to reduce the overfitting of a model, especially Decision Trees. ๐ ๐
1. How Bagging Works? ๐ค
Imagine you have a complex problem. Instead of asking one expert, you ask 100 people. But, to ensure they don't all say the same thing, you give each person a slightly different set of information.
Bootstrapping - the machine creates multiple subsets of the original data. It does this by sampling with replacement. This means some data points appear multiple times in one subset while others do not appear at all.
Parallel Training - we train a separate model (usually a Decision Tree) on each subset simultaneously. Aggregating - to make a final prediction, machine combines the results of all models. For Classification, It uses majority voting. For Regression, It uses the average of all predictions.
2. Random Forest
A Random Forest is an ensemble of many Decision Trees. It is better than a simple Bagging model because it adds a second layer of randomness.
The Math & Logic,
In a normal Decision Tree, the machine looks at all features to find the best split. In a Random Forest, for every split, the machine only looks at a random subset of features. This prevents one very strong feature from dominating every tree. It forces the trees to be different. It makes the final forest much more stable and accurate.
3. Out of Bag Error (OOB Error) ๐งฌ
One of the things about Random Forests is that we don't need a separate validation set to test the model. Because of Bootstrapping, 36.8% of the data is left out of each tree's training. This is called as Out of Bag data. The machine can test each tree on its own OOB data to calculate an accuracy score. The OOB error is a very good estimate of how the model will perform on real-world with unseen data.
4. Feature Importance
Random Forests are not black boxes. They can tell us which features are the most important for making a prediction. The machine calculates how much the Gini Impurity or Entropy decreases when a specific feature is used. If a feature consistently reduces impurity across all 100 trees, it gets a high Importance Score.
Summary ๐
Bagging combines multiple models to reduce error. Random Forest is a collection of Decision Trees that uses Bootstrapping and Feature Randomness to be incredibly accurate and stable. It is one of the most reliable Go-To algorithms for any tabular data project. โจ ๐๐In the next article (Article 27) we will discuss the opposite of Bagging.๐โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค1๐ฅ1
Get a special 50% discount on Revo Uninstaller Pro for your Valentine's Day and completely uninstall your girlfriend.exe ๐๐๐
โ๏ธ @TheInfinityAI
Life hack โก๏ธ: Free up 90% of your stress by removing unnecessary background processes. ๐ง ๐ง๐
โ๏ธ @TheInfinityAI
โค3
Article 27: Boosting Fundamentals โ Learning from Mistakes ๐๐
Boosting is an ensemble technique that combines several weak learners (models that are only slightly better than random guessing) to create one strong learner. This is the opposite of Bagging. ๐ช
1. How Boosting Works (The Logic) ๐ง
Imagine you are practicing for an exam.
The Machine Learning Process ๐ค
2. AdaBoost (Adaptive Boosting) ๐๐
AdaBoost was the first successful boosting algorithm. It is adaptive because it changes the weights of the data points based on the error of the previous model.
The Math behind:
3. Gradient Boosting ๐
Gradient Boosting is a tuned perspective of AdaBoost. Instead of changing weights, it tries to predict the difference between the actual value and the predicted value (Residuals).
The Process
Summary ๐
Boosting builds models one by one, with each model correcting the errors of the previous one. AdaBoost uses weights to focus on hard points, while Gradient Boosting uses math (gradients) to predict the residuals. This makes boosting models some of the most powerful tools in AI today. โจ ๐๐ In the next article (Article 28), we discuss the Speed Demons of ML: Advanced Boosting (XGBoost, LightGBM, and CatBoost) โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Boosting is an ensemble technique that combines several weak learners (models that are only slightly better than random guessing) to create one strong learner. This is the opposite of Bagging. ๐ช
1. How Boosting Works (The Logic) ๐ง
Imagine you are practicing for an exam.
01. You take a practice test.
02. You look at the questions you got wrong.
03. You spend more time studying only those specific topics.
04. You take another test and repeat the process.
The Machine Learning Process ๐ค
โ Sequential Training - The machine trains a base model (usually a very shallow decision tree called a stump).
โ Weighting - The machine looks at the data points that the first model predicted incorrectly. It gives higher weights to those points.
โ Correction - The next model is trained. Because of the weights, it focuses more on the difficult data points.
โ Final Prediction - The results are combined. Models that performed better get a higher say (weight) in the final vote.
2. AdaBoost (Adaptive Boosting) ๐๐
AdaBoost was the first successful boosting algorithm. It is adaptive because it changes the weights of the data points based on the error of the previous model.
The Math behind:
โ Equal Weights - At the start, all N data points have a weight of 1/N.
โ Calculate Error (ฯต) - For each tree, calc how many points it missed.
โ Calculate the amount of Say (ฮฑ):
ฮฑ = ยฝln (((1 โ ฯต) / ฯต)) - If the error is low, the tree gets a high "Say".
โ Update Weights - * Increase weights for incorrect points. Decrease weights for correct points.
โ Normalize - Make sure all weights add up to 1.
3. Gradient Boosting ๐
Gradient Boosting is a tuned perspective of AdaBoost. Instead of changing weights, it tries to predict the difference between the actual value and the predicted value (Residuals).
The Process
โ Start with an Initial Guess - Usually, the average of all target values.
โ Calculate Residuals - Find the error for every data point (Actual - Prediction).
โ Build a Tree on Residuals - Train a model to predict the Errors, not the original values.
โ Update Prediction - Add the new tree's prediction to the old prediction. We multiply the new tree's prediction by a small number (like 0.1) so we don't overfit too fast. [ Learning Rate (ฮท)]
โ Repeat - Keep building trees on the new residuals until the error is near zero.
Summary ๐
Boosting builds models one by one, with each model correcting the errors of the previous one. AdaBoost uses weights to focus on hard points, while Gradient Boosting uses math (gradients) to predict the residuals. This makes boosting models some of the most powerful tools in AI today. โจ ๐๐ In the next article (Article 28), we discuss the Speed Demons of ML: Advanced Boosting (XGBoost, LightGBM, and CatBoost) โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค1
Article 28: Advanced Boosting โ XGBoost, LightGBM, and CatBoost ๐โก๏ธ
In this article, we look at the three most popular Boosting libraries. They all use the Gradient Boosting framework but improve it with clever engineering and math. ๐ง ๐
1. XGBoost (Extreme Gradient Boosting) ๐
XGBoost is the most famous library. It is Extreme because it is designed for speed and performance.
The Logic,
2. LightGBM (Light Gradient Boosting Machine) โก๏ธ
LightGBM was created by Microsoft. It is designed to use less memory and is very fast on huge datasets.
The Backend process:
3. CatBoost (Categorical Boosting) ๐
CatBoost was created by Yandex. It is the best choice when your data has many Categorical features (like "Country" or "Color" etc).
Why is it the uniqueness?
Summary ๐
Advanced Boosting libraries take Gradient Boosting to the next level. XGBoost is the great all rounder with strong mathematics. LightGBM is the fastest for massive datasets. CatBoost is the magic tool for categorical data. In the next article (Article 29), we will enter Phase 8: Reinforcement Learning (RL), where we learn how agents learn through Rewards and Penalties! ๐ฎโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
In this article, we look at the three most popular Boosting libraries. They all use the Gradient Boosting framework but improve it with clever engineering and math. ๐ง ๐
1. XGBoost (Extreme Gradient Boosting) ๐
XGBoost is the most famous library. It is Extreme because it is designed for speed and performance.
The Logic,
โ Regularization (L1 & L2) - Unlike basic GBM, XGBoost includes dendrites to penalize complex models. This helps prevent Overfitting.
โ Second-Order Derivatives - It uses Taylor Expansion to calculate the loss function more accurately. This makes the optimization much faster than standard methods.
โ Pruning - It uses a Depth-First approach. It grows the tree to its maximum depth and then removes branches that do not add enough value.
โ Parallel Processing - It uses the computer's hardware efficiently to build trees faster.
2. LightGBM (Light Gradient Boosting Machine) โก๏ธ
LightGBM was created by Microsoft. It is designed to use less memory and is very fast on huge datasets.
The Backend process:
โ GOSS (Gradient-based One-Side Sampling) - It focuses only on data points with large gradients (errors) and ignores points with small errors. This reduces the amount of data it needs to process.
โ Leaf-Wise Growth - Standard models grow Level-Wise (layer by layer). LightGBM grows Leaf-Wise. It picks the leaf that will reduce the most loss and splits it. This results in much higher accuracy but can overfit if not tuned carefully.
โ EFB (Exclusive Feature Bundling) - It combines many features into one to reduce the dimensionality of the data without losing information.
3. CatBoost (Categorical Boosting) ๐
CatBoost was created by Yandex. It is the best choice when your data has many Categorical features (like "Country" or "Color" etc).
Why is it the uniqueness?
โ Native Categorical Support - You do not need to do One-Hot Encoding manually. CatBoost handles categories internally using a method called Ordered Boosting.
โ Symmetric Trees - It builds perfectly balanced trees. This makes the model very fast when used for predictions (Inference).
โ No Overfitting - It uses a mathematical trick to prevent target leakage, which makes it very stable even with small datasets.
Summary ๐
Advanced Boosting libraries take Gradient Boosting to the next level. XGBoost is the great all rounder with strong mathematics. LightGBM is the fastest for massive datasets. CatBoost is the magic tool for categorical data. In the next article (Article 29), we will enter Phase 8: Reinforcement Learning (RL), where we learn how agents learn through Rewards and Penalties! ๐ฎโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค3
Forwarded from Computer Science and Programming
6 Components of Context Engineering
Context engineering is the practice of optimizing how information flows to AI models, comprising six core components: prompting techniques (few-shot, chain-of-thought), query augmentation (rewriting, expansion, decomposition), long-term memory (vector/graph databases for episodic, semantic, and procedural memory), short-term memory (conversation history management), knowledge base retrieval (RAG pipelines with pre-retrieval, retrieval, and augmentation layers), and tools/agents (single and multi-agent architectures, MCPs). While model selection and prompts contribute only 25% to output quality, the remaining 75% comes from properly engineering these context components to deliver the right information at the right time in the right format.
โค2
Article 29: Reinforcement Learning Fundamentals โ The Agentโs Journey ๐ฎ๐ฐ
Reinforcement Learning is like training a dog. If the dog does a good thing, we give it a treat (Reward). If the dog does something bad, we do not give a treat (Penalty). Over time, the dog learns to do the things that get the most treats.
1. The Key Players in Reinforcement Learning (RL)
To understand RL, we must know these five main terms,
2. The Reinforcement Learning Interaction Loop
The agent and environment talk to each other in a continuous loop.
3. Markov Decision Process (MDP)
The mathematical foundation of Reinforcement Learning is the Markov Decision Process (MDP). MDP assumes that the future depends only on the current state and action. It does not matter how the agent arrived at the current state. We call this the Markov Property.
The Math components;
4. Exploration vs. Exploitation
This is the biggest challenge in RL. The agent must balance two things,
5. Why is Reinforcement Learning important?
Reinforcement Learning is the technology behind,
Summary ๐
Reinforcement Learning is about learning from interaction. An Agent takes Actions in an Environment to maximize its total Reward. The MDP provides the mathematical framework for this process. The agent must always balance Exploration (trying new things) and Exploitation (using known facts). โจ ๐๐
In the next article (Article 30), we will discuss Q-Learning and Deep Q-Networks (DQN). Ready to learn how agents use a Cheat Sheet to make decisions! ๐โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Reinforcement Learning is like training a dog. If the dog does a good thing, we give it a treat (Reward). If the dog does something bad, we do not give a treat (Penalty). Over time, the dog learns to do the things that get the most treats.
1. The Key Players in Reinforcement Learning (RL)
To understand RL, we must know these five main terms,
โ The Agent - This is the AI or the learner that makes decisions.
โ The Environment - This is the world where the agent lives and acts. For example, in a video game, the "game world" is the environment.
โ State - This is the current situation of the agent. It is like a snapshot of the environment at a specific time.
โ Action - This is what the agent chooses to do (like move left, jump, stay still).
โ Reward - This is the feedback from the environment. Positive reward for a good action and negative reward (Penalty) for a bad action.
2. The Reinforcement Learning Interaction Loop
The agent and environment talk to each other in a continuous loop.
โ The agent observes the current State.
โ The agent takes an Action.
โ The environment changes to a New State.
โ The environment gives a Reward to the agent.
โ The agent uses the reward to learn if the action was good or bad.
3. Markov Decision Process (MDP)
The mathematical foundation of Reinforcement Learning is the Markov Decision Process (MDP). MDP assumes that the future depends only on the current state and action. It does not matter how the agent arrived at the current state. We call this the Markov Property.
The Math components;
Policy (ฯ) - This is the agentโs strategy. It is a map that tells the agent which action to take in each state.
Value Function (V) - This is the total reward the agent expects to get in the long term, starting from a specific state.
Discount Factor (ฮณ) - This is a number between 0 and 1. It tells the agent how much to care about future rewards compared to immediate rewards.
4. Exploration vs. Exploitation
This is the biggest challenge in RL. The agent must balance two things,
Exploitation - The agent uses what it already knows to get a reward. (Example: Going to your favourite restaurant because you know the food is good).
Exploration - The agent tries something new to see if it gives a better reward. (Example: Trying a new restaurant to see if it is better than your favourite one).
5. Why is Reinforcement Learning important?
Reinforcement Learning is the technology behind,
โ Self-driving cars (learning how to drive safely).
โ Game AI (like AlphaGo, which beat the world champion).
โ Robotics (teaching robots to walk or pick up items).
Summary ๐
Reinforcement Learning is about learning from interaction. An Agent takes Actions in an Environment to maximize its total Reward. The MDP provides the mathematical framework for this process. The agent must always balance Exploration (trying new things) and Exploitation (using known facts). โจ ๐๐
In the next article (Article 30), we will discuss Q-Learning and Deep Q-Networks (DQN). Ready to learn how agents use a Cheat Sheet to make decisions! ๐โญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค1
Forwarded from Data Science & Machine Learning
๐ Data Science Roadmap ๐
๐ Start Here
โ๐ What is Data Science & Why It Matters?
โ๐ Roles (Data Analyst, Data Scientist, ML Engineer)
โ๐ Setting Up Environment (Python, Jupyter Notebook)
๐ Python for Data Science
โ๐ Python Basics (Variables, Loops, Functions)
โ๐ NumPy for Numerical Computing
โ๐ Pandas for Data Analysis
๐ Data Cleaning & Preparation
โ๐ Handling Missing Values
โ๐ Data Transformation
โ๐ Feature Engineering
๐ Exploratory Data Analysis (EDA)
โ๐ Descriptive Statistics
โ๐ Data Visualization (Matplotlib, Seaborn)
โ๐ Finding Patterns & Insights
๐ Statistics & Probability
โ๐ Mean, Median, Mode, Variance
โ๐ Probability Basics
โ๐ Hypothesis Testing
๐ Machine Learning Basics
โ๐ Supervised Learning (Regression, Classification)
โ๐ Unsupervised Learning (Clustering)
โ๐ Model Evaluation (Accuracy, Precision, Recall)
๐ Machine Learning Algorithms
โ๐ Linear Regression
โ๐ Decision Trees & Random Forest
โ๐ K-Means Clustering
๐ Model Building & Deployment
โ๐ Train-Test Split
โ๐ Cross Validation
โ๐ Deploy Models (Flask / FastAPI)
๐ Big Data & Tools
โ๐ SQL for Data Handling
โ๐ Introduction to Big Data (Hadoop, Spark)
โ๐ Version Control (Git & GitHub)
๐ Practice Projects
โ๐ House Price Prediction
โ๐ Customer Segmentation
โ๐ Sales Forecasting Model
๐ โ Move to Next Level
โ๐ Deep Learning (Neural Networks, TensorFlow, PyTorch)
โ๐ NLP (Text Analysis, Chatbots)
โ๐ MLOps & Model Optimization
Data Science Resources: https://whatsapp.com/channel/0029VaxbzNFCxoAmYgiGTL3Z
React "โค๏ธ" for more! ๐๐
๐ Start Here
โ๐ What is Data Science & Why It Matters?
โ๐ Roles (Data Analyst, Data Scientist, ML Engineer)
โ๐ Setting Up Environment (Python, Jupyter Notebook)
๐ Python for Data Science
โ๐ Python Basics (Variables, Loops, Functions)
โ๐ NumPy for Numerical Computing
โ๐ Pandas for Data Analysis
๐ Data Cleaning & Preparation
โ๐ Handling Missing Values
โ๐ Data Transformation
โ๐ Feature Engineering
๐ Exploratory Data Analysis (EDA)
โ๐ Descriptive Statistics
โ๐ Data Visualization (Matplotlib, Seaborn)
โ๐ Finding Patterns & Insights
๐ Statistics & Probability
โ๐ Mean, Median, Mode, Variance
โ๐ Probability Basics
โ๐ Hypothesis Testing
๐ Machine Learning Basics
โ๐ Supervised Learning (Regression, Classification)
โ๐ Unsupervised Learning (Clustering)
โ๐ Model Evaluation (Accuracy, Precision, Recall)
๐ Machine Learning Algorithms
โ๐ Linear Regression
โ๐ Decision Trees & Random Forest
โ๐ K-Means Clustering
๐ Model Building & Deployment
โ๐ Train-Test Split
โ๐ Cross Validation
โ๐ Deploy Models (Flask / FastAPI)
๐ Big Data & Tools
โ๐ SQL for Data Handling
โ๐ Introduction to Big Data (Hadoop, Spark)
โ๐ Version Control (Git & GitHub)
๐ Practice Projects
โ๐ House Price Prediction
โ๐ Customer Segmentation
โ๐ Sales Forecasting Model
๐ โ Move to Next Level
โ๐ Deep Learning (Neural Networks, TensorFlow, PyTorch)
โ๐ NLP (Text Analysis, Chatbots)
โ๐ MLOps & Model Optimization
Data Science Resources: https://whatsapp.com/channel/0029VaxbzNFCxoAmYgiGTL3Z
React "โค๏ธ" for more! ๐๐
โค3
Forwarded from Computer Science and Programming
Video.js v10 Beta: Hello, World (again)
Video.js v10.0.0 beta is a ground-up rewrite merging Video.js, Plyr, Vidstack, and Media Chrome into a single modern framework. Key highlights include an 88% reduction in default bundle size (66% even without ABR), a new composable streaming engine called SPF that enables much smaller adaptive bitrate bundles, first-class React and TypeScript support, unstyled UI primitives inspired by Radix/Base UI, and a shadcn-style skin ejection system. The architecture is fully composable โ unused features are tree-shaken out. Three presets ship with the beta: video, audio, and background video. New skins were designed by Plyr's creator Sam Potts. GA is targeted for mid-2026, with migration guides for Video.js v8, Plyr, Vidstack, and Media Chrome planned before then.
โค1
Article 30: Q-Learning and DQN โ The Agentโs Brain ๐ง
In a simple situation, agent can remember the best action for every situation. But in a complex situation, the agent needs a brain or a cheat sheet to help it choose.
1. Q-Learning (The Cheat Sheet)
Q-Learning is a Value-Based algorithm. The Q stands for Quality. It measures how good an action is for a specific state.
The Q-Table:
Imagine a table that lists every possible state and every possible action.
When the agent is in a state, it looks at the Q-Table. It picks the action with the highest Q-Value.
2. The Maths background (The Bellman Equation)
Now let's see how the agent fills in the Q-Table. It uses the Bellman Equation. Every time the agent takes an action and gets a reward, it updates the table with this logic; The new Q-value is the old value PLUS a small update based on the immediate reward and the best future reward.
Q(s, a) = Q(s, a) + ฮฑ [R + ฮณ max Q (s', a') - Q(s, a)]
ฮฑ (Learning Rate) tells the agent how much to trust new information and ฮณ (Discount Factor) tells the agent how much to value future rewards.
3. The Problem (The Curse of Dimensionality)
A Q-Table works well for simple games like Tic-Tac-Toe. But what about a video game with millions of pixels? If a game has 1 million possible states, the table becomes too big for the computer's memory. This is why we need a smarter way to store Q-Values.
4. Deep Q-Networks (DQN)
In DQN we throw away the big table and replace it with a Neural Network.
How it works:
5. Making DQN Stable?
Learning with a Neural Network in RL is often unstable. To fix this DQN uses two advanced techniques.
Summary ๐
Q-Learning uses a Q-Table to store the value of actions in different states. When the situation is too complex for a table, we use Deep Q-Networks (DQN). DQN uses a Neural Network to predict values and uses Experience Replay to keep the learning stable. โจ ๐๐.
In the next article (Article 31), we discuss Policy Gradients and Actor-Critic Methods. In that, the agent learns a strategy directly instead of just looking at values! ๐ญโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
In a simple situation, agent can remember the best action for every situation. But in a complex situation, the agent needs a brain or a cheat sheet to help it choose.
1. Q-Learning (The Cheat Sheet)
Q-Learning is a Value-Based algorithm. The Q stands for Quality. It measures how good an action is for a specific state.
The Q-Table:
Imagine a table that lists every possible state and every possible action.
Rows - These represent the States.
Columns - These represent the Actions.
Cells - These store the Q-Value.
When the agent is in a state, it looks at the Q-Table. It picks the action with the highest Q-Value.
2. The Maths background (The Bellman Equation)
Now let's see how the agent fills in the Q-Table. It uses the Bellman Equation. Every time the agent takes an action and gets a reward, it updates the table with this logic; The new Q-value is the old value PLUS a small update based on the immediate reward and the best future reward.
Q(s, a) = Q(s, a) + ฮฑ [R + ฮณ max Q (s', a') - Q(s, a)]
ฮฑ (Learning Rate) tells the agent how much to trust new information and ฮณ (Discount Factor) tells the agent how much to value future rewards.
3. The Problem (The Curse of Dimensionality)
A Q-Table works well for simple games like Tic-Tac-Toe. But what about a video game with millions of pixels? If a game has 1 million possible states, the table becomes too big for the computer's memory. This is why we need a smarter way to store Q-Values.
4. Deep Q-Networks (DQN)
In DQN we throw away the big table and replace it with a Neural Network.
How it works:
The agent gives the current State (like an image) as input to the Neural Network.
The Neural Network does not give a single answer. It predicts the Q-Values for all possible actions at once.
The agent picks the action with the highest predicted Q-Value.
5. Making DQN Stable?
Learning with a Neural Network in RL is often unstable. To fix this DQN uses two advanced techniques.
Experience Replay- The agent stores its past experiences (State, Action, Reward, Next State) in a memory buffer. Instead of learning only from the current step, it takes a random sample from its memory to train the network. This prevents the agent from forgetting old lessons.
Target Network - DQN uses two identical Neural Networks. One network makes the prediction, and the second target network calculates the goal. We update the Target Network only once in a while. This keeps the learning steady and calm.
Summary ๐
Q-Learning uses a Q-Table to store the value of actions in different states. When the situation is too complex for a table, we use Deep Q-Networks (DQN). DQN uses a Neural Network to predict values and uses Experience Replay to keep the learning stable. โจ ๐๐.
In the next article (Article 31), we discuss Policy Gradients and Actor-Critic Methods. In that, the agent learns a strategy directly instead of just looking at values! ๐ญโญ๏ธ ๐๐
โ๏ธ @TheInfinityAI
Telegram
Infinity CS
โค2