#artificialintelligence #machinelearning #deeplearning #ai #ml #dl #aimemes #mlmemes
Original: Machine Learning TechHub
Original: Machine Learning TechHub
Calculating Mean Absolute Error and Mean Squared Error without using sklearn library
@epythonlab #ml #AI #codetip
@epythonlab #ml #AI #codetip
Project Idea: Building a spam classifier
Introduction
Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically classify such mail as 'Junk Mail'.
In this mission we will be using the Naive Bayes algorithm to create a model that can classify dataset SMS messages as spam or not spam, based on the training we give to the model. It is important to have some level of intuition as to what a spammy text message might look like.
What are spammy messages?
Usually they have words like 'free', 'win', 'winner', 'cash', 'prize', or similar words in them, as these texts are designed to catch your eye and tempt you to open them. Also, spam messages tend to have words written in all capitals and also tend to use a lot of exclamation marks. To the recipient, it is usually pretty straightforward to identify a spam text and our objective here is to train a model to do that for us!
Being able to identify spam messages is a binary classification problem as messages are classified as either 'Spam' or 'Not Spam' and nothing else. Also, this is a supervised learning problem, as we know what are trying to predict. We will be feeding a labelled dataset into the model, that it can learn from, to make future predictions.
@epythonlab #projectIdea #ml
Introduction
Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically classify such mail as 'Junk Mail'.
In this mission we will be using the Naive Bayes algorithm to create a model that can classify dataset SMS messages as spam or not spam, based on the training we give to the model. It is important to have some level of intuition as to what a spammy text message might look like.
What are spammy messages?
Usually they have words like 'free', 'win', 'winner', 'cash', 'prize', or similar words in them, as these texts are designed to catch your eye and tempt you to open them. Also, spam messages tend to have words written in all capitals and also tend to use a lot of exclamation marks. To the recipient, it is usually pretty straightforward to identify a spam text and our objective here is to train a model to do that for us!
Being able to identify spam messages is a binary classification problem as messages are classified as either 'Spam' or 'Not Spam' and nothing else. Also, this is a supervised learning problem, as we know what are trying to predict. We will be feeding a labelled dataset into the model, that it can learn from, to make future predictions.
@epythonlab #projectIdea #ml
👍5
Implementing Bag of Words from scratch
Before we dive into scikit-learn's Bag of Words (BoW) library to do the dirty work for us, let's implement it ourselves first so that we can understand what's happening behind the scenes.
Step 1: Convert all strings to their lower case form.
Let's say we have a document set:
documents = ['Hello, how are you!',
'Win money, win from home.',
'Call me now.',
'Hello, Call hello you tomorrow?']
Instructions:
Convert all the strings in the documents set to their lower case. Save them into a list called 'lower_case_documents'. You can convert strings to their lower case in python by using the lower() method.
@epythonlab #codetip #naive_bayes #ml #AI
Before we dive into scikit-learn's Bag of Words (BoW) library to do the dirty work for us, let's implement it ourselves first so that we can understand what's happening behind the scenes.
Step 1: Convert all strings to their lower case form.
Let's say we have a document set:
documents = ['Hello, how are you!',
'Win money, win from home.',
'Call me now.',
'Hello, Call hello you tomorrow?']
Instructions:
Convert all the strings in the documents set to their lower case. Save them into a list called 'lower_case_documents'. You can convert strings to their lower case in python by using the lower() method.
@epythonlab #codetip #naive_bayes #ml #AI
👍3🔥1
Epython Lab
Implementing Bag of Words from scratch Before we dive into scikit-learn's Bag of Words (BoW) library to do the dirty work for us, let's implement it ourselves first so that we can understand what's happening behind the scenes. Step 1: Convert all strings…
Step 2: Removing all punctuation
Instructions: Remove all punctuation from the strings in the document set. Save the strings into a list called 'sans_punctuation_documents'.
@epythonlab #codetip #ml #AI #naive_bayes
Instructions: Remove all punctuation from the strings in the document set. Save the strings into a list called 'sans_punctuation_documents'.
@epythonlab #codetip #ml #AI #naive_bayes
👍3
Epython Lab
Step 2: Removing all punctuation Instructions: Remove all punctuation from the strings in the document set. Save the strings into a list called 'sans_punctuation_documents'. @epythonlab #codetip #ml #AI #naive_bayes
Step 3: Tokenization
Tokenizing a sentence in a document set means splitting up the sentence into individual words using a delimiter. The delimiter specifies what character we will use to identify the beginning and end of a word. Most commonly, we use a single space as the delimiter character for identifying words, and this is true in our documents in this case also.
Instructions: Tokenize the strings stored in 'sans_punctuation_documents' using the split() method. Store the final document set in a list called 'preprocessed_documents'.
@epythonlab #codetip #ml #AI #naive_bayes
Tokenizing a sentence in a document set means splitting up the sentence into individual words using a delimiter. The delimiter specifies what character we will use to identify the beginning and end of a word. Most commonly, we use a single space as the delimiter character for identifying words, and this is true in our documents in this case also.
Instructions: Tokenize the strings stored in 'sans_punctuation_documents' using the split() method. Store the final document set in a list called 'preprocessed_documents'.
@epythonlab #codetip #ml #AI #naive_bayes
👍3
Epython Lab
Step 3: Tokenization Tokenizing a sentence in a document set means splitting up the sentence into individual words using a delimiter. The delimiter specifies what character we will use to identify the beginning and end of a word. Most commonly, we use a single…
Step 4 and the last step: Count frequencies
Now that we have our document set in the required format, we can proceed to counting the occurrence of each word in each document of the document set. We will use the Counter method from the Python collections library for this purpose.
Counter counts the occurrence of each item in the list and returns a dictionary with the key as the item being counted and the corresponding value being the count of that item in the list.
Instructions: Using the Counter() method and preprocessed_documents as the input, create a dictionary with the keys being each word in each document and the corresponding values being the frequency of occurrence of that word. Save each Counter dictionary as an item in a list called 'frequency_list'.
@epythonlab #codetip #ml #AI #naive_bayes
Now that we have our document set in the required format, we can proceed to counting the occurrence of each word in each document of the document set. We will use the Counter method from the Python collections library for this purpose.
Counter counts the occurrence of each item in the list and returns a dictionary with the key as the item being counted and the corresponding value being the count of that item in the list.
Instructions: Using the Counter() method and preprocessed_documents as the input, create a dictionary with the keys being each word in each document and the corresponding values being the frequency of occurrence of that word. Save each Counter dictionary as an item in a list called 'frequency_list'.
@epythonlab #codetip #ml #AI #naive_bayes
👍5
Congratulations We have implemented BoW from scratch using Python.
Here is a Partially mplementation of spam classifier using naive_bayes algorithm
https://t.me/epythonlab/689
In this post we have implemented bag of words without using scikit learn library.
We have followed lots of steps to implement BoW without library.
1. Convert all strings to their lowercase
https://t.me/epythonlab/690
2. Removing all punctuations
https://t.me/epythonlab/691
3. Tokenize https://t.me/epythonlab/692
4. Count freequencies
https://t.me/epythonlab/693
N.B: Follow all steps above and implement BoW using scikit learn by yourself.
@epythonlab #AI #ML
Here is a Partially mplementation of spam classifier using naive_bayes algorithm
https://t.me/epythonlab/689
In this post we have implemented bag of words without using scikit learn library.
We have followed lots of steps to implement BoW without library.
1. Convert all strings to their lowercase
https://t.me/epythonlab/690
2. Removing all punctuations
https://t.me/epythonlab/691
3. Tokenize https://t.me/epythonlab/692
4. Count freequencies
https://t.me/epythonlab/693
N.B: Follow all steps above and implement BoW using scikit learn by yourself.
@epythonlab #AI #ML
Telegram
EPYTHON LAB
Project Idea: Building a spam classifier
Introduction
Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically…
Introduction
Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically…
👍7
Data: without data impossible to think about AI. It is not easy to get the required data in the right format. Here you can generate dummy data and learn to implement ML models in your data.
https://lnkd.in/e2jV2zcy
#ai #data #ml
https://lnkd.in/e2jV2zcy
#ai #data #ml
lnkd.in
LinkedIn
This link will take you to a page that’s not on LinkedIn
👍6❤1
Calculating Mean Absolute Error and Mean Squared Error without using sklearn library
@epythonlab #ml #AI #codetip
@epythonlab #ml #AI #codetip
❤5