Epython Lab

Calculating Mean Absolute Error and Mean Squared Error without using sklearn library
@epythonlab #ml #AI #codetip

1.53K views09:20

Count Number of Word Occurrences in List Python

@epythonlab #codetip #list

1.58K views09:57

Implementing Bag of Words from scratch

Before we dive into scikit-learn's Bag of Words (BoW) library to do the dirty work for us, let's implement it ourselves first so that we can understand what's happening behind the scenes.

Step 1: Convert all strings to their lower case form.

Let's say we have a document set:
documents = ['Hello, how are you!',
'Win money, win from home.',
'Call me now.',
'Hello, Call hello you tomorrow?']

Instructions:

Convert all the strings in the documents set to their lower case. Save them into a list called 'lower_case_documents'. You can convert strings to their lower case in python by using the lower() method.

@epythonlab #codetip #naive_bayes #ml #AI

👍3🔥1

1.59K views09:01

Epython Lab

Implementing Bag of Words from scratch Before we dive into scikit-learn's Bag of Words (BoW) library to do the dirty work for us, let's implement it ourselves first so that we can understand what's happening behind the scenes. Step 1: Convert all strings…

Step 2: Removing all punctuation
Instructions: Remove all punctuation from the strings in the document set. Save the strings into a list called 'sans_punctuation_documents'.
@epythonlab #codetip #ml #AI #naive_bayes

👍3

1.4K viewsedited 09:06

Epython Lab

Step 2: Removing all punctuation Instructions: Remove all punctuation from the strings in the document set. Save the strings into a list called 'sans_punctuation_documents'. @epythonlab #codetip #ml #AI #naive_bayes

Step 3: Tokenization

Tokenizing a sentence in a document set means splitting up the sentence into individual words using a delimiter. The delimiter specifies what character we will use to identify the beginning and end of a word. Most commonly, we use a single space as the delimiter character for identifying words, and this is true in our documents in this case also.

Instructions: Tokenize the strings stored in 'sans_punctuation_documents' using the split() method. Store the final document set in a list called 'preprocessed_documents'.

@epythonlab #codetip #ml #AI #naive_bayes

👍3

1.38K views09:19

Epython Lab

Step 3: Tokenization Tokenizing a sentence in a document set means splitting up the sentence into individual words using a delimiter. The delimiter specifies what character we will use to identify the beginning and end of a word. Most commonly, we use a single…

Step 4 and the last step: Count frequencies

Now that we have our document set in the required format, we can proceed to counting the occurrence of each word in each document of the document set. We will use the Counter method from the Python collections library for this purpose.

Counter counts the occurrence of each item in the list and returns a dictionary with the key as the item being counted and the corresponding value being the count of that item in the list.

Instructions: Using the Counter() method and preprocessed_documents as the input, create a dictionary with the keys being each word in each document and the corresponding values being the frequency of occurrence of that word. Save each Counter dictionary as an item in a list called 'frequency_list'.

@epythonlab #codetip #ml #AI #naive_bayes

👍5

1.37K views09:24

Epython Lab

Calculating Mean Absolute Error and Mean Squared Error without using sklearn library
@epythonlab #ml #AI #codetip

❤5

1.69K views05:10

About

Blog

Apps

Platform