Epython Lab – Telegram

Epython Lab

6.45K subscribers

660 photos

31 videos

104 files

1.22K links

Welcome to Epython Lab, where you can get resources to learn, one-on-one trainings on machine learning, business analytics, and Python, and solutions for business problems.

Buy ads: https://telega.io/c/epythonlab

Download Telegram

About

Blog

Apps

Platform

6.45K subscribers

Project Idea: Building a spam classifier Introduction Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically…

Implementing Bag of Words in scikit-learn

Now that we have implemented the BoW concept from scratch, let's go ahead and use scikit-learn to do this process in a clean and succinct way. We will use the same document set as we used in the previous step.

'''
Here we will look to create a frequency matrix on a smaller document set to make sure we understand how the
document-term matrix generation happens. We have created a sample document set 'documents'.
'''
Step 1:
documents = ['Hello, how are you!',
'Win money, win from home.',
'Call me now.',
'Hello, Call hello you tomorrow?']

Instructions: Import the sklearn.feature_extraction.text.CountVectorizer method and create an instance of it called 'count_vector'.

👍3

1.25K viewsedited 08:17

Data preprocessing with CountVectorizer()

In Step 2.2, we implemented a version of the CountVectorizer() method from scratch that entailed cleaning our data first. This cleaning involved converting all of our data to lower case and removing all punctuation marks. CountVectorizer() has certain parameters which take care of these steps for us. They are:

lowercase = True

The lowercase parameter has a default value of True which converts all of our text to its lower case form.

token_pattern = (?u)\\b\\w\\w+\\b

The token_pattern parameter has a default regular expression value of (?u)\\b\\w\\w+\\b which ignores all punctuation marks and treats them as delimiters, while accepting alphanumeric strings of length greater than or equal to 2, as individual tokens or words.

stop_words

The stop_words parameter, if set to english will remove all words from our document set that match a list of English stop words which is defined in scikit-learn. Considering the size of our dataset and the fac

👍4

1.17K views08:32

Considering the small size of our dataset and the fact that we are dealing with SMS messages and not larger text sources like e-mail, we will not use stop words, and we won't be setting this parameter value.

You can take a look at all the parameter values of your count_vector object by simply printing out the object as follows:

👍3

1.14K views08:34

Instructions: Fit your document dataset to the CountVectorizer object you have created using fit(), and get the list of words which have been categorized as features using the get_feature_names() method.

👍3

1.14K views08:36

The get_feature_names() method returns our feature names for this dataset, which is the set of words that make up our vocabulary for 'documents'.
Instructions: Create a matrix with each row representing one of the 4 documents, and each column representing a word (feature name). Each value in the matrix will represent the frequency of the word in that column occurring in the particular document in that row. You can do this using the transform() method of CountVectorizer, passing in the document data set as the argument. The transform() method returns a matrix of NumPy integers, which you can convert to an array using toarray(). Call the array 'doc_array'.

👍3

1.2K views08:39

Now we have a clean representation of the documents in terms of the frequency distribution of the words in them. To make it easier to understand our next step is to convert this array into a dataframe and name the columns appropriately.

Instructions: Convert the 'doc_array' we created into a dataframe, with the column names as the words (feature names). Call the dataframe 'frequency_matrix'.

👍3

1.2K views08:41

Project Idea: Building a spam classifier Introduction Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically…

Congratulations! We have successfully implemented a Bag of Words problem for a document dataset that we created. https://t.me/epythonlab/689

One potential issue that can arise from using this method is that if our dataset of text is extremely large (say if we have a large collection of news articles or email data), there will be certain values that are more common than others simply due to the structure of the language itself. For example, words like 'is', 'the', 'an', pronouns, grammatical constructs, etc., could skew our matrix and affect our analysis.

There are a couple of ways to mitigate this. One way is to use the stop_words parameter and set its value to english. This will automatically ignore all the words in our input text that are found in a built-in list of English stop words in scikit-learn.

Another way of mitigating this is by using the tfidf(Term frequency index document frequency) method.

Project Idea: Building a spam classifier

Introduction

Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically…

👍4

1.42K views08:45

Question: We have implemented bag of words with and without scikit-learn. Let you understand each solutions of the problem posted here https://t.me/epythonlab/689 and write short summary which method(without scikit-learn or using python code and with using scikit-learn library) do you think is best option to implement bag of words?
Send your summary to @asibehtenager. Your summary will be posted on the channel that help others to learn from you.

Project Idea: Building a spam classifier

Introduction

Spam detection is one of the major applications of Machine Learning in the interwebs today. Pretty much all of the major email service providers have spam detection systems built in and automatically…

👍3

1.59K views08:50

How to get data from wikipedia

https://www.youtube.com/watch?v=xF-clSS2zM0&list=UUsFz0IGS9qFcwrh7a91juPg&index=35

Access and Parse Data from Wikipedia with Wikipedia API in Python

In this video, you will learn about how to access data from Wikipedia and parse into your own local language using Wikipedia API in Python.

👍2

1.47K views07:29

Data Cleansing in Pandas
https://www.youtube.com/watch?v=veF0_Bgk5Aw&list=UUsFz0IGS9qFcwrh7a91juPg&index=14

How to Parse and Clean Data using Pandas Library

Hello everyone and welcome. In this video, you will learn about how to parse and clean data using Pandas Library.
#pandas #datacleansing #python

Ask your question in the telegram group https://t.me/epythonlab/

Thanks for watching!

👍4🔥1

1.47K views13:57

Best Selling Python Book on Amazon for free. Who want this book? you can find the pdf at the next post. Like or Dislike?
#pythonbooks @epythonlab

👍4❤1

1.74K views14:38

Python_Crash_Course,_2nd_Edition:_A_Hands_On,_Project_Based_Introduction.pdf

Best Selling Python Book on Amazon for free. Who want this book? you can find the pdf at https://t.me/epythonlab/709/

Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming 2nd Edition

❤3👍1

2.18K viewsedited 15:01

We have plan to demonstrate how to build a simple blog app using Flask. We want to know that how many of you are absolute beginner in Flask Framework? We will send you zoom link via your telegram account on Friday @2:00 PM(GMT+3). Replay to this poll.

Final Results

I know HTML, CSS, JS, and Python but not Flask

I don't know all

71 voters1.86K views18:07

How to build blog app using Flask Part I https://www.youtube.com/watch?v=dIN31vX9Qvw&list=UUsFz0IGS9qFcwrh7a91juPg

How to create web app using Flask Part 1 | Introduction to Flask

#flask #blogapp #python
In this video, you will learn how to get started with Flask.
Layout Design: https://www.youtube.com/playlist?list=PL0nX4ZoMtjYGzAtRxyP0szpmv3Yaub-0o
Ask your question at https://t.me/epythonlab/
Thanks for watching!

👍1

1.9K viewsedited 13:30

Auto Scraper Bot.pdf

Telegram shilling bot for Crypto traders, msg to @asibehtenager

👍1

2.03K viewsedited 11:54

Data science from scratch with source code on github https://t.me/epythonlab/681?single

Data science from scratch

📖Link: https://github.com/joelgrus/data-science-from-scratch

@epythonlab #books

👍3

1.81K viewsedited 10:34

Flask Layout Design
https://www.youtube.com/watch?v=IBPw_gw-3sU

How to create Web app using Flask Part 2 | Layout Design

#flask #python #bootsrape
Flask layout design using bootstrap and css from the scratch.
Watch playlist: https://www.youtube.com/playlist?list=PL0nX4ZoMtjYGzAtRxyP0szpmv3Yaub-0o
Ask your question at https://t.me/epythonlab/
Thanks for watching!

👍2

1.61K views10:38

What is the output?

1.62K views15:23

Flask Layout Design https://www.youtube.com/watch?v=IBPw_gw-3sU

It is available now

1.62K views17:09

Flask Layout Design https://www.youtube.com/watch?v=IBPw_gw-3sU

Source code of the video
https://github.com/epythonlab/BlogApp

GitHub - epythonlab/BlogApp: This is a simple CRUD application developed using Python, Botstrap and Flask as a framework. The full…

This is a simple CRUD application developed using Python, Botstrap and Flask as a framework. The full video tutorials are available on youtube. you can find the tutorial https://youtube.com/epython...

👍2

1.77K views05:11