Epython Lab
6.44K subscribers
661 photos
31 videos
104 files
1.22K links
Welcome to Epython Lab, where you can get resources to learn, one-on-one trainings on machine learning, business analytics, and Python, and solutions for business problems.

Buy ads: https://telega.io/c/epythonlab
Download Telegram
#KeyNote #BusinessAnalytics #DataScience

Cross Industry Standard Process for Data Mining (CRISP-DM)
"A data mining process model that describes commonly used approaches that data mining experts use to tackle problems... it was the leading methodology used by industry data miners." -Wikipedia

CRISP-DM Steps

1. Business Issue Understanding
2. Data Understanding
3. Data Preparation
4. Analysis/Modeling
5. Validation
6. Presentation/Visualization
#KeyNote #DataAnanlysisMethodology #BusinessAnalytics

Type of data analysis methodology

Predictive

Predictive analytics uses existing data to predict a future outcome. For example, a company may use predictive analytics to forecast demand or whether a customer will respond to an advertising campaign.

Geospatial

This type of analysis uses location based data to help drive your conclusions. Some examples are:

Identifying customers by a geographic dimension such as zip code, state, or county, or
Calculating the distance between addresses and your stores, or
Creating a trade area based upon your customer locations for further analysis

Some types of Geospatial analysis require the use of special software - such as software that can convert an address to Latitude & Longitude, or can calculate the drive time between two geographic points on a map.

Segmentation

Segmentation is the process of grouping data together. Groups can be simple, such as customers who have purchased different items, to more complex segmentation techniques where you identify stores that are similar based upon the demographics of their customers.

Aggregation

This methodology simply means calculating a value across a group or dimension and is commonly used in data analysis. For example, you may want to aggregate sales data for a salesperson by month - adding all of the sales closed for each month. Then, you may want to aggregate across dimensions, such as sales by month per sales territory. In this scenario, you could calculate the sales per month for each salesperson, and then add the sales per month for all salespeople in each region.

Aggregation is often done in reporting to be able to β€œ slice and dice” information to help managers make decisions and view performance.

Descriptive

Descriptive statistics provides simple summaries of a data sample. Examples could be calculating average GPA for applicants to a school, or calculating the batting average of a professional baseball player. In our electricity supply scenario, we could use descriptive statistics to calculate the average temperature per hour, per day, or per date.

Some of the commonly used descriptive statistics are Mean, Median, Mode, Standard Deviation, and Interquartile range.
#Keynote #DataScience #NLP #Python @epythonlab

Natural Language Processing

Natural language processing (NLP) is the field devoted to methods and algorithms for processing human (natural) languages for computers. NLP is a vast discipline that is actively being researched. Some examples of machine learning applications using NLP include sentiment analysis, topic modeling, and language translation. In NLP, the following terms have specific meanings:

- Corpus: The body/collection of text being investigated.
- Document: The unit of analysis, what is considered a single observation.

Examples of corpora include a collection of reviews and tweets, the text of the Iliad, and Wikipedia articles. Documents can be whatever you decided, it is what your model will consider an observation. For the example when the corpus is a collection of reviews or tweets, it is logical to make the document a single review or tweet. For the example of the text of the Iliad, we can set the document size to a sentence or a paragraph. The choice of document size will be influenced by the size of our corpus. If it is large, it may make sense to call each paragraph a document. As is usually the case, some design choices that need to be made.
Python is an interpreted programming language.
What is the difference between interpreter and compiler?
Explain your reason?

#Keynote #python
πŸ‘1
Artificial Intelligence Vs Machine Learning

. Artificial Intelligence is the making of intelligent machines by enabling the machine to copy human behaviors

. Machine Learning is subset of AI uses statistics to enable machines to improve with experience.

. Deep Learning is subset of machine learning that enables computers to solve more complex problems.

#Keynote #machinelearning #AI @epythonlab #deeplearning
πŸ‘2
Forwarded from Epython Lab (Asibeh Tenager)
#KeyNote #DataScience #datanalytics #modeltrain #futureprediction

Data Analytics, we often use Model Development to help us predict future observations from the data we have.

A Model will help us understand the exact relationship between different variables and how these variables are used to predict the result.

@epythonlab
πŸ‘5
Virtual Environment(virtualenv)

virtualenv, is a tool to create isolated Python environments. We need to use virtual environments to keep the dependencies used by different Python projects separate, and to keep our global site-packages directory clean. We also go one step further and install virtualenvwrapper, a set of extensions that make using virtualenv a whole lot easier by providing simpler commands.

We have to use pip python library installation package to install both dependencies on any platform.


@epythonlab #keynote #virtualenv
I can mention many reasons that FLASK is better than DJANGO but in some cases DJANGO is better than FLASK. I usually use FLASK for doing my tasks. What do you use?

#keynote #FLASK #DJANGO @epythonlab
What is WSGI?

WSGI stands for Web server Gate way interface that is a simple calling convention for web servers to forward requests to web applications or frameworks written in the Python programming language.

#keynote #python #webapp
Epython Lab
Which Framework is more likely used to implement machine learning model prediction task?
#Flask is a customizable Python framework that gives developers complete control over how users access data. Flask is a "micro-framework" based on Werkzeug's WSGI toolkit and Jinja 2's templating engine. It is designed as a web framework for RESTful API development.

#keynote #Flask #framework @epythonlab

Why Flask is Microframework?

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
Barriers to Deep Learning

Unfortunately, deep learning is not the solution to every problem.

It has three major barriers:
1. It requires enough data- Deep learning requires a lot of data
2. It requires much computing power- for example:- Google's DeepMind AlphaGo required 1,202 CPUs and 176 GPUs.
3. You probably won't understand why certain decisions were being made, given the complexity and flexibility of these algorithms.

Application of machine learning is one that requires deep learning.
@epythonlab #keynote #deeplearning #machinelearning
TensorFlow and Scikit-learn

If you're interested in solving machine learning questions, two of the most popular open-source libraries in the world for solving these problems are TensorFlow and Scikit-learn.

TensorFlow and Scikit-learn, provide any data scientist the ability to use the most advanced techniques in supervised and unsupervised machine learning easily, and for a variety of situations. You can expect that TensorFlow and Scikit-learn will continue to be used for machine learning in both industry and academia for the foreseeable future.

@epythonlab #keynote #tensorflow #scikitlearn
Check out all #keynote
Python Notes.pdf
153.2 KB
Sometimes it's unable to load Jupyter Notebook on GitHub.

More resources @epythonlab #keynote

#github
πŸ‘5
What is Pandas?

Pandas
is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python.

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary and also importing from CSV(comma-separated value).

Here are the most common pandas functions for data analysis https://youtu.be/8a3Y-HT09sQ
#KeyNote #Pandas #DataFrame #DataScience
❀2