#KeyNote #BusinessAnalytics #DataScience
Cross Industry Standard Process for Data Mining (CRISP-DM)
"A data mining process model that describes commonly used approaches that data mining experts use to tackle problems... it was the leading methodology used by industry data miners." -Wikipedia
CRISP-DM Steps
1. Business Issue Understanding
2. Data Understanding
3. Data Preparation
4. Analysis/Modeling
5. Validation
6. Presentation/Visualization
Cross Industry Standard Process for Data Mining (CRISP-DM)
"A data mining process model that describes commonly used approaches that data mining experts use to tackle problems... it was the leading methodology used by industry data miners." -Wikipedia
CRISP-DM Steps
1. Business Issue Understanding
2. Data Understanding
3. Data Preparation
4. Analysis/Modeling
5. Validation
6. Presentation/Visualization
#KeyNote #DataAnanlysisMethodology #BusinessAnalytics
Type of data analysis methodology
Predictive
Predictive analytics uses existing data to predict a future outcome. For example, a company may use predictive analytics to forecast demand or whether a customer will respond to an advertising campaign.
Geospatial
This type of analysis uses location based data to help drive your conclusions. Some examples are:
Identifying customers by a geographic dimension such as zip code, state, or county, or
Calculating the distance between addresses and your stores, or
Creating a trade area based upon your customer locations for further analysis
Some types of Geospatial analysis require the use of special software - such as software that can convert an address to Latitude & Longitude, or can calculate the drive time between two geographic points on a map.
Segmentation
Segmentation is the process of grouping data together. Groups can be simple, such as customers who have purchased different items, to more complex segmentation techniques where you identify stores that are similar based upon the demographics of their customers.
Aggregation
This methodology simply means calculating a value across a group or dimension and is commonly used in data analysis. For example, you may want to aggregate sales data for a salesperson by month - adding all of the sales closed for each month. Then, you may want to aggregate across dimensions, such as sales by month per sales territory. In this scenario, you could calculate the sales per month for each salesperson, and then add the sales per month for all salespeople in each region.
Aggregation is often done in reporting to be able to β slice and diceβ information to help managers make decisions and view performance.
Descriptive
Descriptive statistics provides simple summaries of a data sample. Examples could be calculating average GPA for applicants to a school, or calculating the batting average of a professional baseball player. In our electricity supply scenario, we could use descriptive statistics to calculate the average temperature per hour, per day, or per date.
Some of the commonly used descriptive statistics are Mean, Median, Mode, Standard Deviation, and Interquartile range.
Type of data analysis methodology
Predictive
Predictive analytics uses existing data to predict a future outcome. For example, a company may use predictive analytics to forecast demand or whether a customer will respond to an advertising campaign.
Geospatial
This type of analysis uses location based data to help drive your conclusions. Some examples are:
Identifying customers by a geographic dimension such as zip code, state, or county, or
Calculating the distance between addresses and your stores, or
Creating a trade area based upon your customer locations for further analysis
Some types of Geospatial analysis require the use of special software - such as software that can convert an address to Latitude & Longitude, or can calculate the drive time between two geographic points on a map.
Segmentation
Segmentation is the process of grouping data together. Groups can be simple, such as customers who have purchased different items, to more complex segmentation techniques where you identify stores that are similar based upon the demographics of their customers.
Aggregation
This methodology simply means calculating a value across a group or dimension and is commonly used in data analysis. For example, you may want to aggregate sales data for a salesperson by month - adding all of the sales closed for each month. Then, you may want to aggregate across dimensions, such as sales by month per sales territory. In this scenario, you could calculate the sales per month for each salesperson, and then add the sales per month for all salespeople in each region.
Aggregation is often done in reporting to be able to β slice and diceβ information to help managers make decisions and view performance.
Descriptive
Descriptive statistics provides simple summaries of a data sample. Examples could be calculating average GPA for applicants to a school, or calculating the batting average of a professional baseball player. In our electricity supply scenario, we could use descriptive statistics to calculate the average temperature per hour, per day, or per date.
Some of the commonly used descriptive statistics are Mean, Median, Mode, Standard Deviation, and Interquartile range.
#Keynote #DataScience #NLP #Python @epythonlab
Natural Language Processing
Natural language processing (NLP) is the field devoted to methods and algorithms for processing human (natural) languages for computers. NLP is a vast discipline that is actively being researched. Some examples of machine learning applications using NLP include sentiment analysis, topic modeling, and language translation. In NLP, the following terms have specific meanings:
- Corpus: The body/collection of text being investigated.
- Document: The unit of analysis, what is considered a single observation.
Examples of corpora include a collection of reviews and tweets, the text of the Iliad, and Wikipedia articles. Documents can be whatever you decided, it is what your model will consider an observation. For the example when the corpus is a collection of reviews or tweets, it is logical to make the document a single review or tweet. For the example of the text of the Iliad, we can set the document size to a sentence or a paragraph. The choice of document size will be influenced by the size of our corpus. If it is large, it may make sense to call each paragraph a document. As is usually the case, some design choices that need to be made.
Natural Language Processing
Natural language processing (NLP) is the field devoted to methods and algorithms for processing human (natural) languages for computers. NLP is a vast discipline that is actively being researched. Some examples of machine learning applications using NLP include sentiment analysis, topic modeling, and language translation. In NLP, the following terms have specific meanings:
- Corpus: The body/collection of text being investigated.
- Document: The unit of analysis, what is considered a single observation.
Examples of corpora include a collection of reviews and tweets, the text of the Iliad, and Wikipedia articles. Documents can be whatever you decided, it is what your model will consider an observation. For the example when the corpus is a collection of reviews or tweets, it is logical to make the document a single review or tweet. For the example of the text of the Iliad, we can set the document size to a sentence or a paragraph. The choice of document size will be influenced by the size of our corpus. If it is large, it may make sense to call each paragraph a document. As is usually the case, some design choices that need to be made.
Artificial Intelligence Vs Machine Learning
. Artificial Intelligence is the making of intelligent machines by enabling the machine to copy human behaviors
. Machine Learning is subset of AI uses statistics to enable machines to improve with experience.
. Deep Learning is subset of machine learning that enables computers to solve more complex problems.
#Keynote #machinelearning #AI @epythonlab #deeplearning
. Artificial Intelligence is the making of intelligent machines by enabling the machine to copy human behaviors
. Machine Learning is subset of AI uses statistics to enable machines to improve with experience.
. Deep Learning is subset of machine learning that enables computers to solve more complex problems.
#Keynote #machinelearning #AI @epythonlab #deeplearning
π2
Forwarded from Epython Lab (Asibeh Tenager)
#KeyNote #DataScience #datanalytics #modeltrain #futureprediction
Data Analytics, we often use Model Development to help us predict future observations from the data we have.
A Model will help us understand the exact relationship between different variables and how these variables are used to predict the result.
@epythonlab
Data Analytics, we often use Model Development to help us predict future observations from the data we have.
A Model will help us understand the exact relationship between different variables and how these variables are used to predict the result.
@epythonlab
π5
Virtual Environment(virtualenv)
virtualenv, is a tool to create isolated Python environments. We need to use virtual environments to keep the dependencies used by different Python projects separate, and to keep our global site-packages directory clean. We also go one step further and install virtualenvwrapper, a set of extensions that make using virtualenv a whole lot easier by providing simpler commands.
We have to use pip python library installation package to install both dependencies on any platform.
@epythonlab #keynote #virtualenv
virtualenv, is a tool to create isolated Python environments. We need to use virtual environments to keep the dependencies used by different Python projects separate, and to keep our global site-packages directory clean. We also go one step further and install virtualenvwrapper, a set of extensions that make using virtualenv a whole lot easier by providing simpler commands.
We have to use pip python library installation package to install both dependencies on any platform.
@epythonlab #keynote #virtualenv
I can mention many reasons that FLASK is better than DJANGO but in some cases DJANGO is better than FLASK. I usually use FLASK for doing my tasks. What do you use?
#keynote #FLASK #DJANGO @epythonlab
#keynote #FLASK #DJANGO @epythonlab
Epython Lab
Which Framework is more likely used to implement machine learning model prediction task?
#Flask is a customizable Python framework that gives developers complete control over how users access data. Flask is a "micro-framework" based on Werkzeug's WSGI toolkit and Jinja 2's templating engine. It is designed as a web framework for RESTful API development.
#keynote #Flask #framework @epythonlab
Why Flask is Microframework?
Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
#keynote #Flask #framework @epythonlab
Why Flask is Microframework?
Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
Barriers to Deep Learning
Unfortunately, deep learning is not the solution to every problem.
It has three major barriers:
1. It requires enough data- Deep learning requires a lot of data
2. It requires much computing power- for example:- Google's DeepMind AlphaGo required 1,202 CPUs and 176 GPUs.
3. You probably won't understand why certain decisions were being made, given the complexity and flexibility of these algorithms.
Application of machine learning is one that requires deep learning.
@epythonlab #keynote #deeplearning #machinelearning
Unfortunately, deep learning is not the solution to every problem.
It has three major barriers:
1. It requires enough data- Deep learning requires a lot of data
2. It requires much computing power- for example:- Google's DeepMind AlphaGo required 1,202 CPUs and 176 GPUs.
3. You probably won't understand why certain decisions were being made, given the complexity and flexibility of these algorithms.
Application of machine learning is one that requires deep learning.
@epythonlab #keynote #deeplearning #machinelearning
TensorFlow and Scikit-learn
If you're interested in solving machine learning questions, two of the most popular open-source libraries in the world for solving these problems are TensorFlow and Scikit-learn.
TensorFlow and Scikit-learn, provide any data scientist the ability to use the most advanced techniques in supervised and unsupervised machine learning easily, and for a variety of situations. You can expect that TensorFlow and Scikit-learn will continue to be used for machine learning in both industry and academia for the foreseeable future.
@epythonlab #keynote #tensorflow #scikitlearn
If you're interested in solving machine learning questions, two of the most popular open-source libraries in the world for solving these problems are TensorFlow and Scikit-learn.
TensorFlow and Scikit-learn, provide any data scientist the ability to use the most advanced techniques in supervised and unsupervised machine learning easily, and for a variety of situations. You can expect that TensorFlow and Scikit-learn will continue to be used for machine learning in both industry and academia for the foreseeable future.
@epythonlab #keynote #tensorflow #scikitlearn
What is Pandas?
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python.
The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary and also importing from CSV(comma-separated value).
Here are the most common pandas functions for data analysis https://youtu.be/8a3Y-HT09sQ
#KeyNote #Pandas #DataFrame #DataScience
Pandas is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python.
The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary and also importing from CSV(comma-separated value).
Here are the most common pandas functions for data analysis https://youtu.be/8a3Y-HT09sQ
#KeyNote #Pandas #DataFrame #DataScience
YouTube
Filtering Rows and Columns in Pandas DataFrame
Hi everyone, welcome to this tutorial on pandas data manipulation and aggregations functions. In this tutorial, you will learn about top pandas functions tha...
β€2