Epython Lab
6.76K subscribers
633 photos
30 videos
103 files
1.16K links
Welcome to Epython Lab, where you can get resources to learn, one-on-one trainings on machine learning, business analytics, and Python, and solutions for business problems.

Buy ads: https://telega.io/c/epythonlab
Download Telegram
Epython Lab
#Names #Variable #Python Q: Explain the difference between name and variable in Python? Send your answer to @pythonethbot
#Keynote #Variables #Names
variables are objects stored in memory.
Names are labels that we assign to them. Names are how we refers to variables through code.
Ex: a = 5
a is the name that points to a variable in a memory. There can be possible that multiple names points to the same variable.
Variables keep track of the information we need to successfully execute a program. Variables can be used to store a variety of types of information in computer memory.
#KeyNote #DataScience #Methodology #DataMining

Data Science Methodologies

What is CRISP-DM(Cross industries - Data Mining)?

The CRISP-DM methodology is a process aimed at increasing the use of data mining over a wide variety of business applications and industries. The intent is to take case specific scenarios and general behaviors to make them domain neutral. CRISP-DM is comprised of six steps with an entity that has to implement in order to have a reasonable chance of success.

The six steps are shown in the following diagram:

1. Business Understanding:- This stage is the most important because this is where the intention of the project is outlined. Foundational Methodology and CRISP-DM are aligned here. It requires communication and clarity. The difficulty here is that stakeholders have different objectives, biases, and modalities of relating information. They don’t all see the same things or in the same manner. Without clear, concise, and complete perspective of what the project goals are resources will be needlessly expended.

2. Data Understanding:- Data understanding relies on business understanding. Data is collected at this stage of the process. The understanding of what the business wants and needs will determine what data is collected, from what sources, and by what methods. CRISP-DM combines the stages of Data Requirements, Data Collection, and Data Understanding from the Foundational Methodology outline.

3. Data Preparation:- Once the data has been collected, it must be transformed into a useable subset unless it is determined that more data is needed. Once a dataset is chosen, it must then be checked for questionable, missing, or ambiguous cases. Data Preparation is common to CRISP-DM and Foundational Methodology.

4. Modeling:- Once prepared for use, the data must be expressed through whatever appropriate models, give meaningful insights, and hopefully new knowledge. This is the purpose of data mining: to create knowledge information that has meaning and utility. The use of models reveals patterns and structures within the data that provide insight into the features of interest. Models are selected on a portion of the data and adjustments are made if necessary. Model selection is an art and science. Both Foundational Methodology and CRISP-DM are required for the subsequent stage.

5. Evaluation:- The selected model must be tested. This is usually done by having a pre-selected test, set to run the trained model on. This will allow you to see the effectiveness of the model on a set it sees as new. Results from this are used to determine efficacy of the model and foreshadows its role in the next and final stage.

6. Deployment:- In the deployment step, the model is used on new data outside of the scope of the dataset and by new stakeholders. The new interactions at this phase might reveal the new variables and needs for the dataset and model. These new challenges could initiate revision of either business needs and actions, or the model and data, or both.

CRISP-DM is a highly flexible and cyclical model. Flexibility is required at each step along with communication to keep the project on track. At any of the six stages, it may be necessary to revisit an earlier stage and make changes. The key point of this process is that it’s cyclical; therefore, even at the finish you are having another business understanding encounter to discuss the viability after deployment. The journey continues.

For more information on CRISP-DM, go to: IBM Knowledge Center – CRISP-DM Help Overview
πŸ‘1
#KeyNote #SQL #Database #DataAnalyzes #RDMS #Python
Benefits of Python for Database Programming
- Python is a popular scripting language to connect to the database and analyzes the data.
- Python ecosystem: - NumPy, pandas, matplotlib, SciPy
- Ease of use
- Python supports relational database systems
- Python database API's to connect to the database
- Detailed documentation: The python is easily available
@Epythonlab
#KeyNote #DataScience #datanalytics #modeltrain #futureprediction

Data Analytics, we often use Model Development to help us predict future observations from the data we have.

A Model will help us understand the exact relationship between different variables and how these variables are used to predict the result.

@epythonlab
#KeyNote #UnsupervisedMachineLearning #Clustering #k-means

Clustering is one of unsupervised machine learning algorithm. There are many models for clustering out there. Despite its simplicity, the K-means is vastly used for clustering in many data science applications, especially useful if you need to quickly discover insights from unlabeled data.

Some real-world applications of k-means:

- Customer segmentation
- Understand what the visitors of a website are trying to accomplish
- Pattern recognition
- Machine learning
- Data compression
#KeyNote #BusinessAnalytics #DataScience

Cross Industry Standard Process for Data Mining (CRISP-DM)
"A data mining process model that describes commonly used approaches that data mining experts use to tackle problems... it was the leading methodology used by industry data miners." -Wikipedia

CRISP-DM Steps

1. Business Issue Understanding
2. Data Understanding
3. Data Preparation
4. Analysis/Modeling
5. Validation
6. Presentation/Visualization
#KeyNote #DataAnanlysisMethodology #BusinessAnalytics

Type of data analysis methodology

Predictive

Predictive analytics uses existing data to predict a future outcome. For example, a company may use predictive analytics to forecast demand or whether a customer will respond to an advertising campaign.

Geospatial

This type of analysis uses location based data to help drive your conclusions. Some examples are:

Identifying customers by a geographic dimension such as zip code, state, or county, or
Calculating the distance between addresses and your stores, or
Creating a trade area based upon your customer locations for further analysis

Some types of Geospatial analysis require the use of special software - such as software that can convert an address to Latitude & Longitude, or can calculate the drive time between two geographic points on a map.

Segmentation

Segmentation is the process of grouping data together. Groups can be simple, such as customers who have purchased different items, to more complex segmentation techniques where you identify stores that are similar based upon the demographics of their customers.

Aggregation

This methodology simply means calculating a value across a group or dimension and is commonly used in data analysis. For example, you may want to aggregate sales data for a salesperson by month - adding all of the sales closed for each month. Then, you may want to aggregate across dimensions, such as sales by month per sales territory. In this scenario, you could calculate the sales per month for each salesperson, and then add the sales per month for all salespeople in each region.

Aggregation is often done in reporting to be able to β€œ slice and dice” information to help managers make decisions and view performance.

Descriptive

Descriptive statistics provides simple summaries of a data sample. Examples could be calculating average GPA for applicants to a school, or calculating the batting average of a professional baseball player. In our electricity supply scenario, we could use descriptive statistics to calculate the average temperature per hour, per day, or per date.

Some of the commonly used descriptive statistics are Mean, Median, Mode, Standard Deviation, and Interquartile range.
#Keynote #DataScience #NLP #Python @epythonlab

Natural Language Processing

Natural language processing (NLP) is the field devoted to methods and algorithms for processing human (natural) languages for computers. NLP is a vast discipline that is actively being researched. Some examples of machine learning applications using NLP include sentiment analysis, topic modeling, and language translation. In NLP, the following terms have specific meanings:

- Corpus: The body/collection of text being investigated.
- Document: The unit of analysis, what is considered a single observation.

Examples of corpora include a collection of reviews and tweets, the text of the Iliad, and Wikipedia articles. Documents can be whatever you decided, it is what your model will consider an observation. For the example when the corpus is a collection of reviews or tweets, it is logical to make the document a single review or tweet. For the example of the text of the Iliad, we can set the document size to a sentence or a paragraph. The choice of document size will be influenced by the size of our corpus. If it is large, it may make sense to call each paragraph a document. As is usually the case, some design choices that need to be made.
Python is an interpreted programming language.
What is the difference between interpreter and compiler?
Explain your reason?

#Keynote #python
πŸ‘1
Artificial Intelligence Vs Machine Learning

. Artificial Intelligence is the making of intelligent machines by enabling the machine to copy human behaviors

. Machine Learning is subset of AI uses statistics to enable machines to improve with experience.

. Deep Learning is subset of machine learning that enables computers to solve more complex problems.

#Keynote #machinelearning #AI @epythonlab #deeplearning
πŸ‘2
Forwarded from Epython Lab (Asibeh Tenager)
#KeyNote #DataScience #datanalytics #modeltrain #futureprediction

Data Analytics, we often use Model Development to help us predict future observations from the data we have.

A Model will help us understand the exact relationship between different variables and how these variables are used to predict the result.

@epythonlab
πŸ‘5
Virtual Environment(virtualenv)

virtualenv, is a tool to create isolated Python environments. We need to use virtual environments to keep the dependencies used by different Python projects separate, and to keep our global site-packages directory clean. We also go one step further and install virtualenvwrapper, a set of extensions that make using virtualenv a whole lot easier by providing simpler commands.

We have to use pip python library installation package to install both dependencies on any platform.


@epythonlab #keynote #virtualenv
I can mention many reasons that FLASK is better than DJANGO but in some cases DJANGO is better than FLASK. I usually use FLASK for doing my tasks. What do you use?

#keynote #FLASK #DJANGO @epythonlab
What is WSGI?

WSGI stands for Web server Gate way interface that is a simple calling convention for web servers to forward requests to web applications or frameworks written in the Python programming language.

#keynote #python #webapp
Epython Lab
Which Framework is more likely used to implement machine learning model prediction task?
#Flask is a customizable Python framework that gives developers complete control over how users access data. Flask is a "micro-framework" based on Werkzeug's WSGI toolkit and Jinja 2's templating engine. It is designed as a web framework for RESTful API development.

#keynote #Flask #framework @epythonlab

Why Flask is Microframework?

Flask is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions.
Barriers to Deep Learning

Unfortunately, deep learning is not the solution to every problem.

It has three major barriers:
1. It requires enough data- Deep learning requires a lot of data
2. It requires much computing power- for example:- Google's DeepMind AlphaGo required 1,202 CPUs and 176 GPUs.
3. You probably won't understand why certain decisions were being made, given the complexity and flexibility of these algorithms.

Application of machine learning is one that requires deep learning.
@epythonlab #keynote #deeplearning #machinelearning
TensorFlow and Scikit-learn

If you're interested in solving machine learning questions, two of the most popular open-source libraries in the world for solving these problems are TensorFlow and Scikit-learn.

TensorFlow and Scikit-learn, provide any data scientist the ability to use the most advanced techniques in supervised and unsupervised machine learning easily, and for a variety of situations. You can expect that TensorFlow and Scikit-learn will continue to be used for machine learning in both industry and academia for the foreseeable future.

@epythonlab #keynote #tensorflow #scikitlearn
Check out all #keynote
Python Notes.pdf
153.2 KB
Sometimes it's unable to load Jupyter Notebook on GitHub.

More resources @epythonlab #keynote

#github
πŸ‘5
What is Pandas?

Pandas
is an open source library, providing high-performance, easy-to-use data structures and data analysis tools for Python.

The DataFrame is one of Pandas' most important data structures. It's basically a way to store tabular data where you can label the rows and the columns. One way to build a DataFrame is from a dictionary and also importing from CSV(comma-separated value).

Here are the most common pandas functions for data analysis https://youtu.be/8a3Y-HT09sQ
#KeyNote #Pandas #DataFrame #DataScience
❀2