Epython Lab
6.44K subscribers
661 photos
31 videos
104 files
1.22K links
Welcome to Epython Lab, where you can get resources to learn, one-on-one trainings on machine learning, business analytics, and Python, and solutions for business problems.

Buy ads: https://telega.io/c/epythonlab
Download Telegram
We learn how to analysis data in future data science. You stay at home and learn new futures of Data Science.

Think this is an opportunity to be future Data Scientist.

I share you my knowledge and experience.

#QuarantineYourself #Bioinformatics #DataAnalysis #DataScience
#COVID19 the highest death rate in the world

#DataScience #Bioinformatics #DataAnalysis #Pandas
K-Centroid Clustering

Summary: Cluster analysis identifies cohesive subgroups of observations within a dataset. It allows us to reduce a large number of observations into a smaller number of clusters.

STEP 1: SELECT APPROPRIATE VARIABLES

The first step is to understand the objectives for segmentation. Then, choose the appropriate variables that provide the information needed for clustering. A sophisticated cluster analysis cannot compensate for the poor choice of attributes.

STEP 2: DATA PREPARATION

Numeric data: Cluster analyses requires numeric data. Many non-numeric variables can be converted to numeric ones. Make sure to remove outliers as clustering algorithms are highly sensitive to outliers.

Variable reduction: This step often requires variable reduction techniques to combine variables that revolve around a particular theme. A common method is Principal Component Analysis (PCA), which reduces a set of related variables into few principal components (PCs) that explain most of the variances in the data. Rule of thumb is to use PCs that account for ~80% variance.

Scaling the data: Standardizing each variable using the z-score ensures that the results are not overly sensitive to variables with higher values.


STEP 3: DETERMINE THE NUMBER OF CLUSTERS

Use the AR and CH indices to determine the optimal method and number of clusters. Use a box and whisker plot. The higher the median and smaller the variation the better. Remember, clustering is an iterative process and may require comparing several models to arrive at a good solution.


STEP 4: CREATE THE CLUSTERING MODEL

Select the variables, standardization process, clustering method, and number of clusters that gave the best solution. Create the cluster model and append the clusters to the dataset.


STEP 5: VISUALIZE AND VALIDATE RESULTS

Visualization helps us determine the meaning and usefulness of the clustering solution. Use summary statistics to understand difference among clusters.

Validate the results: You can use internal validation and/or external validation. Plot the distribution of the validation variable for each cluster using box and whisker plot to visualize the differences.

#keynotes #cluster #kcentroid #dataanalysis @epythonlab
Collecting, organizing, and processing data is the priority of Data analysis, data analytics, machine learning, etc. If we have interesting data in the right format, we're lucky. But we have no that data, we are going to search source of data which contains all the data that we need. Website is one source of data, but it might not be downloadable. So, web scraping comes in handy to scrape the data from any website.
More... https://t.me/epythonlab/807?single
#machinelearning #dataanalysis #data #dataanalytics #dataanalytics
πŸ‘5
Top Useful Pandas Functions for Daily Data Analysis: https://lnkd.in/eSQabBRN

#dataanalysis #pandas
❀6
Top Pandas methods to filter data from DataFrame

Top Useful Pandas Functions for Daily Data Analysis: https://lnkd.in/eSQabBRN


#data #dataanalysis #pandas #python #machinelearning
❀6
😎 Cool Python 🐍 Tricks πŸ•Ί

βž• More Tricks https://lnkd.in/e2ZX-Net

#python #dataanalysis #datascience #machinelearning
πŸ‘10
Feature Engineering: Extracting features from messy Twitter data using Python

https://lnkd.in/eicGcGim

#python #data #engineering #epythonlab #dataanalysis #datascience

Keep sharing
πŸ‘4
Pandas is a powerful data aggregation tools most data scientists using it for daily basis data analysis.

It has many methods to filter data from DataFrame.

Which one is not the data filtering methods in Pandas?

Learn more https://lnkd.in/e_2pevPd

#dataanalysis #pandas #datascientists #data #epythonlab #python
❀5πŸ‘2