One of the most frequent questions I got is how to start with data science and machine learning as a complete beginner, and what skills do you need to have. Do you need to know programming, do you need to know math etc.
Below is my answer I wrote on my discord server, few years ago. It's still relevant and hopefully helpful.
Here are some things you should be familiar with to start your journey as data scientist:
Statistics
You need to have some statistical knowledge, like theory of probability, bayes theorem, probability distributions (uniform, normal/gaussian, logarithmic, exponential, chi-square distribution etc), you should know some basics like what is mean, median and mode. You should understand hypothesis testing and statistical significance as well. If mentioned terms are not familiar to you try researching about them. I shared 4 books of statistics for data science here at discord, they might be useful.
Programming
Generally you are going to need some programming background, which languages have you used before?
Most of people use python, it's great for preparing data as well as using some ML packages for creating machine learning models. What is great about Python is that it's very beginner friendly. R programming language is another option for data science/machine learning. Java and Scala offers nice libraries for data science as well. I personally use Java at my work.
Most important libraries
In case Python is your first choice (and it probably is if you are beginner) then you should check pandas - the biggest library for data manipulation and data analysis, numpy - library for multidimensional arrays and matrices, there are many libraries for machine learning as Keras (Deep learning), Scikit-learn, PyTorch, TensorFlow. Some libraries for data visualization are also important - biggest is matplotlib but there are also Seaborn, Plotly, ggplot, Bokeh...
When it comes to java i use deeplearning4j, ApacheSpark, Apache Hadoop, and bunch of NLP (Natural Processing Libraries) which are not so important now if you are total beginner. We will get you there eventually.
Where to start?
If this sounds like too much for you don't worry, that is just an overview of situation in the field. You don't have to know all those libraries, some basics of Pandas, Numpy and maybe Scikit-learn for beginning is enough.
First thing i have ever read about machine learning which is very important for data science is this medium article:
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471
It's subtitle is: The worldβs easiest introduction to Machine Learning and it's not far form truth. After i read this i understood machine learning as well as data science much better.
Tip: medium allows you to read 3 articles for free per month, but if you open them in incognito mode you have unlimited access to all articles for free smile
After finishing this try researching about other ML concepts like: Types of ML algorithms, classification and regression problems, overfitting/underfitting, model evaluation techniques and measures etc.
I would definitely recommend Andrew Ng's courses on coursera, some of them are available on yt as well.
Once you understand basic concepts, you can dive deeper in data science. Learn about datasets, how to prepare data, how to handle missing values, how to perform feature engineering etc. and try to solve some real world data science problems. I shared 500+ interesting data science projects with source code in post above. I also shared a data science live course by UC Berkeley, Fall 2022. Go check that as well.
Phew π , that was lots of text. I got really tired writing it. But since i get 10-20 of these questions every day, mostly on Instagram and WhatsApp, it's better to have all written in one place. I hope i helped, good luck with your data science journey!
#data_science #datascience #Berkeley
ββββββββββββββββββββ
Join @Coding_CommunityOfficial for more
ππ‘ππ’π¬ ππππ₯π‘ππ‘πππ
Below is my answer I wrote on my discord server, few years ago. It's still relevant and hopefully helpful.
Here are some things you should be familiar with to start your journey as data scientist:
Statistics
You need to have some statistical knowledge, like theory of probability, bayes theorem, probability distributions (uniform, normal/gaussian, logarithmic, exponential, chi-square distribution etc), you should know some basics like what is mean, median and mode. You should understand hypothesis testing and statistical significance as well. If mentioned terms are not familiar to you try researching about them. I shared 4 books of statistics for data science here at discord, they might be useful.
Programming
Generally you are going to need some programming background, which languages have you used before?
Most of people use python, it's great for preparing data as well as using some ML packages for creating machine learning models. What is great about Python is that it's very beginner friendly. R programming language is another option for data science/machine learning. Java and Scala offers nice libraries for data science as well. I personally use Java at my work.
Most important libraries
In case Python is your first choice (and it probably is if you are beginner) then you should check pandas - the biggest library for data manipulation and data analysis, numpy - library for multidimensional arrays and matrices, there are many libraries for machine learning as Keras (Deep learning), Scikit-learn, PyTorch, TensorFlow. Some libraries for data visualization are also important - biggest is matplotlib but there are also Seaborn, Plotly, ggplot, Bokeh...
When it comes to java i use deeplearning4j, ApacheSpark, Apache Hadoop, and bunch of NLP (Natural Processing Libraries) which are not so important now if you are total beginner. We will get you there eventually.
Where to start?
If this sounds like too much for you don't worry, that is just an overview of situation in the field. You don't have to know all those libraries, some basics of Pandas, Numpy and maybe Scikit-learn for beginning is enough.
First thing i have ever read about machine learning which is very important for data science is this medium article:
https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471
It's subtitle is: The worldβs easiest introduction to Machine Learning and it's not far form truth. After i read this i understood machine learning as well as data science much better.
Tip: medium allows you to read 3 articles for free per month, but if you open them in incognito mode you have unlimited access to all articles for free smile
After finishing this try researching about other ML concepts like: Types of ML algorithms, classification and regression problems, overfitting/underfitting, model evaluation techniques and measures etc.
I would definitely recommend Andrew Ng's courses on coursera, some of them are available on yt as well.
Once you understand basic concepts, you can dive deeper in data science. Learn about datasets, how to prepare data, how to handle missing values, how to perform feature engineering etc. and try to solve some real world data science problems. I shared 500+ interesting data science projects with source code in post above. I also shared a data science live course by UC Berkeley, Fall 2022. Go check that as well.
Phew π , that was lots of text. I got really tired writing it. But since i get 10-20 of these questions every day, mostly on Instagram and WhatsApp, it's better to have all written in one place. I hope i helped, good luck with your data science journey!
#data_science #datascience #Berkeley
ββββββββββββββββββββ
Join @Coding_CommunityOfficial for more
ππ‘ππ’π¬ ππππ₯π‘ππ‘πππ