Data science/ML/AI
13.7K subscribers
561 photos
2 videos
145 files
320 links
Data science and machine learning hub

Python, SQL, stats, ML, deep learning, projects, PDFs, roadmaps and AI resources.

For beginners, data scientists and ML engineers
πŸ‘‰ https://rebrand.ly/bigdatachannels

DMCA: @disclosure_bds
Contact: @mldatascientist
Download Telegram
Pandas_Cheat_Sheet.pdf
387.2 KB
THE PANDAS CHEAT SHEET
A well detailed guide to data wrangling using pandas
πŸ‘1
Reasons Why Data Goes Missing
Understanding the reason for the missing data in your dataset is important because it helps you determine the type of missing data and what you need to do about it. Lets get our brain to grasp this concept shall we?😁😁
Missing Completely at Random(MCAR): This is a fact that a certain missing value has nothing to do with its hypothetical value and values of other variables. eg:
You collect data on end-of-year holiday spending patterns. You survey adults on how much they spend annually on gifts for family and friends in dollar amounts.
You note that there are a few missing values in your holiday spending dataset. Some people started answering your survey but dropped out or skipped a question.
However, you note that you have data points from a wide distribution, ranging from low to high values.
Therefore, you conclude that the missing values aren’t related to any specific holiday spending amount range.

Missing at Random(MAR):This means that the propensity for a data point to be missing is unrelated to the missing data but related to some observed data. eg:
You repeat your data collection with a new group. You notice that there are more missing values for adults aged 18–25 than for other age groups.
But looking at the observed data for adults aged 18–25, you notice that the values are widely spread. It’s unlikely that the missing data are missing because of the specific values themselves.
Instead, some younger adults may be less inclined to reveal their holiday spending amounts for unrelated reasons (e.g., more protective of their privacy).

Missing Not at Random(MNAR): This is data that is neither MAR nor MCAR (i.e. the value of the variable that's missing is related to the reason it's missing). eg:
If some participants with low incomes avoid reporting their holiday spending amounts because they are low in your datast, then this is a MNAR problem
πŸ‘2
Deep Learning free courses

Introduction to Deep Learning

🎬 10 video lesson
Duration ⏰: 1 week worth of material
πŸƒβ€β™‚οΈ Self paced
πŸ“„ Notes, πŸ‘¨β€πŸ« Labs and many more
☒️ Projects, Competitions
Teacher: Alexander Amini, Ava Soleimany
Source: MIT
πŸ”— Course link

Practical Deep Learning For Coders
🎬 8 video lessons
πŸ“” Book Read online
πŸ“„ Notes, πŸ‘¨β€πŸ« Labs and many more
Duration ⏰: 7 weeks long, 10 hours a week
πŸƒβ€β™‚οΈ Self paced
Teacher: Jeremy Howard
Source: fast.ai
πŸ”— Course link

Deep Learning
by Kaggle, on youtube
🎬 13 video lesson
Duration ⏰: 2 hours worth of material
πŸ”— Course link

Learn Deep Learning and TensorFlow, without a Ph.D.
🎬 8 video lesson
Duration ⏰: 3 hours worth of material
πŸƒβ€β™‚οΈ Self paced
πŸ“„ Notes, slides
Teacher: Martin GΓΆrner
Source: Google Cloud
πŸ”— Course link

Explore Deep Learning for Natural Language Processing
🎬 9 video lesson
Duration ⏰: 7-8 hours worth of material
πŸƒβ€β™‚οΈ Self paced
Resource: Trailhead
πŸ”— Course link

Deep Learning Summer School
🎬
35 video lesson
Duration ⏰: 35+ hours
πŸƒβ€β™‚οΈ Self paced
Resource: deeplearning
πŸ”— Course link

Deep Learning Prerequisites: The Numpy Stack in Python V2
Rating ⭐️: 4.5 out of 5
Students πŸ‘¨β€πŸŽ“: 2230
Duration ⏰: 1hr 59min
Created by Lazy Programmer Team, Lazy Programmer Inc.
πŸ”— Course link

AI 101 Video Presentation
presentation given by πŸ‘¨β€πŸ«: MIT’s Brandon Leshchinskiy
πŸ”— Presentation link

Deep Learning in Life Sciences - Spring 2021
🎬
22 video lesson
Duration ⏰: 31 hours worth of material
πŸƒβ€β™‚οΈ Self paced
Teacher: Manolis Kellis
Resource: Class Central
πŸ”— Course link

Intro to Deep Learning
by Kaggle
Use TensorFlow and Keras to build and train neural networks for structured data.
Duration ⏰: 4 hours
πŸ”— Course link


Deep Learning An MIT Press book πŸ“š
Authers: Ian Goodfellow, Yoshua Bengio and Aaron Courville
πŸ”— Book link

#Deep_Learning #deeplearning #dl #machinelearning
βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–
πŸ‘‰Join @bigdataspecialist for moreπŸ‘ˆ
πŸ‘1
COMMON HYPOTHESIS TEST.pdf
5.2 MB
A GUIDE TO UNDERSTANDING HYPOTHESIS TEST
Tutorial-Math-Deep-Learning-2018.pdf
36.9 MB
A Guide to Understanding Mathematics for Deep Learning
Amazing Free Resources on Data Science and Machine Learning for Beginners

1) Data Science for Beginners - A Curriculum
By: Azure Cloud Advocates at Microsoft
Stars ⭐️: 15K
Fork: 2.4K
Repo: https://microsoft.github.io/Data-Science-For-Beginners/#/?id=lessons

2) Machine Learning for Beginners - A Curriculum
By: Azure Cloud Advocates at Microsoft
Stars ⭐️: 38K
Fork: 7.4K
Repo: https://microsoft.github.io/ML-For-Beginners/#/
Head First SQL
Here's a brain friendly guide to learning SQL for beginners

Author:Lynn Beighley
Pages: 586
Link: Click Me!
Statistics Guide for Data Science
Learning Statistics for Data Science can be quite overwhelming for beginners without a Statistics background. One can get confused on which topics to learn or books to read up to equip their knowledge

You don't have to learn it all. Here are essential topics you can learn

1) Know what a p value is and its limitations
2) Linear Regression and its Assumptions
3) Different Statistical Distributions and when to use them
4) Mean, Variance for Normal, Poisson, and Uniform Distribution
5) Sampling Techniques and Common Designs(eg: A/B)
6) Bayes Theorems and it's application
7) Measurements and Interpretation of Confidence Intervals
8) Logistics Regressions and ROC curves
9) Resampling(Cross Validation and Bootstrapping)
10) Tree Based Models

βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–
Join @datascience_bds for more cool data science materials.
*This channel belongs to @bigdataspecialist group
Where to find Data for Machine Learning

High quality data is key for building useful machine learning models. Models learn their behaviour from data. So, finding the right data is a big part of the work to build machine learning into your products.

This article gives a concise explanation on finding the right data for your models.

https://towardsdatascience.com/where-to-find-data-for-machine-learning-e375e2a515c8
18 Best Data Science PodCasts
SQL Free Resources
Looking to learn SQL for free? Here is a curated list of websites you can use to upgeade your SQL skill level or practice writing queries. Remember SQL is a necessary skill to have in your toolkit as a data professional.

1. W3 Schools

https://w3schools.com/sql

2. SQL Zoo

http://sqlzoo.net

3. SQLBolt

http://sqlbolt.com

4. Khan Academy

https://khanacademy.org/computing/computer-programming/sql

5. FreeCode Camp

https://youtu.be/HXV3zeQKqGY

To Practice what you have learned and build your skill at hte same time , you can use these:

6. Hacker Rank

https://hackerrank.com/domains/sql

7. SQL Murder Mystery Game

https://mystery.knightlab.com

#datascience #SQL

βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–βž–
Join @datascience_bds for more cool data science materials.
*This channel belongs to @bigdataspecialist group
πŸ‘1
Machine Learning with Python: Zero to GBMs

This is a practical and beginner-friendly introduction to supervised machine learning, decision trees, and gradient boosting using Python. This is a self-paced course where you can:

πŸ‘ŒWatch hands-on coding-focused video tutorials
πŸ‘ŒPractice coding with cloud Jupyter notebooks
πŸ‘ŒBuild an end-to-end real-world course project
πŸ‘ŒEarn a verified certificate of accomplishment
πŸ‘ŒInteract with a global community of learners
πŸ‘ŒYou will solve 2 coding assignments & build a course project where you'll train ML models using a large real-world datasets

Link: https://jovian.ai/learn/machine-learning-with-python-zero-to-gbms
Text Classification with TensorFlow

This is an intermediate-level Python course taught by MIT grad student Kylie Ying. You can code along at home in your browser.

You'll use TensorFlow to train Neural Networks, visualize a diabetes dataset, and perform Text Classification on wine reviews. (2 hour YouTube course)

Link: https://www.freecodecamp.org/news/text-classification-tensorflow/
Introduction to Machine Learning, IIT Kharagpur

πŸ†“ Free Online Course
πŸ’» 44 Lecture Videos
πŸƒβ€β™‚οΈ Self paced
Teacher πŸ‘¨β€πŸ« : Prof. S. Sarkar

πŸ”— https://nptel.ac.in/courses/106105152
The Scikit-Learn Guide

Looking to improve your knowledge on machine Learning ALgorithms, there's no better place to start from than to check the sklearn documentation

There is alot of interesting information you can gain there

https://scikit-learn.org/stable/
πŸ‘1
Want to make sure your Spark applications reach the best performance?

We invite you to our Dynamic Talks #90 | Spark performance mastery!
⏰ Date and time: July 20, 6:30 pm (CET)

The speaker is IΓ±igo San Aniceto Orbegozo, Staff Big Data Engineer at Grid Dynamics.

πŸ’» Participation is free but registration is required: https://forms.gle/UVvfWG5LeZAXTuNQ6

More about event: https://fb.me/e/1U9Vq4epw
πŸ‘1
Just wanted to share this πŸ‘† here as well in case somebody is interested.