Data Science and Engineering

The 5 graph algorithms that you should know

Rahul Agarwal describes some of the most important graph algorithms you should know and how to implement them using Python.

293 views07:37

https://towardsdatascience.com/data-scientists-the-five-graph-algorithms-that-you-should-know-30f454fa5513

308 views07:37

Data Science and Engineering

A simple neural network with Python and Keras

https://www.pyimagesearch.com/2016/09/26/a-simple-neural-network-with-python-and-keras

PyImageSearch

A simple neural network with Python and Keras - PyImageSearch

Learn how to create a simple neural network using the Keras neural network and deep learning library along with the Python programming language.

1.23K views21:12

Data Science and Engineering

Detecting and treating outliers is a necessity in any dataset as it inevitably introduces the deviation in the model estimations. It can make the difference between winning and loosing a data science competition.

https://lnkd.in/fMV6GaY

This article deals with the detection of the outliers in Time Series data using different ideas, every idea improving upon the previous one and finally treating the outliers in the best way possible.

Hint of ideas covered.....
Idea #1 — Winsorization
Idea #2 Standard deviation etc.

Medium

Forecasting: how to detect outliers?

(the article below is an extract from the book Data Science for Supply Chain Forecast, available here)

352 views06:27

Data Science and Engineering

An article covering the case study over "Customer Transaction Prediction using LightGBM".

https://medium.com/analytics-vidhya/https-medium-com-kushagrarajtiwari-customer-transaction-prediction-3191c6c634dc

It comprehensively covers:
1. General Business Significance of this problem
2. Exploratory Data Analysis
3. Feature Engineering
4. Why use LightGBM for this problem

A good read if you want to explore problems in bank/financial domain.

Medium

Customer Transaction Prediction using LightGBM

Exploratory Data Analysis and modelling with imbalanced data.

379 views06:29

Data Science and Engineering

Automating the end-to-end lifecycle of Machine Learning applications
#CD4ML #software_engineering #ML

Discoverable and Accessible Data
Reproducible Model Training
Model Serving (Embedded model, Model as service)
Testing and Quality in Machine Learning
Experiments Tracking
Model Deployment (Multiple models, Shadow models)
Model Monitoring and Observability

https://martinfowler.com/articles/cd4ml.html

martinfowler.com

Continuous Delivery for Machine Learning

How to apply Continuous Delivery to build Machine Learning applications

1.41K viewsedited 13:52

Data Science and Engineering

https://www.surrey.ac.uk/news/new-ai-neural-network-approach-detects-heart-failure-single-heartbeat-100-accuracy

400 views22:25

Data Science and Engineering

Microsoft open-sourced scripts and notebooks to pre-train and finetune BERT natural language model with domain-specific texts

Github: https://github.com/microsoft/AzureML-BERT
Earlier: https://t.me/opendatascience/837

#Bert #Microsoft #NLP #dl

GitHub

GitHub - microsoft/AzureML-BERT: End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service

End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service - microsoft/AzureML-BERT

435 views08:58

Data Science and Engineering

Great collection of practical rules for routine DS engineering / research job.

Machine Learning in a company is 10% Data Science & 90% other challenges, this pdf provides a great deal of principals and solutions to deal with them.

We can only recommend saving this post to your Saved Messages by forwarding it to yourself.

Link: http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

#cheatsheet #advice #practical #common #shouldbesaved

459 views09:02

Data Science and Engineering

CS238: Decision Making under Uncertainty (AA 228)

Textbook: Decision Making Under Uncertainty: Theory and Application by Mykel J. Kochenderfer et al. (MIT Lincoln Laboratory Series)
See course materials

http://web.stanford.edu/class/aa228/

web.stanford.edu

AA228/CS238 | Decision Making under Uncertainty

Description This course introduces decision making under uncertainty from a computational perspective and provides an overview of the necessary tools for building autonomous and decision-support systems. Following an introduction to probabilistic models and…

479 views09:16

Data Science and Engineering

Estimating the success of re-identifications in incomplete datasets using generative models

99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes, suggesting that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR.

This is a big concern about privacy and a problem for Data Engineering, especially for those working with anonymized personal information. Paper provides a way to re-identify person from anonymized dataset, this can be useful for people who work for government or security companies

https://www.reddit.com/r/science/comments/chko43/9998_of_americans_would_be_correctly_reidentified/

#privacy #gdpr #federatedlearning #ml

From the science community on Reddit: 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic…

Posted by FvDijk - 348 votes and 29 comments

1.84K views09:27

Data Science and Engineering

Great article on text preprocessing, covering cleaning, #tokenization, #lemmatization and other aspects

Link: https://medium.com/@datamonsters/text-preprocessing-in-python-steps-tools-and-examples-bf025f872908

#NLP #NLU #datacleaning

Medium

Text Preprocessing in Python: Steps, Tools, and Examples

by Olga Davydova, Data Monsters

541 views09:29

Data Science and Engineering

New paper on training with pseudo-labels for semantic segmentation

Semi-Supervised Segmentation of Salt Bodies in Seismic Images:
SOTA (1st place) at TGS Salt Identification Challenge.

Github: https://github.com/ybabakhin/kaggle_salt_bes_phalanx
ArXiV: https://arxiv.org/abs/1904.04445

#GCPR2019 #Segmentation #CV

577 views09:30

Data Science and Engineering

Unified rational protein engineering with sequence-only deep representation learning

UniRep predicts amino-acid sequences that form stable bonds. In industry, that’s vital for determining the production yields, reaction rates, and shelf life of protein-based products.

Link: https://www.biorxiv.org/content/10.1101/589333v1.full

#biolearning #rnn #Harvard #sequence #protein

620 views09:33

Data Science and Engineering

Exploring Weight Agnostic Neural Networks

Exploration of agents that can already perform well in their environment without the need to learn weight parameters.

Link: https://ai.googleblog.com
Code: https://github.com/google/brain-tokyo-workshop/tree/master/WANNRelease

726 views09:36

Data Science and Engineering

Channel name was changed to «Data»

09:40

Data Science and Engineering

Channel name was changed to «Data Science and Engineering»

09:40

Data Science and Engineering

Deep learning cheatsheets, covering content of Stanford’s CS 230 class.

CNN: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks

RNN: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks

TipsAndTricks: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-deep-learning-tips-and-tricks

#cheatsheet #Stanford #cnn #rnn #tipsntricks #dnn

stanford.edu

CS 230 - Convolutional Neural Networks Cheatsheet

Teaching page of Shervine Amidi, Adjunct Lecturer at Stanford University.

752 views09:48

Data Science and Engineering

Data science jobs are going to be in increasingly massive demand. Prepare your resume!

Internet of Things companies will dominate the 2020s.

https://www.androidauthority.com/internet-of-things-companies-1023404/

“Each year, Internet of Things companies tell us “this is the year of IoT,” and each year, we get an expensive fridge that no one buys. But IoT is coming. In fact, it is already here! There are currently 6.7 billion “data collecting devices” in use today, with 20 billion projected for 2020 according to Amazon. Gartner predicts there will be 25 billion connected devices by 2021, and Accenture suggests that the global IoT market will be worth $14.2 trillion in 2030 (as reported by CRN).”

“Data science jobs are going to be in increasingly massive demand for these companies. Collecting all that data from users is only useful if businesses know what to do with it, and how to infer actionable advice from it. How do you turn billions of purchases across millions of users in multiple different countries into a better marketing campaign? That’s where data science comes in.”

"One of the biggest areas of concern for Internet of Things companies is data security."

Android Authority

Internet of Things companies will dominate the 2020s: Prepare your resume!

Internet of things companies will dominate the 2020s. Discover the businesses making waves and how to prepare your resume to land a job with them.

2.35K views08:10

Data Science and Engineering

Our conceptual understanding of how best to represent words and sentences in a way that best captures underlying meanings and relationships is rapidly evolving. Among the latest milestones are BERT and ELMo.

This article talks about the concepts one needs to be aware of to properly get his/her head around BERT.

http://jalammar.github.io/illustrated-bert/

#BERT #EMLO #NLP

jalammar.github.io

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)

Discussions:
Hacker News (98 points, 19 comments), Reddit r/MachineLearning (164 points, 20 comments)

Translations: Chinese (Simplified), French 1, French 2, Japanese, Korean, Persian, Russian, Spanish

2021 Update: I created this brief and highly accessible…

3.32K views10:47

Data Science and Engineering

Transformers and self-attention explained from scratch, also in pythonic language.

#BERT #GPT_2 #Transformer
#python

http://www.peterbloem.nl/blog/transformers

1.43K viewsedited 08:05

About

Blog

Apps

Platform