Data Science and Engineering
548 subscribers
10 photos
1 video
2 files
1.22K links
This is the first Telegram platform for data scientists, machine learning specialists, developers, software engineers and IT managers to share knowledge, connect, collaborate & learn.
Download Telegram
"Mathematics For Machine Learning"

A book that is intended to help people understand the #mathematics behind the #MachineLearning techniques.

Its aim is to make people understand what goes under the hood in common ML algorithms.

The best part is that the team is also working on Jupyter notebook tutorials

Download the PDF of the book: https://lnkd.in/e-gXPRf

100% OFF in Home Delivery Asia 2019>>> https://lnkd.in/f_TxgKN

For Data Science Implementations:
Know Data Science https://lnkd.in/fMHtxYP
Understand How to answer Why https://lnkd.in/f396Dqg
Machine Learning Terminology https://lnkd.in/fCihY9W
Understand Machine Learning Implementation https://lnkd.in/f5aUbBM
Machine Learning on Retail https://lnkd.in/fihPTJf
and Marketing https://lnkd.in/fBncKiy
Detecting new knowledge in unstructured text using ML. More evidence that when you put large amounts of papers and reports together and apply OpenSource machine learning to the text - the whole can be greater than the sum of its parts. This paper focuses on Thermoelectric materials.

Vice News Article
https://lnkd.in/gkXnEXt

Nature Paper (Tshitoyan et al 2019)
https://www.nature.com/articles/s41586-019-1335-8
The 5 graph algorithms that you should know

Rahul Agarwal describes some of the most important graph algorithms you should know and how to implement them using Python.
Detecting and treating outliers is a necessity in any dataset as it inevitably introduces the deviation in the model estimations. It can make the difference between winning and loosing a data science competition.

https://lnkd.in/fMV6GaY

This article deals with the detection of the outliers in Time Series data using different ideas, every idea improving upon the previous one and finally treating the outliers in the best way possible.

Hint of ideas covered.....
Idea #1 — Winsorization
Idea #2 Standard deviation etc.
An article covering the case study over "Customer Transaction Prediction using LightGBM".

https://medium.com/analytics-vidhya/https-medium-com-kushagrarajtiwari-customer-transaction-prediction-3191c6c634dc

It comprehensively covers:
1. General Business Significance of this problem
2. Exploratory Data Analysis
3. Feature Engineering
4. Why use LightGBM for this problem

A good read if you want to explore problems in bank/financial domain.
Automating the end-to-end lifecycle of Machine Learning applications
#CD4ML #software_engineering #ML

Discoverable and Accessible Data
Reproducible Model Training
Model Serving (Embedded model, Model as service)
Testing and Quality in Machine Learning
Experiments Tracking
Model Deployment (Multiple models, Shadow models)
Model Monitoring and Observability

https://martinfowler.com/articles/cd4ml.html
Great collection of practical rules for routine DS engineering / research job.

Machine Learning in a company is 10% Data Science & 90% other challenges, this pdf provides a great deal of principals and solutions to deal with them.

We can only recommend saving this post to your Saved Messages by forwarding it to yourself.

Link: http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

#cheatsheet #advice #practical #common #shouldbesaved
Estimating the success of re-identifications in incomplete datasets using generative models

99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes, suggesting that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR.

This is a big concern about privacy and a problem for Data Engineering, especially for those working with anonymized personal information. Paper provides a way to re-identify person from anonymized dataset, this can be useful for people who work for government or security companies

https://www.reddit.com/r/science/comments/chko43/9998_of_americans_would_be_correctly_reidentified/

#privacy #gdpr #federatedlearning #ml
​​New paper on training with pseudo-labels for semantic segmentation

Semi-Supervised Segmentation of Salt Bodies in Seismic Images:
SOTA (1st place) at TGS Salt Identification Challenge.

Github: https://github.com/ybabakhin/kaggle_salt_bes_phalanx
ArXiV: https://arxiv.org/abs/1904.04445

#GCPR2019 #Segmentation #CV
​​Unified rational protein engineering with sequence-only deep representation learning

UniRep predicts amino-acid sequences that form stable bonds. In industry, that’s vital for determining the production yields, reaction rates, and shelf life of protein-based products.

Link: https://www.biorxiv.org/content/10.1101/589333v1.full

#biolearning #rnn #Harvard #sequence #protein
​​Exploring Weight Agnostic Neural Networks

Exploration of agents that can already perform well in their environment without the need to learn weight parameters.

Link: https://ai.googleblog.com
Code: https://github.com/google/brain-tokyo-workshop/tree/master/WANNRelease