Data Science and Engineering
547 subscribers
10 photos
1 video
2 files
1.22K links
This is the first Telegram platform for data scientists, machine learning specialists, developers, software engineers and IT managers to share knowledge, connect, collaborate & learn.
Download Telegram
Automating the end-to-end lifecycle of Machine Learning applications
#CD4ML #software_engineering #ML

Discoverable and Accessible Data
Reproducible Model Training
Model Serving (Embedded model, Model as service)
Testing and Quality in Machine Learning
Experiments Tracking
Model Deployment (Multiple models, Shadow models)
Model Monitoring and Observability

https://martinfowler.com/articles/cd4ml.html
Great collection of practical rules for routine DS engineering / research job.

Machine Learning in a company is 10% Data Science & 90% other challenges, this pdf provides a great deal of principals and solutions to deal with them.

We can only recommend saving this post to your Saved Messages by forwarding it to yourself.

Link: http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

#cheatsheet #advice #practical #common #shouldbesaved
Estimating the success of re-identifications in incomplete datasets using generative models

99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes, suggesting that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR.

This is a big concern about privacy and a problem for Data Engineering, especially for those working with anonymized personal information. Paper provides a way to re-identify person from anonymized dataset, this can be useful for people who work for government or security companies

https://www.reddit.com/r/science/comments/chko43/9998_of_americans_would_be_correctly_reidentified/

#privacy #gdpr #federatedlearning #ml
​​New paper on training with pseudo-labels for semantic segmentation

Semi-Supervised Segmentation of Salt Bodies in Seismic Images:
SOTA (1st place) at TGS Salt Identification Challenge.

Github: https://github.com/ybabakhin/kaggle_salt_bes_phalanx
ArXiV: https://arxiv.org/abs/1904.04445

#GCPR2019 #Segmentation #CV
​​Unified rational protein engineering with sequence-only deep representation learning

UniRep predicts amino-acid sequences that form stable bonds. In industry, that’s vital for determining the production yields, reaction rates, and shelf life of protein-based products.

Link: https://www.biorxiv.org/content/10.1101/589333v1.full

#biolearning #rnn #Harvard #sequence #protein
​​Exploring Weight Agnostic Neural Networks

Exploration of agents that can already perform well in their environment without the need to learn weight parameters.

Link: https://ai.googleblog.com
Code: https://github.com/google/brain-tokyo-workshop/tree/master/WANNRelease
Channel name was changed to «Data»
Channel name was changed to «Data Science and Engineering»
Data science jobs are going to be in increasingly massive demand. Prepare your resume!

Internet of Things companies will dominate the 2020s.

https://www.androidauthority.com/internet-of-things-companies-1023404/

“Each year, Internet of Things companies tell us “this is the year of IoT,” and each year, we get an expensive fridge that no one buys. But IoT is coming. In fact, it is already here! There are currently 6.7 billion “data collecting devices” in use today, with 20 billion projected for 2020 according to Amazon. Gartner predicts there will be 25 billion connected devices by 2021, and Accenture suggests that the global IoT market will be worth $14.2 trillion in 2030 (as reported by CRN).”

“Data science jobs are going to be in increasingly massive demand for these companies. Collecting all that data from users is only useful if businesses know what to do with it, and how to infer actionable advice from it. How do you turn billions of purchases across millions of users in multiple different countries into a better marketing campaign? That’s where data science comes in.”

"One of the biggest areas of concern for Internet of Things companies is data security."
Our conceptual understanding of how best to represent words and sentences in a way that best captures underlying meanings and relationships is rapidly evolving. Among the latest milestones are BERT and ELMo.

This article talks about the concepts one needs to be aware of to properly get his/her head around BERT.

http://jalammar.github.io/illustrated-bert/

#BERT #EMLO #NLP
Transformers and self-attention explained from scratch, also in pythonic language.

#BERT #GPT_2 #Transformer
#python

http://www.peterbloem.nl/blog/transformers