Data Science by ODS.ai 🦜
51.6K subscribers
339 photos
30 videos
7 files
1.5K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
Great example on how different approach to feature encoding can influence the results.

Mean (likelihood) encoding for categorical variables with high cardinality and feature interactions: a comprehensive study with Python

Link: https://www.kaggle.com/vprokopev/mean-likelihood-encodings-a-comprehensive-study

#FeatureEngineering #FeactureEncoding #Kaggle
Neural Network Embeddings Explained

How deep learning can represent War and Peace as a vector

Easy to read #novice article about #embeddings. Basically β€” how to represent everything as a vector.

Link: https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526
Test-Driven Data Analysis

TDD is an approach to software development, suggesting that tests are essential part of the process. Over the years TDD have shown that it is required to maintain a good code base and the most common requirement for the lasting project.

Test driven approach can be maintain with data analysis too, with the reproducible research approach or TDDA, which is suggested by the latter link.

Link: http://www.tdda.info

#tdda
The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care

Interesting work looking at how AI could suggest optimal treatment for sepsis. Sepsis is a life threatening complication of infection and many deaths could be prevented with earlier identification and more targeted therapies.

Link: https://www.nature.com/articles/s41591-018-0213-5

#medical #health
Dynamic Meta-Embeddings for Improved Sentence Representations

While one of the first steps in many NLP systems is selecting what pre-trained word embeddings to use, we argue that such a step is better left for neural networks to figure out by themselves. To that end, we introduce dynamic meta-embeddings, a simple yet effective method for the supervised learning of embedding ensembles, which leads to state-of-the-art performance within the same model class on a variety of tasks. We subsequently show how the technique can be used to shed new light on the usage of word embeddings in NLP systems.

Paper: https://research.fb.com/wp-content/uploads/2018/10/Dynamic-Meta-Embeddings-for-Improved-Sentence-Representations.pdf
Link: https://research.fb.com/publications/dynamic-meta-embeddings-for-improved-sentence-representations/

P.S. Note the date of the publication

#embeddings #NLP #facebook
Really scary
An agent which learned to play Mario without rewards. Instead, it was incentivized to avoid "boredom" (that is, getting into states where it can predict what will happen next). Discovered warp levels, how to defeat bosses, etc.

Link: https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/

#RL #openai
Facebook open sourced Horizon, an end-to-end applied reinforcement learning platform built on #PyTorch 1.0. Horizon uses RL to optimize systems in large-scale production environments and we're excited to make it accessible to anyone using #RL at scale.

https://code.fb.com/ml-applications/horizon/

#facebook
XNLI dataset published by Facebook AI & NYU.

New dataset have been released recently to promote cross-lingual approaches to natural language understanding (#NLU).

This dataset builds on the commonly used Multi-Genre Natural Language Inference (MultiNLI) corpus, adding 14 languages to that English-only data set, including two low-resource languages: Swahili and Urdu.

Link: https://code.fb.com/ai-research/xlni/

#NLP #facebook
Reversible RNNs

Paper about how to reduce memory costs of GRU and LSTM networks by 10-15x without loss in performance. Also 5-10x for attention-based architectures. New paper with Matt MacKay, Paul Vicol, and Jimmy Ba, to appear at NIPS.

Link: https://arxiv.org/abs/1810.10999

#dl #RNN #NIPS2018
Faster R-CNN and Mask R-CNN in #PyTorch 1.0

Another release from #Facebook.

Mask R-CNN Benchmark: a fast and modular implementation for Faster R-CNN and Mask R-CNN written entirely in @PyTorch 1.0. It brings up to 30% speedup compared to mmdetection during training.

Webcam demo and ipynb file are available.

Github: https://github.com/facebookresearch/maskrcnn-benchmark

#CNN #CV #segmentation #detection
Mask R-CNN Benchmark Demo