Data Science by ODS.ai 🦜
48.1K subscribers
456 photos
59 videos
7 files
1.6K links
First Telegram Data Science channel. Covering all technical and popular staff about anything related to Data Science: AI, Big Data, Machine Learning, Statistics, general Math and the applications of former. To reach editors contact: @haarrp
Download Telegram
Data Science by ODS.ai 🦜
Should we create official chat for the channel to discuss links, answer common questions and to flood (during nighttime) ?
We count every opinion and listen to your feedback, so please vote.

We also preparing special event for the chat creation, so stay tuned for the announcement
Forwarded from Karim Iskakov - ΠΊΠ°Π½Π°Π» (Vladimir Ivashkin)
This media is not supported in your browser
VIEW IN TELEGRAM
I'd like to present our new paper with Yandex.Weather! We are pioneers in using a combination of satellite images, radar shots and neural networks for real-time rain forecast. Check out our video for more details!
▢️ youtu.be/9zd3VR-prYU
πŸ”Ž yandex.com/weather/nowcast
πŸ“ arxiv.org/abs/1905.09932
πŸ“‰ @loss_function_porn
Checkout our friends' recent publication.
ODS breakfast in Paris! See you this Saturday at 10:30 at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts.
​​TabNine showed deep learning code autocomplete tool based on GPT-2 architecture.

Video demonstrates the concept. Hopefully, it will allow us to write code with less bugs, not more.

Link: https://tabnine.com/blog/deep
Something relatively similar by Microsoft: https://visualstudio.microsoft.com/ru/services/intellicode

#GPT2 #TabNine #autocomplete #product #NLP #NLU #codegeneration
Great collection of practical rules for routine DS engineering / research job.

Machine Learning in a company is 10% Data Science & 90% other challenges, this pdf provides a great deal of principals and solutions to deal with them.

We can only recommend saving this post to your Saved Messages by forwarding it to yourself.

Link: http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf

#cheatsheet #advice #practical #common #shouldbesaved
​​YouTokenToMe, new tool for text tokenisation from VK team

Meet new enhanced tokenisation tool on steroids. Works 7-10 times faster alphabetic languages and 40 to 50 times faster on logographic languages, than alternatives.

Under the hood (watch source) there is C++ implementation with python bindings, using Byte Pair Encoding (BPE) algorithm. YouTokenToMe beats #SentencePiece by Google and #fastBPE, created by a researcher from Facebook AI Research in terms of speed.

Github: https://github.com/vkcom/YouTokenToMe
Medium: https://medium.com/@vktech/youtokentome-a-tool-for-quick-text-tokenization-from-the-vk-team-aa6341215c5a
Byte Pair Encoding: https://arxiv.org/abs/1508.07909
​​Simultaneous food and facial recognition at a Foxconn factory canteen, Shenzhen China

#video #foodlearning #facerecogniction #dl #cv #foxconn
​​Google AI research on learning better simulation methods for partial differential equations

New research shows how machine learning can improve high-performance computing for solving partial differential equations, with potential applications that range from modeling #climatechange to simulating fusion reactions. Learn all about it here

Link: https://ai.googleblog.com/2019/07/learning-better-simulation-methods-for.html

#PDE #DE #GoogleAI
On the concept of 'intellectual debt'

There is technical debt β€” when you know you should rewrite some stuff, or implement some features, but they don't seem critical at the moment. So article introduces a concept of 'intellectual debt', which resies with more broad and common use of #MachineLearning and #DeepLearning (specially, the latter). What happens when AI gives us seemingly correct answers that we wouldn't have thought of ourselves, without any theory to explain them?

Link: https://www.newyorker.com/tech/annals-of-technology/the-hidden-costs-of-automated-thinking

#Meta #common #lyrics
​​New dataset with adversarial examples

Natural Adversarial Examples are real-world and unmodified examples which cause classifiers to be consistently confused. The new dataset has 7,500 images, which we personally labeled over several months.

ArXiV: https://arxiv.org/abs/1907.07174
Dataset and code: https://github.com/hendrycks/natural-adv-examples

#Dataset #Adversarial
​​Release of 27 pretrained models for NLP / NLU for PyTorch

Hugging Face open sources a new library that contains up to 27 pretrained models to conduct state-of-the-art NLP/NLU tasks.

Link: https://medium.com/dair-ai/pytorch-transformers-for-state-of-the-art-nlp-3348911ffa5b

#SOTA #NLP #NLU #PyTorch #opensource
ODS breakfast in Paris! See you this Saturday at 10:30 at Malongo CafΓ©, 50 Rue Saint-AndrΓ© des Arts.
Filter autoselect in VSCO by Google

#VSCO used #TensorFlow Lite to develop the 'For This Photo' feature, which uses on-device ML to suggest photo filter presets from a curated list.

YouTube: https://www.youtube.com/watch?v=fHbjfeitIvE
Link: https://medium.com/tensorflow/suggesting-presets-for-images-building-for-this-photo-at-vsco-9b94041c4ba4

#mobile #device #cv #dl
Baidu's recent paper: Hubless Nearest Neighbor Search

Hubless Nearest Neighbor Search, a new method for Bilingual Lexicon Induction, improves retrieval accuracy significantly. Empirical results show HNN outperforms NN, ISF and other state-of-the-art.

Github: https://github.com/baidu-research/HNN
Paper: https://github.com/baidu-research/HNN/blob/master/doc/HNN.pdf

#ACL2019 #NLP #NLU
​​Plato Research Dialogue System: A Flexible Conversational AI Platform

The Plato Research Dialogue System is a platform #Uber developed to enable experts and non-experts alike to quickly build, train, and deploy conversational AI agents.

Link: https://eng.uber.com/plato-research-dialogue-system/

#ConversationalAI #converstaion #NLP #NLU