Spark in me – Telegram

Spark in me

2.18K subscribers

973 photos

56 videos

116 files

2.74K links

Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.

Download Telegram

About

Blog

Apps

Platform

2.18K subscribers

Measuring feature importance properly

http://explained.ai/rf-importance/index.html

Once again stumbled upon an amazing article about measuring feature importance for any ML algorithms:
(0) Permutation importance - if your ML algorithm is costly, then you can just shuffle a column and check importance
(1) Drop column importance - drop a column, re-train a model, check performance metrics

Why it is useful / caveats
(0) If you really care about understanding your domain - feature importances are a must have
(1) All of this works only for powerful models
(2) Landmines include - correlated or duplicate variables, data normalization

Correlated variables
(0) For RF - correlated variables share permutation importance roughly proportionally to their correlation
(1) Drop column importance can behave unpredictably

I personally like engineering different kinds of features and doing ablation tests:
(0) Among feature sets, sharing similar purpose
(1) Within feature sets

#data_science

1.1K viewsAlexander, 11:48

2018 DS/ML digest 14

Amazing article - why you do not need ML
- https://cyberomin.github.io/startup/2018/07/01/sql-ml-ai.html
- I personally love plain-vanilla SQL and in 90% of cases people under-use it
- I even wrote 90% of my JSON API on our blog in pure PostgreSQL xD

Practice / papers
(0) Interesting papers from CVPR https://towardsdatascience.com/the-10-coolest-papers-from-cvpr-2018-11cb48585a49
(1) Some down-to-earth obstacles to ML deploy https://habr.com/company/hh/blog/415437/
(2) Using synthetic data for CNNs (by Nvidia) - https://arxiv.org/pdf/1804.06516.pdf
(3) This puzzles me - so much effort and engineering spent on something ... strange and useless - http://taskonomy.stanford.edu/index.html
On paper they do a cool thing - investigate transfer learning between different domains, but in practice it is done on TF and there is no clear conclusion of any kind
(4) VAE + real datasets http://siavashk.github.io/2016/02/22/autoencoder-imagenet/ - only small Imagenet (64x64)
(5) Understanding the speed of models deployed on mobile - http://machinethink.net/blog/how-fast-is-my-model/
(6) A brief overview of multi-modal methods https://medium.com/mlreview/multi-modal-methods-image-captioning-from-translation-to-attention-895b6444256e

Visualizations / explanations
(0) Amazing website with ML explanations http://explained.ai/
(1) PCA and linear VAEs are close https://pvirie.wordpress.com/2016/03/29/linear-autoencoders-do-pca/

#deep_learning
#digest
#data_science

cyberomin.github.io

No, you don't need ML/AI. You need SQL

A while ago, I did a Twitter thread about the need to use traditional and existing tools to solve everyday business problems other than jumping on new buzzwords, sexy and often times complicated technologies.

1.1K viewsAlexander, 04:51

A cool article from Ben Evans about how to think about ML

https://www.ben-evans.com/benedictevans/2018/06/22/ways-to-think-about-machine-learning-8nefy

Ways to think about machine learning — Benedict Evans

Everyone has heard of machine learning now, and every big company is working on projects around ‘AI’. We know this is a Next Big Thing. But we don’t yet have a settled sense of quite what machine learning means - what it will mean for tech companies or…

768 viewsAlexander, 07:15

My recent PyTorch 0.4 Dockerfile for CV

https://gist.github.com/snakers4/72ccc3d936f04a3307d20f1810b2fa81

#deep_learning

My PyTorch 0.4 Dockerfile

My PyTorch 0.4 Dockerfile. GitHub Gist: instantly share code, notes, and snippets.

948 viewsAlexander, 07:16

Open Images Object detection on Kaggle

- https://www.kaggle.com/c/google-ai-open-images-object-detection-track#Description

- Key ideas
-- 1.2 images, high-res, 500 classes
-- decent prizes, but short time-span (2 months)
-- object detection

#deep_learning

Google AI Open Images - Object Detection Track

Detect objects in varied and complex images.

752 viewsAlexander, 05:12

2018 DS/ML digest 15

What I filtered through this time

Market / news
(0) Letters by big company employees against using ML for weapons
- Microsoft
- Amazon
(1) Facebook open sources Dense Pose (eseentially this is Mask-RCNN)
- https://research.fb.com/facebook-open-sources-densepose/

Papers / posts / NLP
(0) One more blog post about text / sentence embeddings https://goo.gl/Zm8C2c
- key idea different weighting

(1) One more sentence embedding calculation method
- https://openreview.net/pdf?id=SyK00v5xx ?

(2) Posts explaing NLP embeddings
- http://www.offconvex.org/2015/12/12/word-embeddings-1/ - some basics - SVD / Word2Vec / GloVe
-- SVD improves embedding quality (as compared to ohe)?
-- use log-weighting, use TF-IDF weighting (the above weighting)
- http://www.offconvex.org/2016/02/14/word-embeddings-2/ - word embedding properties
-- dimensions vs. embedding quality http://www.cs.princeton.edu/~arora/pubs/LSAgraph.jpg

(3) Spacy + Cython = 100x speed boost - https://goo.gl/9TwVqu - good to know about this as a last resort
- described use-case

you are pre-processing a large training set for a DeepLearning framework like pyTorch/TensorFlow
or you have a heavy processing logic in your DeepLearning batch loader that slows down your training

(4) Once again stumbled upon this - https://blog.openai.com/language-unsupervised/

(5) Papers
- Simple NLP embedding baseline https://goo.gl/nGujzS
- NLP decathlon for question answering https://goo.gl/6HHi7q
- Debiasing embeddings https://arxiv.org/abs/1806.06301
- Once again transfer learning in NLP by open-AI - https://goo.gl/82VR4U

#deep_learning
#digest
#data_science

837 viewsAlexander, edited 07:57

Forwarded from SK

http://nlp.town/blog/sentence-similarity/

824 viewsAlexander, 08:11

https://www.youtube.com/watch?utm_campaign=Revue+newsletter&utm_medium=Newsletter&utm_source=NLP+News&v=3o4VzEyJ0WA

Machine Learning Research & Interpreting Neural Networks

Machine learning and neural networks change how computers and humans interact, but they can be complicated to understand. In this episode of Coffee with a Googler, Laurence Moroney (@lmoroney) sits down with Christoper Olah (@ch402) from the Google Brain…

870 viewsAlexander, 14:16

Forwarded from Just links

https://twitter.com/Foone/status/1014267515696922624

You want to know something about how bullshit insane our brains are? OK, so there's a physical problem with our eyes: We move them in short fast bursts called "saccades", right? very quick, synchronized movements. The only problem is: they go all blurry and…

12 viewsAlexander, 16:56

XGB - now on GPU properly?
https://twitter.com/i/web/status/1014192185510629378

Joshua Patterson

#XGBoost is faster than ever, with better scaling, on #GPU thanks to the hard work of @nvidia & @h2oai! Check out the latest paper https://t.co/P2m31idljB, and more is coming very soon! #lightgbm #catboost #GBDT

1.2K viewsAlexander, 05:54

Forwarded from Админим с Буквой (bykva)

Bash shortcuts

Написал микро лабораторную работу для обучения хоткеям в bash.

https://medium.com/@bykvaadm/bash-shortcuts-d6f275a6ce9d

#bash_tips_and_tricks #junior

Небольшая лабораторка по изучению основных хоткеев в bash. Подготовьте себе вот такую строку:

17 viewsAlexander, 15:33

https://youtu.be/qS4H6PEcCCA

Epicycles, complex Fourier series and Homer Simpson's orbit

NEW (Christmas 2019). Two ways to support Mathologer
Mathologer Patreon: https://www.patreon.com/mathologer
Mathologer PayPal: paypal.me/mathologer
(see the Patreon page for details)

Today’s video was motivated by an amazing animation of a picture of Homer…

786 viewsAlexander, 08:27

Playing with VAEs and their practical use

So, I played a bit with Variational Auto Encoders (VAE) and wrote a small blog post on this topic

https://spark-in.me/post/playing-with-vae-umap-pca

Please like, share and repost!

#deep_learning
#data_science

Like this post or have something to say => tell us more in the comments or donate!

938 viewsspark_comment_bot, 12:29

0+ Comments Donate

https://youtu.be/FwFduRA_L6Q

Convolutional Network Demo from 1989

This is a demo of "LeNet 1", the first convolutional network that could recognize handwritten digits with good speed and accuracy.

It was developed in early 1989 in the Adaptive System Research Department, headed by Larry Jackel, at Bell Labs in Holmdel…

712 viewsAlexander, 04:52

A new multi-threaded addition to pandas stack?

Read about this some time ago (when this was just in development https://t.me/snakers4/1850) - found essentially 3 alternatives
- just being clever about optimizing your operations + using what is essentially a multi-threaded map/reduce in pandas https://t.me/snakers4/1981
- pandas on ray
- dask (overkill)

Links:
(0) https://rise.cs.berkeley.edu/blog/pandas-on-ray-early-lessons/
(1) https://www.reddit.com/comments/8wuz7e
(2) https://github.com/modin-project/modin

So...I ran a test in the notebook I had on hand. It works. More tests will be done in future.
https://pics.spark-in.me/upload/2c7a2f8c8ce1dd7a86a54ec3a3dcf965.png

#data_science
#pandas

Spark in me - Internet, data science, math, deep learning, philosophy

Pandas on Ray - RISE Lab
https://rise.cs.berkeley.edu/blog/pandas-on-ray/

842 viewsAlexander, edited 06:06

Disclaimer - it does not support pivot tables or complicated group_by ...

877 viewsAlexander, 06:16

Yet another proxy - shadowsocks

If someone needs another proxy guide, someone with an Arabic username shared some alternative advice for proxy configuration
- http://disq.us/p/1tsy4nk (wait a bit till link resolves)

#internet
#linux

Playing with a simple SOCKS5 proxy server on Digital Ocean and Ubuntu 16

This article tells you how to start your SOCKS5 proxy with zero to little experience
Статьи автора - http://spark-in.me/author/snakers41
Блог - http://spark-in.me

838 viewsAlexander, edited 17:47

2018 DS/ML digest 16

Papers / posts
(0) RL now solves Quake
https://venturebeat.com/2018/07/03/googles-deepmind-taught-ai-teamwork-by-playing-quake-iii-arena/
(1) A fast.ai post about AdamW
http://www.fast.ai/2018/07/02/adam-weight-decay/
-- Adam generally requires more regularization than SGD, so be sure to adjust your regularization hyper-parameters when switching from SGD to Adam
-- Amsgrad turns out to be very disappointing
-- Refresher article http://ruder.io/optimizing-gradient-descent/index.html#nadam
(2) How to tackle new classes in CV
https://petewarden.com/2018/07/06/what-image-classifiers-can-do-about-unknown-objects/
(3) A new word in GANs?
-- https://ajolicoeur.wordpress.com/RelativisticGAN/
-- https://arxiv.org/pdf/1807.00734.pdf
(4) Using deep learning representations for search
-- https://goo.gl/R1vhTh
-- library for fast search on python https://github.com/spotify/annoy
(5) One more paper on GAN convergence
https://avg.is.tuebingen.mpg.de/publications/meschedericml2018
(6) Switchable normalization - adds a bit to ResNet50 + pre-trained models
https://github.com/switchablenorms/Switchable-Normalization

Datasets
(0) Disney starts to release datasets
https://www.disneyanimation.com/technology/datasets

Market / interesting links
(0) A motion to open-source GitHub
https://github.com/dear-github/dear-github/issues/304
(1) Allegedly GTX 1180 start in sales appearing in Asia (?)
(2) Some controversy regarding Andrew Ng and self-driving cars https://goo.gl/WNW4E3
(3) National AI strategies overviewed - https://goo.gl/BXDCD7
-- Canada C$135m
-- China has the largest strategy
-- Notably - countries like Finland also have one
(4) Amazon allegedly sells face recognition to the USA https://goo.gl/eDzekn

#data_science
#deep_learning

Google’s DeepMind taught AI teamwork by playing Quake III Arena

Google’s DeepMind today shared the results of training multiple AI systems to play Capture the Flag on Quake III Arena, a multiplayer first-person shooter game. The AI played nearly 450,000 g…

948 viewsAlexander, 09:04

https://blog.bradfieldcs.com/an-introduction-to-hashing-in-the-era-of-machine-learning-6039394549b0

An Introduction to Hashing in the Era of Machine Learning

In December 2017, researchers at Google and MIT published a provocative research paper about their efforts into “learned index structures”…

850 viewsAlexander, 16:17

Ofc such experiments are done on toy datasets - but it's nice to know

762 viewsAlexander, 04:35