Spark in me – Telegram

Spark in me

2.26K subscribers

754 photos

48 videos

114 files

2.65K links

Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.

Download Telegram

About

Blog

Apps

Platform

2.26K subscribers

Some useful devops things

Happy holidays everyone!
Was a bit busy with devops stuff, found some under-the-radar things, that you might find useful when deploying ML applications.

- Dockerize - if you have some older application in your stack, that writes logs to some random folder, you can use this to easily dockerize this app, explanation

- Wait for it - if you have a distributed architecture and you cannot design your app to be resilient to restarts / one of services being not accessible from start - you can use this

- Reverse proxy + TTS. Actually I tried using traefik (it is advertised as an eazy one-size fits all solution) ... but it does not work and glitches. But nginx of course works. Found this gem of a wrapper recently - it allows you to use nginx reverse proxy and have TLS encryption with Let's encrypt with just 2 micro services

- Docker compose 2 vs 3. Did not find this in docs - 3 is not necessarily newer or better, it is just geared towards swarm mode in Docker

#devops

GitHub - jwilder/dockerize: Utility to simplify running applications in docker containers

Utility to simplify running applications in docker containers - jwilder/dockerize

1.1K viewsAlexander, edited 08:30

New embedded computing platform?

Looks like Intel has a new computing platform for ultra compact PCs. This may fit some of the semi-embedded ML applications!

Intel NUC 9 Extreme Compute Element
- https://www.theverge.com/2020/1/7/21051879/intel-pc-nuc-9-extreme-ghost-canyon-element-hands-on-teardown-ces-2020
- https://pc-01.tech/razer-tomahawk/

#hardware

Inside Intel’s Ghost Canyon NUC, the incredibly small modular desktop PC

And it’s just the first of its kind.

1.1K viewsAlexander, edited 08:32

Deploying High Load ML Models ... in Real-Time with Batches

I have heard opinions that properly using GPUs in production is difficult, because of how to handle real-time queues / batching properly. In reality it's a chore, but you can even save money compared with CPU-only deploy (especially if you deploy a whole workload where some parts are CPU intensive)!

Using GPUs for your ML models has several advantages:

- 10x faster (sometimes even without batching)
- Usually you can have viable batch sizes of 10 - 100 on one GPU depending on your task

But how can you use GPUs in production?
Usually if you do not have real-time requirements or if your model / workload is large, you can get away without explicit batching.

But what if, you need high load and real-time responses at the same time?

The pattern that I arrived at is:

- Use some message broker (Redis, Rabbit MQ). I chose Rabbit MQ because it has nice tutorials, libraries and community

- Upon accepting a workload, check it, hash it and store it locally

- Send a message to a broker with hash / pointer to this stored workload via Remote Procedure Call pattern (also if you really have high load, you may need to send these messages asynchronously as well! in this case aio-pika RPC pattern will help you)

- On the consumer side, accumulate messages to process batches and / or process them on timeouts, if batch accumulation takes too much time

- This has an added benefit of resilience if you write your code properly and acknowledge messages when necessary

Some useful reading:

- RPC pattern in pika (python Rabbit MQ client)
- Real asynchronous proper pika examples 1 / 2
- RPC in aio-pika (asyncio pika)
- What if you want to have RPC / batches / async client in pika at the same time?

Also, docker compose does not yet accept `gpus` option

So there are workarounds:
- https://github.com/docker/compose/issues/6691
- https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup

Please tell me if you would like a more detailed post on this topic.

#devops
#deep_learning

pika/examples/asynchronous_consumer_example.py at main · pika/pika

Pure Python RabbitMQ/AMQP 0-9-1 client library. Contribute to pika/pika development by creating an account on GitHub.

1.4K viewsAlexander, 12:32

A really cool down to earth developer's blog / email digest

Key differences from your typical coder's blog:

- Emails arrive in the order they were written. It tells a story
- Real examples from real life. Real fails
- No BS and sugar coating
- No 10x coder / code ninja / code guru stuff

https://codewithoutrules.com/softwareclown/

#code

Code Without Rules

Software Clown: 20 years of software mistakes, and how you can avoid them

Helping you become a productive programmer and get work/life balance

1.4K viewsAlexander, 13:17

PyTorch 1.4 release

TLDR - production / deploy oriented blocks. Blocks to train huge networks. New cool features - pruning and quantization get traction. AMD support starts getting mentioned.

https://github.com/pytorch/pytorch/releases/tag/v1.4.0

- PyTorch Mobile - Build level customization
- Distributed Model Parallel Training (RPC)
- Java bindings
- End of python 2 support =)
- Pruning out-of-the box
- Learning rate schedulers (torch.optim.lr_scheduler) now support “chaining.”
- Named Tensors (out of beta?)
- AMD Support (!?)
- Quantization (!) - more modules support

Still no builds for python 3.8? =)

#deep_learning

Release Mobile build customization, Distributed model parallel training, Java bindings, and more · pytorch/pytorch

PyTorch 1.4.0 Release Notes

Highlights
Backwards Incompatible Changes

Python
JIT
C++

New Features

torch.optim
Distributed
RPC [Experimental]
JIT
Mobile

Improvements

Distributed
JIT
Mobile
N...

1.2K viewsAlexander, 05:48

Spark in me via @vote

Has anyone tried ROCm + PyTorch?
anonymous poll

What is ROCm? – 46
👍👍👍👍👍👍👍 62%

No, I have not tried it – 26
👍👍👍👍 35%

Yes, it technically works, but too early stage – 2
▫️ 3%

Yes, it works properly, even for real-life cases
▫️ 0%

👥 74 people voted so far.

1.3K viewsAlexander, 06:01

Yes, it works properly, even for real-life cases – 0%

Yes, it technically works, but too early stage – 3%

No, I have not tried it – 35%

What is ROCm? – 62%

First 2020 ML / DS / Coding Digest

Highlights

- PyTorch 1.4 - focus on production / deploy / optimization - cool!
- Order of magnitude more efficient transformer from Google?
- Proper English ASR system comparison
- Pandas 1.0

Please like / share / repost!

https://spark-in.me/post/2020_ds_ml_digest_01

#digest

1.3K viewsAlexander, edited 08:26

A small saga about OpenVPN TLDR: (0) Purchase a cheap VDS from a noname provider with decent bandwidth => install OpenVPN => forget about problems => share with friends and family; (1) This guide just works https://goo.gl/K2xjby (do not be afraid of its length…

Decided to update my OpenVPN installation, since I already gave my VPN to several people

Tried pritunl - it really works out of the box - probably full installation would take 10-15 mins really

Another valid alternative is a dockerized OpenVPN

Some time ago I wrote a plain down-to-earth guide for windows users on how to rent a server, create a key, etc etc - if you would like the same for this VPN - ping me

Getting Started

Getting started with Pritunl

1.9K viewsAlexander, 12:20

About AMD support ...

1.7K viewsAlexander, 07:54

Forwarded from Egor

бинарей нет готовых, последняя серия карт Navi не поддерживается. кому такое нафиг надо

1.8K viewsAlexander, 07:54

Soo cool!

Umap has a dedicated built-in plotting tool!)))

1.4K viewsAlexander, edited 16:27

Forwarded from Sava Kalbachou

https://umap-learn.readthedocs.io/en/0.4dev/plotting.html

1.5K viewsAlexander, 16:27

Collapsible Headings now in JupyterLab

- https://github.com/aquirdTurtle/Collapsible_Headings

The only thing that kept me from switching!
Please share your favourite plugins for JupyterLab!

#data_science

GitHub - aquirdTurtle/Collapsible_Headings: Implements Collapsible Headers for Jupyter Lab Notebooks

Implements Collapsible Headers for Jupyter Lab Notebooks - GitHub - aquirdTurtle/Collapsible_Headings: Implements Collapsible Headers for Jupyter Lab Notebooks

1.3K viewsAlexander, edited 10:59

The State of Native Quantization in PyTorch

Yeah, right. PyTorch 1.3 and / or 1.4 boasted native qint8 quantization support.
So cool, right?

They have 2 main tutorials:

- Memory intensive networks with linear layers (BERT) => dynamic quantization
- Convolutional networks => static quantization

I have not tried vanilla BERT and / or vanilla MobileNet (please tell if you tried!), but looks like:

- 1D convolutions are not supported yet
- Native nn.transformers dynamic quantization ... does not work

It is kind of meh, because I used their native layers (instead of hugging face for example) ... to avoid this exact kind of issue! =)

Anyway, tell me if quantization worked for you in PyTorch, and meanwhile you can upvote these feature requests / bug reports if you feel like these features are useful!

- Fix nn.transformer quantization
- 1D conv support

Ofc I also could spend a couple of months fiddling with quantization myself, but it looks like this year will be more production driven, so why do the same job as they are clearly now are focused on doing?)

#deep_learning

Quantization error with nn.Transformer · Issue #32764 · pytorch/pytorch

🐛 Bug #32590 (comment) TLDR I just use plain vanilla nn.transformer layer in my model as a decoder nn.TransformerEncoderLayer nn.TransformerEncoder Try quantization like in this [tutorial] (https:/...

1.4K viewsAlexander, edited 13:54

Streamlit vs. viola vs. panel vs. dash vs. bokeh server TLDR - make scientific web-apps via python only w/o any web-programming (i.e. django, tornado). Dash - Mostly for BI - Also a paid product - Looks like the new Tableau - Serving and out-of-the-box…

Using Viola With The Power of Vue.js

Remember the last post about python dashboard / demo solutions?

Since then we tried Viola and Streamlit.
Streamlit is very cool, but you cannot build really custom things with it. You should have 100% support of widgets that you need, and there are always issues with caching.
Also it is painful to change the default appearance.

Remember that viola "In theory has customizable grid and CSS"?
Of course someone took care of that!

Enter ipyvuetify + voila-vuetify.
TLDR - this allows you to have viola demos with Vue UI Library, all in 100% python.

Problems:

- No native table widget / method to show and UPDATE pandas tables. There are solutions that load Vue UI tables, but no updates out-of-the-box
- All plotting libraries will work mostly fine
- All jupiter widgets will work fine. But when you will need a custom widget - you will have to either code it, or find some hack with js-links or manual HTML manipulations
- Takes some time to load, will NOT scale to hundreds / thousands of concurrent users
- ipyvuetify is poorly documented, not very intuitive examples

Links:

- https://github.com/voila-dashboards/voila-vuetify
- https://github.com/mariobuikhuizen/ipyvuetify

#data_science

GitHub - voila-dashboards/voila-vuetify: Dashboard template for Voilà based on VuetifyJS

Dashboard template for Voilà based on VuetifyJS. Contribute to voila-dashboards/voila-vuetify development by creating an account on GitHub.

1.3K viewsAlexander, 11:25

Forwarded from Anton Lozhkov

AGI is near, they said

https://github.com/google-research/google-research/blob/master/meena/meena.txt#L53

https://github.com/google-research/google-research/blob/master/meena/meena.txt#L576

google-research/meena/meena.txt at master · google-research/google-research

Google Research. Contribute to google-research/google-research development by creating an account on GitHub.

1.1K viewsAlexander, 08:06

2020 DS / ML Digest 2

Highlights

- New STT benchmarks from FAIR
- Analysis of GPT-2 by thegradient
- Google’s Meena, a 2.6 billion parameter end-to-end trained neural conversational model (not AGI ofc)
- OpenAI now uses PyTorch
- LaserTag - cool idea on how to handle simpler s2s tasks, i.e. error correction

Please like / share / repost!

https://spark-in.me/post/2020_ds_ml_digest_02

#digest

Spark in me - Internet, data science, math, deep learning, philosophy

AGI is near, they said

https://github.com/google-research/google-research/blob/master/meena/meena.txt#L53

https://github.com/google-research/google-research/blob/master/meena/meena.txt#L576

1.3K viewsAlexander, 13:16

2020 DS / ML Digest 2 Highlights - New STT benchmarks from FAIR - Analysis of GPT-2 by thegradient - Google’s Meena, a 2.6 billion parameter end-to-end trained neural conversational model (not AGI ofc) - OpenAI now uses PyTorch - LaserTag - cool idea on…

Setting up Wi-Fi on a Headless Server

Yeah, that's a pain in the ass!
Chicken and egg problem - you need to install packages, but you need to set-up Wi-Fi first.
So first you need to install packages ... by copying them via USB stick.
Remember CD-ROM sneakernet? =)
Also making Wi-Fi robust to reboots is a pain.

This guides worked for me on Ubuntu 18.04.3 server:

- https://www.linuxbabe.com/command-line/ubuntu-server-16-04-wifi-wpa-supplicant
- rc.local worked instead of systemd or crontab for me https://gist.github.com/mohamadaliakbari/1cb9400984094541581fff07143e1c9d
- better use your router to nail down the static IP

#linux

Using WPA_Supplicant to Connect to WPA2 Wi-fi from Terminal on Ubuntu 16.04 Server

In this tutorial, we are going to learn how to connect to Wi-fi network from the command line on Ubuntu 16.04 server and desktop using wpa_supplicant.

1.2K viewsAlexander, 08:04

Some Proxy Related Tips and Hacks ... Quick Ez and in Style =)

DO is not cool anymore

First of all - let's get the Elephant out of the room. Remember I recommended Digital Ocean?
Looks like they behave like a f**g corporation now. They require you selfie with a passport now.
F**k this. Even AWS does not need this.

Why would you need proxies?

Scraping mostly. Circumventing anal restrictions.
Sometimes there are other legit use cases like proxying your tg api requests.

Which framework to use?

None.
One of our team tried scrapy, but there is too much hassle (imho) w/o any benefits.
(apart from their corporate platform crawlera, but I do not still understand why it exists, enterprise maybe)
Just use aiohttp, asyncio, bs4, requests, threading and multiprocessing.
And just write mostly good enough code.
If you do not need to scrape 100m pages per day or use selenium to scrape JS, this is more than enough.
Really. Do not buy-in into this cargo cult stuff.

Video

For video-content there are libraries:

- youtube-dl - lots of features, horrible python API, nice CLI, it really works
- pytube - was really cool and pythonic, but author abandoned it. Most likely he just wrote a ton of regexp that he decided not to support. Some methods still work though

Also remember that many HTTP libraries have HTTP / SOCK5 support.
If the libraries are old, this may be supported via env variables.

Where to get proxies?

The most interesting part.
There are "dedicated" services (e.g. smartproxy / luminati.io / proxymesh / bestproxy.ru / mobile proxy services).
And they probably are the only option when you scrape Amazon.
But if you need high bandwidth and many requests - such proxies usually have garbage speed.
(and morally - probably 50% of them are hacked routers)
Ofc there is scrapoxy.io/ - but this is just too much!

Enter Vultr

They have always been a DO look-alike serice.

I found a simple hacky way to get 10-20-40 proxies quickly.
You can just use Vultr + Ubuntu 18.04 docker image + write a plain startup script.
That is is. Literally.

With Docker already installed your script may looks something like this:

docker run -d --name socks5_1 -p 1080:1080 -e PROXY_USER=user -e PROXY_PASSWORD=password serjs/go-socks5-proxy && \
docker run -d --name socks5_2 -p 1081:1080 -e PROXY_USER=user -e PROXY_PASSWORD=password serjs/go-socks5-proxy

There are cheaper hosting alternatives - Vultr is quite expensive.
But his script feature + Docker images really save time.
But now they have the nice features - backups / snapshots / startup scripts / ez scaling / etc / etc w/o the corporate bs.
Also use my Give $100, Get $25 link!

Beware - new accounts may be limited to 5 servers per account.
You may want to create several accounts at first.

#data_science
#scraping

SSD VPS Servers, Cloud Servers and Cloud Hosting

Vultr Global Cloud Hosting - Brilliantly Fast SSD VPS Cloud Servers. 100% KVM Virtualization

1.3K viewsAlexander, edited 10:48

image_2020-02-09_13-50-26.png

1.3K viewsAlexander, 10:50