Spark in me
2.2K subscribers
822 photos
48 videos
116 files
2.68K links
Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.
Download Telegram
A Transformer Encoder + Attention Module that Supports Quantization

Spent some time lately with 3 pieces of code:

- The above snippets
- BERT code
- PyTorch source code

Why? Because the attention module in PyTorch's activations module (it contains multi-head attention) is a clusterfuck and it does not quantize out of the box! 2 major releases - and this glaring issue is not fixed, meh.

One remark - this attention only works for encoder networks, i.e. it does not pass state in case of recurrence. Just keep it in mind. A more simple, one head self-attention has been a workhorse in my work, so I do not see why 2-4 head attention will not be better (usually 2 heads are enough though).

Anyway I reworked the above modules a bit to be workable in real models and to be compatible with PyTorch checkpoints. Some fiddling required to load old weights, but it is easy.

Also the above modules did not contain some crucial implementation details, had to add them in.

And it works on real tasks, ofc.

#deep_learning
Real Life Ghetto Transformer Module Pruning and Quantization

Very simple:

(0)
Set n_heads to 2. Because more than 2 does not help a lot. There are even papers about this. On GPU inference there is no difference in speed regardless of the number of heads, but on the CPU there is;

(1)
The size of the inner projection layers should equal to the embedding size / your network width. (Or maybe even smaller, I guess you can fit some attention there too? lol) Usually by default this is set to 2 * width;

(2)
Still testing this, but any multi-head module I saw in a serious code base has an out_proj layer. Trying to remove this one is an obvious ablation test;

(3)
Use out-of-the-box quantization afterwards;

(4)
Add more layers to your liking, but somehow reducing sequence length obviously helps;

(5)
Ofc, also just training / distilling a narrower network is also an obvious idea;

If you have a more advanced recipe for pruning that was proven to work in production in a real company please ping me in private.

So far I have not yet seen how all of these 95% pruning claims (example) in papers transfer into real inference speeds.

#deep_learning
Spark in me
Real Life Ghetto Transformer Module Pruning and Quantization Very simple: (0) Set n_heads to 2. Because more than 2 does not help a lot. There are even papers about this. On GPU inference there is no difference in speed regardless of the number of heads…
> out_proj layer. Trying to remove this one is an obvious ablation test;

Non-surprisingly, it works.
I am not clear about whether it is obviously better (model size is just 25% less), but when testing using pre-trained models there is a drop in quality, but I am not sure how to measure it properly.
#offtopic

Awesome DOOM Quickstart for New Gamers

With modern computer games becoming more cringe (MMO grinding, building ecosystems, loot boxes, micro transactions, PC culture, just dumb gameplay etc) it is harder to find new cool original games. Oh come on, almost all industry titans are pumping cringe now!

I noticed that most younger people have not played DOOM. So I wrote this guide. Enjoy. TLDR - released in 1993 DOOM is the original game that popularized first person gaming on PC and still has the most active modding community ever.

https://github.com/snakers4/awesome_doom_quickstart

PS
<rant>
I hated when they called DOOM 2016 just DOOM so that new games would not find out about the OG DOOM.
</rant>

PPS
<rant>
DOOM 2016 is cool, DOOM 2020 was even cooler, but Bethesda blew it.
</rant>
2020 DS / ML Digest 8

Highlights
:

- A trend towards sanity in ML research becomes more visible?
- Linformer - transformer optimization for long sequences
- Google translate - recent improvements
- When Does Unsupervised Machine Translation Work?
- PyTorch vs Tensorflow in production

Please like / share / repost!

https://spark-in.me/post/2020_ds_ml_digest_08

#digest
This is huge

Tectonic plates are shifting, some hidden deals were made, world is changing

Obvious triggers - Linux gaming finally working, Ubuntu 20 being grandma proof, windows servers being not popular
Sane Programming / ML / Tech Blogs

Remember this post?

It is sad that tech / ML / software engineering is comprised of 95% BS nowadays. Also it is disheartening that people voluntarily choose to focus their research careers on spreading misinformation and more polished form of BS.

But sometimes some sources just stand out as a beacon of sanity. Today a very short list of blogs:

https://0x65.dev/ - building search engine (closed)
https://martinheinz.dev - python, coding
https://blog.cerebralab.com - ML, coding, pholosophy
https://codewithoutrules.com/softwareclown - coding, product management

I will start a new hashtag on my channel #no_bs, when I find some awesome examples of bullshit / bullshit rebuttals / sanity.

#no_bs
Finally a video on interesting topic
No approach details in the video itself though
The approach is in essence a distillation of older methods + consistency losses
Forwarded from Админим с Буквой (bykva)
А какие типы записей в днс бывают?

===> тем кто умеет в днсы можно идти мимо <===

Я вот человек довольно далекий от днсов, на своем веку их почти не крутил, а если да - то по мелочи, А,МХ,СNАМЕ,TXT... ну, что там еще бывает. А вот оказывается что много чего.

https://doc.powerdns.com/authoritative/appendices/types.html

#🤔 #вотбывсёзнать
Forwarded from Админим с Буквой (bykva)
рубрика "а что так можно было?"

А вы знаете о том что systemd умеет выполнять задачи по расписанию ака cronjob, и даже больше? а вот..

https://opensource.com/article/20/7/systemd-timers
I too recently found out this when reading that systemd was introduced to unify various distros of Linux. But I was always hesitant to invest time to write unit files.

Looks like now next time I need a cron job I will use systemd )