Spark in me – Telegram

Spark in me

2.2K subscribers

822 photos

48 videos

116 files

2.68K links

Lost like tears in rain. DS, ML, a bit of philosophy and math. No bs or ads.

Download Telegram

About

Blog

Apps

Platform

2.2K subscribers

A Transformer Encoder + Attention Module that Supports Quantization

Spent some time lately with 3 pieces of code:

- The above snippets
- BERT code
- PyTorch source code

Why? Because the attention module in PyTorch's activations module (it contains multi-head attention) is a clusterfuck and it does not quantize out of the box! 2 major releases - and this glaring issue is not fixed, meh.

One remark - this attention only works for encoder networks, i.e. it does not pass state in case of recurrence. Just keep it in mind. A more simple, one head self-attention has been a workhorse in my work, so I do not see why 2-4 head attention will not be better (usually 2 heads are enough though).

Anyway I reworked the above modules a bit to be workable in real models and to be compatible with PyTorch checkpoints. Some fiddling required to load old weights, but it is easy.

Also the above modules did not contain some crucial implementation details, had to add them in.

And it works on real tasks, ofc.

#deep_learning

Quantization error with nn.Transformer · Issue #32764 · pytorch/pytorch

🐛 Bug #32590 (comment) TLDR I just use plain vanilla nn.transformer layer in my model as a decoder nn.TransformerEncoderLayer nn.TransformerEncoder Try quantization like in this [tutorial] (https:/...

1.0K viewsAlexander, edited 04:05

transformer_modules.py

1.1K viewsAlexander, 04:05

Real Life Ghetto Transformer Module Pruning and Quantization

Very simple:

(0)
Set n_heads to 2. Because more than 2 does not help a lot. There are even papers about this. On GPU inference there is no difference in speed regardless of the number of heads, but on the CPU there is;

(1)
The size of the inner projection layers should equal to the embedding size / your network width. (Or maybe even smaller, I guess you can fit some attention there too? lol) Usually by default this is set to 2 * width;

(2)
Still testing this, but any multi-head module I saw in a serious code base has an out_proj layer. Trying to remove this one is an obvious ablation test;

(3)
Use out-of-the-box quantization afterwards;

(4)
Add more layers to your liking, but somehow reducing sequence length obviously helps;

(5)
Ofc, also just training / distilling a narrower network is also an obvious idea;

If you have a more advanced recipe for pruning that was proven to work in production in a real company please ping me in private.

So far I have not yet seen how all of these 95% pruning claims (example) in papers transfer into real inference speeds.

#deep_learning

982 viewsAlexander, edited 04:34

Forwarded from ∏ρØƒuñçτØρ Øπτµç∑ | 👁‍🗨››››

136 viewsAlexander, 09:13

Real Life Ghetto Transformer Module Pruning and Quantization Very simple: (0) Set n_heads to 2. Because more than 2 does not help a lot. There are even papers about this. On GPU inference there is no difference in speed regardless of the number of heads…

> out_proj layer. Trying to remove this one is an obvious ablation test;

Non-surprisingly, it works.
I am not clear about whether it is obviously better (model size is just 25% less), but when testing using pre-trained models there is a drop in quality, but I am not sure how to measure it properly.

980 viewsAlexander, 07:41

#offtopic

Awesome DOOM Quickstart for New Gamers

With modern computer games becoming more cringe (MMO grinding, building ecosystems, loot boxes, micro transactions, PC culture, just dumb gameplay etc) it is harder to find new cool original games. Oh come on, almost all industry titans are pumping cringe now!

I noticed that most younger people have not played DOOM. So I wrote this guide. Enjoy. TLDR - released in 1993 DOOM is the original game that popularized first person gaming on PC and still has the most active modding community ever.

https://github.com/snakers4/awesome_doom_quickstart

PS
<rant>
I hated when they called DOOM 2016 just DOOM so that new games would not find out about the OG DOOM.
</rant>

PPS
<rant>
DOOM 2016 is cool, DOOM 2020 was even cooler, but Bethesda blew it.
</rant>

GitHub - snakers4/awesome_doom_quickstart: Many young people have not played doom ... this is a small guide to help them start…

Many young people have not played doom ... this is a small guide to help them start their journey - GitHub - snakers4/awesome_doom_quickstart: Many young people have not played doom ... this is a s...

1.1K viewsAlexander, edited 05:04

https://youtu.be/-06f8p8dAiQ

Почему видеокарта и процессор не могут заменить друг друга

Комплектующие - https://www.e-katalog.ru/u/v9p6UC/a
Процессоры - https://www.e-katalog.ru/u/DkzaI7/a
В видео разбираемся с вопросом о том почему процессор и видеокарта не взаимозаменяемые комплектующие и почему существуют и процессор и видеокарта и почему…

1.2K viewsAlexander, 02:01

AI – the no bullshit approach – Piekniewski's blog
https://blog.piekniewski.info/2020/06/08/ai-the-no-bullshit-approach/

Piekniewski's blog

AI - the no bullshit approach

Intro Since many of my posts were mostly critical and arguably somewhat cynical [1], [2], [3], at least over the last 2-3 years, I decided to switch gears a little and let my audience know

1.3K viewsAlexander, 17:17

Ячейки памяти в SSD. Как работают, почему ломаются? SLC, MLC, TLC, QLC - PC-01 | Этот компьютер
https://pc-01.tech/ssd/

PC-01 | Этот компьютер

Ячейки памяти в SSD. Как работают, почему ломаются? SLC, MLC, TLC, QLC - PC-01 | Этот компьютер

рассматриваем принципы работы ячеек памяти, определение носителя информации, принципы считывания состояния ячейки памяти. Методы записи данных в ячейку памяти и причины ограниченности ресурса работы SSD. А так же что из себя представляют многобитные ячейки…

1.1K viewsAlexander, 17:16

Free Unlimited Download Links for Open STT! Kudos to Azure Open Datasets, we now have brand new direct download links! https://github.com/snakers4/open_stt/releases/tag/v1.02 #deep_learning #speech

Now Open STT Featured On Azure Datasets

Good news, if you use the Azure cloud!

https://azure.microsoft.com/en-us/services/open-datasets/catalog/open-speech-to-text/

Datasets in Azure Open Datasets - Azure Open Datasets

Explore the datasets in Azure Open Datasets.

1.2K viewsAlexander, 05:29

2020 DS / ML Digest 8

Highlights:

- A trend towards sanity in ML research becomes more visible?
- Linformer - transformer optimization for long sequences
- Google translate - recent improvements
- When Does Unsupervised Machine Translation Work?
- PyTorch vs Tensorflow in production

Please like / share / repost!

https://spark-in.me/post/2020_ds_ml_digest_08

#digest

1.3K viewsAlexander, 05:34

This is huge

Tectonic plates are shifting, some hidden deals were made, world is changing

Obvious triggers - Linux gaming finally working, Ubuntu 20 being grandma proof, windows servers being not popular

1.1K viewsAlexander, edited 07:15

Forwarded from Alexander

https://devblogs.nvidia.com/announcing-cuda-on-windows-subsystem-for-linux-2/

NVIDIA Technical Blog

Announcing CUDA on Windows Subsystem for Linux 2

In response to popular demand, Microsoft announced a new feature of the Windows Subsystem for Linux 2 (WSL 2)—GPU acceleration—at the Build conference in May 2020. This feature opens the gate for many…

1.4K viewsAlexander, 07:15

Sane Programming / ML / Tech Blogs

Remember this post?

It is sad that tech / ML / software engineering is comprised of 95% BS nowadays. Also it is disheartening that people voluntarily choose to focus their research careers on spreading misinformation and more polished form of BS.

But sometimes some sources just stand out as a beacon of sanity. Today a very short list of blogs:

https://0x65.dev/ - building search engine (closed)
https://martinheinz.dev - python, coding
https://blog.cerebralab.com - ML, coding, pholosophy
https://codewithoutrules.com/softwareclown - coding, product management

I will start a new hashtag on my channel #no_bs, when I find some awesome examples of bullshit / bullshit rebuttals / sanity.

#no_bs

Spark in me - Internet, data science, math, deep learning, philosophy

Debunking Useless ML "Research" in a ... Hilarious Way!

Remember the hilarious YOLOV3? Where the author of YOLO said he was just laying on the couch for a year reading twitter and that all the ideas in detection were depleted, all detectors were the same…

1.3K viewsAlexander, 06:36

https://youtu.be/QSVrKK_uHoU

Amazing AR Effects Are Coming!

❤️ Check out Weights & Biases and sign up for a free demo here: https://www.wandb.com/papers

Their mentioned post is available here:
https://app.wandb.ai/latentspace/published-work/The-Science-of-Debugging-with-W%26B-Reports--Vmlldzo4OTI3Ng

📝 The paper…

1.1K viewsAlexander, 19:17

Finally a video on interesting topic
No approach details in the video itself though
The approach is in essence a distillation of older methods + consistency losses

1.1K viewsAlexander, edited 19:18

https://www.youtube.com/watch?v=yI6csSs72bg

Внутри CPU: Intel 8086

Строение родоначальника x86 совместимых процессоров. Intel 8086
Канал House of NHTi: https://www.youtube.com/user/nhtchannellive

https://pc-01.tech - сайт канала. Свежие новости о железе, обзоры и тесты комплектующих.

https://vk.com/pc_0_1 - группа "Этот…

1.3K viewsAlexander, 08:13

Forwarded from Админим с Буквой (bykva)

А какие типы записей в днс бывают?

===> тем кто умеет в днсы можно идти мимо <===

Я вот человек довольно далекий от днсов, на своем веку их почти не крутил, а если да - то по мелочи, А,МХ,СNАМЕ,TXT... ну, что там еще бывает. А вот оказывается что много чего.

https://doc.powerdns.com/authoritative/appendices/types.html

#🤔 #вотбывсёзнать

271 viewsAlexander, 05:35

Forwarded from Админим с Буквой (bykva)

рубрика "а что так можно было?"

А вы знаете о том что systemd умеет выполнять задачи по расписанию ака cronjob, и даже больше? а вот..

https://opensource.com/article/20/7/systemd-timers

Use systemd timers instead of cronjobs

I am in the process of converting my cron jobs to systemd timers.

281 viewsAlexander, 05:35

I too recently found out this when reading that systemd was introduced to unify various distros of Linux. But I was always hesitant to invest time to write unit files.

Looks like now next time I need a cron job I will use systemd )

1.1K viewsAlexander, 05:37