A Transformer Encoder + Attention Module that Supports Quantization
Spent some time lately with 3 pieces of code:
- The above snippets
- BERT code
- PyTorch source code
Why? Because the attention module in PyTorch's activations module (it contains multi-head attention) is a clusterfuck and it does not quantize out of the box! 2 major releases - and this glaring issue is not fixed, meh.
One remark - this attention only works for encoder networks, i.e. it does not pass state in case of recurrence. Just keep it in mind. A more simple, one head self-attention has been a workhorse in my work, so I do not see why 2-4 head attention will not be better (usually 2 heads are enough though).
Anyway I reworked the above modules a bit to be workable in real models and to be compatible with PyTorch checkpoints. Some fiddling required to load old weights, but it is easy.
Also the above modules did not contain some crucial implementation details, had to add them in.
And it works on real tasks, ofc.
#deep_learning
Spent some time lately with 3 pieces of code:
- The above snippets
- BERT code
- PyTorch source code
Why? Because the attention module in PyTorch's activations module (it contains multi-head attention) is a clusterfuck and it does not quantize out of the box! 2 major releases - and this glaring issue is not fixed, meh.
One remark - this attention only works for encoder networks, i.e. it does not pass state in case of recurrence. Just keep it in mind. A more simple, one head self-attention has been a workhorse in my work, so I do not see why 2-4 head attention will not be better (usually 2 heads are enough though).
Anyway I reworked the above modules a bit to be workable in real models and to be compatible with PyTorch checkpoints. Some fiddling required to load old weights, but it is easy.
Also the above modules did not contain some crucial implementation details, had to add them in.
And it works on real tasks, ofc.
#deep_learning
GitHub
Quantization error with nn.Transformer · Issue #32764 · pytorch/pytorch
🐛 Bug #32590 (comment) TLDR I just use plain vanilla nn.transformer layer in my model as a decoder nn.TransformerEncoderLayer nn.TransformerEncoder Try quantization like in this [tutorial] (https:/...
Real Life Ghetto Transformer Module Pruning and Quantization
Very simple:
(0)
Set
(1)
The size of the inner projection layers should equal to the embedding size / your network width. (Or maybe even smaller, I guess you can fit some attention there too? lol) Usually by default this is set to 2 * width;
(2)
Still testing this, but any multi-head module I saw in a serious code base has an
(3)
Use out-of-the-box quantization afterwards;
(4)
Add more layers to your liking, but somehow reducing sequence length obviously helps;
(5)
Ofc, also just training / distilling a narrower network is also an obvious idea;
If you have a more advanced recipe for pruning that was proven to work in production in a real company please ping me in private.
So far I have not yet seen how all of these 95% pruning claims (example) in papers transfer into real inference speeds.
#deep_learning
Very simple:
(0)
Set
n_heads
to 2
. Because more than 2 does not help a lot. There are even papers about this. On GPU inference there is no difference in speed regardless of the number of heads, but on the CPU there is;(1)
The size of the inner projection layers should equal to the embedding size / your network width. (Or maybe even smaller, I guess you can fit some attention there too? lol) Usually by default this is set to 2 * width;
(2)
Still testing this, but any multi-head module I saw in a serious code base has an
out_proj
layer. Trying to remove this one is an obvious ablation test;(3)
Use out-of-the-box quantization afterwards;
(4)
Add more layers to your liking, but somehow reducing sequence length obviously helps;
(5)
Ofc, also just training / distilling a narrower network is also an obvious idea;
If you have a more advanced recipe for pruning that was proven to work in production in a real company please ping me in private.
So far I have not yet seen how all of these 95% pruning claims (example) in papers transfer into real inference speeds.
#deep_learning
Spark in me
Real Life Ghetto Transformer Module Pruning and Quantization Very simple: (0) Set n_heads to 2. Because more than 2 does not help a lot. There are even papers about this. On GPU inference there is no difference in speed regardless of the number of heads…
>
Non-surprisingly, it works.
I am not clear about whether it is obviously better (model size is just 25% less), but when testing using pre-trained models there is a drop in quality, but I am not sure how to measure it properly.
out_proj
layer. Trying to remove this one is an obvious ablation test;Non-surprisingly, it works.
I am not clear about whether it is obviously better (model size is just 25% less), but when testing using pre-trained models there is a drop in quality, but I am not sure how to measure it properly.
#offtopic
Awesome DOOM Quickstart for New Gamers
With modern computer games becoming more cringe (MMO grinding, building ecosystems, loot boxes, micro transactions, PC culture, just dumb gameplay etc) it is harder to find new cool original games. Oh come on, almost all industry titans are pumping cringe now!
I noticed that most younger people have not played DOOM. So I wrote this guide. Enjoy. TLDR - released in 1993 DOOM is the original game that popularized first person gaming on PC and still has the most active modding community ever.
https://github.com/snakers4/awesome_doom_quickstart
PS
<rant>
I hated when they called DOOM 2016 just DOOM so that new games would not find out about the OG DOOM.
</rant>
PPS
<rant>
DOOM 2016 is cool, DOOM 2020 was even cooler, but Bethesda blew it.
</rant>
Awesome DOOM Quickstart for New Gamers
With modern computer games becoming more cringe (MMO grinding, building ecosystems, loot boxes, micro transactions, PC culture, just dumb gameplay etc) it is harder to find new cool original games. Oh come on, almost all industry titans are pumping cringe now!
I noticed that most younger people have not played DOOM. So I wrote this guide. Enjoy. TLDR - released in 1993 DOOM is the original game that popularized first person gaming on PC and still has the most active modding community ever.
https://github.com/snakers4/awesome_doom_quickstart
PS
<rant>
I hated when they called DOOM 2016 just DOOM so that new games would not find out about the OG DOOM.
</rant>
PPS
<rant>
DOOM 2016 is cool, DOOM 2020 was even cooler, but Bethesda blew it.
</rant>
GitHub
GitHub - snakers4/awesome_doom_quickstart: Many young people have not played doom ... this is a small guide to help them start…
Many young people have not played doom ... this is a small guide to help them start their journey - GitHub - snakers4/awesome_doom_quickstart: Many young people have not played doom ... this is a s...
AI – the no bullshit approach – Piekniewski's blog
https://blog.piekniewski.info/2020/06/08/ai-the-no-bullshit-approach/
https://blog.piekniewski.info/2020/06/08/ai-the-no-bullshit-approach/
Piekniewski's blog
AI - the no bullshit approach
Intro Since many of my posts were mostly critical and arguably somewhat cynical [1], [2], [3], at least over the last 2-3 years, I decided to switch gears a little and let my audience know
Ячейки памяти в SSD. Как работают, почему ломаются? SLC, MLC, TLC, QLC - PC-01 | Этот компьютер
https://pc-01.tech/ssd/
https://pc-01.tech/ssd/
PC-01 | Этот компьютер
Ячейки памяти в SSD. Как работают, почему ломаются? SLC, MLC, TLC, QLC - PC-01 | Этот компьютер
рассматриваем принципы работы ячеек памяти, определение носителя информации, принципы считывания состояния ячейки памяти. Методы записи данных в ячейку памяти и причины ограниченности ресурса работы SSD. А так же что из себя представляют многобитные ячейки…
Spark in me
Free Unlimited Download Links for Open STT! Kudos to Azure Open Datasets, we now have brand new direct download links! https://github.com/snakers4/open_stt/releases/tag/v1.02 #deep_learning #speech
Now Open STT Featured On Azure Datasets
Good news, if you use the Azure cloud!
https://azure.microsoft.com/en-us/services/open-datasets/catalog/open-speech-to-text/
Good news, if you use the Azure cloud!
https://azure.microsoft.com/en-us/services/open-datasets/catalog/open-speech-to-text/
Docs
Datasets in Azure Open Datasets - Azure Open Datasets
Explore the datasets in Azure Open Datasets.
2020 DS / ML Digest 8
Highlights:
- A trend towards sanity in ML research becomes more visible?
- Linformer - transformer optimization for long sequences
- Google translate - recent improvements
- When Does Unsupervised Machine Translation Work?
- PyTorch vs Tensorflow in production
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_08
#digest
Highlights:
- A trend towards sanity in ML research becomes more visible?
- Linformer - transformer optimization for long sequences
- Google translate - recent improvements
- When Does Unsupervised Machine Translation Work?
- PyTorch vs Tensorflow in production
Please like / share / repost!
https://spark-in.me/post/2020_ds_ml_digest_08
#digest
This is huge
Tectonic plates are shifting, some hidden deals were made, world is changing
Obvious triggers - Linux gaming finally working, Ubuntu 20 being grandma proof, windows servers being not popular
Tectonic plates are shifting, some hidden deals were made, world is changing
Obvious triggers - Linux gaming finally working, Ubuntu 20 being grandma proof, windows servers being not popular
Sane Programming / ML / Tech Blogs
Remember this post?
It is sad that tech / ML / software engineering is comprised of 95% BS nowadays. Also it is disheartening that people voluntarily choose to focus their research careers on spreading misinformation and more polished form of BS.
But sometimes some sources just stand out as a beacon of sanity. Today a very short list of blogs:
https://0x65.dev/ - building search engine (closed)
https://martinheinz.dev - python, coding
https://blog.cerebralab.com - ML, coding, pholosophy
https://codewithoutrules.com/softwareclown - coding, product management
I will start a new hashtag on my channel #no_bs, when I find some awesome examples of bullshit / bullshit rebuttals / sanity.
#no_bs
Remember this post?
It is sad that tech / ML / software engineering is comprised of 95% BS nowadays. Also it is disheartening that people voluntarily choose to focus their research careers on spreading misinformation and more polished form of BS.
But sometimes some sources just stand out as a beacon of sanity. Today a very short list of blogs:
https://0x65.dev/ - building search engine (closed)
https://martinheinz.dev - python, coding
https://blog.cerebralab.com - ML, coding, pholosophy
https://codewithoutrules.com/softwareclown - coding, product management
I will start a new hashtag on my channel #no_bs, when I find some awesome examples of bullshit / bullshit rebuttals / sanity.
#no_bs
Telegram
Spark in me - Internet, data science, math, deep learning, philosophy
Debunking Useless ML "Research" in a ... Hilarious Way!
Remember the hilarious YOLOV3? Where the author of YOLO said he was just laying on the couch for a year reading twitter and that all the ideas in detection were depleted, all detectors were the same…
Remember the hilarious YOLOV3? Where the author of YOLO said he was just laying on the couch for a year reading twitter and that all the ideas in detection were depleted, all detectors were the same…
Finally a video on interesting topic
No approach details in the video itself though
The approach is in essence a distillation of older methods + consistency losses
No approach details in the video itself though
The approach is in essence a distillation of older methods + consistency losses
Forwarded from Админим с Буквой (bykva)
А какие типы записей в днс бывают?
===> тем кто умеет в днсы можно идти мимо <===
Я вот человек довольно далекий от днсов, на своем веку их почти не крутил, а если да - то по мелочи, А,МХ,СNАМЕ,TXT... ну, что там еще бывает. А вот оказывается что много чего.
https://doc.powerdns.com/authoritative/appendices/types.html
#🤔 #вотбывсёзнать
===> тем кто умеет в днсы можно идти мимо <===
Я вот человек довольно далекий от днсов, на своем веку их почти не крутил, а если да - то по мелочи, А,МХ,СNАМЕ,TXT... ну, что там еще бывает. А вот оказывается что много чего.
https://doc.powerdns.com/authoritative/appendices/types.html
#🤔 #вотбывсёзнать
Forwarded from Админим с Буквой (bykva)
рубрика "а что так можно было?"
А вы знаете о том что systemd умеет выполнять задачи по расписанию ака cronjob, и даже больше? а вот..
https://opensource.com/article/20/7/systemd-timers
А вы знаете о том что systemd умеет выполнять задачи по расписанию ака cronjob, и даже больше? а вот..
https://opensource.com/article/20/7/systemd-timers
Opensource.com
Use systemd timers instead of cronjobs
I am in the process of converting my cron jobs to systemd timers.
I too recently found out this when reading that systemd was introduced to unify various distros of Linux. But I was always hesitant to invest time to write unit files.
Looks like now next time I need a cron job I will use systemd )
Looks like now next time I need a cron job I will use systemd )