Spark in me pinned «Ukrainian Open STT 1000 Hours Following the path of Open STT in Russian, now you can enjoy a similar dataset in Ukrainian: - Torrent Link - GitHub Link Congratulations to our Ukrainian friends for finally publishing a diverse easily downloadable dataset!…»
2021 DS / ML Digest 02
Highlights:
- Lyra: A New Very Low-Bitrate Codec for Speech Compression
- TF 3D exists
- Why are machine learning algorithms hard to tune (training with several losses)?
- The Technology Behind Cinematic Photos
- XOR trick
- Why did I leave Google or, why did I stay so long?
- Stakes create good software
Please like / share / repost!
https://spark-in.me/post/2021_ds_ml_digest_02
#digest
Highlights:
- Lyra: A New Very Low-Bitrate Codec for Speech Compression
- TF 3D exists
- Why are machine learning algorithms hard to tune (training with several losses)?
- The Technology Behind Cinematic Photos
- XOR trick
- Why did I leave Google or, why did I stay so long?
- Stakes create good software
Please like / share / repost!
https://spark-in.me/post/2021_ds_ml_digest_02
#digest
Yet Another Ultra Sane Blog
With all crypto / AI / trade wars bs hype it is more and more difficult to stay sane, attached and motivated.
Following last #no_bs gem, I have found another gem, but this time just covering progress / tech / macro / finance in general.
To summarize, the author writes about global trends and is industrious enough to pull a lot of supporting stats and data.
They way he describes his blog:
- https://www.strangeloopcanon.com/about
- The author had a streak recently (last Decemeber) - https://www.strangeloopcanon.com/p/the-great-polarisation-1n
#no_bs
With all crypto / AI / trade wars bs hype it is more and more difficult to stay sane, attached and motivated.
Following last #no_bs gem, I have found another gem, but this time just covering progress / tech / macro / finance in general.
To summarize, the author writes about global trends and is industrious enough to pull a lot of supporting stats and data.
They way he describes his blog:
I write about dynamical systems of progress, trying to uncover the ways in which we achieve and sustain scientific, social, technological and cultural progress. Every week I write a new essay about the strange loops that determine progress, the systems underlying economics, philosophy of business, innovation and the world of technology.Just check this out:
- https://www.strangeloopcanon.com/about
- The author had a streak recently (last Decemeber) - https://www.strangeloopcanon.com/p/the-great-polarisation-1n
#no_bs
Strangeloopcanon
About - Strange Loop Canon
“Any fool can know. The point is to understand.”
― Albert Einstein. Click to read Strange Loop Canon, by Rohit Krishnan, a Substack publication with thousands of subscribers.
― Albert Einstein. Click to read Strange Loop Canon, by Rohit Krishnan, a Substack publication with thousands of subscribers.
Spark in me pinned «2021 DS / ML Digest 02 Highlights: - Lyra: A New Very Low-Bitrate Codec for Speech Compression - TF 3D exists - Why are machine learning algorithms hard to tune (training with several losses)? - The Technology Behind Cinematic Photos - XOR trick - Why did…»
Forwarded from Silero News (Alexander)
Silero Models Update
Preparing to roll out a huge backlog of updates:
- Added
- Ukrainian model V3 released (also
- Some organizational issues and updates
Please see full changes, status and plans here
Stay tuned for much much more soon!
Preparing to roll out a huge backlog of updates:
- Added
xxsmall
speed metrics (non e2e, only the acoustic / CE model)- Ukrainian model V3 released (also
xxsmall
)- Some organizational issues and updates
Please see full changes, status and plans here
Stay tuned for much much more soon!
GitHub
silero-models/changelog.md at master · snakers4/silero-models
Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple - silero-models/changelog.md at master · snakers4/silero-models
Compressed Feather in Pandas
A nifty feature in pandas I totally missed - saving not only
- Pandas to feather doc
- Pyarrow to feather doc
#data_science
A nifty feature in pandas I totally missed - saving not only
.csv
data frames compressed, but also .feather
ones. Reduces files size 4-5x for repetitive data.- Pandas to feather doc
- Pyarrow to feather doc
#data_science
A First Legitimate Use of Crypto?
Seems a bit ridiculous, but looks like a decent way for bands / artists to hold auctions for their creations:
- https://opensea.io/collection/clowncoin
Seems a bit ridiculous, but looks like a decent way for bands / artists to hold auctions for their creations:
- https://opensea.io/collection/clowncoin
OpenSea
CLOWN COIN - Collection
8===D $ $ $ $
PyTorch 1.8 Released
- https://pytorch.org/blog/pytorch-1.8-released/
- https://github.com/pytorch/pytorch/releases
Apart from mostly fixes, and some nice quantization (still no transformer!) and ONNX improvements, I really like this additions:
(0)
(1)
(2)
New beta benchmark utils
Link
(3)
New PyTorch Mobile demos
(4)
New quantization API
link
(5)
New related libraries release (i.e.
#deep_learning
- https://pytorch.org/blog/pytorch-1.8-released/
- https://github.com/pytorch/pytorch/releases
Apart from mostly fixes, and some nice quantization (still no transformer!) and ONNX improvements, I really like this additions:
(0)
PyTorch Lite Interpreter is a streamlined version of the PyTorch runtime that can execute PyTorch programs in resource constrained devices, with reduced binary size footprint. This prototype feature reduces binary sizes by up to 70% compared to the current on-device runtime in the current release.Link
(1)
Starting in PyTorch 1.8, we have added support for ROCm wheels providing an easy onboarding to using AMD GPUs.Link
(2)
New beta benchmark utils
Link
(3)
New PyTorch Mobile demos
(4)
New quantization API
link
(5)
New related libraries release (i.e.
torchaudio
, torchvision
), looks like they are tied to PyTorch releases now#deep_learning
PyTorch
PyTorch 1.8 Release, including Compiler and Distributed Training updates, and New Mobile Tutorials
We are excited to announce the availability of PyTorch 1.8. This release is composed of more than 3,000 commits since 1.7. It includes major updates and new features for compilation, code optimization, frontend APIs for scientific computing, and AMD ROCm…
PyTorch New Quantization API
A brief summary why PyTorch has a new prototype API for quantization - looks like the previous API was too difficult? It wasn't really, but it required some fiddling and non-standard layers just did not work:
A brief summary why PyTorch has a new prototype API for quantization - looks like the previous API was too difficult? It wasn't really, but it required some fiddling and non-standard layers just did not work:
Torch FX
- https://pytorch.org/docs/master/fx.html
Over the years, I became quite good in monkey patching PyTorch code just using python's and pytorch tools (e.g.
One thing comes to mind immediately - when you have the same models with static control flows and you need to create a quantized / torch script version of it. Now it is a pain in the ass - because it requires manually switching them back and forth (switch on, create a quantized TorchScript version one, switch back, create another one, etc).
Will I use it? I guess I need to sleep on it. We ended up not using static quantization very much. Looks very cool and flexible, serves a real purpose, but usually stupid one line hacks can do the same without learning a new tool.
So idk, what do you think? Do you like any of the examples? I like the invert one.
#deep_learning
- https://pytorch.org/docs/master/fx.html
X is a toolkit for developers to use to transform nn.Module instances. FX consists of three main components: a symbolic tracer, an intermediate representation, and Python code generation.I understand that people building PyTorch usually favour flexible toolkits (and they expose a lot to an end user) and most likely they just realized that static quantization was too complex for an average user to handle and they wrote this as an engine for automated quantization transformations, which is cool. Designing a proper API is always a balancing act.
Over the years, I became quite good in monkey patching PyTorch code just using python's and pytorch tools (e.g.
module.named_modules()
). So I wonder what the killer use case of this feature would be?One thing comes to mind immediately - when you have the same models with static control flows and you need to create a quantized / torch script version of it. Now it is a pain in the ass - because it requires manually switching them back and forth (switch on, create a quantized TorchScript version one, switch back, create another one, etc).
Will I use it? I guess I need to sleep on it. We ended up not using static quantization very much. Looks very cool and flexible, serves a real purpose, but usually stupid one line hacks can do the same without learning a new tool.
So idk, what do you think? Do you like any of the examples? I like the invert one.
#deep_learning
New Benchmarking Tool in PyTorch
https://pytorch.org/tutorials/recipes/recipes/benchmark.html#pytorch-benchmark
Looks a bit over-complicated at the first glance (why provide classes for random tensor generation, I have no idea), but it has a few very nice features:
- Automated
- Automated CUDA synchronization
- Report generation, storing the results, comparing the results
But I suppose there is nothing wrong just using
#deep_learning
https://pytorch.org/tutorials/recipes/recipes/benchmark.html#pytorch-benchmark
Looks a bit over-complicated at the first glance (why provide classes for random tensor generation, I have no idea), but it has a few very nice features:
- Automated
num_threads
handling- Automated CUDA synchronization
- Report generation, storing the results, comparing the results
But I suppose there is nothing wrong just using
%%timeit
manually setting num_threads
.#deep_learning
Building Your Own Supercomputer Cheap (RU)
My guest post on ODS @ habr:
- https://habr.com/ru/company/ods/blog/546808/
EDIT - some awesome comments!
#deep_learning
My guest post on ODS @ habr:
- https://habr.com/ru/company/ods/blog/546808/
EDIT - some awesome comments!
#deep_learning
Хабр
Собираем Свой Суперкомпьютер Недорого
Нынче никого не удивишь достижениями искусственного интеллекта машинного обучения (ML) в самых разных областях. При этом доверчивые граждане редко задают два вопроса: (i) а какая собственно цена...
📎
BIFURCATED RISER X16 TO 2X8 (SET)
Remember that there is a very limited number of motherboards with 5+ PCIE slots?
Now there are risers like this - https://riser.maxcloudon.com/ru/bifurcated-risers/25-bifurcated-riser-x16-to-2x8-set.html
Has anyone tried something similar for DL?
#deep_learning
BIFURCATED RISER X16 TO 2X8 (SET)
Remember that there is a very limited number of motherboards with 5+ PCIE slots?
Now there are risers like this - https://riser.maxcloudon.com/ru/bifurcated-risers/25-bifurcated-riser-x16-to-2x8-set.html
Has anyone tried something similar for DL?
#deep_learning
MaxCloudON Shop
Bifurcated Riser X16 to 2X8 | PCIe Bifurcation Riser X16 to 2X8
A set of 2 Expanders, 4 cables and a Daughterboard for PCI Express bifurcation. Port splitting riser card from X16 to 2X8.
📎
While ThreadRipper Pro MBs are impossible to buy, this MB may be the base for our next huge server build:
- https://market.yandex.ru/product--materinskaia-plata-asrock-rack-romed8-2t/705623617
Looks a bit expensive (and it uses ECC RAM + EPYC processors), but with 7 x PCIE 4.0 16x and 2x10Gbit/s Ethernet possibilities are limitless.
📎
And another hack - buying used 100 GBit/s infiniband cards from ebay, they are cheap now in the US
#deep_learning
While ThreadRipper Pro MBs are impossible to buy, this MB may be the base for our next huge server build:
- https://market.yandex.ru/product--materinskaia-plata-asrock-rack-romed8-2t/705623617
Looks a bit expensive (and it uses ECC RAM + EPYC processors), but with 7 x PCIE 4.0 16x and 2x10Gbit/s Ethernet possibilities are limitless.
📎
And another hack - buying used 100 GBit/s infiniband cards from ebay, they are cheap now in the US
#deep_learning
Яндекс Маркет
Материнская плата ASRock Rack ROMED8-2T oem — купить в интернет-магазине по низкой цене на Яндекс Маркете
Материнская плата ASRock Rack ROMED8-2T oem — купить сегодня c доставкой и гарантией по выгодной цене. Материнская плата ASRock Rack ROMED8-2T oem: характеристики, фото, магазины поблизости на карте. Достоинства и недостатки модели — Материнская плата ASRock…
Spark in me pinned «Building Your Own Supercomputer Cheap (RU) My guest post on ODS @ habr: - https://habr.com/ru/company/ods/blog/546808/ EDIT - some awesome comments! #deep_learning»
Spark in me pinned «📎 BIFURCATED RISER X16 TO 2X8 (SET) Remember that there is a very limited number of motherboards with 5+ PCIE slots? Now there are risers like this - https://riser.maxcloudon.com/ru/bifurcated-risers/25-bifurcated-riser-x16-to-2x8-set.html Has anyone…»
Spark in me pinned «📎 While ThreadRipper Pro MBs are impossible to buy, this MB may be the base for our next huge server build: - https://market.yandex.ru/product--materinskaia-plata-asrock-rack-romed8-2t/705623617 Looks a bit expensive (and it uses ECC RAM + EPYC processors)…»