Spark in me

https://www.e-katalog.ru/u/5B2PXe/a Процессоры
https://www.e-katalog.ru/u/IfpRsz/a Комплектующие
В этом видео разбираемся с вопросом ограничения в работе комплектующих.
Производители комплектующих отключают ядра процессоров, урезают чипы для процессоров и…

1.1K viewsAlexander, 04:53

Spark in me

A Guide on Making Presentations and Charts

https://github.com/nalgeon/dataviz

#off_topic

GitHub

GitHub - nalgeon/dataviz: Data visualization guide for presentations, reports, and dashboards

Data visualization guide for presentations, reports, and dashboards - nalgeon/dataviz

1.1K viewsAlexander, 04:30

Spark in me

Forwarded from Silero News (Alexander)

A Streaming Interface for Silero Models EE

We have created a gRPC-based streaming interface for our EE models based on silero-vad.

Not sure if we are going to make any of this public, but writing an interface that adds value (as opposed to just having it) is difficult.

Key features:

- Unlike Google we do not rescore full results at the end of utterance / sentence => all results are kind of "final";
- Therefore "early" partial responses are a separate feature (i.e. 2 seconds after the start of utterance);
- Automatic handling of speech that is too long (i.e. 7 seconds or longer) - we have some hacks ensuring we do not cut words in the middle;
- Threading and multiprocessing;
- We had to create fast / efficient versions of silero-vad (10k or 100k params) to be included in the gRPC server;
- The service also proxies VAD responses, which may be useful downstream;

Hopefully, since real people do not speak at the same time, this would increase the hardware utilization efficiency 2x compared to a plain HTTP interface in case of phone calls.

In future we will also be calculating the sizings of our system using the streaming interface, i.e. how many real conversation each given sizing can really handle.

An educated guess - if we can handle 20 queries per second or 10 queries per 500ms with ~40 RTC, I suppose that would mean about 40 conversations.

1.0K viewsAlexander, 17:19

Spark in me

Forwarded from Silero News (Alexander)

Silero VAD Update

- Added a mini VAD (100k params as opposed to micro with 10k) for 8 kHz and 16 kHz;
- Added adaptive post-processing (no need for thresholds), examples coming soon;
- Micro is also available for 8k and 16k;

https://github.com/snakers4/silero-vad

GitHub

GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Silero VAD: pre-trained enterprise-grade Voice Activity Detector - snakers4/silero-vad

1.0K viewsAlexander, 17:19

Spark in me

Forwarded from Заметки Computer Vision инженера

Периодически закапываюсь и делаю очередной обзор рынка embedded устройств для ComputerVision. Сейчас триггернул OAK, на который недели две назад я делал обзор. И вот, очередная статья на Хабр - https://habr.com/ru/company/recognitor/blog/551552/

Хабр

Edge платы для домашнего Computer Vision

Я люблю делать всякие странные штуки с Computer Vision. Из того, что я выкладывал на Хабре - умная кормушку для птиц и камера для слежения за ребенком. По рабо...

1.3K viewsAlexander, 07:15

Spark in me

https://youtu.be/SVS3-DxdylI

YouTube

Облачный гейминг. Сравнение разных сервисов

Wi-Fi оборудование — https://www.e-katalog.ru/u/Ptu9D6/a
Компьютерная техника — https://www.e-katalog.ru/u/5dDaRG/a
В этом видео разбираемся с тем что есть хорошего и плохого в облачном гейминге, какие различия в разных сервисах и пробуем это всё на практике.…

1.1K viewsAlexander, 15:49

Spark in me

Forwarded from partially unsupervised

На выходных продуктивно прокрастинировал: не желая убирать в квартире, решил прибраться в компьютерах - разгрести завалы файлов, почистить ~/data, что-то забэкапить и так далее.

Хотелось сложить сотни гигабайт некритичных файлов (датасеты, не самые важные бэкапы и т.д.) так, чтобы это было просто (sync ~/data /awesome_storage) и дешево.

Первая мысль была про S3, но как-то это слишком "ентерпрайзно" для такой банальной задачи, да и априори не очень дешево ($0.023-0.025/Gb в зависимости от региона + куча подозрительных примечаний). Следующим вариантом был Digital Ocean Spaces, который в целом неплох и дает 250 Gb хранилища и 1Tb трафика за 5 баксов (дальше $0.02/Gb и $0.01/Gb соответственно), т.е. по деньгам ушел недалеко - довольно ожидаемо, все-таки это уже большая и с недавних пор публичная компания. Туда же идет Vultr (я использую их для ssh-туннелирования), который копирует Digital Ocean почти во всем, в т.ч. в прайсинге.

Приятной находкой стал BackBlaze. У этих ребят два продукта - backup решение (plug and play для нетехнарей) и S3-like хранилище с ценами сильно ниже ($0.005/Gb хранение, те же $0.01/Gb скачивание). Пользоваться легко, есть два CLI API - одно мимикрирует под S3, другое свое и чуть попроще (b2 sync origin source 🚀). Прям сейчас я туда заливаю всякое барахло с неидеальной скоростью в районе 5 Mbit/s, но кажется, что проблема скорее в моем исходящем канале.

Наконец, я наткнулся на Rclone. Это open source обертка над 50+ хранилищами, от своего FTP или SFTP до сервисов типа Dropbox и Google Drive. Вышеупомянутые S3, DO, Vultr и B2 тоже поддерживаются. Для более важных бэкапов можно, например, сделать синхронизацию между провайдерами в одну команду. Благодаря Rclone, мой терабайтный Яндекс.Диск (Яндекс дарит бывшим сотрудникам) теперь не пустует, а вовсю наполняется бэкапами.

Digitalocean

DigitalOcean Spaces | S3-Compatible Cloud Object Storage

S3-compatible object storage with a built-in CDN that makes scaling easy, reliable, and affordable. $5/mo.

948 viewsAlexander, 09:23

Spark in me

I personally also have promoted rclone here and I am a user of spaces by DO for smaller archives, which works just fine except for the steep pricing.

As for BackBlaze their pricing is nice, but I have considered them for really large backups ... and just buying hard-drives is also an option. Long-term buying drives is probably 2-3x cheaper (just did a quick back-of-the-envelope calculation, given that your NAS is cheap).

It depends on your use-case =)

PS
Updated second link

Spark in me

In case you need to download a large dropbox folder

https://github.com/rclone/rclone

1.0K viewsAlexander, edited 09:27

Spark in me

Forwarded from Silero News (Alexander)

🚀 Huge update for English STT in Silero Models 🚀

New features:

❗️ Default model (jit or onnx) size is reduced almost by 50% without sacrificing quality

❗️New model flavours: jit_q (smaller quantized model), jit_skip (with exposed skip connections), jit_large (higher quality model), onnx_large

❗️ New smallest model jit_q is only 40M in size

❗️New performance benchmarks - default models are on par with previous models and Google, large models mostly outperform Google

Deprecations

⚠️ TensorFlow checkpoints discontinued;

Coming Soon

📎 CE benchmarks coming soon

📎 xlarge model coming soon

📎 Even more quality improvements coming soon with v31

📎 xsmall model was created (2x smaller than the default), but I could not quantize it. I am looking into creating a xxsmall model;
Still working on making EE models fully JIT-traceable

Links

- https://github.com/snakers4/silero-models
- https://github.com/snakers4/silero-models/wiki/Quality-Benchmarks#en-v3

GitHub

Quality Benchmarks

Silero Models: pre-trained STT models and benchmarks made embarrassingly simple - snakers4/silero-models

945 viewsAlexander, 15:18

Spark in me

Forwarded from Silero News (Alexander)

🚀 Huge update for English STT in Silero Models - Continued 🚀

New features:

❗️ Added xsmall models: jit_xsmall, jit_q_xsmall, onnx_xsmall
❗️ New smallest model jit_q_xsmall is only 26M in size

Links

- https://github.com/snakers4/silero-models

GitHub

GitHub - snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly…

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple - snakers4/silero-models

908 viewsAlexander, 05:47

Spark in me

MLPerf Inference v1.0

- Inference Edge v1.0 https://mlcommons.org/en/inference-edge-10/
- Inference Datacenter v1.0 https://mlcommons.org/en/inference-datacenter-10/

The immediate conclusion (as expected) - enterprise kinky party. The second conclusion - they mostly compare vastly different systems (mostly HPC), which is good.

Honestly I do not really care for A100 vs. A?0 vs. Quadro vs. T4, but edge benchmarks are always rare and nice.

The most interesting spreadsheet IMO is this one.

And here I see some quite interesting stuff:

- Firefly-RK3399 looks similar to RPI 4 in performance (has anyone used it?)
- NVIDIA Jetson AGX Xavier looks ~2x faster than both of them (and probably is much more expensive and unobtainable)
- TFLite / ArmNN - but no ONNX or PyTorch on ARM, I wonder why
- int8 very much a must-have on these devices, I see performance boosts up to 2x

PS

Firefly-RK3399 has a PCIE M2 slot, so theoretically you can plug in PCIE accelerator sticks there? =)
It also runs on Ubuntu?

#hardware
#deep_learning

MLCommons

v1.0 Results

MLCommons aims to accelerate machine learning innovation to benefit everyone.

1.0K viewsAlexander, 04:46

Spark in me

The best Docker security guideline I have ever seen:

https://sysdig.com/blog/dockerfile-best-practices/

No, really. Not an ordinary top-10 article.

#off_topic

Sysdig

Top 20 Dockerfile best practices

Learn how to prevent security issues and optimize containerized applications by applying 20 Dockerfile best practices in your image building.

1.3K viewsAlexander, 11:05

Spark in me

Nice Transformer-based S2S Example

Minimalistic clean code is hard to come by nowadays:

- https://twitter.com/full_stack_dl/status/1349156930518859780

Transformer S2S decoding is not often properly explained.

Tldr - this enables fast masked training, but inference is step-by-step, no miracle here.

#deep_learning

Twitter

Full Stack Deep Learning

🛠️Tooling Tuesday🛠️ Today, we share a @GoogleColab notebook implementing a Transformer with @PyTorch, trained using @PyTorchLightnin. We show both encoder and decoder, train with teacher forcing, and implement greedy decoding for inference. colab.researc…

1.3K viewsAlexander, 13:45

Spark in me

Has anyone tried ZeroRedundancyOptimizer?

https://pytorch.org/tutorials/recipes/zero_redundancy_optimizer.html

Anonymous Poll

Yes (please comment about your experience)

29%

67%

What is ZeroRedundancyOptimizer?

136 voters1.3K viewsAlexander, 09:49

Spark in me

Forwarded from Silero News (Alexander)

🚀 Huge update for English STT in Silero Models - Continued 2 🚀

❗️ CE and EE benchmarks

❗️ Quantized model benchmarks

❗️ xsmall model benchmarks

https://github.com/snakers4/silero-models/wiki/Quality-Benchmarks#en-v3

GitHub

Quality Benchmarks

Silero Models: pre-trained STT models and benchmarks made embarrassingly simple - snakers4/silero-models

1.1K viewsAlexander, 13:17

Spark in me

2021 DS / ML Digest 04

📌 Highlights

- TalkNet 2
- A proper transformer S2S example (a breath of fresh air, really)
- The state of transformers in computer vision
- ZeRO via DeepSpeed and FairScale benchmarks
- Can Vision Transformers Learn without Natural Images?

💎 Spotlight

📌 ... goes to this blog post:

Docker security best practices - https://sysdig.com/blog/dockerfile-best-practices (this is not just another bs top-10 article, these pieces of advices make sense!)

📌 ... and to this paper:

Full Page Handwriting Recognition via Image to Sequence Extraction - http://arxiv.org/abs/2103.06450

There are no proper dependable modern OCR systems on the market! This paper shares our values and builds an in-the-wild system ... though authors have done it on university money and resold it to business (and no OSS also).

Please like / share / repost!

- https://spark-in.me/post/2021_ds_ml_digest_04

#digest

Sysdig

Top 20 Dockerfile best practices

Learn how to prevent security issues and optimize containerized applications by applying 20 Dockerfile best practices in your image building.

1.4K viewsAlexander, edited 12:07

Spark in me

https://www.youtube.com/watch?v=2ty2J0s2W0c&ab_channel=Vox

I cannot help myself sharing this.
Yet no one talks about Russia ... selling Sputnik V to low income countries.
What a world to live in!

#off_topic

YouTube

How rich countries are making the pandemic last longer

A program called Covax wants to distribute Covid-19 vaccines fairly. Is it working?

Subscribe to our channel! http://goo.gl/0bsAjO

Early in the Covid-19 pandemic, many of the world’s richest countries poured money into the race for a vaccine. Billions of…

1.7K viewsAlexander, 13:45

Spark in me

image_2021-04-28_16-45-15.png

754.3 KB

1.9K viewsAlexander, 13:45

Spark in me

Looks like now short video (10M) is supported in Github markdown

https://stackoverflow.com/questions/4279611/how-to-embed-a-video-into-github-readme-md

Stack Overflow

How to embed a video into GitHub README.md?

Is it possible to embed a flash video into README.md on GitHub? It isn't showing up: https://github.com/mattdipasquale/PicSciP

1.7K viewsAlexander, 04:51

Spark in me

This is just pure gold:
https://blog.piekniewski.info/2021/05/12/ai-mid-2021/

Piekniewski's blog

Ai mid 2021. Self driving car meets reality.

The pandemic has largely overwhelmed the news cycle over the past year and hence influencing and largely deflating the AI hype train. There were a few developments though which I'd consider significant. Some of them

1.6K viewsAlexander, 06:12

Spark in me

Einops and Einsum in PyTorch

Previously there was an attempt of making DL code more readable via named tensors (still a prototype, they "imported" a third party library). A cool idea, but I have never really seen anyone using it (me too).

Now a similar thing with (not) new Einstein notation for Deep learning:

- https://pytorch.org/docs/stable/generated/torch.einsum.html
- https://stackoverflow.com/questions/55894693/understanding-pytorch-einsum
- https://github.com/arogozhnikov/einops

Will it stick? No idea. Einsum may be a blessing for some complex code. It is not necessarily more readable generally if you got used to basic APIs (like bmm for example).

Also I believe it may be adapted into PyTorch as syntactic sugar.

#deep_learning

Stack Overflow

Understanding PyTorch einsum

I'm familiar with how einsum works in NumPy. A similar functionality is also offered by PyTorch: torch.einsum(). What are the similarities and differences, either in terms of functionality or perfo...

1.3K viewsAlexander, 08:56

About

Blog

Apps

Platform