Spark in me

Stellar No BS Articles

A ConvNet for the 2020s - https://arxiv.org/pdf/2201.03545.pdf

#no_bs

1.3K viewsAlexander, 09:16

Spark in me

image_2022-01-17_12-55-21.png

27.2 KB

3090 vs 3090 Ti
TLDR - a factory overclocked version with 450W TDP

1.1K viewsAlexander, edited 09:55

Spark in me

Forwarded from Silero News (Alexander)

High Quality Ukrainian TTS

We have published a high quality V2 Ukrainian male voice - mykyta_v2.

Key features:

- 8000, 24000, 48000 kHz - support for ultra high quality (wideband)
- Unique high quality voice with good articulation
- Same speed as V2 models
- No automatic stress support

Link - https://github.com/snakers4/silero-models#models-and-speakers

This is a first step towards a huge new release, we typically adopt new features with via tic-toc strategy.

This is mostly a beta model, much faster models are soon to come for all languages and all speakers.

GitHub

GitHub - snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly…

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple - snakers4/silero-models

995 viewsAlexander, 13:49

Spark in me

Forwarded from Silero News (Alexander)

mykyta_48k.wav

239.1 KB

1.0K viewsAlexander, 13:49

Spark in me

Forwarded from Silero News (Alexander)

Even Better High Quality Ukrainian TTS

The same model, but sounds much better

Link - https://github.com/snakers4/silero-models#models-and-speakers

GitHub

GitHub - snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly…

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple - snakers4/silero-models

1.1K viewsAlexander, 14:13

Spark in me

Forwarded from Silero News (Alexander)

1.2K viewsAlexander, 14:13

Spark in me

Stellar No BS Articles

ConvMixer: Patches Are All You Need?

- An extremely simple model (30 lines of code in a readable format, <10 lines in non-readable format) - patch layer + some wide conv layers
- Claims competitive quality for its simplicity
- No real huge resources poured in its design, unlike effnets / regnets / nasnets etc

- https://github.com/locuslab/convmixer
- https://arxiv.org/pdf/2201.09792v1.pdf

Looks like a resurgence of plain and working ideas. ConvNext, RepVGG and now this.

#no_bs

GitHub

GitHub - locuslab/convmixer: Implementation of ConvMixer for "Patches Are All You Need? 🤷"

Implementation of ConvMixer for "Patches Are All You Need? 🤷" - GitHub - locuslab/convmixer: Implementation of ConvMixer for "Patches Are All You Need? 🤷"

1.1K viewsAlexander, edited 12:46

Spark in me

Yet Another Farcical Self-Delusion?

I do not work with text embedding models currently, but such threads are utterly hilarious:

https://twitter.com/Nils_Reimers/status/1487014195568775173?s=20&t=z8jAsiDgoASIOqppzhNnkQ

If anyone does, please explain if this is biased. But when I worked with such models ... low key public models published by FAIR / Google worked decently, so idk.

If OpenAI in reality is as useful as Tesla car service ... well you know =)

I can only add that when we were looking for some base compact multi-language transformer model for fine-tuning ... the best we found was dated ~2019, which I find fucking hilarious.

But ofc there were several people re-uploading the most popular models from 2018 by the hundreds... claiming to make them more compact ... just by cutting unnecessary embeddings for a given language.

#no_bs

908 viewsAlexander, 11:40

Spark in me

Digest 2022-01

# Speech

AI that understands speech by looking as well as hearing - https://ai.facebook.com/blog/ai-that-understands-speech-by-looking-as-well-as-hearing

HuBERT: Self-supervised representation learning for speech recognition, generation, and compression - https://ai.facebook.com/blog/hubert-self-supervised-representation-learning-for-speech-recognition-generation-and-compression

# ML

Графовые нейронные сети - https://dyakonov.org/2021/12/30/gnn/
A Gentle Introduction to Graph Neural Networks - https://distill.pub/2021/gnn-intro/
GPT-3, Foundation Models, and AI Nationalism - https://lastweekin.ai/p/gpt-3-foundation-models-and-ai-nationalism
The Illustrated Retrieval Transformer - https://jalammar.github.io/illustrated-retrieval-transformer/
You get what you measure: New NLU benchmarks for few-shot learning and robustness evaluation - https://www.microsoft.com/en-us/research/blog/you-get-what-you-measure-new-nlu-benchmarks-for-few-shot-learning-and-robustness-evaluation/
Azure AI milestone: New foundation model Florence v1.0 advances state of the art, topping popular computer vision leaderboards - https://www.microsoft.com/en-us/research/blog/azure-ai-milestone-new-foundation-model-florence-v1-0-pushing-vision-and-vision-language-state-of-the-art/
Language modelling at scale: Gopher, ethical considerations, and retrieval - https://deepmind.com/blog/article/language-modelling-at-scale
Sequence-to-sequence learning with Transducers - https://lorenlugosch.github.io/posts/2020/11/transducer/
A contemplation of logsumexp - https://lorenlugosch.github.io/posts/2020/06/logsumexp/
Meta claims its AI improves speech recognition quality by reading lips - https://venturebeat.com/2022/01/07/meta-claims-its-ai-improves-speech-recognition-quality-by-reading-lips/
Training 100B models is fucking hard - https://github.com/bigscience-workshop/bigscience/blob/master/train/lessons-learned.md
Scaling Vision with Sparse Mixture of Experts - https://ai.googleblog.com/2022/01/scaling-vision-with-sparse-mixture-of.html
Интерпретация моделей и диагностика сдвига данных: LIME, SHAP и Shapley Flow - https://habr.com/ru/company/ods/blog/599573/
A ConvNet for the 2020s - https://arxiv.org/pdf/2201.03545.pdf
LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything - https://ai.googleblog.com/2022/01/lamda-towards-safe-grounded-and-high.html
Separating Birdsong in the Wild for Classification - https://ai.googleblog.com/2022/01/separating-birdsong-in-wild-for.html
Accurate Alpha Matting for Portrait Mode Selfies on Pixel 6 - https://ai.googleblog.com/2022/01/accurate-alpha-matting-for-portrait.html
The Gradient Update #16: China's World-leading Surveillance Research and a ConvNet for the 2020s - https://thegradientpub.substack.com/p/the-gradient-update-16-chinas-world
Does Gradient Flow Over Neural Networks Really Represent Gradient Descent? - http://www.offconvex.org/2022/01/06/gf-gd/
Does Your Medical Image Classifier Know What It Doesn’t Know? - https://ai.googleblog.com/2022/01/does-your-medical-image-classifier-know.html
Introducing Text and Code Embeddings in the OpenAI API - https://openai.com/blog/introducing-text-and-code-embeddings/
Steering Towards Effective Autonomous Vehicle Policy - https://thegradient.pub/engaging-with-disengagement/

Introducing StylEx: A New Approach for Visual Explanation of Classifiers
- https://ai.googleblog.com/2022/01/introducing-stylex-new-approach-for.html
- https://www.youtube.com/watch?v=mbrka3vBjH8
- tldr very cool, but most likely requires a lot of compute

Facebook

AI that understands speech by looking as well as hearing

To help build more versatile & robust AI speech recognition tools, we are announcing Audio-Visual HuBERT (AV-HuBERT), a state-of-the-art self-supervised framework for understanding speech that learns by observing & hearing people speak

809 viewsAlexander, 10:57

Spark in me

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
- [Illustration](https://scontent-arn2-1.xx.fbcdn.net/v/t39.2365-6/271815807_4636921079718503_8613393990345138136_n.gif?_nc_cat=107&ccb=1-5&_nc_sid=ad8a9d&_nc_ohc=yn27DielBOYAX8rk045&_nc_ht=scontent-arn2-1.xx&oh=00_AT8ueSOOllDdunQw26KIBUYwyoOq_b1leSPKrmSfZoeazA&oe=61F26871)
- [Link](https://ai.facebook.com/blog/the-first-high-performance-self-supervised-algorithm-that-works-for-speech-vision-and-text/)
- These are actually 3 separate models (!) - marketing lies as usual
- No clear indication, but the NLP model uses 16 GPUs, others - not specified
- The first high-performance self-supervised algorithm that works for speech, vision, and text
- Trained by predicting the model representations of the full input data given a partial view of the input
- Standard Transformer architecture with a modality-specific encoding
- The encoding of the unmasked training sample is parameterized by an exponentially moving average of the model parameters
- Training targets based on the output of the top K blocks of the teacher network for time-steps which are masked in student mode
- We apply a normalization to each block before averaging the top K blocks
- For speech representations, we use instance normalization
- For NLP and vision we found parameter-less layer normalization
- 800 epochs, 86M parameters and 307M parameters
- Smooth L1 loss

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
- 2106.07447
- Offline clustering step to provide aligned target labels for a BERT-like prediction loss
- Applying the prediction loss over the masked regions only
- Relies on the consistency of the unsupervised clustering step rather than the intrinsic quality of the assigned cluster labels
- Acoustic unit discovery models to provide frame-level targets
- How to mask and where to apply the prediction loss:
- p% of the timesteps are randomly selected as start indices, and spans of l steps are masked
- cross-entropy loss computed over masked and unmasked timesteps, weighted, α parameter
- α = 1 is more resilient to the quality of cluster targets, which is demonstrated in our experiments
- Multuple clustering, iterative refinement starting with MFCC
- Convolutional waveform encoder, a BERT encoder, a projection layer and a code embedding layer
- BASE, LARGE, and X-LARGE - 95M, 317M, 964M
- ![image](https://user-images.githubusercontent.com/12515440/150782226-92accb43-380a-4e0f-91f5-86fdba4624ce.png)
- Convolutional encoder generates a feature sequence at a 20ms framerate for audio sampled at 16kHz (CNN encoder down-sampling factor is 320x)
- After pre-training, CTC loss for ASR fine-tuning of the whole model weights except the convolutional audio encoder, which remains frozen
- CTC target vocabulary includes 26 English chars + space + apostrophe + CTC blank
- 960h of LibriSpeech + 60kh of Libri-light
- First iteration labels: 960 hour LibriSpeech training set, k-means clustering with 100 clusters on 39-dimensional MFCC features, which are 13 coefficients with the first and the second-order derivatives
- For the subsequent iterations, k-means clustering with 500 clusters on the latent features from the HuBERT model pre-trained in the previous iteration
- MiniBatchKMeans
- BASE - two iterations on the 960h on 32 GPUs (batch size of at most 87.5 seconds of audio per GPU), 250k steps
- LARGE and X-LARGE for one iteration on 60kh on 128 and 256 GPUs, respectively, for 400k steps

#digest

754 viewsAlexander, 10:57

Spark in me

Digest 2022-01

# Blogs

My first impressions of web3 - https://moxie.org/2022/01/07/web3-first-impressions.html
Dependency Risk and Funding - https://lucumr.pocoo.org/2022/1/10/dependency-risk-and-funding/
Tech questions for 2022 - https://www.ben-evans.com/benedictevans/2022/1/2/2022-questions
5 грязных трюков в соревновательном Data Science, о которых тебе не расскажут в приличном обществе - https://habr.com/ru/post/600067/
Proof of stake is a scam and the people promoting it are scammers - https://yanmaani.github.io/proof-of-stake-is-a-scam-and-the-people-promoting-it-are-scammers/
Bitcoin will never be a stable currency - https://yanmaani.github.io/bitcoin-will-never-be-a-stable-currency/
Understanding the SSH Encryption and Connection Process - https://www.digitalocean.com/community/tutorials/understanding-the-ssh-encryption-and-connection-process
Как работает Эфириум (Ethereum)? - https://habr.com/ru/post/407583/
New data: What developers look for in future job opportunities - https://stackoverflow.blog/2021/12/07/new-data-what-developers-look-for-in-future-job-opportunities/
О фейковых криптовалютах - https://habr.com/ru/post/544700/
New data: What developers look for in future job opportunities - https://stackoverflow.blog/2021/12/07/new-data-what-developers-look-for-in-future-job-opportunities/
Journalism, media, and technology trends and predictions 2022 - https://reutersinstitute.politics.ox.ac.uk/journalism-media-and-technology-trends-and-predictions-2022
Seoul Robotics launches Level 5 Control Tower to enable autonomous mobility - https://www.therobotreport.com/seoul-robotics-launches-level-5-control-tower-to-enable-autonomous-mobility/
How no-code AI development platforms could introduce model bias - https://venturebeat.com/2022/01/06/how-no-code-ai-development-platforms-could-introduce-model-bias/
Пожалуйста, прекратите называть админов девопсами - https://habr.com/ru/post/646581/
Fast subsets of large datasets with Pandas and SQLite - https://pythonspeed.com/articles/indexing-pandas-sqlite/
Secure your GitHub account with GitHub Mobile 2FA - https://github.blog/2022-01-25-secure-your-github-account-github-mobile-2fa/
One machine can go pretty far if you build things properly - https://rachelbythebay.com/w/2022/01/27/scale/
ML and NLP Research Highlights of 2021 - https://ruder.io/ml-highlights-2021/

#digest

Moxie Marlinspike

My first impressions of web3

Despite considering myself a cryptographer, I have not found myself particularly drawn to “crypto.” I don’t think I’ve ever actually said the words “get off my lawn,” but I’m much more likely to click on Pepperidge Farm Remembers flavored memes about how…

905 viewsAlexander, 10:58

Spark in me

Digest 2022-01

# Hardware

Using AI to Manage Internal SSD Parameters - https://thessdguy.com/using-ai-to-manage-internal-ssd-parameters/
Solidigm, SK hynix’ New SSD/Flash Subsidiary - https://thessdguy.com/solidigm-sk-hynix-new-ssd-flash-subsidiary/
Опубликованы бенчмарки EPYC 7773X: 64 ядра, 768 МБ кэша L3 - https://geekr.vercel.app/post/645203
Micron’s Tiny Little 2TB SSD - https://thessdguy.com/microns-tiny-little-2tb-ssd/
3090 Ti crazy prices - https://habr.com/ru/news/t/645881/

# Code

Object ownership across programming languages - https://codewithoutrules.com/2017/01/26/object-ownership/
Integrate-first approach - https://unstructed.tech/2022/01/10/integrate-first-approach/
Docker vs. Singularity for data processing: UIDs and filesystem access - https://pythonspeed.com/articles/containers-filesystem-data-processing/
Some more info about it - https://www.reddit.com/r/docker/comments/7y2yp2/why_is_singularity_used_as_opposed_to_docker_in/
Memory location matters for performance - https://pythonspeed.com/articles/performance-memory-locality/
Погромист. Мои самые эпичные провалы за всю карьеру - https://habr.com/ru/post/646393/
3 Things You Might Not Know About Numbers in Python - https://davidamos.dev/three-things-you-might-not-know-about-numbers-in-python/
The fastest way to read a CSV in Pandas - https://pythonspeed.com/articles/pandas-read-csv-fast/

#digest

1.4K viewsAlexander, 10:59

Spark in me

imodels: leveraging the unreasonable effectiveness of rules

Looks like a cool EDA / model based data exploration instrument for tabular data:

- Illustration
- https://bair.berkeley.edu/blog/2022/02/02/imodels/
- https://github.com/csinva/imodels

Not another 1 trillion param neural network, or AI fairness or policy bs.

The Berkeley Artificial Intelligence Research Blog

imodels: leveraging the unreasonable effectiveness of rules

The BAIR Blog

1.2K viewsAlexander, edited 09:15

Spark in me

Smart shop MVP in Moscow, Russia
Looks like it will work only for low traffic

945 viewsAlexander, edited 10:12

Spark in me

Forwarded from Заметки Computer Vision инженера

На днях ВкусВилл открыл умный магазин. Он открылся совсем рядом от меня - так что решил съездить посмотреть как устроен. Я уже делал обзор умного магазина от Азбуки Вкуса, так что есть с чем сравнить:)
https://youtu.be/JZP9z1jc54s

YouTube

Разбираем ещё один умный магазин

Прошлый рассказ про Азбуку Вкуса - http://cv-blog.ru/?p=409 (https://youtu.be/gwaZJhyzymU)

Больше моих статей можно найти тут - https://vk.com/cvml_team
Дублирую сюда - https://t.me/CVML_team

982 viewsAlexander, 10:12

Spark in me

By the way, this type of automation I like even more

1.1K viewsAlexander, 12:18

Spark in me

image_2022-02-14_15-18-32.png

1 MB

1.1K viewsAlexander, 12:18

Spark in me

PyTorch TLDR Retrospective

I really enjoy the constructive approach and humility on this guy:

- https://soumith.ch/posts/2022/01/pytorch-retro/

This is in such a contrast with the usual day-to-day corporate bs!

Also I just learned that there is a Github Stars program, which most likely would be dominated by frontend devs, but whatever.

I personally would nominate @searchivarius, @soumith and @lmcinnes.

Who would you?

#no_bs

Also insert a comment here on token achievements and the fact that despite some people sharing truly awe inspiring stuff, it will be subverted, weaponized and milked by capitalism. But I digress.

soumith.ch

Decisions and Pivots | Soumith Chintala

a tweet-thread at the 5-year mark

1.2K viewsAlexander, edited 08:55

Spark in me

Doom and ... Windows 11 now can run on Pixel phones ...

Android apps work in ... Windows 11 ...

All of this is useless, but probably one step towards just running Windows on ARM for plebs?

https://www.youtube.com/watch?v=Om__pY_ORTQ

YouTube

We will all be running Windroid soon...

Visit https://brilliant.org/TFC/ to get started learning STEM for free, and the first 200 people will get 20% off their annual premium subscription - Sponsored by Brilliant.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

►►► This video ◄◄◄

This week Google announced…

1.2K viewsAlexander, 17:45

About

Blog

Apps

Platform