Spark in me

Compressed Feather in Pandas

A nifty feature in pandas I totally missed - saving not only .csv data frames compressed, but also .feather ones. Reduces files size 4-5x for repetitive data.

- Pandas to feather doc
- Pyarrow to feather doc

#data_science

1.0K viewsAlexander, edited 06:11

Spark in me

A First Legitimate Use of Crypto?

Seems a bit ridiculous, but looks like a decent way for bands / artists to hold auctions for their creations:

- https://opensea.io/collection/clowncoin

OpenSea

CLOWN COIN - Collection

8===D $ $ $ $

1.1K viewsAlexander, edited 06:15

Spark in me

PyTorch 1.8 Released

- https://pytorch.org/blog/pytorch-1.8-released/
- https://github.com/pytorch/pytorch/releases

Apart from mostly fixes, and some nice quantization (still no transformer!) and ONNX improvements, I really like this additions:

(0)

PyTorch Lite Interpreter is a streamlined version of the PyTorch runtime that can execute PyTorch programs in resource constrained devices, with reduced binary size footprint. This prototype feature reduces binary sizes by up to 70% compared to the current on-device runtime in the current release.

Link

(1)

Starting in PyTorch 1.8, we have added support for ROCm wheels providing an easy onboarding to using AMD GPUs.

Link

(2)
New beta benchmark utils
Link

(3)
New PyTorch Mobile demos

(4)
New quantization API
link

(5)
New related libraries release (i.e. torchaudio, torchvision), looks like they are tied to PyTorch releases now

#deep_learning

PyTorch

PyTorch 1.8 Release, including Compiler and Distributed Training updates, and New Mobile Tutorials

We are excited to announce the availability of PyTorch 1.8. This release is composed of more than 3,000 commits since 1.7. It includes major updates and new features for compilation, code optimization, frontend APIs for scientific computing, and AMD ROCm…

1.0K viewsAlexander, edited 05:33

Spark in me via @gif

This media is not supported in your browser

VIEW IN TELEGRAM

946 viewsAlexander, 06:08

Spark in me

Happy 8th of March xD

949 viewsAlexander, 06:08

Spark in me

PyTorch New Quantization API

A brief summary why PyTorch has a new prototype API for quantization - looks like the previous API was too difficult? It wasn't really, but it required some fiddling and non-standard layers just did not work:

940 viewsAlexander, edited 07:35

Spark in me

image_2021-03-08_10-35-09.png

77 KB

1.1K viewsAlexander, 07:35

Spark in me

Torch FX

- https://pytorch.org/docs/master/fx.html

X is a toolkit for developers to use to transform nn.Module instances. FX consists of three main components: a symbolic tracer, an intermediate representation, and Python code generation.

I understand that people building PyTorch usually favour flexible toolkits (and they expose a lot to an end user) and most likely they just realized that static quantization was too complex for an average user to handle and they wrote this as an engine for automated quantization transformations, which is cool. Designing a proper API is always a balancing act.

Over the years, I became quite good in monkey patching PyTorch code just using python's and pytorch tools (e.g. module.named_modules()). So I wonder what the killer use case of this feature would be?

One thing comes to mind immediately - when you have the same models with static control flows and you need to create a quantized / torch script version of it. Now it is a pain in the ass - because it requires manually switching them back and forth (switch on, create a quantized TorchScript version one, switch back, create another one, etc).

Will I use it? I guess I need to sleep on it. We ended up not using static quantization very much. Looks very cool and flexible, serves a real purpose, but usually stupid one line hacks can do the same without learning a new tool.

So idk, what do you think? Do you like any of the examples? I like the invert one.

#deep_learning

1.2K viewsAlexander, edited 08:00

Spark in me

New Benchmarking Tool in PyTorch

https://pytorch.org/tutorials/recipes/recipes/benchmark.html#pytorch-benchmark

Looks a bit over-complicated at the first glance (why provide classes for random tensor generation, I have no idea), but it has a few very nice features:

- Automated num_threads handling
- Automated CUDA synchronization
- Report generation, storing the results, comparing the results

But I suppose there is nothing wrong just using %%timeit manually setting num_threads.

#deep_learning

1.4K viewsAlexander, 05:24

Spark in me

Building Your Own Supercomputer Cheap (RU)

My guest post on ODS @ habr:

- https://habr.com/ru/company/ods/blog/546808/

EDIT - some awesome comments!

#deep_learning

Хабр

Собираем Свой Суперкомпьютер Недорого

Нынче никого не удивишь достижениями искусственного интеллекта машинного обучения (ML) в самых разных областях. При этом доверчивые граждане редко задают два вопроса: (i) а какая собственно цена...

1.3K viewsAlexander, edited 11:01

Spark in me

📎

BIFURCATED RISER X16 TO 2X8 (SET)

Remember that there is a very limited number of motherboards with 5+ PCIE slots?

Now there are risers like this - https://riser.maxcloudon.com/ru/bifurcated-risers/25-bifurcated-riser-x16-to-2x8-set.html

Has anyone tried something similar for DL?

#deep_learning

MaxCloudON Shop

Bifurcated Riser X16 to 2X8 | PCIe Bifurcation Riser X16 to 2X8

A set of 2 Expanders, 4 cables and a Daughterboard for PCI Express bifurcation. Port splitting riser card from X16 to 2X8.

1.2K viewsAlexander, edited 14:21

Spark in me

📎

While ThreadRipper Pro MBs are impossible to buy, this MB may be the base for our next huge server build:

- https://market.yandex.ru/product--materinskaia-plata-asrock-rack-romed8-2t/705623617

Looks a bit expensive (and it uses ECC RAM + EPYC processors), but with 7 x PCIE 4.0 16x and 2x10Gbit/s Ethernet possibilities are limitless.

📎

And another hack - buying used 100 GBit/s infiniband cards from ebay, they are cheap now in the US

#deep_learning

Яндекс Маркет

Материнская плата ASRock Rack ROMED8-2T oem — купить в интернет-магазине по низкой цене на Яндекс Маркете

Материнская плата ASRock Rack ROMED8-2T oem — купить сегодня c доставкой и гарантией по выгодной цене. Материнская плата ASRock Rack ROMED8-2T oem: характеристики, фото, магазины поблизости на карте. Достоинства и недостатки модели — Материнская плата ASRock…

1.4K viewsAlexander, edited 07:00

Spark in me

Spark in me pinned «Building Your Own Supercomputer Cheap (RU) My guest post on ODS @ habr: - https://habr.com/ru/company/ods/blog/546808/ EDIT - some awesome comments! #deep_learning»

13:43

Spark in me

Spark in me pinned «📎 BIFURCATED RISER X16 TO 2X8 (SET) Remember that there is a very limited number of motherboards with 5+ PCIE slots? Now there are risers like this - https://riser.maxcloudon.com/ru/bifurcated-risers/25-bifurcated-riser-x16-to-2x8-set.html Has anyone…»

13:43

Spark in me

Spark in me pinned «📎 While ThreadRipper Pro MBs are impossible to buy, this MB may be the base for our next huge server build: - https://market.yandex.ru/product--materinskaia-plata-asrock-rack-romed8-2t/705623617 Looks a bit expensive (and it uses ECC RAM + EPYC processors)…»

13:43

Spark in me

TLDR Doom Eternal Review

The og game - once in a generation game. 11/10. Go play it. Beat ultra nightmare.

DLC 1 - epic, beautiful and ball crushing. Beat ultra nightmare if you have the balls.

DLC 2 - fun, beautiful, but a bit rushed and disappointing ending / boss. Does not up the ante for seasoned players at all.

#off_topic

1.5K viewsAlexander, edited 05:08

Spark in me

Forwarded from Silero News (Alexander)

Silero VAD Micro 8 kHz

10k param VAD added (vs. 1.1m params) for 8 kHz audio only.

If you do not want to do any resampling.

https://github.com/snakers4/silero-vad

GitHub

GitHub - snakers4/silero-vad: Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Silero VAD: pre-trained enterprise-grade Voice Activity Detector - snakers4/silero-vad

1.1K viewsAlexander, 15:44

Spark in me

PyTorch + AMD Inference

We were benchmarking our networks on Intel vs AMD processors using out-of-the-box official build.

And Intel mostly is better (with the same number of threads and roughly the same core speed and lack of overclocking). I was wondering why this is, and then I found this thread.

To be honest I have little motivation to invest time in redoing our environment builds from scratch with OpenBLAS + CUDA (and most likely it is not worth the time since in production most likely there will be Intel CPUs).

But I wonder, does anyone in the community have dockerized dev environment builds based around CUDA + OpenBLAS? Because looks like out of the box PyTorch ships with MKL by Intel.

#deep_learning

From the MachineLearning community on Reddit: [Discussion] PyTorch favors Intel against AMD's rising?

Explore this post and more from the MachineLearning community

1.2K viewsAlexander, 07:15

Spark in me

Factorized Networks

I really like the idea from this article - https://www.microsoft.com/en-us/research/blog/factorized-layers-revisited-compressing-deep-networks-without-playing-the-lottery/

Basically you do not prune networks (which does not readily transfer into inference) or distill your teacher network into a student, but train a low-rank factorized version of the network with some optimizations from scratch.

This article even has code, but basically ... this is an older fork of fairseq imported via 1 commit. So good luck doing what authors did not bother to do (providing a stand-alone implementation).

So the question is - has anyone seen a minimalist stand-alone implementation for something similar?

#deep_learning

Microsoft Research

Factorized layers revisited: Compressing deep networks without playing the lottery - Microsoft Research

From BiT (928 million parameters) to GPT-3 (175 billion parameters), state-of-the-art machine learning models are rapidly growing in size. With the greater expressivity and easier trainability of these models come skyrocketing training costs, deployment difficulties…

1.2K viewsAlexander, 06:51

Spark in me

Speeding Up Your Transformer Based Networks For Real

You know that there are a lot of pruning / distillation papers that boast 90% sparsification, but it is impossible to use this in production? Looks like I stumbled upon some decent recipe.

Well, you can take a well-trained transformer based network and just replace all of the Linear layers with their low-rank counterparts initialized using SVD (spectral initialization).

But does it really work?

I tested it today, it worked.

How well does it work?

On a not-quite trained network my loss after 1 epoch of tuning was 25-30% higher with a factorized model compared to a full model. I took 25% of eigenvalues and they amounted for about ~50% of total.

The metrics took a significant hit, but on simpler classification tasks it should work better.

What are the benefits?

25% factorization (i.e. taking only 25% of all eigenvalues) produces a model that is 50% smaller.

Is it worth it?

The main question is that can you take either (i) a poorly trained or (ii) well-trained network and train it until it reaches the same numbers as the full model.

This remains to be seen.

class FactorizedLinear(nn.Module):
    def __init__(self,
                 or_linear,
                 dim_ratio=1.0):
        super().__init__()
        self.bias = nn.parameter.Parameter(or_linear.bias.data, requires_grad=True)
        u, vh = self.spectral_init(or_linear.weight.data, dim_ratio=dim_ratio)
        print(f'Doing SVD of tensor {or_linear.weight.shape}, U: {u.shape}, Vh: {vh.shape}')
        self.u = nn.parameter.Parameter(u, requires_grad=True)
        self.vh = nn.parameter.Parameter(vh, requires_grad=True)
        self.dim_ratio = dim_ratio
        self.in_features = u.size(0)
        self.out_features = vh.size(1)

    @staticmethod
    def spectral_init(m,
                      dim_ratio=1):
        u, s, vh = torch.linalg.svd(m, full_matrices=False)
        u = u @ torch.diag(torch.sqrt(s))
        vh = torch.diag(torch.sqrt(s)) @ vh
        if dim_ratio < 1:
            dims = int(u.size(1) * dim_ratio)
            u = u[:, :dims]
            vh = vh[:dims, :]
            s_share = s[:dims].sum() / s.sum() * 100
            print(f'SVD eigenvalue share {s_share:.2f}%')
        return u, vh

    def extra_repr(self) -> str:
        return (f'in_features={self.in_features}, '
                f'out_features={self.out_features}, '
                f'bias=True, dim_ratio={self.dim_ratio}')

    def forward(self, x):
        return x @ (self.u @ self.vh).transpose(0, 1) + self.bias

# deep_learning

1.5K viewsAlexander, edited 18:06

Spark in me

Forwarded from Silero News (Alexander)

Silero TTS Released

Surprise! A quick pre-release of Silero Text-to-Speech models!

Speakers

10 voices (each available in 16 kHz and 8 kHz):

- 6 Russian voices;
- 1 English voice;
- 1 German voice, 1 Spanish voice, 1 French voice;

Why is this Different?

- One-line usage;
- A large library of voices;
- A fully end-to-end pipeline;
- Naturally sounding speech;
- No GPU or training required;
- Minimalism and lack of dependencies;
- Faster than real-time on one CPU thread (!!!);
- Support for 16kHz and 8kHz out of the box;

Links

- Try our TTS models here;
- Quick summary;
- Performance benchmarks;

Stay tuned for much more detailed PR releases and torch.hub release soon!

GitHub

GitHub - snakers4/silero-models: Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly…

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple - snakers4/silero-models

770 viewsAlexander, 10:27

About

Blog

Apps

Platform