Vol Building AGI
581 subscribers
116 photos
9 videos
12 files
199 links
Past topics: speech synthesis, transformers, LSTM, recurrence
Download Telegram
Channel name was changed to «Vol Building AGI»
o1 excels at generating gpu kernels that are harder than level 1
ARC-AGI has been solved. Apply for safety testing of o3: https://openai.com/12-days/
Heatmap of additively smoothed log probabilities of character bigrams log p(target|source) in common voice uk 10.0. Every word is padded by spaces on each side.

що looks like a very popular word. ї is missing entirely! You can see vowels making up distinct columns and rows.
Plotting columns of the DFT basis with matplotlib looks very vibrant out of the box. A small enough ratio of the window size to the sampling rate allows for stripes to reveal underlying higher frequency filters at the back.


def dft(size=512, rate=16000, low=50):
k = np.linspace(low, rate / 2, size, endpoint=False)
t = np.arange(size) / rate
return np.exp(-2j * np.pi * k[:, None] * t)
Why want high precision accumulation while doing low precision computations
👍1
Media is too big
VIEW IN TELEGRAM
I installed ghostty so my terminal can render images and have fragment shaders applied on the whole window.
👍2🤯1
Media is too big
VIEW IN TELEGRAM
Let's animate the process of extracting a 13 Mel-Frequency Cepstral Coefficients (MFCC) spectrogram from an MP3 file.
To update gaussian mixture models with 16384 components of 13d MFCC frames using expectation maximization, I need to initialize the mixtures. The simplest data-driven initializer for GMMs is taking cluster centroids.

I decided to compare three algorithms for clustering:

1. random sampling (usually decent init for other algorithms, no hyperparmeters)
2. minibatch Lloyd (performs EMA updates on mini batches, has one EMA weight hypeparameter)
3. Linde-Buzo-Gray (LBG) with minibatch Lloyd refinement — a classic algorithm for computing quantization codebooks. Its main trick is to start with a cluster of size 1, and progressively double the clustering size by perturbing the original set and refining it with k-means.

I have 11 million frames of from common voice uk 10.0, so Lloyd algorithm (classical k means) and SVD/QR are out — they require materializing matrices that are a bit too big for my macbook.

On the plot the x axis is number of steps, the Y axis is the quantization loss.
I found that LBG is very compute efficient — it spends most of the time running k-means for small clusterings, so in terms of wall clock (ballpark 10x faster?) my cpu-only pure-numpy implementation with efficient L2 distance computation. It also seems to be more data efficient: I couldn't get better results with lloyd when I ran it for more steps.

I didn't bother tuning the EMA learning rate too much and settled at 0.9. Progressive scaling is all we need?
👍1
This media is not supported in your browser
VIEW IN TELEGRAM
Alignment self-training can work even with a single utterance. In this video expectation maximization for a GMM acoustic model with a linear chain HMM successfully finds a plausible alignment in 30 steps.

My algorithm only updates GMM mixture coefficients in the maximization step. Expectation step uses my implementation of scaled forward-backward recursions.

1024 GMM means are pretrained using LBG, the HMM prior is a linear chain — similar to what you see on the right hand side in the plot.