Cheng Lu and Yang Song have solved diffusion https://arxiv.org/abs/2410.11081
arXiv.org
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling. Most existing CMs are trained using discretized timesteps, which introduce...
🤯2
https://youtu.be/ZANbujPTvOY
This graduate level problem benchmark was solved by o1 in less than a year since the benchmark was released — it was supposed unsolvable by language models for a while
This graduate level problem benchmark was solved by o1 in less than a year since the benchmark was released — it was supposed unsolvable by language models for a while
YouTube
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Authors: David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics…
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics…
How to build AGI, Ukrainian book from 1979. Still relevant
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=QkZlKBMAAAAJ&citation_for_view=QkZlKBMAAAAJ:CHSYGLWDkRkC
English version
https://archive.org/details/modelingofthinki0000amos/mode/2up
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=QkZlKBMAAAAJ&citation_for_view=QkZlKBMAAAAJ:CHSYGLWDkRkC
English version
https://archive.org/details/modelingofthinki0000amos/mode/2up
Google
Алгоритмы разума
НМ Амосов, 1979 - Cited by 257
👍4
Sora Turbo is generally available now in Ukraine, https://x.com/model_mechanic/status/1866183714407141603?s=46&t=qNUYWfgTfF4u1RKfN0ir3A
❤4
Mechinterp on chain of thought circuits https://arxiv.org/abs/2406.02128
arXiv.org
Iteration Head: A Mechanistic Study of Chain-of-Thought
Chain-of-Thought (CoT) reasoning is known to improve Large Language Models both empirically and in terms of theoretical approximation power. However, our understanding of the inner workings and...
Debug neural networks by casting them as neural fields https://github.com/neale/neural-canvas
GitHub
GitHub - neale/neural-canvas: creative deep learning with implicit neural representations
creative deep learning with implicit neural representations - neale/neural-canvas
ARC-AGI has been solved. Apply for safety testing of o3: https://openai.com/12-days/
OpenAI
12 Days of OpenAI
Plotting columns of the DFT basis with matplotlib looks very vibrant out of the box. A small enough ratio of the window size to the sampling rate allows for stripes to reveal underlying higher frequency filters at the back.
def dft(size=512, rate=16000, low=50):
k = np.linspace(low, rate / 2, size, endpoint=False)
t = np.arange(size) / rate
return np.exp(-2j * np.pi * k[:, None] * t)
Marcus Hutter has provided a recipe to build an agent that provably solves any problem. He wrote a new book about it: https://x.com/mhutter42/status/1871426793380688255?s=46&t=qNUYWfgTfF4u1RKfN0ir3A
X (formerly Twitter)
Marcus Hutter (@mhutter42) on X
Santa Arrived! The PDF (of a colorful Xmas version) of the "Introduction to Universal AI" book is now freely available online at https://t.co/r9COEBnf4S Wishing you all joyful reading, Merry Xmas & a :-) New Year.
🔥2
Media is too big
VIEW IN TELEGRAM
I installed ghostty so my terminal can render images and have fragment shaders applied on the whole window.
👍2🤯1
Media is too big
VIEW IN TELEGRAM
Let's animate the process of extracting a 13 Mel-Frequency Cepstral Coefficients (MFCC) spectrogram from an MP3 file.
To update gaussian mixture models with 16384 components of 13d MFCC frames using expectation maximization, I need to initialize the mixtures. The simplest data-driven initializer for GMMs is taking cluster centroids.
I decided to compare three algorithms for clustering:
1. random sampling (usually decent init for other algorithms, no hyperparmeters)
2. minibatch Lloyd (performs EMA updates on mini batches, has one EMA weight hypeparameter)
3. Linde-Buzo-Gray (LBG) with minibatch Lloyd refinement — a classic algorithm for computing quantization codebooks. Its main trick is to start with a cluster of size 1, and progressively double the clustering size by perturbing the original set and refining it with k-means.
I have 11 million frames of from common voice uk 10.0, so Lloyd algorithm (classical k means) and SVD/QR are out — they require materializing matrices that are a bit too big for my macbook.
On the plot the x axis is number of steps, the Y axis is the quantization loss.
I decided to compare three algorithms for clustering:
1. random sampling (usually decent init for other algorithms, no hyperparmeters)
2. minibatch Lloyd (performs EMA updates on mini batches, has one EMA weight hypeparameter)
3. Linde-Buzo-Gray (LBG) with minibatch Lloyd refinement — a classic algorithm for computing quantization codebooks. Its main trick is to start with a cluster of size 1, and progressively double the clustering size by perturbing the original set and refining it with k-means.
I have 11 million frames of from common voice uk 10.0, so Lloyd algorithm (classical k means) and SVD/QR are out — they require materializing matrices that are a bit too big for my macbook.
On the plot the x axis is number of steps, the Y axis is the quantization loss.
I found that LBG is very compute efficient — it spends most of the time running k-means for small clusterings, so in terms of wall clock (ballpark 10x faster?) my cpu-only pure-numpy implementation with efficient L2 distance computation. It also seems to be more data efficient: I couldn't get better results with lloyd when I ran it for more steps.
I didn't bother tuning the EMA learning rate too much and settled at 0.9. Progressive scaling is all we need?
I didn't bother tuning the EMA learning rate too much and settled at 0.9. Progressive scaling is all we need?
👍1
This media is not supported in your browser
VIEW IN TELEGRAM
Alignment self-training can work even with a single utterance. In this video expectation maximization for a GMM acoustic model with a linear chain HMM successfully finds a plausible alignment in 30 steps.
My algorithm only updates GMM mixture coefficients in the maximization step. Expectation step uses my implementation of scaled forward-backward recursions.
1024 GMM means are pretrained using LBG, the HMM prior is a linear chain — similar to what you see on the right hand side in the plot.
My algorithm only updates GMM mixture coefficients in the maximization step. Expectation step uses my implementation of scaled forward-backward recursions.
1024 GMM means are pretrained using LBG, the HMM prior is a linear chain — similar to what you see on the right hand side in the plot.