Vol Building AGI

Turns out the transition probabilities directly affect state duration: the expected duration of each state is E[d] = 1/(1-p) where p is the transition probability.

This quantity corresponds to the mean of the geometric distribution. It arises when you run a sequence of coin flips with probability p and count the number of times until you get one side. On average it's 1/(1-p).

On the screenshot I start with a trained acoustic model but set the transition probabilities to 3/4 1/4 (printed at the top).

The decoder assigns one frame to each of the first two states and then spends all of its time in the final state (second ali plot on the bottom), even though the observation map looks reasonable (see right plot). The error goes away when I initialize state durations by solving for p letting d be the total duration of the sequence divided by the number of states.

170 views03:03

Animating the Linde-Buzo-Gray clustering algorithm. It starts with a single cluster describing all the data (i.e. just taking the mean), doubles it using a small perturbation and refines the clustering using the Lloyd (aka "k means") algorithm. I am plotting the reconstruction loss, the entropy (entropy should be close to log K) and the codebook utilization. At the end there are 5 dead entries in the codebook — this problem gets bigger with larger K and needs to be worked on.

Each picture is the codebook after doubling the number of clusters.

❤1🔥1

215 viewsedited 03:11

Vol Building AGI

💯5

215 views03:59

Vol Building AGI

Kazuki Irie’s new paper: the brain uses keys to access memories but cannot access keys themselves https://arxiv.org/abs/2501.02950

arXiv.org

Key-value memory in the brain

Classical models of memory in psychology and neuroscience rely on similarity-based retrieval of stored patterns, where similarity is a function of retrieval cues and the stored patterns. While...

❤1

211 views03:43

Vol Building AGI

Pretty cool that in a progressive superposition of MLPs scaling activations for older MLPs just by a single large value is enough to restore access to the old task knowledge written in the "old" MLP.

SGD seems to be incentivized to find larger weights for the second MLP in the continual training scenario so its activations always "beat" the activations of the first network.

I am referring to this experiment: https://github.com/kazuki-irie/kv-memory-brain/blob/master/Forgetting_and_recovery.ipynb

GitHub

kv-memory-brain/Forgetting_and_recovery.ipynb at master · kazuki-irie/kv-memory-brain

Official Code Repository for the paper "Key-value memory in the brain" - kazuki-irie/kv-memory-brain

❤3

289 views05:40

Vol Building AGI

252 views04:12

Vol Building AGI

When training neural networks with SGD, the learning rate critically depends on the batch size. If you change one parameter you usually need to sweep the other. This a heat map that measures log(1-accuracy) on the test set as a function of width, batch size and learning rate.

🔥3

259 views22:12

Vol Building AGI

Decided to collect a few tokens from telegram for the UNLP shared task pretraining. Will 1B be enough? At 350M words now.

👍3❤1

325 views05:06

Vol Building AGI

baseline.png

532.6 KB

I ran a baseline for UNLP2025 shared task using gpt-4o-mini-2024-07-18 and gpt-4o-2024-08-06 with a basic prompt and structured outputs asking for reasoning and labels as binary flags. The macro f1 scores for technique detection are 0.32 and 0.34, mini model tends to trade precision for extra recall.

👍4

741 views05:44

Vol Building AGI

> Так, всі вже зареєструвалися на спільну задачу UNLP, а ти чого чекаєш?

425 views05:15

Vol Building AGI

> Зареєструйся на спільну задачу UNLP, інакше будеш лузером

😢1

430 views05:21

Vol Building AGI

> Так а чого ти бертом тегатимеш слова в реченні, це ж просто енкодер. Енкодери то обмежена архітектура, нею AGI не зробиш

461 views05:25

Vol Building AGI

> Ллама то класна ллмка, і все тут сказано.

💯1

490 views05:32

Vol Building AGI

> Кажеш, BERT — то просто енкодер, але а як же тоді його топові результати у всяких бенчмарках? (this one is especially weird to me — it feels like it's addressing the point but it's actually diverting to another one!)

536 viewsedited 05:34

Vol Building AGI

Leo has sent me a paper that talks about how neural networks learn fourier features to perform addition — a recurrent topic in NN literature. It's story time.

In a bar after my master's thesis defense, Professor Stefan Wolf wrote an equation on a napkin:

$ \pi/4 = \sum_n^\infty (-1)^n / (2n + 1) $

This was the shortest program to compute pi I've ever seen yet, so I was incredibly excited. If I spend more test time compute on it, I get a better approximation of pi. It's not the fastest program in terms of convergence speed, but definitely a one short enough that I can remember it. It was an approximation due to Leibniz.

Next morning I got interested in finding a linear RNN curcuit that encodes the Leibniz approximation. A linear RNN uses a cumulative sum operation at its heart, so given a stream of input ones it outputs a count of ones so far "for free" (by construction / inductive bias).

I can express the division as a nonlinearity, but how to express sign flipping (-1)^n as n goes up?

A natural way of encoding a flipping sign is using base 2 representation: if you encode n using binary then the least significant bit will be alternating.
In a neural network we can express each bit position using a separate dimension. Consider this linear feature map:

def binary(digits: int):
"Make a basis of powers of two of dimension digits, lowest bits first"
return 1 << np.arange(digits)

After mapping a sum into this high dimensional space you only need to read off the leading dimension, aka the highest frequency component.

What is a natural learnable approximation of representing numbers in base K? Well, it seems that it is somewhere in the basis of sinusoids of different frequencies.

❤4

471 views07:45

Vol Building AGI

If you're curious about what the final construction looks like, check out this program: https://gist.github.com/proger/ba147e3953a155d833aae084c1f0cd12

Gist

pi.py

GitHub Gist: instantly share code, notes, and snippets.

🔥1

499 views07:50

Vol Building AGI

Behind the scenes of early versions of ChatGPT by John Schulman and Barret Zoph: https://x.com/johnschulman2/status/1891539960743743756?

X (formerly Twitter)

John Schulman (@johnschulman2) on X

@barret_zoph and I recently gave a talk at Stanford on post-training and our experience working together on ChatGPT. Unfortunately the talk wasn't recorded, but here are the slides: https://t.co/7fcGmvFtUF. (If you have a recording, please let me know!)

397 views06:20

Vol Building AGI

https://youtu.be/9_PepvnqIfU

There are no authorities in science

YouTube

TURING AWARD WINNER Richard S. Sutton in Conversation with Cam Linke | No Authorities in Science

“There are no authorities in science,” says A.M. Turing Award winner Richard S. Sutton.
In this exclusive conversation, Amii Chief Scientific Advisor Richard S. Sutton and Amii CEO Cam Linke discuss the breakthroughs that shaped Reinforcement Learning, the…

👍3

380 views20:56

Vol Building AGI

Forwarded from Linkstream

science says we will remain smart as long as we want
https://www.science.org/doi/full/10.1126/sciadv.ads1560?af=R

Science Advances

Age and cognitive skills: Use it or lose it

Cognitive skills do not decline with age for those who use math and reading throughout their life.

❤2

387 views21:17

Vol Building AGI

Incompleteness? Just avoid it

https://arxiv.org/abs/1408.3821

arXiv.org

Limits on Fundamental Limits to Computation

An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of...

😁1

383 views17:23

Vol Building AGI

Forwarded from AI HOUSE

🎧 Запрошуємо переглянути новий епізод AI HOUSE Podcast

В гостях — Володимир Кирилов, Member of Technical Staff в OpenAI. Разом із нашим ведучим Романом Кислим, вони заглибились у важливі теми розвитку ШІ, deep learning та шлях до роботи в одній із найвідоміших ШІ-лабораторій світу.

А саме:
— як українці винайшли Deep Learning;
— чому саме у Хінтона все вийшло;
— як працюють лабораторії машинного навчання за кордоном;
— як Володимир потрапив в OpenAI;
— про самоосвіту, навчання в УКУ та на ФІОТ, магістратуру з ШІ й пейпери.

Випуск вже можна подивитися на ютуб-каналі або послухати на зручних для вас подкаст-платформах.

Ставте лайки та залишайте коментарі, <ми завжди їх читаємо>.

Приємного перегляду 👀

🏠 LinkedIn 🏠 Instagram 🏠 Podcast

Please open Telegram to view this post

VIEW IN TELEGRAM

❤10

380 views17:10

About

Blog

Apps

Platform