Vol Building AGI – Telegram

Vol Building AGI

580 subscribers

116 photos

9 videos

12 files

199 links

Past topics: speech synthesis, transformers, LSTM, recurrence

Download Telegram

About

Blog

Apps

Platform

Vol Building AGI

580 subscribers

Vol Building AGI

Forwarded from In Tensor We Trust

Вийшла нова версія mlx — 0.0.7, лінк на реліз.

Для тих, хто не знає, що це таке — це як torch, але для процесорів Apple.

На фотографіях все, що було зроблено.

Підтримка формату моделей safetensor — великий лайк 👍🏻, бо, знаєте самі, безпека понад усе.

Практичне використання просте:

— ML інженерія на макбуках;
— Інференс моделей;
— Маємо змогу на макбуках офлайново файн-тюнити моделі у клієнта (100% on-device ML).

86 views09:48

Vol Building AGI

Awni Hannun (author of mlx) is also known as the first author of Deep Speech — the first MLP-LSTM large scale speech recognizer trained with CTC on many GPUs on 10k+ hours of data. The data was not that abundant so the team had to hire people to read books.

82 viewsedited 09:51

Vol Building AGI

My best memory of references to his work was my first year at KPI where I took a course from Prof Marchenko. He mentioned Algorithms + Data Structures = Programs book as recommended reading. I hated the course at the time — we were taught to describe algorithms in an arcane graphical syntax and then code it in Turbo Pascal. I rejected Pascal because I thought nobody wrote serious programs in it — I had never been exposed to high quality Pascal code at the time, all interesting code I’ve seen was written in C for various Unix operating systems. I dropped out of school entirely within a year.

67 views18:45

Vol Building AGI

I took Master’s level courses last year and experienced pressure to stay less and less grounded in the machine codes. On a piece of paper changing one symbol changes turns the entire computation upside down, while doing so with a computer requires rewriting all of your code. When working through an idea you first imagine it, lay it down in natural language and then slowly formalize.

Turns out this method is called step-wise refinement — a program development technique that Niklas Wirth, an ETH Professor, has published in 1971. That’s the paper I’ll be reading today.

http://pascal.hansotten.com/uploads/wirth/Program%20development%20by%20step-wise%20refinement%20jan%201971%20002.pdf

78 viewsedited 18:45

Vol Building AGI

For every GPU, a group of 32 threads is called a *warp*.
Threads in a warp have an efficient lock-free synchronous communication method called a *shuffle*.

On this screenshot a shfl_up_sync intrinsic is used to simultaneously send a value of the register file (`acc`) up to the 2**e-th neighbour five times, simulating a propagation down a binary tree.

The next figure (from Using CUDA Warp-Level Primitives by NVIDIA) illustrates the same concept using a down-shuffle on a mini-warp of 8 threads. A thread inside a warp is also called a "lane".

78 views13:52

Vol Building AGI

93 views13:52

Vol Building AGI

102 views13:52

Vol Building AGI

Optimizing parallel deep learning systems is a bit like navigating Tokyo by public transit

93 viewsedited 11:40

Vol Building AGI

RWKV scaled to 1T tokens seems to beat Mistral trained on 8 on some multilingual benchmarks

Zero shot translation to Ukrainian in Eagle is about the same as Mistral in 2-shot setting and fine tuned llama2 with 10k examples.

https://twitter.com/RWKV_AI/status/1751797147492888651

X (formerly Twitter)

RWKV (@RWKV_AI) on X

Introducing Eagle-7B

Based on the RWKV-v5 architecture, bringing into opensource space, the strongest
- multi-lingual model
(beating even mistral)
- attention-free transformer today
(10-100x+ lower inference)

With comparable English performance with…

🔥2

120 viewsedited 12:42

Vol Building AGI

No reason to use transformer decoders any more for LLMs :)

109 views12:45

Vol Building AGI

RNNs are faster to train, faster in inference and are more data efficient.

👍3

109 views12:46

Vol Building AGI

Arpa count tables? RNN weight matrices? Decision trees? Suffix arrays!

https://arxiv.org/abs/2401.17377

🔥1

115 views09:37

Vol Building AGI

wandb is in a good mood today:

❤2

115 viewsedited 22:22

Vol Building AGI

https://twitter.com/DlCountdown/status/1764278990011813975

NeurIPS conference submission deadline is in late May, workshops deadlines will probably be August

X (formerly Twitter)

AI Conference DL Countdown (@DlCountdown) on X

The NeurIPS deadline has been announced:
May 22nd, 8PM UTC

99 viewsedited 14:07

Vol Building AGI

https://x.com/mlstreettalk/status/1765701266221522986

This is what you learn as a side note in our Machine Learning course at USI. Glad Yann communicates this message to a large audience. Recurrent neural nets can do anything, but gradient descent won’t find everything.

X (formerly Twitter)

Machine Learning Street Talk (@MLStreetTalk) on X

In 2021 on MLST the legendary @ylecun argued that RNNs were Turing Complete. In 2024, he came to the dark side! What do you think? 👇

102 viewsedited 11:53

Vol Building AGI

Математика — це наука трансмісії простих ідей про регулярність світу між людьми. Це мова програмування, на якій ви стисло описуєте вашу думку, щоб завантажити її у свідомість ваших колег з абсолютною точністю.

Єгор зробив канал, де ми вчимось покращити навичку точної комунікації бібліотеки математичних ідей серед розробників штучного інтелекту.

Доєднуйтесь: https://t.me/applied_math_uk

Прикладна математика

Про прикладну математику українською

Групи:

— https://t.me/speech_recognition_uk
— https://t.me/speech_synthesis_uk
— https://t.me/computer_vision_uk
— https://t.me/ai_work_uk
— https://t.me/nlp_uk

Discord: https://t.me/discord_uds

❤2

461 views12:27

Vol Building AGI

Перший реліз Hippogriff: моєї імплементації архітектури Griffin, гібрид локального трансформера з sliding multi query attention (як mistral) та лінійної рекурентності (як mamba/rwkv)

В середині пакету ви також знайдете мій крафтовий трейнлуп з діагностиками активацій та стану вагів.

https://github.com/proger/hippogriff

GitHub - proger/hippogriff: Griffin MQA + Hawk Linear RNN Hybrid

Griffin MQA + Hawk Linear RNN Hybrid. Contribute to proger/hippogriff development by creating an account on GitHub.

👍3

494 viewsedited 14:03

Vol Building AGI

https://twitter.com/OfirPress/status/1767282605794136148

X (formerly Twitter)

Ofir Press (@OfirPress) on X

When a student sadly tells me that the idea we've been working on for weeks was just arXived, I say:

"Great! We've just gotten *strong* confirmation that our thinking was in the right direction. We've had the initial work done for us. Lets figure out how…

99 views07:49

Vol Building AGI

Media is too big

VIEW IN TELEGRAM

I love MATLAB/Octave. It's plotting experience is so smooth compared to matplotlib! Numpy/torch have their array APIs copied from MATLAB, so the amount of things you need to remember to move from Python is very small.

🤯1

111 views09:09

Vol Building AGI

To train transformers, you need a lot of diverse data. Let's use online RL to generate data!

Check out my new repo, control: Soft Actor Critic to produce experience trajectories

https://github.com/proger/control

🔥2

98 views12:19

Vol Building AGI

Bayesian Flow Networks (BFNs) link iterative denoising diffusion and recursive estimation of distribution parameters.

In my new post, I constrast autoregressive generative modeling (prevalent in language) and recursive Bayesian estimation of all parameters jointly.

https://proger.github.io/posts/bfn/normal.html

Bayesian Flow Networks

This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light...

103 views13:29