Vol Building AGI
581 subscribers
116 photos
9 videos
12 files
199 links
Past topics: speech synthesis, transformers, LSTM, recurrence
Download Telegram
When making matplotlib figures, it's important to match the font to the rest of the paper: https://x.com/giffmana/status/1632506730897653761
👍1
💅7🔥3
This media is not supported in your browser
VIEW IN TELEGRAM
В мене була можливість поспілкуватися з дослідниками robotics + computer vision — їх фокус розробки завжди на обчислення в реальному часі. Мапи і моделі місцевості будуються в реальному часі. Може так є сенс думати і про тренування мультимодальних діалогових систем? Працювати з чатботами натренованими на випадкових текстах зібраних хто зна ким хто зна коли не настільки цікаво.
👍3
https://www.youtube.com/watch?v=1t7AWa4SMlo

I got excited and built a fast weight programmer loop into the script demo I shared before. It's using a read-forget-update loop to remember patterns that are coming from the microphone.

Check it out — it learns music and speech on the fly.
🔥1
How 32 GPU threads cooperate in storing a single 16x16 tile for maximal memory coalescing. This is a swizzled layout from ThunderKittens. The labels on the right image should be transposed for the row-major format.
The format above is required for using the mma.m16n8k16 instruction in PTX 8.5: https://docs.nvidia.com/cuda/parallel-thread-execution/#warp-level-matrix-fragment-mma-16816-float
Cool example of deep learning research going in circles: short conv1d has been reintroduced with QRNN, reintroduced to linear transformers in H3 and made mainstream in Mamba. This convolution makes the network learn faster in the beginning, improves recall and allows a single-layer linear RNN solve associative retrieval — I spent about a month figuring that property out.

This convolution block has made its way into Noam Shazeer's last transformer variant, and people are now writing papers making statements about its expressive power.

The conclusion authors make:

> An important parameterization to explore is replacing the short convolutions within CAT with SSMs

We've been there! Next thing you know you get Conv-SSM-Attention, and that is called Recurrent Gemma 😄

https://arxiv.org/abs/2407.05591
👍2
Hello from ICML. I am at the tutorial on Data Attribution at Scale. We are studying how to relate model outputs to training inputs. Here are the notes:

https://ml-data-tutorial.org/
👍4
The next tutorial by Zeyuan Allen-Zhu is on Physics of Language Models. We study how to apply scientific method to the study of language models with examples. The examples include curation of synthetic data, mechanistic probing, and more.

Tutorial website: https://physics.allen-zhu.com/

Announcement: https://x.com/zeyuanallenzhu/status/1813150298363601102
🔥2
Use a world model to interpolate between two learning algorithms
Юрій рухає локомотив
🥰1
https://x.com/hamelhusain/status/1824452022890119658?

Programming languages are not the problem, time to check your writing skills.
Спитав у chatgpt як виглядає маршрутка Богдан. Ідеально вгадав балкони та кондиціонери.
😁3