The format above is required for using the mma.m16n8k16 instruction in PTX 8.5: https://docs.nvidia.com/cuda/parallel-thread-execution/#warp-level-matrix-fragment-mma-16816-float
Turing complete programs in transformers by construction give length generalization out of the box
https://arxiv.org/abs/2407.03310
https://arxiv.org/abs/2407.03310
arXiv.org
Universal Length Generalization with Turing Programs
Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed...
Cool example of deep learning research going in circles: short conv1d has been reintroduced with QRNN, reintroduced to linear transformers in H3 and made mainstream in Mamba. This convolution makes the network learn faster in the beginning, improves recall and allows a single-layer linear RNN solve associative retrieval — I spent about a month figuring that property out.
This convolution block has made its way into Noam Shazeer's last transformer variant, and people are now writing papers making statements about its expressive power.
The conclusion authors make:
> An important parameterization to explore is replacing the short convolutions within CAT with SSMs
We've been there! Next thing you know you get Conv-SSM-Attention, and that is called Recurrent Gemma 😄
https://arxiv.org/abs/2407.05591
This convolution block has made its way into Noam Shazeer's last transformer variant, and people are now writing papers making statements about its expressive power.
The conclusion authors make:
> An important parameterization to explore is replacing the short convolutions within CAT with SSMs
We've been there! Next thing you know you get Conv-SSM-Attention, and that is called Recurrent Gemma 😄
https://arxiv.org/abs/2407.05591
arXiv.org
On the Power of Convolution Augmented Transformer
The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent architectural recipes, such as state-space models, have bridged the performance gap....
👍2
Compiler Explorer supports CUDA C++. See how nvcc emits SASS for thread barriers: https://godbolt.org/z/613zseoGW
godbolt.org
Compiler Explorer - CUDA C++ (NVCC 12.4.1)
#include <cooperative_groups.h>
#include <cuda/barrier>
using barrier = cuda::barrier<cuda::thread_scope_block>;
__global__ void square(int* array, int n) {
__shared__ barrier bar;
auto block = cooperative_groups::this_thread_block();
if (…
#include <cuda/barrier>
using barrier = cuda::barrier<cuda::thread_scope_block>;
__global__ void square(int* array, int n) {
__shared__ barrier bar;
auto block = cooperative_groups::this_thread_block();
if (…
Hello from ICML. I am at the tutorial on Data Attribution at Scale. We are studying how to relate model outputs to training inputs. Here are the notes:
https://ml-data-tutorial.org/
https://ml-data-tutorial.org/
ml-data-tutorial.org
Data Attribution at Scale | ICML 2024
Notes accompanying our ICML tutorial
👍4
The next tutorial by Zeyuan Allen-Zhu is on Physics of Language Models. We study how to apply scientific method to the study of language models with examples. The examples include curation of synthetic data, mechanistic probing, and more.
Tutorial website: https://physics.allen-zhu.com/
Announcement: https://x.com/zeyuanallenzhu/status/1813150298363601102
Tutorial website: https://physics.allen-zhu.com/
Announcement: https://x.com/zeyuanallenzhu/status/1813150298363601102
Allen-Zhu
Physics of Language Models
The concept of Physics of Language Models was jointly conceived and designed by ZA and Xiaoli Xu.
🔥2
Nando de Freitas giving love to OpenAI https://x.com/nandodf/status/1816530449830936805
X (formerly Twitter)
Nando de Freitas (@NandoDF) on X
@OpenAI has been the most inspiring and most impactful AI organisation in the history of humankind. I say this as someone who’s competed with them for nearly a decade. They made my life far more interesting than I could have dreamt of.
The people at @OpenAI…
The people at @OpenAI…
Vol Building AGI
The next tutorial by Zeyuan Allen-Zhu is on Physics of Language Models. We study how to apply scientific method to the study of language models with examples. The examples include curation of synthetic data, mechanistic probing, and more. Tutorial website:…
The tutorial video is now up https://youtu.be/yBL7J0kgldU?si=II4_C2fCaUQnkNyS
YouTube
ICML 2024 Tutorial: Physics of Language Models
Project page (with further readings): https://physics.allen-zhu.com/
Abstract: We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand…
Abstract: We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand…
🤯1
https://x.com/HumansNoContext/status/1821925436512665639
How to tune hyperparameters. Notice how smoothly the behavior changes when the speed is varied.
How to tune hyperparameters. Notice how smoothly the behavior changes when the speed is varied.
X (formerly Twitter)
NO CONTEXT HUMANS (@HumansNoContext) on X
Honestly this was unexpectedly fun to watch
😁1
https://x.com/hamelhusain/status/1824452022890119658?
Programming languages are not the problem, time to check your writing skills.
Programming languages are not the problem, time to check your writing skills.
Спитав у chatgpt як виглядає маршрутка Богдан. Ідеально вгадав балкони та кондиціонери.
😁3
Cheng Lu and Yang Song have solved diffusion https://arxiv.org/abs/2410.11081
arXiv.org
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models
Consistency models (CMs) are a powerful class of diffusion-based generative models optimized for fast sampling. Most existing CMs are trained using discretized timesteps, which introduce...
🤯2
https://youtu.be/ZANbujPTvOY
This graduate level problem benchmark was solved by o1 in less than a year since the benchmark was released — it was supposed unsolvable by language models for a while
This graduate level problem benchmark was solved by o1 in less than a year since the benchmark was released — it was supposed unsolvable by language models for a while
YouTube
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Authors: David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics…
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics…