baseline.png
532.6 KB
I ran a baseline for UNLP2025 shared task using gpt-4o-mini-2024-07-18 and gpt-4o-2024-08-06 with a basic prompt and structured outputs asking for reasoning and labels as binary flags. The macro f1 scores for technique detection are 0.32 and 0.34, mini model tends to trade precision for extra recall.
👍4
> Так а чого ти бертом тегатимеш слова в реченні, це ж просто енкодер. Енкодери то обмежена архітектура, нею AGI не зробиш
> Кажеш, BERT — то просто енкодер, але а як же тоді його топові результати у всяких бенчмарках? (this one is especially weird to me — it feels like it's addressing the point but it's actually diverting to another one!)
Leo has sent me a paper that talks about how neural networks learn fourier features to perform addition — a recurrent topic in NN literature. It's story time.
In a bar after my master's thesis defense, Professor Stefan Wolf wrote an equation on a napkin:
$ \pi/4 = \sum_n^\infty (-1)^n / (2n + 1) $
This was the shortest program to compute pi I've ever seen yet, so I was incredibly excited. If I spend more test time compute on it, I get a better approximation of pi. It's not the fastest program in terms of convergence speed, but definitely a one short enough that I can remember it. It was an approximation due to Leibniz.
Next morning I got interested in finding a linear RNN curcuit that encodes the Leibniz approximation. A linear RNN uses a cumulative sum operation at its heart, so given a stream of input ones it outputs a count of ones so far "for free" (by construction / inductive bias).
I can express the division as a nonlinearity, but how to express sign flipping (-1)^n as n goes up?
A natural way of encoding a flipping sign is using base 2 representation: if you encode n using binary then the least significant bit will be alternating.
In a neural network we can express each bit position using a separate dimension. Consider this linear feature map:
def binary(digits: int):
"Make a basis of powers of two of dimension
return 1 << np.arange(digits)
After mapping a sum into this high dimensional space you only need to read off the leading dimension, aka the highest frequency component.
What is a natural learnable approximation of representing numbers in base K? Well, it seems that it is somewhere in the basis of sinusoids of different frequencies.
In a bar after my master's thesis defense, Professor Stefan Wolf wrote an equation on a napkin:
$ \pi/4 = \sum_n^\infty (-1)^n / (2n + 1) $
This was the shortest program to compute pi I've ever seen yet, so I was incredibly excited. If I spend more test time compute on it, I get a better approximation of pi. It's not the fastest program in terms of convergence speed, but definitely a one short enough that I can remember it. It was an approximation due to Leibniz.
Next morning I got interested in finding a linear RNN curcuit that encodes the Leibniz approximation. A linear RNN uses a cumulative sum operation at its heart, so given a stream of input ones it outputs a count of ones so far "for free" (by construction / inductive bias).
I can express the division as a nonlinearity, but how to express sign flipping (-1)^n as n goes up?
A natural way of encoding a flipping sign is using base 2 representation: if you encode n using binary then the least significant bit will be alternating.
In a neural network we can express each bit position using a separate dimension. Consider this linear feature map:
def binary(digits: int):
"Make a basis of powers of two of dimension
digits, lowest bits first"return 1 << np.arange(digits)
After mapping a sum into this high dimensional space you only need to read off the leading dimension, aka the highest frequency component.
What is a natural learnable approximation of representing numbers in base K? Well, it seems that it is somewhere in the basis of sinusoids of different frequencies.
❤4
If you're curious about what the final construction looks like, check out this program: https://gist.github.com/proger/ba147e3953a155d833aae084c1f0cd12
Gist
pi.py
GitHub Gist: instantly share code, notes, and snippets.
🔥1
Behind the scenes of early versions of ChatGPT by John Schulman and Barret Zoph: https://x.com/johnschulman2/status/1891539960743743756?
X (formerly Twitter)
John Schulman (@johnschulman2) on X
@barret_zoph and I recently gave a talk at Stanford on post-training and our experience working together on ChatGPT. Unfortunately the talk wasn't recorded, but here are the slides: https://t.co/7fcGmvFtUF. (If you have a recording, please let me know!)
Forwarded from Linkstream
science says we will remain smart as long as we want
https://www.science.org/doi/full/10.1126/sciadv.ads1560?af=R
https://www.science.org/doi/full/10.1126/sciadv.ads1560?af=R
Science Advances
Age and cognitive skills: Use it or lose it
Cognitive skills do not decline with age for those who use math and reading throughout their life.
❤2
Forwarded from AI HOUSE
🎧 Запрошуємо переглянути новий епізод AI HOUSE Podcast
В гостях — Володимир Кирилов, Member of Technical Staff в OpenAI. Разом із нашим ведучим Романом Кислим, вони заглибились у важливі теми розвитку ШІ, deep learning та шлях до роботи в одній із найвідоміших ШІ-лабораторій світу.
А саме:
— як українці винайшли Deep Learning;
— чому саме у Хінтона все вийшло;
— як працюють лабораторії машинного навчання за кордоном;
— як Володимир потрапив в OpenAI;
— про самоосвіту, навчання в УКУ та на ФІОТ, магістратуру з ШІ й пейпери.
Випуск вже можна подивитися на ютуб-каналі або послухати на зручних для вас подкаст-платформах.
Ставте лайки та залишайте коментарі, <ми завжди їх читаємо>.
Приємного перегляду 👀
🏠 LinkedIn 🏠 Instagram 🏠 Podcast
В гостях — Володимир Кирилов, Member of Technical Staff в OpenAI. Разом із нашим ведучим Романом Кислим, вони заглибились у важливі теми розвитку ШІ, deep learning та шлях до роботи в одній із найвідоміших ШІ-лабораторій світу.
А саме:
— як українці винайшли Deep Learning;
— чому саме у Хінтона все вийшло;
— як працюють лабораторії машинного навчання за кордоном;
— як Володимир потрапив в OpenAI;
— про самоосвіту, навчання в УКУ та на ФІОТ, магістратуру з ШІ й пейпери.
Випуск вже можна подивитися на ютуб-каналі або послухати на зручних для вас подкаст-платформах.
Ставте лайки та залишайте коментарі, <ми завжди їх читаємо>.
Приємного перегляду 👀
Please open Telegram to view this post
VIEW IN TELEGRAM
❤10
Археологічний артефакт — Джеремі Говард розповідає, що його загальна мовна лстмка ULMFiT (Universal Language Model Fine-tuning for Text Classification) була мотивацією будувати GPT-1. Під час адаптацііі на фінальну задачу замість лори тренували всю мережу з різними льорнінг рейтами на різні шари.
https://x.com/jeremyphoward/status/1906478657100755011
https://x.com/jeremyphoward/status/1906478657100755011
❤2
Що додати в ллмку? https://t.co/XKB4XxjREV
Openai
Open model feedback
We’re planning to release our first open language model since GPT‑2 in the coming months. We’re excited to collaborate with developers, researchers, and the broader community to gather inputs and make this model as useful as possible.
Some very good MLP kernels https://github.com/triton-lang/triton/pull/6429
Яка архітектура краще? Та, що змінює експоненту ступеневого закону масштабування: https://x.com/_katieeverett/status/1926722325073801612
X (formerly Twitter)
Katie Everett (@_katieeverett) on X
There were so many great replies to this thread, let's do a Part 2!
For scaling laws between loss and compute, where loss = a * flops ^ b + c, which factors change primarily the constant (a) and which factors can actually change the exponent (b)?
https…
For scaling laws between loss and compute, where loss = a * flops ^ b + c, which factors change primarily the constant (a) and which factors can actually change the exponent (b)?
https…
👍5
Куда кернели замінюються куда віртуальними машинами https://x.com/bfspector/status/1927435524416958871
X (formerly Twitter)
Benjamin F Spector (@bfspector) on X
(1/5) We’ve never enjoyed watching people chop Llamas into tiny pieces.
So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel.
Megakernels are faster & more humane. Here’s how to treat your Llamas…
So, we’re excited to be releasing our Low-Latency-Llama Megakernel! We run the whole forward pass in single kernel.
Megakernels are faster & more humane. Here’s how to treat your Llamas…
🤯5