https://www.youtube.com/watch?v=FWMkuOyUZHI
Сегодня в 20:00 Увидимся. Оценим 100-ый выпуск Neuropunks 3 LP!
Сегодня в 20:00 Увидимся. Оценим 100-ый выпуск Neuropunks 3 LP!
❤🔥16 3
PIR тренировка окончена.
Запустил бенчмарки для сравнения с GPT-2 (OpenAI)
╔══════════════════════════════════════════════════════════════╗
║ PIR 270M PRETRAIN ║
║ ЗАВЕРШЁН! ║
╠══════════════════════════════════════════════════════════════╣
║ ✅ Step: 39,000 / 39,000 (100%) ║
║ 📉 Final Loss: 1.1040 ║
║ ⏱️ Total Time: 26.07 hours ║
║ 💾 Tokens Trained: 4.00B ║
║ 🏆 PPL: 3.0 ║
╚══════════════════════════════════════════════════════════════╝
Запустил бенчмарки для сравнения с GPT-2 (OpenAI)
🔥5❤3 1
● PIR 270M vs GPT-2 — Полное сравнение
Дата: 17 января 2026
Автор: Павел Попович Техножнец
---
Модели
┌───────────────────────┬────────────────────┬─────────────────┬─────────────────┐
│ │ PIR 270M │ GPT-2 124M │ GPT-2 345M │
├───────────────────────┼────────────────────┼─────────────────┼─────────────────┤
│ Параметры │ 272M │ 124M │ 345M │
├───────────────────────┼────────────────────┼─────────────────┼─────────────────┤
│ Архитектура │ Parallel Scan O(T) │ Attention O(T²) │ Attention O(T²) │
├───────────────────────┼────────────────────┼─────────────────┼─────────────────┤
│ Hardware для обучения │ 1x A100 │ 8x A100 │ 8x A100 │
├───────────────────────┼────────────────────┼─────────────────┼─────────────────┤
│ Время обучения │ ~50 часов │ ~4 дня │ ~4 дня │
└───────────────────────┴────────────────────┴─────────────────┴─────────────────┘
---
Качество (Language Modeling)
┌──────────────────┬──────────┬────────────┬────────────┐
│ Метрика │ PIR 270M │ GPT-2 124M │ GPT-2 345M │
├──────────────────┼──────────┼────────────┼────────────┤
│ WikiText-2 PPL ↓ │ 5.14 │ ~30 │ ~22 │
├──────────────────┼──────────┼────────────┼────────────┤
│ Val Loss ↓ │ 1.10 │ ~2.85 │ ~2.5 │
└──────────────────┴──────────┴────────────┴────────────┘
↓ = меньше лучше
---
Скорость inference (A100)
┌─────────────────┬──────────────┬───────────────┬───────────────┐
│ Метрика │ PIR 270M │ GPT-2 124M │ GPT-2 345M │
├─────────────────┼──────────────┼───────────────┼───────────────┤
│ Forward 512 tok │ 20,550 tok/s │ ~25,000 tok/s │ ~15,000 tok/s │
├─────────────────┼──────────────┼───────────────┼───────────────┤
│ Генерация │ 57 tok/s │ ~70 tok/s │ ~45 tok/s │
├─────────────────┼──────────────┼───────────────┼───────────────┤
│ ms/token │ 17.5 ms │ ~14 ms │ ~22 ms │
└─────────────────┴──────────────┴───────────────┴───────────────┘
PIR сравним по скорости, но с линейной сложностью O(T)
Разнообразие генераций
┌──────────────┬──────────┬────────────┬────────────┐
│ Метрика │ PIR 270M │ GPT-2 124M │ GPT-2 345M │
├──────────────┼──────────┼────────────┼────────────┤
│ Distinct-1 ↑ │ 0.39 │ ~0.35 │ ~0.38 │
├──────────────┼──────────┼────────────┼────────────┤
│ Distinct-2 ↑ │ 0.82 │ ~0.75 │ ~0.78 │
└──────────────┴──────────┴────────────┴────────────┘
↑ = больше лучше
Reasoning (base models, без SFT)
┌───────────┬──────────┬───────────────┬────────────┐
│ Тест │ PIR 270M │ GPT-2 124M │ GPT-2 345M │
├───────────┼──────────┼───────────────┼────────────┤
│ LAMBADA ↑ │ 1.6% │ ~1-2% │ ~3-5% │
├───────────┼──────────┼───────────────┼────────────┤
│ BoolQ ↑ │ 25% │ ~50% (random) │ ~50% │
└───────────┴──────────┴───────────────┴────────────┘
Base models не обучены отвечать на вопросы
Главный результат
┌──────────────────┬──────────┬──────────┐
│ │ PIR 270M │ GPT-2 │
├──────────────────┼──────────┼──────────┤
│ PPL лучше в │ 4-6 раз │ baseline │
├──────────────────┼──────────┼──────────┤
│ GPU для обучения │ 1 │ 8 │
├──────────────────┼──────────┼──────────┤
│ Сложность │ O(T) │ O(T²) │
└──────────────────┴──────────┴──────────┘
Вывод: PIR 270M показывает PPL в 4-6 раз лучше GPT-2 при обучении на 1 GPU вместо 8.
Архитектура без attention работает.
Победа. Следующий шаг это обучение диалогам - мы на правильном пути.
🔥7🤯1
Первые итоги Fine-tuning'а:
[Q] What is consciousness?
[Q] What is consciousness?
[A] The physical activity of a brain. The brain is responsible for processing and recording the brain's flow, which can be seen as a function of memory. It is often used to describe emotions such as emotions or feelings of pain or other stress...
👏8☃1👍1
Техножнец
Первые итоги Fine-tuning'а: [Q] What is consciousness? [A] The physical activity of a brain. The brain is responsible for processing and recording the brain's flow, which can be seen as a function of memory. It is often used to describe emotions such…
[Q] What is blockchain?
[A] A Bitcoin Blockchain(ТОЧКА)org is a digital currency that allows you to create and use bitcoin, while using bitcoins, the process can be used for transactions by people who are interested in buying or selling.
❤5☃1🆒1
[Q] What is machine learning?
[A] This paper is a computer science course that uses computers to read and write. It is the best of all tools, including machine learning,
machine learning algorithms, data science, software development and other technologies.
☃1
Вот это жесть...как спать теперь...оно что работает чтоли...я не могу понять...точнее не могу поверить
🔥8🎉4🍾3❤1
[Q] Explain democracy.
[A] The political process is a system of government, and it is not intended to be used in any way. It can lead to a more democratic society and an effective system of governance... every citizen must first vote for the people to vote, which will also have a right to control their own party.
● [Q] Who was Albert Einstein?
[A] The Einstein experiment is the answer to this question. It suggests that the theory of relativity and quantum mechanics is not a new concept, but rather a set of principles.
The first two principles are:
1. What is the nature of physics?
A fundamental principle in mathematical logic, or an intuitive understanding of reality, is a real problem because it does not exist within all circumstances. The idea of the universe
🤔1🎄1 1 1
[Q] Describe a beautiful sunset.
[A] The sun was shining on the sky, its bright sky as it spread through the darkness.
🦄2 2🎄1
[Q] How does the internet work?
[A] A. The internet has a lot of advantages, including convenience and privacy. There are many different kinds of websites that can help you
get started with your business. However, there are several popular online services like Amazon.com, Amazon, Amazon Instant Video, and Netflix.
😁11☃1👀1
[Q] What is a database?
[A] The database name contains information about the number of records collected and the names of all documents in the database. It's usually used to generate thousands of data, such as a database or an application, which is typically stored on the same machine. The system will have to be used for storing records from a database...
☃1
[Q] What is Python programming?
[A] Python is a programming language developed by Google, which allows developers to create and develop software that is able to use algorithms in order to understand and analyze data. It uses Java, JavaScript, and CSS, as well as more advanced applications for web development.
😁12🎄2
[Q] Explain the theory of relativity.
[A] The gravitational force of gravity is a process in which gravity forces the world to perceive it, while also maintaining its position as the mass of matter. This can occur by changing the velocity of light, and then taking into account the rate at which point and space are used...
😁3🎄2