In the upcoming version of Tact, we've introduced over 100 updates, including new features, dozens of bug fixes, and more than 20 optimizations that improve gas consumption. I want to highlight these optimizations because, until now, the most common reason for not using Tact has been its inefficiency.
For this release, we benchmarked our optimizations using the most common smart contract on TON—the Jetton. We compared our standard implementation against the reference FunC version, which is currently the most widely used. I'm happy to say that we’ve actually outperformed it!
On average, a Jetton compiled with this upcoming Tact version consumes just 95% of the gas used by the reference FunC implementation. This isn’t a cherry-picked result—these optimizations apply universally to all smart contracts, impacting them in almost the same way. Essentially, Tact is now on par with FunC in terms of gas efficiency—or even surpasses it in many cases—all while offering a much higher level of abstraction, enabling developers to build scalable, maintainable, and extensible projects with ease.
And this is just the beginning. We have a lot more to share in the coming months.
For this release, we benchmarked our optimizations using the most common smart contract on TON—the Jetton. We compared our standard implementation against the reference FunC version, which is currently the most widely used. I'm happy to say that we’ve actually outperformed it!
On average, a Jetton compiled with this upcoming Tact version consumes just 95% of the gas used by the reference FunC implementation. This isn’t a cherry-picked result—these optimizations apply universally to all smart contracts, impacting them in almost the same way. Essentially, Tact is now on par with FunC in terms of gas efficiency—or even surpasses it in many cases—all while offering a much higher level of abstraction, enabling developers to build scalable, maintainable, and extensible projects with ease.
And this is just the beginning. We have a lot more to share in the coming months.
53🔥24👍13❤8🤯1🐳1🍓1💋1
In addition to my previous post, I want to share my thoughts on why Tact will become the ultimate language for TON, surpassing Tolk, its primary competitor. Below, I outline three key points in separate posts.
❤7👍5⚡4🔥3🥰2💋1
1. We have an excellent and consistently expanding team.
A team is always more effective than an individual, especially when dealing with complex and broad projects such as developing a programming language. Tact is not limited to just the compiler itself; it encompasses multiple tools, each of which must be regularly maintained to keep up with the latest language features and user feedback.
Managing all these components is significantly easier and more efficient with a larger team. Additionally, comprehensive code reviews, which are crucial in compiler development, become more manageable with more team members involved. Since we're developing a language intended for smart contracts, even a minor error can result in substantial financial losses. Therefore, we cannot simply introduce new features or modify existing functionalities without thorough validation and extensive testing. Multiple engineers carefully reviewing each change greatly reduce the risk of mistakes.
We have continuously hired new engineers to address weaknesses and expand our team's expertise and capabilities. Our team consists of skilled engineers with extensive knowledge in various fields, combining their expertise to strengthen our overall capability. This growth strategy will be instrumental in Tact's success. As evidenced by recent statistics, our productivity has significantly increased over the past few months. This month alone, we averaged merging five pull requests per day. This figure only includes the main Tact monorepo—comprising the compiler and documentation—and does not account for numerous additional tools.
A team is always more effective than an individual, especially when dealing with complex and broad projects such as developing a programming language. Tact is not limited to just the compiler itself; it encompasses multiple tools, each of which must be regularly maintained to keep up with the latest language features and user feedback.
Managing all these components is significantly easier and more efficient with a larger team. Additionally, comprehensive code reviews, which are crucial in compiler development, become more manageable with more team members involved. Since we're developing a language intended for smart contracts, even a minor error can result in substantial financial losses. Therefore, we cannot simply introduce new features or modify existing functionalities without thorough validation and extensive testing. Multiple engineers carefully reviewing each change greatly reduce the risk of mistakes.
We have continuously hired new engineers to address weaknesses and expand our team's expertise and capabilities. Our team consists of skilled engineers with extensive knowledge in various fields, combining their expertise to strengthen our overall capability. This growth strategy will be instrumental in Tact's success. As evidenced by recent statistics, our productivity has significantly increased over the past few months. This month alone, we averaged merging five pull requests per day. This figure only includes the main Tact monorepo—comprising the compiler and documentation—and does not account for numerous additional tools.
❤7👍5🔥5🍓4⚡2💋2💘2
2. We love the community, and the community loves us
The community truly appreciates Tact—especially newcomers. Although we've faced some resistance from long-time developers, attitudes are positively shifting as we continue to improve and enhance our language.
All past developer feedback polls and objective analyses of smart contracts deployed on the mainnet clearly illustrate Tact's steadily increasing adoption rate. At the beginning of 2024, only 8.7% of unique smart contracts on the mainnet were written in Tact. By the end of the year, this figure grew to 32.9%. Essentially, every third unique smart contract deployed on the mainnet today is written using Tact.
This upward trend shows no signs of slowing down, particularly after our recent major release—the largest in Tact's history. It introduced numerous new features and improvements, including significant gas optimizations, making Tact even more appealing to developers for their projects.
An essential part of this growth is our genuine love for the community. We welcome all kinds of feedback—whether through comments, chat messages, or GitHub issues. We actively implement features requested by our users and prioritize resolving the issues they report.
Last but certainly not least, our documentation is outstanding. When covering blockchain-specific topics, our documentation is frequently more comprehensive and accurate than even the official TON documentation, which has yet to catch up with ours. Regarding Tact itself, every feature is thoroughly documented, complete with numerous examples, specifications, details, and important warnings. And we're constantly expanding and refining it!
The community truly appreciates Tact—especially newcomers. Although we've faced some resistance from long-time developers, attitudes are positively shifting as we continue to improve and enhance our language.
All past developer feedback polls and objective analyses of smart contracts deployed on the mainnet clearly illustrate Tact's steadily increasing adoption rate. At the beginning of 2024, only 8.7% of unique smart contracts on the mainnet were written in Tact. By the end of the year, this figure grew to 32.9%. Essentially, every third unique smart contract deployed on the mainnet today is written using Tact.
This upward trend shows no signs of slowing down, particularly after our recent major release—the largest in Tact's history. It introduced numerous new features and improvements, including significant gas optimizations, making Tact even more appealing to developers for their projects.
An essential part of this growth is our genuine love for the community. We welcome all kinds of feedback—whether through comments, chat messages, or GitHub issues. We actively implement features requested by our users and prioritize resolving the issues they report.
Last but certainly not least, our documentation is outstanding. When covering blockchain-specific topics, our documentation is frequently more comprehensive and accurate than even the official TON documentation, which has yet to catch up with ours. Regarding Tact itself, every feature is thoroughly documented, complete with numerous examples, specifications, details, and important warnings. And we're constantly expanding and refining it!
❤🔥11👍6⚡5🔥4❤3😘3💋2
3. Technological Advantages and Clear Path Forward
Currently, Tact compiles to FunC (the language from which Tolk was forked), rather than directly into assembly. Despite this intermediate compilation step, we've already made significant strides in optimizing gas efficiency. Specifically, for common use cases and typical developer-written contracts (without extreme, manual, low-level optimizations), contracts written in Tact now consume slightly less gas compared to the same logic written directly in FunC.
We achieved this through sophisticated, built-in low-level optimizations embedded into our compiler and standard library. Essentially, Tact automatically applies many optimizations that would otherwise require specialized, manual effort—allowing developers to focus purely on the logic, readability, and architecture of their smart contracts, rather than the complexities of gas optimization.
Our future plans are even more ambitious: once we eliminate the dependency on FunC and transition to direct compilation into assembly, we'll unlock even deeper and more powerful optimization possibilities. This will further widen the performance and efficiency gap between Tact and alternatives like Tolk.
When it comes to features, Tact is already well ahead of Tolk. From day one, Tact was purposefully designed to provide developers with a smooth and intuitive experience, rich tooling, and the ability to effortlessly create maintainable, secure, and scalable smart contracts. Our upcoming release—Tact 2.0—scheduled for later this year, will further enhance this foundation, introducing even more innovative features, optimizations, and architectural improvements.
While Tolk is also moving towards better usability and feature enhancements, Tact's inherent design principles and dedicated roadmap position it uniquely to remain the leading choice for TON smart contract development.
Currently, Tact compiles to FunC (the language from which Tolk was forked), rather than directly into assembly. Despite this intermediate compilation step, we've already made significant strides in optimizing gas efficiency. Specifically, for common use cases and typical developer-written contracts (without extreme, manual, low-level optimizations), contracts written in Tact now consume slightly less gas compared to the same logic written directly in FunC.
We achieved this through sophisticated, built-in low-level optimizations embedded into our compiler and standard library. Essentially, Tact automatically applies many optimizations that would otherwise require specialized, manual effort—allowing developers to focus purely on the logic, readability, and architecture of their smart contracts, rather than the complexities of gas optimization.
Our future plans are even more ambitious: once we eliminate the dependency on FunC and transition to direct compilation into assembly, we'll unlock even deeper and more powerful optimization possibilities. This will further widen the performance and efficiency gap between Tact and alternatives like Tolk.
When it comes to features, Tact is already well ahead of Tolk. From day one, Tact was purposefully designed to provide developers with a smooth and intuitive experience, rich tooling, and the ability to effortlessly create maintainable, secure, and scalable smart contracts. Our upcoming release—Tact 2.0—scheduled for later this year, will further enhance this foundation, introducing even more innovative features, optimizations, and architectural improvements.
While Tolk is also moving towards better usability and feature enhancements, Tact's inherent design principles and dedicated roadmap position it uniquely to remain the leading choice for TON smart contract development.
👍9❤6⚡4🔥4🥰1🍓1🍾1💋1
Here are charts illustrating the statistics mentioned above with specific numbers.
44👍11❤7🔥6😘2⚡1🍓1💘1
I benchmarked 52 models across 12 different prompts, with 500 generations per combination, resulting in many interesting charts.
https://gusarich.com/blog/measuring-llm-entropy/
Please open Telegram to view this post
VIEW IN TELEGRAM
Daniil Sedov
Measuring and Analyzing Entropy in Large Language Models
A detailed benchmarking study exploring entropy and randomness across 52 large language models using diverse prompting strategies, revealing notable biases and significant variability influenced by model architectures and prompt engineering.
❤14👍9🔥4🥰2💋1
I ran a relatively simple black-box fuzzing experiment with a fresh approach on the Tact compiler, using only documentation as input. Found 10 real issues for just $80.
https://gusarich.com/blog/fuzzing-with-llms/
Please open Telegram to view this post
VIEW IN TELEGRAM
Daniil Sedov
Documentation-Driven Compiler Fuzzing with Large Language Models
A fresh and simple black-box approach to fuzzing compilers using large language models to generate test cases from documentation and specification.
1🔥12❤5👍5⚡2
Multitasking in 2025
People tend to multitask more and more as technology and society evolve. And this behavior only becomes stronger as AI integrates into our daily lives. We now consume multiple sources of information and do multiple things at the same time. But for some, that can be very hard — our brains work differently.
The key shift is that now you can actually delegate part of your cognitive load to AI, instantly and effectively, freeing up mental space for more things. If you use AI for two tasks, you can easily handle both at once. Send a message in one window, and while waiting for the result, switch to something else, send a message there too, then switch back and see the first result. Repeat. You’re basically doubling your speed. What were you doing before while waiting for a reply anyway? Scrolling social media? What if you did something else instead?
I've never been good at multitasking. Even with simple things — like talking while doing something physical — I often just stop thinking about one task until I finish the other. I could be putting milk in the fridge while talking to someone and just... stop, fridge wide open, until I finish the sentence, and only then finally put the milk in.
But even with that, I’ve still managed to multitask effectively over the past few weeks thanks to AI. Most of the time when I work now, I handle two things at once — whether it’s job stuff, studies, writing, or some boring online things like booking hotels and planning trips. I often have multiple ChatGPT windows open at the same time, doing different things. And I like it — I like how I can literally get so much more done in the same amount of time.
Of course, not all my work is multitasked. Sometimes I enter long stretches of deep focus on just one task — and even then, AI still helps a lot. It boosts your efficiency even when you’re doing only one thing at a time.
People who are already good at multitasking — and constantly generating ideas in their head — will experience this the most. What if, instead of just writing a fresh idea into your notes, you could instantly open a new tab and start implementing it, without even losing focus on other things? It’s incredible. And it’ll only get better as AI systems evolve. Remember: we’re still early.
Many jobs will eventually transform into manager-like work — but instead of managing people, you'll be managing multiple AI agents at once. Even with today’s AI, you can do so much more, in both quality and quantity. One of the best skills to develop right now is the ability to think and read fast — it directly boosts your efficiency. You don’t need to master specific hard skills. Instead, learn how to learn. Learn how to adapt.
People tend to multitask more and more as technology and society evolve. And this behavior only becomes stronger as AI integrates into our daily lives. We now consume multiple sources of information and do multiple things at the same time. But for some, that can be very hard — our brains work differently.
The key shift is that now you can actually delegate part of your cognitive load to AI, instantly and effectively, freeing up mental space for more things. If you use AI for two tasks, you can easily handle both at once. Send a message in one window, and while waiting for the result, switch to something else, send a message there too, then switch back and see the first result. Repeat. You’re basically doubling your speed. What were you doing before while waiting for a reply anyway? Scrolling social media? What if you did something else instead?
I've never been good at multitasking. Even with simple things — like talking while doing something physical — I often just stop thinking about one task until I finish the other. I could be putting milk in the fridge while talking to someone and just... stop, fridge wide open, until I finish the sentence, and only then finally put the milk in.
But even with that, I’ve still managed to multitask effectively over the past few weeks thanks to AI. Most of the time when I work now, I handle two things at once — whether it’s job stuff, studies, writing, or some boring online things like booking hotels and planning trips. I often have multiple ChatGPT windows open at the same time, doing different things. And I like it — I like how I can literally get so much more done in the same amount of time.
Of course, not all my work is multitasked. Sometimes I enter long stretches of deep focus on just one task — and even then, AI still helps a lot. It boosts your efficiency even when you’re doing only one thing at a time.
People who are already good at multitasking — and constantly generating ideas in their head — will experience this the most. What if, instead of just writing a fresh idea into your notes, you could instantly open a new tab and start implementing it, without even losing focus on other things? It’s incredible. And it’ll only get better as AI systems evolve. Remember: we’re still early.
Many jobs will eventually transform into manager-like work — but instead of managing people, you'll be managing multiple AI agents at once. Even with today’s AI, you can do so much more, in both quality and quantity. One of the best skills to develop right now is the ability to think and read fast — it directly boosts your efficiency. You don’t need to master specific hard skills. Instead, learn how to learn. Learn how to adapt.
1👍29❤13🔥6🤷♂3💋2⚡1🍓1
It has been a great experience. Initially, I was mostly involved with language features and the compiler itself. During our team's early months, this was necessary due to our smaller size. However, as our team expanded rapidly with new talented engineers, I was recently able to shift my focus to tasks that are now more interesting to me.
Currently, I am focused on LLM-powered fuzzing for ensuring security and documentation quality. We have achieved incredible results with this approach, and a new blog post will soon be published, sharing insights into the efficiency of different models and the fuzzing methodology overall.
I also plan to leverage my expertise in DeFi and smart contracts, gained over several years of successfully implementing and auditing large-scale solutions, to support the team with our DeFi libraries and best-practice implementations of standard smart contracts.
Since I now have much greater freedom in my activities and our team has significantly more ongoing projects, I feel better than ever about working full-time and striving to make TON the best blockchain for developers and Tact the best language for building on it.
Please open Telegram to view this post
VIEW IN TELEGRAM
196👍40🔥18❤14❤🔥3🤡2🍓2🍾2💋1
I've spent months running LLM-powered fuzzing at production scale—processing billions of tokens, discovering practical scaling laws, and developing effective deduplication strategies. Here’s what I learned along the way:
https://gusarich.com/blog/billions-of-tokens-later/
Please open Telegram to view this post
VIEW IN TELEGRAM
Daniil Sedov
Billions of Tokens Later: Scaling LLM Fuzzing in Practice
Lessons learned from scaling documentation-driven black-box fuzzing pipelines to billions of tokens, practical deduplication strategies, discovered scaling laws, and initial explorations into white-box fuzzing for future expansion.
51👍13🍓5❤4✍1🔥1🥰1💋1
The Complexity Threshold of AI
We see dozens of new LLMs heavily tuned for software engineering tasks, and they're becoming very good at it, very quickly. As models evolved, I started using them more and more for writing code, eventually reaching a point where I almost completely stopped writing code myself. The last time I wrote code manually (or rather, with AI-assisted tab completions) was around four months ago. However, once tasks become larger and more complex, these models quickly become inefficient. They seem to have a certain complexity threshold, beyond which their efficiency rapidly declines.
I was mostly using AI either to quickly take projects "from 0 to 1" by iterating on MVPs, or to build small Python scripts for working with LLMs and data. About a week ago, I needed to rapidly build another MVP while iterating on ideas, so I used Claude Code and completed the whole thing within a single day. I wanted to keep developing it, but the code became so messy that changing or adding anything was nearly impossible. Even minor adjustments, like updating the UI, caused other parts to break. At that point, I decided I was done with this MVP and needed to re-implement everything from scratch with better structure and architecture.
When I started the second implementation attempt, with my "plan" ready, I gave it to Claude Code and watched it fail for hours. It was producing code, but it didn't match my vision and wasn't working as expected. Many architectural and code-level issues remained. I tried slightly adjusting the plan and re-implementing everything multiple times over the span of three days, but it didn't help. After three unsuccessful attempts, I almost lost hope.
But then I decided to spend more time refining the specification itself before starting the implementation. I spent two days writing and iteratively refining the specification through feedback loops with the smartest models, while also giving my own feedback on each part. It covered almost everything needed for implementation, from high-level architecture to logic for specific use cases and solutions for the hardest implementation parts that I struggled with the most. Suddenly, when I gave this new specification to the agent and started slowly implementing things one by one, it just worked.
I told the agent to implement the next thing, waited 10 minutes, tested the implementation, asked it to fix any issues, and then moved to the next step. A few times during those two days, I also asked another agent to carefully read the entire codebase and strictly compare it against the specification, then passed its feedback back to the first agent to resolve any differences. After about two days, the project was mostly finished. It still has some rough edges (which are easy to address), and I haven't thoroughly *tested* everything yet (I even decided not to write any automated tests at all at this stage), but all the core functionality just worked. When I asked Claude to change something, it usually did so accurately, without breaking other parts.
The thing is that AI is much better at following instructions than at coming up with practical stuff on its own. Smarter models can handle more complex tasks independently, but in larger projects, their limit of "acceptable complexity per step" is lower. Therefore, when working on a bigger and more complex project, it's important to keep all individual steps at the same level of complexity and depth as you would when working on smaller projects. The complexity of each individual step matters more than the complexity of the whole project.
We see dozens of new LLMs heavily tuned for software engineering tasks, and they're becoming very good at it, very quickly. As models evolved, I started using them more and more for writing code, eventually reaching a point where I almost completely stopped writing code myself. The last time I wrote code manually (or rather, with AI-assisted tab completions) was around four months ago. However, once tasks become larger and more complex, these models quickly become inefficient. They seem to have a certain complexity threshold, beyond which their efficiency rapidly declines.
I was mostly using AI either to quickly take projects "from 0 to 1" by iterating on MVPs, or to build small Python scripts for working with LLMs and data. About a week ago, I needed to rapidly build another MVP while iterating on ideas, so I used Claude Code and completed the whole thing within a single day. I wanted to keep developing it, but the code became so messy that changing or adding anything was nearly impossible. Even minor adjustments, like updating the UI, caused other parts to break. At that point, I decided I was done with this MVP and needed to re-implement everything from scratch with better structure and architecture.
When I started the second implementation attempt, with my "plan" ready, I gave it to Claude Code and watched it fail for hours. It was producing code, but it didn't match my vision and wasn't working as expected. Many architectural and code-level issues remained. I tried slightly adjusting the plan and re-implementing everything multiple times over the span of three days, but it didn't help. After three unsuccessful attempts, I almost lost hope.
But then I decided to spend more time refining the specification itself before starting the implementation. I spent two days writing and iteratively refining the specification through feedback loops with the smartest models, while also giving my own feedback on each part. It covered almost everything needed for implementation, from high-level architecture to logic for specific use cases and solutions for the hardest implementation parts that I struggled with the most. Suddenly, when I gave this new specification to the agent and started slowly implementing things one by one, it just worked.
I told the agent to implement the next thing, waited 10 minutes, tested the implementation, asked it to fix any issues, and then moved to the next step. A few times during those two days, I also asked another agent to carefully read the entire codebase and strictly compare it against the specification, then passed its feedback back to the first agent to resolve any differences. After about two days, the project was mostly finished. It still has some rough edges (which are easy to address), and I haven't thoroughly *tested* everything yet (I even decided not to write any automated tests at all at this stage), but all the core functionality just worked. When I asked Claude to change something, it usually did so accurately, without breaking other parts.
The thing is that AI is much better at following instructions than at coming up with practical stuff on its own. Smarter models can handle more complex tasks independently, but in larger projects, their limit of "acceptable complexity per step" is lower. Therefore, when working on a bigger and more complex project, it's important to keep all individual steps at the same level of complexity and depth as you would when working on smaller projects. The complexity of each individual step matters more than the complexity of the whole project.
3❤8💯7🔥5👍3👎3⚡1🤔1
My impression of GPT-5
This was an extremely anticipated release. Literally the whole AI bubble waited for it and watched closely. It's been 2 years since GPT-4, and people expected something extraordinary. Me too.
I raised my expectations for GPT-5 in the past few months - hoping that it would basically be "o4" but under a new name. And I expected a capability jump similar to the jump from o1 to o3.
I was also watching the whole rollout extremely closely and had tried out GPT-5 before the official release for a few days. First, when it was being tested on LMArena under the codenames "Zenith" and "Summit", and another time when it was available on Perplexity due to a bug.
I didn't try it out heavily on real tasks in those days, but I still sent many prompts for testing purposes. And I had a "taste" of it at that time. It felt similar to o3 in vibe, but smarter, more precise, and just generally better. My expectations rose again after trying it out there. I was almost sure it would be an "o4" moment.
But then the day of the release came, and I was watching the livestream. It was boring. I turned it off halfway through. And I didn't even try GPT-5 that day and went to sleep. I was already disappointed with the boring presentation.
The next day I was scrolling X and looking at feedback from other people, and it was mostly bad. People said it was either on par with o3, or a step down.
Then I finally tried it out. And it actually felt better than o3. And GPT-5 pro felt better than o3 pro. I still can't exactly say *how much better* they are, but it's definitely noticeable and significant in many scenarios.
It's hard to notice a difference in simple casual chats, but once you give it something complex or put it in an agentic environment - you'll see how it just does a better job than o3 could, and in many cases - much better than any other model could.
It also translates to agentic coding. For a whole month beforehand I was extensively using Claude 4 Opus for coding in Claude Code full-time, and it was great. I liked that model and its taste. It was nice coding with it. But honestly, it was pissing me off very often.
And so I tried downloading the Codex CLI with GPT-5 inside. The UX of the CLI itself is poor compared to Claude Code at the moment. It is not that developer-friendly. But after trying to code with GPT-5 the same way I did with Opus, I started to notice how GPT-5 was just better.
It's not always about the quality of the code, and definitely not about the speed. My first impression is that it not only writes better code overall, but that it's much better at instruction following and tool calling. Those are the things that people liked the most about Claude models. And I liked that too. And many people thought that no model would match Claude in these metrics anytime soon.
The thing is that GPT-5 just follows your instructions extremely precisely. And it doesn't do things you don't ask it to do. Claude was pissing me off so much by starting to go off track from instructions in long coding sessions, or even in simple queries when it just did something I did not ask it to do. GPT-5 is just better in this regard. Sometimes it follows instructions so well that you understand that your instructions were bad.
And it works so well with long context. I can mention something once early in a coding session, and then I just see how it still remembers, references, and follows that for so long. Opus was missing those things very often, especially in long sessions.
It might sound like too much ass-licking for OpenAI, but that's my honest experience with GPT-5. I was sceptical too, especially after seeing that boring livestream and seeing so much hate on X. But after trying all of it out myself, I was really amazed. Is it "o4" level? I'm not sure. More like o3.5.
This was an extremely anticipated release. Literally the whole AI bubble waited for it and watched closely. It's been 2 years since GPT-4, and people expected something extraordinary. Me too.
I raised my expectations for GPT-5 in the past few months - hoping that it would basically be "o4" but under a new name. And I expected a capability jump similar to the jump from o1 to o3.
I was also watching the whole rollout extremely closely and had tried out GPT-5 before the official release for a few days. First, when it was being tested on LMArena under the codenames "Zenith" and "Summit", and another time when it was available on Perplexity due to a bug.
I didn't try it out heavily on real tasks in those days, but I still sent many prompts for testing purposes. And I had a "taste" of it at that time. It felt similar to o3 in vibe, but smarter, more precise, and just generally better. My expectations rose again after trying it out there. I was almost sure it would be an "o4" moment.
But then the day of the release came, and I was watching the livestream. It was boring. I turned it off halfway through. And I didn't even try GPT-5 that day and went to sleep. I was already disappointed with the boring presentation.
The next day I was scrolling X and looking at feedback from other people, and it was mostly bad. People said it was either on par with o3, or a step down.
Then I finally tried it out. And it actually felt better than o3. And GPT-5 pro felt better than o3 pro. I still can't exactly say *how much better* they are, but it's definitely noticeable and significant in many scenarios.
It's hard to notice a difference in simple casual chats, but once you give it something complex or put it in an agentic environment - you'll see how it just does a better job than o3 could, and in many cases - much better than any other model could.
It also translates to agentic coding. For a whole month beforehand I was extensively using Claude 4 Opus for coding in Claude Code full-time, and it was great. I liked that model and its taste. It was nice coding with it. But honestly, it was pissing me off very often.
And so I tried downloading the Codex CLI with GPT-5 inside. The UX of the CLI itself is poor compared to Claude Code at the moment. It is not that developer-friendly. But after trying to code with GPT-5 the same way I did with Opus, I started to notice how GPT-5 was just better.
It's not always about the quality of the code, and definitely not about the speed. My first impression is that it not only writes better code overall, but that it's much better at instruction following and tool calling. Those are the things that people liked the most about Claude models. And I liked that too. And many people thought that no model would match Claude in these metrics anytime soon.
The thing is that GPT-5 just follows your instructions extremely precisely. And it doesn't do things you don't ask it to do. Claude was pissing me off so much by starting to go off track from instructions in long coding sessions, or even in simple queries when it just did something I did not ask it to do. GPT-5 is just better in this regard. Sometimes it follows instructions so well that you understand that your instructions were bad.
And it works so well with long context. I can mention something once early in a coding session, and then I just see how it still remembers, references, and follows that for so long. Opus was missing those things very often, especially in long sessions.
It might sound like too much ass-licking for OpenAI, but that's my honest experience with GPT-5. I was sceptical too, especially after seeing that boring livestream and seeing so much hate on X. But after trying all of it out myself, I was really amazed. Is it "o4" level? I'm not sure. More like o3.5.
1🔥8⚡1🕊1
Why did many people have a bad first impression of GPT-5?
Actually, the reason behind that is absurdly stupid. OpenAI fucked up with UX. That's it. The model is actually good; all variants of it are. But OpenAI rushed the release for some reason, and their goal of making the UX better made it worse for a lot of users.
The key detail here was the model router that they added to ChatGPT so that users don't have to manually choose a model, and it can just choose the appropriate one on its own. For example, if you ask it how to pronounce a word, that can easily be answered with a non-thinking model, with lower latency and the same accuracy. But if you give it a math problem, ask something about coding, or just generally give it a task that requires more reasoning - it is better processed by a thinking variant of the model.
And the idea is good, especially for the average user who doesn't know much about how these models work and doesn't want to think about which model to choose for every query. But the implementation was very bad in the first couple of days, and OpenAI confirmed that themselves. The router was working poorly, not choosing a thinking model for complex queries when needed, and not only that, the information about which model answered the query was also hidden, so, for example, when your request (as a free/plus user) was routed to "GPT-5 mini", you couldn't know that. There's not even a "mini" model in the model picker.
And another factor is the "reasoning effort" parameter that OpenAI's models have. It determines "how hard" the model thinks before answering. If you need quicker answers, choose "low"; if you need more reasoning for more complex tasks, use "medium" or "hard". And the thing is that in ChatGPT you can't choose that setting yourself. It's part of the router, too. And the information about this setting is also hidden from users.
It turned out that most of the requests from free/plus users were processed either by a non-thinking model, or by a thinking model but with the "low" reasoning effort setting. And sometimes also by "mini" models when limits for the regular ones were exhausted. And the performance is, expectedly, bad under those circumstances.
So, even when paying users tried out GPT-5, they were often getting bad results. And that was their impression of GPT-5. And that's why there was so much hate for it online.
But OpenAI is fixing those problems, and some are already fixed. So, if you tried out GPT-5 in the first couple of days and didn't like it, consider trying it out again now or in a few days, as it might be much better.
Actually, the reason behind that is absurdly stupid. OpenAI fucked up with UX. That's it. The model is actually good; all variants of it are. But OpenAI rushed the release for some reason, and their goal of making the UX better made it worse for a lot of users.
The key detail here was the model router that they added to ChatGPT so that users don't have to manually choose a model, and it can just choose the appropriate one on its own. For example, if you ask it how to pronounce a word, that can easily be answered with a non-thinking model, with lower latency and the same accuracy. But if you give it a math problem, ask something about coding, or just generally give it a task that requires more reasoning - it is better processed by a thinking variant of the model.
And the idea is good, especially for the average user who doesn't know much about how these models work and doesn't want to think about which model to choose for every query. But the implementation was very bad in the first couple of days, and OpenAI confirmed that themselves. The router was working poorly, not choosing a thinking model for complex queries when needed, and not only that, the information about which model answered the query was also hidden, so, for example, when your request (as a free/plus user) was routed to "GPT-5 mini", you couldn't know that. There's not even a "mini" model in the model picker.
And another factor is the "reasoning effort" parameter that OpenAI's models have. It determines "how hard" the model thinks before answering. If you need quicker answers, choose "low"; if you need more reasoning for more complex tasks, use "medium" or "hard". And the thing is that in ChatGPT you can't choose that setting yourself. It's part of the router, too. And the information about this setting is also hidden from users.
It turned out that most of the requests from free/plus users were processed either by a non-thinking model, or by a thinking model but with the "low" reasoning effort setting. And sometimes also by "mini" models when limits for the regular ones were exhausted. And the performance is, expectedly, bad under those circumstances.
So, even when paying users tried out GPT-5, they were often getting bad results. And that was their impression of GPT-5. And that's why there was so much hate for it online.
But OpenAI is fixing those problems, and some are already fixed. So, if you tried out GPT-5 in the first couple of days and didn't like it, consider trying it out again now or in a few days, as it might be much better.
1👍6🔥5👏3❤1⚡1🤔1🕊1😐1