ReasoningBank
Currently AI agents have one major limitation: they cannot learn. I mean they don't learn from their experience or from the results of completed tasks. Once the model is trained, all we can do is to tune our prompts or enrich results with domain data from RAG.
Researchers from Google started exploring how to overcome this limitation and introduced the concept called ReasoningBank.
The overall idea is simple:
1. The agent writes down the result of successful or failed tasks into a dedicated md file.
2. During task execution, the agent searches the ReasoningBank and pulls relevant memories into the context.
3. Then it uses an LLM-as-a-judge approach to self-evaluate the result, analyze the trajectory of reasoning, and extract success insights or failure reasons.
Each file has the following structure (very similar to skills):
- Title: identifier of the core strategy.
- Description: short summary of the memory item.
- Content: reasoning steps, decision explanation, or operational insights extracted from past experience.
To be honest, benchmark results compared to other agent memory approaches do not look extremely impressive:
At the same time, this approach adds even more data to the context. And context, as we know, directly affects both model behavior quality and usage cost.
The official paper contains interesting research details, including particular prompts and measurements.
From my perspective, the idea and its implementation are very similar to skills or other long-term agent memories (e.g. in Claude Code). But the overall direction of making agents capable of learning from their own experience looks really promising.
#ai #engineering #news
Currently AI agents have one major limitation: they cannot learn. I mean they don't learn from their experience or from the results of completed tasks. Once the model is trained, all we can do is to tune our prompts or enrich results with domain data from RAG.
Researchers from Google started exploring how to overcome this limitation and introduced the concept called ReasoningBank.
The overall idea is simple:
1. The agent writes down the result of successful or failed tasks into a dedicated md file.
2. During task execution, the agent searches the ReasoningBank and pulls relevant memories into the context.
3. Then it uses an LLM-as-a-judge approach to self-evaluate the result, analyze the trajectory of reasoning, and extract success insights or failure reasons.
Each file has the following structure (very similar to skills):
- Title: identifier of the core strategy.
- Description: short summary of the memory item.
- Content: reasoning steps, decision explanation, or operational insights extracted from past experience.
To be honest, benchmark results compared to other agent memory approaches do not look extremely impressive:
ReasoningBank without scaling outperformed memory-free agents by 8.3% on WebArena and 4.6% on SWE-Bench-Verified.
At the same time, this approach adds even more data to the context. And context, as we know, directly affects both model behavior quality and usage cost.
The official paper contains interesting research details, including particular prompts and measurements.
From my perspective, the idea and its implementation are very similar to skills or other long-term agent memories (e.g. in Claude Code). But the overall direction of making agents capable of learning from their own experience looks really promising.
#ai #engineering #news
π3β€2
AI Engineering
I strongly believe that if you want to use any technology effectively, you need to understand how it works under the hood. Especially in software engineering.
So if you havenβt looked into LLM internals yet, Iβd highly recommend reading AI Engineering by Chip Huyen. The book was published in December 2024. And as AI is moving extremely fast, you might think itβs already outdated. Yes and no.
The book focuses on fundamentals. And they donβt really change that fast. You wonβt find hype topics like skills, harnesses, or agents orchestration there. But for building structured understanding of how AI works, you don't actually need them.
What I personally found useful:
πΈ Core LLM concepts: tokenization, training and post-training processes, datasets preparation. This part is very similar to Mashing Learning Crash Course from Google.
πΈ Model evaluation: quite complex but interesting topic about model output results and their comparison. The book covers ranking, model specialization, public benchmarks and AI-as-a-judge approach.
πΈ Prompt engineering: good reference about context and prompting. Additionally, the author described different security aspects of using prompts, that part really extended my thoughts about what can go wrong.
πΈ Finetuning: a deep dive into different ways to optimize models. You need to be a good mathematician to understand this part. So I was really glad I'm not an ML engineer π (huge respect to all ML experts, it's really hard).
πΈ User feedback: basic patterns on how to collect feedback, what to measure and why, common pitfalls.
To sum up, this book is really great to structure your knowledge about modern AI systems. Once you have that foundation, it becomes much easier to navigate all the new tools, patterns and paradigms that appear almost every month.
#booknook #ai #engineering
I strongly believe that if you want to use any technology effectively, you need to understand how it works under the hood. Especially in software engineering.
So if you havenβt looked into LLM internals yet, Iβd highly recommend reading AI Engineering by Chip Huyen. The book was published in December 2024. And as AI is moving extremely fast, you might think itβs already outdated. Yes and no.
The book focuses on fundamentals. And they donβt really change that fast. You wonβt find hype topics like skills, harnesses, or agents orchestration there. But for building structured understanding of how AI works, you don't actually need them.
What I personally found useful:
πΈ Core LLM concepts: tokenization, training and post-training processes, datasets preparation. This part is very similar to Mashing Learning Crash Course from Google.
πΈ Model evaluation: quite complex but interesting topic about model output results and their comparison. The book covers ranking, model specialization, public benchmarks and AI-as-a-judge approach.
πΈ Prompt engineering: good reference about context and prompting. Additionally, the author described different security aspects of using prompts, that part really extended my thoughts about what can go wrong.
πΈ Finetuning: a deep dive into different ways to optimize models. You need to be a good mathematician to understand this part. So I was really glad I'm not an ML engineer π (huge respect to all ML experts, it's really hard).
πΈ User feedback: basic patterns on how to collect feedback, what to measure and why, common pitfalls.
To sum up, this book is really great to structure your knowledge about modern AI systems. Once you have that foundation, it becomes much easier to navigate all the new tools, patterns and paradigms that appear almost every month.
#booknook #ai #engineering
Amazon
AI Engineering: Building Applications with Foundation Models
AI Engineering: Building Applications with Foundation Models [Huyen, Chip] on Amazon.com. *FREE* shipping on qualifying offers. AI Engineering: Building Applications with Foundation Models
π₯4π3
Ralph Loop
Underterministic nature of AI sometimes produces very interesting engineering solutions. One such example is Ralph (or Ralph Wiggum) Loop.
This is an AI coding pattern inspired by The Simpsons character Ralph Wiggum, known for saying weird things with high confidence π.
The idea is simple: The agent can be dumb in a single iteration. But if it keeps retrying with feedback long enough, it eventually converges.
The loop steps:
The loops finishes when all tasks have
But the real value of the technique is not in retries, it's in context engineering strategy under the hood:
πΈ One loop executes only one task. It keeps agent focused.
πΈ Each iteration starts a new agent session. State lives outside the context keeping it clean between iterations. State is stored in git history, progress.txt and long-term-memory files.
πΈ Tasks are delegated to subagents. The main context is not polluted with task execution details and validations.
πΈ AGENTS.md is updated on each iteration. It is a live artifact that contains discovered patterns, learnings and conventions so future iterations can benefit from those findings and do not repeat previous mistakes.
πΈ AGENTS.md contains explicit validations for feedback loop. It usually defines linters and typechecks, build and test execution commands.
Ralph Loop is a really powerful pattern to get things done: it just repeats the task until it succeeds making agent execution more reliable. "Deterministically bad" but effective.
But this approach only works if you have good task decomposition, clear completion criteria, and mature SDLC practices with strong validations and feedback loops. Otherwise the agent will generate just a ton of mess.
#ai #engineering #patterns
Underterministic nature of AI sometimes produces very interesting engineering solutions. One such example is Ralph (or Ralph Wiggum) Loop.
This is an AI coding pattern inspired by The Simpsons character Ralph Wiggum, known for saying weird things with high confidence π.
The idea is simple: The agent can be dumb in a single iteration. But if it keeps retrying with feedback long enough, it eventually converges.
The loop steps:
Start a new agent\subagent -> load task + memory -> execute 1 selected task -> run validation -> save learnings -> commit progress -> repeat
The loops finishes when all tasks have
passes:true or it reaches the maximum number of iterations (default is 10).But the real value of the technique is not in retries, it's in context engineering strategy under the hood:
πΈ One loop executes only one task. It keeps agent focused.
πΈ Each iteration starts a new agent session. State lives outside the context keeping it clean between iterations. State is stored in git history, progress.txt and long-term-memory files.
πΈ Tasks are delegated to subagents. The main context is not polluted with task execution details and validations.
πΈ AGENTS.md is updated on each iteration. It is a live artifact that contains discovered patterns, learnings and conventions so future iterations can benefit from those findings and do not repeat previous mistakes.
πΈ AGENTS.md contains explicit validations for feedback loop. It usually defines linters and typechecks, build and test execution commands.
Ralph Loop is a really powerful pattern to get things done: it just repeats the task until it succeeds making agent execution more reliable. "Deterministically bad" but effective.
But this approach only works if you have good task decomposition, clear completion criteria, and mature SDLC practices with strong validations and feedback loops. Otherwise the agent will generate just a ton of mess.
#ai #engineering #patterns
Geoffrey Huntley
Ralph Wiggum as a "software engineer"
How Ralph Wiggum went from 'The Simpsons' to the biggest name in AI right now - Venture Beat
πHere's a cool little field report from a Y Combinator hackathon event where they put Ralph Wiggum to the test.
"We Put a Coding Agent in a While Loop and It Shipped
πHere's a cool little field report from a Y Combinator hackathon event where they put Ralph Wiggum to the test.
"We Put a Coding Agent in a While Loop and It Shipped
π₯4
Skill Packaging
Engineering teams are actively building internal collections of skills for agents: code review, troubleshooting, design preparation, onboarding, security practices.
And it looks great until you hit the question: how do you distribute those skills across dozens of teams and multiple harnesses? For Claude you need to put skills into
To solve this problem big companies mostly build their own in-house solutions. Smaller companies usually just copy files from some shared repository and manage this complexity manually.
I donβt like reinventing the wheel, so when my team faced the same problem, we started looking for an existing solution we could reuse. And the only actively maintained tool we managed to find was apm by Microsoft.
APM is a package manager for prompts, skills, and MCPs. In other words, itβs maven or gomod for agents.
APM package structure:
To install the package in a target repo you need to define apm.yaml with a list of required dependencies:
After that you just run:
and the required skills will be installed into the corresponding harness folders (
The tool works with Github and on-prem git installations like Gitlab.
APM is not perfect. It had some unpleasant (but not critical) issues, and sometimes you can really feel that it was heavily vibe-coded in Python .
But despite all that, the tool actually works: you have a spec to define your skills\prompts packages, distribute and update them with simple
#ai #engineering #agents
Engineering teams are actively building internal collections of skills for agents: code review, troubleshooting, design preparation, onboarding, security practices.
And it looks great until you hit the question: how do you distribute those skills across dozens of teams and multiple harnesses? For Claude you need to put skills into
.claude, for Cursor into .cursor, for Gemini into .gemini, etc. And things become even messier when you need to roll out updates.To solve this problem big companies mostly build their own in-house solutions. Smaller companies usually just copy files from some shared repository and manage this complexity manually.
I donβt like reinventing the wheel, so when my team faced the same problem, we started looking for an existing solution we could reuse. And the only actively maintained tool we managed to find was apm by Microsoft.
APM is a package manager for prompts, skills, and MCPs. In other words, itβs maven or gomod for agents.
APM package structure:
my-package/
βββ apm.yml
βββ .apm/
βββ instructions/
β βββ my.instructions.md
βββ skills/
β βββ my-skill/
β βββ SKILL.md
βββ agents/
βββ prompts/
To install the package in a target repo you need to define apm.yaml with a list of required dependencies:
name: my-projecty
version: 1.0.0
targets:
- claude
- copilot
dependencies:
apm:
- <git-address>/my-package
- <git-address>/another-package
mcp: []
After that you just run:
apm install
and the required skills will be installed into the corresponding harness folders (
.claude, .copilot, etc.). The tool works with Github and on-prem git installations like Gitlab.
APM is not perfect. It had some unpleasant (but not critical) issues, and sometimes you can really feel that it was heavily vibe-coded in Python .
But despite all that, the tool actually works: you have a spec to define your skills\prompts packages, distribute and update them with simple
apm update. And on top of apm dependencies format itβs pretty easy to vibe-code your own internal skills marketplace.#ai #engineering #agents
GitHub
GitHub - microsoft/apm: Agent Package Manager
Agent Package Manager. Contribute to microsoft/apm development by creating an account on GitHub.
π4β€3
The Fearless Organization
The most dangerous teams are the quiet teams. There are no disagreements, no bad news, no conflicts. Looks like harmony until the real incident.
That's the topic of the book The Fearless Organization: Creating Psychological Safety in the Workplace for Learning, Innovation, and Growth by Amy C. Edmondson.
Amy is a professor of Leadership and Management at the Harvard Business School. She has studied the phenomenon of psychological safety and its impact on team performance for many years across different organizations.
She defines psychological safety as follows:
What it means in practice:
- people are not afraid to ask questions
- they do not hide problems
- they are not afraid to look stupid
- they do not avoid conflicts
- they can freely express their opinions and bring suggestions
Why does it matter?
There is a good example from the book that explains that. Imagine a doctor prescribes treatment for a child. A nurse notices that doctors usually prescribe drug A in such cases, but this time it is missing.
In a team with high psychological safety, the nurse will clarify this with the doctor and may help prevent a medical error.
In a team with low psychological safety, she may be afraid to ask. And the consequences can be dramatic.
The core idea is simple. But the book contains a lot of real stories where a low level of psychological safety leads to dramatic results (e.g. Volkswagen emission scandal, pilot mistakes that caused plane crashes). The author repeatedly highlights that the more complex and critical the profession is, the more important psychological safety becomes.
How does this relate to our daily work?
We as leaders are responsible for the psychological climate in the team: how well we listen to people, accept different opinions, react to questions, mistakes, or bad news. It's our daily routine that either helps the team become more effective, or leads people to hide problems and the real state of things.
Overall, I really liked the book. It explains the idea in simple language with many real examples. And what is important for me, all arguments and recommendations are supported by sociological research, experiments and practical psychology.
So psychological safety is not just an idea. It is a proven behavioral model and set of practices that can actually help leaders build better teams.
P.S. One of the best real examples of psychological safety is Pixar. I wrote about it earlier in overview of Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration: parts 1,2,3.
#booknook #softskills #leadership
The most dangerous teams are the quiet teams. There are no disagreements, no bad news, no conflicts. Looks like harmony until the real incident.
That's the topic of the book The Fearless Organization: Creating Psychological Safety in the Workplace for Learning, Innovation, and Growth by Amy C. Edmondson.
Amy is a professor of Leadership and Management at the Harvard Business School. She has studied the phenomenon of psychological safety and its impact on team performance for many years across different organizations.
She defines psychological safety as follows:
a belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes, and that the team is safe for inter-personal risk taking.
What it means in practice:
- people are not afraid to ask questions
- they do not hide problems
- they are not afraid to look stupid
- they do not avoid conflicts
- they can freely express their opinions and bring suggestions
Why does it matter?
There is a good example from the book that explains that. Imagine a doctor prescribes treatment for a child. A nurse notices that doctors usually prescribe drug A in such cases, but this time it is missing.
In a team with high psychological safety, the nurse will clarify this with the doctor and may help prevent a medical error.
In a team with low psychological safety, she may be afraid to ask. And the consequences can be dramatic.
The core idea is simple. But the book contains a lot of real stories where a low level of psychological safety leads to dramatic results (e.g. Volkswagen emission scandal, pilot mistakes that caused plane crashes). The author repeatedly highlights that the more complex and critical the profession is, the more important psychological safety becomes.
How does this relate to our daily work?
We as leaders are responsible for the psychological climate in the team: how well we listen to people, accept different opinions, react to questions, mistakes, or bad news. It's our daily routine that either helps the team become more effective, or leads people to hide problems and the real state of things.
Overall, I really liked the book. It explains the idea in simple language with many real examples. And what is important for me, all arguments and recommendations are supported by sociological research, experiments and practical psychology.
So psychological safety is not just an idea. It is a proven behavioral model and set of practices that can actually help leaders build better teams.
P.S. One of the best real examples of psychological safety is Pixar. I wrote about it earlier in overview of Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration: parts 1,2,3.
#booknook #softskills #leadership
π3
Agent Readiness Framework
A few weeks ago I wrote that adopting coding agents requires strong engineering practices.
Test stability, linting, documentation, security controls matter much more than a particular harness or model.
Agent Readiness framework is an attempt to formalize these criteria for a particular repository and define how much autonomy can be safely delegated to agents.
The framework evaluates repos across 8 dimensions:
- Style & Validation
- Build System
- Testing
- Documentation
- Dev Environment
- Code Quality
- Observability
- Security & Governance
Based on these dimensions, the framework defines 5 levels of repo maturity:
πΈ Level 1: Functional. Basic checks: README, linters, unit tests.
πΈ Level 2: Documented. Detailed documentation and basic automations: AGENT.md, reproducible dev env, contribution guides.
πΈ Level 3: Standardized. E2E tests, observability, security scanning, maintained documentation.
πΈ Level 4: Optimized. Fast validation loops, canary deployments, build optimization. Process is optimized for fast feedback.
πΈ Level 5: Autonomous. Task decomposition, multi-service orchestration, self-healing logic, auto-remediation.
The idea is simple: the higher the maturity level, the more predictable and reliable agent results. But looking at these levels, I can see that most repos are actually somewhere between Level 1 and Level 3.
Framework authors also provide a tool to automatically measure these criteria and maturity level, but it's available only after registration and using proprietary APIs. Scanned examples you can find at https://factory.ai/agent-readiness.
There is also an open-source alternative https://github.com/kodustech/agent-readiness. The project doesn't look active, but it gets the job done. It analyzes the repo and generates a report with the overall maturity level, findings for each dimension, and suggestions for improvements. Some rules are not very accurate. Looks like the project was mainly designed for python and js code verification. But anyway the tool gives you a good sense of what to pay attention to in your codebase.
What I like about this framework is that it shows that agent effectiveness is actually limited by the maturity of engineering practices. And it provides measurable and actionable results, that are easy to convert into an improvement plan for a particular repo.
#ai #engineering
A few weeks ago I wrote that adopting coding agents requires strong engineering practices.
Test stability, linting, documentation, security controls matter much more than a particular harness or model.
Agent Readiness framework is an attempt to formalize these criteria for a particular repository and define how much autonomy can be safely delegated to agents.
The framework evaluates repos across 8 dimensions:
- Style & Validation
- Build System
- Testing
- Documentation
- Dev Environment
- Code Quality
- Observability
- Security & Governance
Based on these dimensions, the framework defines 5 levels of repo maturity:
πΈ Level 1: Functional. Basic checks: README, linters, unit tests.
πΈ Level 2: Documented. Detailed documentation and basic automations: AGENT.md, reproducible dev env, contribution guides.
πΈ Level 3: Standardized. E2E tests, observability, security scanning, maintained documentation.
πΈ Level 4: Optimized. Fast validation loops, canary deployments, build optimization. Process is optimized for fast feedback.
πΈ Level 5: Autonomous. Task decomposition, multi-service orchestration, self-healing logic, auto-remediation.
The idea is simple: the higher the maturity level, the more predictable and reliable agent results. But looking at these levels, I can see that most repos are actually somewhere between Level 1 and Level 3.
Framework authors also provide a tool to automatically measure these criteria and maturity level, but it's available only after registration and using proprietary APIs. Scanned examples you can find at https://factory.ai/agent-readiness.
There is also an open-source alternative https://github.com/kodustech/agent-readiness. The project doesn't look active, but it gets the job done. It analyzes the repo and generates a report with the overall maturity level, findings for each dimension, and suggestions for improvements. Some rules are not very accurate. Looks like the project was mainly designed for python and js code verification. But anyway the tool gives you a good sense of what to pay attention to in your codebase.
What I like about this framework is that it shows that agent effectiveness is actually limited by the maturity of engineering practices. And it provides measurable and actionable results, that are easy to convert into an improvement plan for a particular repo.
#ai #engineering
π3π₯3
How Anthropic Writes Skills
Last week Anthropic published lessons learnt of how they build agent skills internally. It's quite interesting to read recommendations from the company that introduced the concept in the first place.
Key ideas:
πΈ Don't be obvious. Model already knows how to code. A skill should provide instructions that change default agent behavior, not repeat the data the model was trained on.
πΈ Build a gotchas section. Add common mistakes and lessons learned. This helps the agent avoid repeating the same failures.
πΈ Use progressive disclosure. A skill is not just a SKILL.md. It can include additional files that are loaded on demand, reducing context overload.
πΈ Don't be too specific. Give the agent information it needs, but leave the flexibility to adapt to the situation.
πΈ Separate configuration from instructions. Store setup data in config.json or collect required input from the user.
πΈ Write description for the model, not for humans. A description should help the model to understand when the skill should be invoked.
πΈ Use long-term memory. Skills can maintain their own data in a subdirectory and reuse it across executions.
πΈ Automate where possible. Not everything should be a prompt. Some actions can be automated with helper scripts and functions.
Unfortunately, the article doesn't provide any guidelines of how to evaluate skill effectiveness. It's still not clear how to understand if a skill actually works, how to compare two versions of the same skill, or how to detect that a skill is no longer useful.
Provided recommendations are based mostly on observations of how popular internal skills are structured. It's useful but not measurable.
So despite the fact that skills are the most powerful agent extension now, evaluating them remains on of the hardest engineering problem.
But that's another story.
#ai #engineering
Last week Anthropic published lessons learnt of how they build agent skills internally. It's quite interesting to read recommendations from the company that introduced the concept in the first place.
Key ideas:
πΈ Don't be obvious. Model already knows how to code. A skill should provide instructions that change default agent behavior, not repeat the data the model was trained on.
πΈ Build a gotchas section. Add common mistakes and lessons learned. This helps the agent avoid repeating the same failures.
πΈ Use progressive disclosure. A skill is not just a SKILL.md. It can include additional files that are loaded on demand, reducing context overload.
πΈ Don't be too specific. Give the agent information it needs, but leave the flexibility to adapt to the situation.
πΈ Separate configuration from instructions. Store setup data in config.json or collect required input from the user.
πΈ Write description for the model, not for humans. A description should help the model to understand when the skill should be invoked.
πΈ Use long-term memory. Skills can maintain their own data in a subdirectory and reuse it across executions.
πΈ Automate where possible. Not everything should be a prompt. Some actions can be automated with helper scripts and functions.
Unfortunately, the article doesn't provide any guidelines of how to evaluate skill effectiveness. It's still not clear how to understand if a skill actually works, how to compare two versions of the same skill, or how to detect that a skill is no longer useful.
Provided recommendations are based mostly on observations of how popular internal skills are structured. It's useful but not measurable.
So despite the fact that skills are the most powerful agent extension now, evaluating them remains on of the hardest engineering problem.
But that's another story.
#ai #engineering
Claude
Lessons from building Claude Code: How we use skills | Claude by Anthropic
What we learned building and scaling hundreds of skills internally at Anthropic.
π3
Project Hail Mary
Technical books and articles are great, but sometimes my brain needs a break. Especially now, when AI is generating more and more new things to learn every day. One of my favorite ways to recharge is reading fiction, and I recently finished the very popular Project Hail Mary by Andy Weir.
I'm not a big sci-fi fan, but I definitely enjoyed this book.
Thanks to the recent movie adaptation, the story is probably familiar to many.
A man wakes up alone on a spaceship with no memory of who he is or why he's there. As his memories gradually return, he discovers that he's a scientist on a mission in another star system.
Humanity is facing extinction. The Sun is losing energy because of a mysterious organism called Astrophage. Nearby stars are also infected except Tau Ceti. A crew is sent there to find out why it's different and, hopefully, save Earth.
Unfortunately, only the main character survives the journey.
He starts his scientific research of the star and eventually noticed a spacecraft on his radar.
And then the almost impossible happens: first contact with an alien. The problem is, how do you communicate when you don't even share the same way of producing speech? The answer is physics. I really liked the idea that laws of physics are universal, making math and science the foundation for building communication between two civilizations.
I won't spoil the rest, but the story is really engaging. Despite being a disaster novel, it contains a good dose of humor and places a strong emphasis on friendship, kindness, and mutual help. And when you finish it, you're left with a surprisingly warm feeling.
I watched the movie after finishing the book, and for once I can say the adaptation is actually good. Of course, it's much more compact and some details are simplified, but it stays remarkably close to the original story while preserving its emotional depth.
Overall, I loved it. Highly recommended both the book and the movie.
#offtop #booknook
Technical books and articles are great, but sometimes my brain needs a break. Especially now, when AI is generating more and more new things to learn every day. One of my favorite ways to recharge is reading fiction, and I recently finished the very popular Project Hail Mary by Andy Weir.
I'm not a big sci-fi fan, but I definitely enjoyed this book.
Thanks to the recent movie adaptation, the story is probably familiar to many.
A man wakes up alone on a spaceship with no memory of who he is or why he's there. As his memories gradually return, he discovers that he's a scientist on a mission in another star system.
Humanity is facing extinction. The Sun is losing energy because of a mysterious organism called Astrophage. Nearby stars are also infected except Tau Ceti. A crew is sent there to find out why it's different and, hopefully, save Earth.
Unfortunately, only the main character survives the journey.
He starts his scientific research of the star and eventually noticed a spacecraft on his radar.
And then the almost impossible happens: first contact with an alien. The problem is, how do you communicate when you don't even share the same way of producing speech? The answer is physics. I really liked the idea that laws of physics are universal, making math and science the foundation for building communication between two civilizations.
I won't spoil the rest, but the story is really engaging. Despite being a disaster novel, it contains a good dose of humor and places a strong emphasis on friendship, kindness, and mutual help. And when you finish it, you're left with a surprisingly warm feeling.
I watched the movie after finishing the book, and for once I can say the adaptation is actually good. Of course, it's much more compact and some details are simplified, but it stays remarkably close to the original story while preserving its emotional depth.
Overall, I loved it. Highly recommended both the book and the movie.
#offtop #booknook
π3π₯2β€1
"AI won't take your job. Someone using AI will."
This quote caught my attention and made me watch A Leaderβs Guide to Advanced Team Structures in an Agentic World from the recent AWS Summit Sydney.
It's a very sobering talk on the current state of the industry, AI adoption, and the future of engineering roles.
The central question of the talk is: "How should we build teams to work in this new AI world?"
To answer it, the author proposes a framework based on four elements:
Economics
The market has changed. Timelines are compressing. A small team of senior engineers can replace an entire existing product. This creates real risks for businesses that fail to adapt in time. That's why AI decisions should be driven by economics and business value, not hype.
Talent
Previously, career growth in tech was mostly about writing code and building features. Today, the most valuable skill is understanding the business, customers, and product. In other words, AI rewards expert generalists. One person can now handle analysis, backend, and frontend, reducing collaboration overhead and the need for deep specialization of large teams.
Another interesting point is the future of junior engineers. The speaker argues that we must keep the junior pipeline alive. Otherwise, we won't have senior expertise in 2034.
Structure
Current IT operations are optimized for determinism. But agents are non-deterministic. So operating model has to shift: variance in execution, focus on outcome and guardrails around thing you actually care about. The best operational model there is platform engineering.
Governance
The author highlights several areas that organizations need to address:
- Agent Identity Management. Every agent should have a verifiable identity traced to a named human.
- Risk Assessment. Clearly define what an agent is allowed to do and ensure it operates within those boundaries.
- Multi-agent Coordination. Control what happens when agents disagree, escalate, or find emergent behavior we don't expect.
- Deskilling Prevention. Employees should maintain core skills even if agents automate routine work. Someone still needs to validate results, audit actions, and take responsibility for decisions.
Overall, the talk is a good reality check on what is actually happening and how businesses and teams need to change to remain successful. Much of it resonates with my own observations, so I would definitely recommend to watch the full video.
#ai #leadership #engineering
This quote caught my attention and made me watch A Leaderβs Guide to Advanced Team Structures in an Agentic World from the recent AWS Summit Sydney.
It's a very sobering talk on the current state of the industry, AI adoption, and the future of engineering roles.
The central question of the talk is: "How should we build teams to work in this new AI world?"
To answer it, the author proposes a framework based on four elements:
Economics
The market has changed. Timelines are compressing. A small team of senior engineers can replace an entire existing product. This creates real risks for businesses that fail to adapt in time. That's why AI decisions should be driven by economics and business value, not hype.
Talent
Previously, career growth in tech was mostly about writing code and building features. Today, the most valuable skill is understanding the business, customers, and product. In other words, AI rewards expert generalists. One person can now handle analysis, backend, and frontend, reducing collaboration overhead and the need for deep specialization of large teams.
Another interesting point is the future of junior engineers. The speaker argues that we must keep the junior pipeline alive. Otherwise, we won't have senior expertise in 2034.
Structure
Current IT operations are optimized for determinism. But agents are non-deterministic. So operating model has to shift: variance in execution, focus on outcome and guardrails around thing you actually care about. The best operational model there is platform engineering.
Governance
The author highlights several areas that organizations need to address:
- Agent Identity Management. Every agent should have a verifiable identity traced to a named human.
- Risk Assessment. Clearly define what an agent is allowed to do and ensure it operates within those boundaries.
- Multi-agent Coordination. Control what happens when agents disagree, escalate, or find emergent behavior we don't expect.
- Deskilling Prevention. Employees should maintain core skills even if agents automate routine work. Someone still needs to validate results, audit actions, and take responsibility for decisions.
Overall, the talk is a good reality check on what is actually happening and how businesses and teams need to change to remain successful. Much of it resonates with my own observations, so I would definitely recommend to watch the full video.
#ai #leadership #engineering
YouTube
A leaderβs guide to advanced team structures in an agentic world | AWS Events
As AI agents transform the workplace, organizations must adapt their structures and methodologies to harness new opportunities. The probabilistic nature of AI requires continuous iteration and intelligent oversight, creating new ways of working across businessβ¦
π2π₯2β€1
A Few Words About Context
New major model releases regularly promise bigger context windows. Sounds great until you realize it's mostly marketing. A bigger context window doesn't mean better results. It often means more data, more noise, and more AI slop.
According to multiple studies, models effectively use only about 30β50% of their available context. For example, a model with a 200K-token context window may already show noticeable quality loss at around 50K tokens.
Why this happens:
πΈ Context rot. Output quality gradually degrades as the context grows.
πΈ Reasoning shift. The model spends less effort on reasoning. The answers sound more confident, but their quality often gets worse.
πΈ The lost-in-the-middle effect. Information in the middle of the context can be overlooked during later reasoning.
πΈ Attention dilution. The model's attention is spread across different instructions, making it harder to focus on what actually matters.
The practical takeaway is simple: keep your context clean:
πΈ Start a new conversation for each new task (
πΈ During long-running tasks, use /compact regularly to collapse intermediate reasoning and keep only the important things.
πΈ Store large data in long-term memory or relevant documentation, and bring it into the context only when it's actually needed.
Useful references:
- https://www.morphllm.com/context-rot
- https://www.zenml.io/llmops-database/context-rot-evaluating-llm-performance-degradation-with-increasing-input-tokens
- https://arxiv.org/html/2601.11564v1
#engineering #ai #tips
New major model releases regularly promise bigger context windows. Sounds great until you realize it's mostly marketing. A bigger context window doesn't mean better results. It often means more data, more noise, and more AI slop.
According to multiple studies, models effectively use only about 30β50% of their available context. For example, a model with a 200K-token context window may already show noticeable quality loss at around 50K tokens.
Why this happens:
πΈ Context rot. Output quality gradually degrades as the context grows.
πΈ Reasoning shift. The model spends less effort on reasoning. The answers sound more confident, but their quality often gets worse.
πΈ The lost-in-the-middle effect. Information in the middle of the context can be overlooked during later reasoning.
πΈ Attention dilution. The model's attention is spread across different instructions, making it harder to focus on what actually matters.
The practical takeaway is simple: keep your context clean:
πΈ Start a new conversation for each new task (
/new in Claude).πΈ During long-running tasks, use /compact regularly to collapse intermediate reasoning and keep only the important things.
πΈ Store large data in long-term memory or relevant documentation, and bring it into the context only when it's actually needed.
Useful references:
- https://www.morphllm.com/context-rot
- https://www.zenml.io/llmops-database/context-rot-evaluating-llm-performance-degradation-with-increasing-input-tokens
- https://arxiv.org/html/2601.11564v1
#engineering #ai #tips
π4π₯2β€1