Agent made me do this
56 subscribers
42 photos
2 videos
84 links
Latest agentic engineering news, freshly delivered directly from my twitter feed.
May contain subjective opinions.
Download Telegram
Good morning, little goblins

Research story from OpenAI on why gpt loves goblins and other creatures 👹
Spoiler - it comes from SFT and reward signals for “nerdy” personality setting

https://openai.com/index/where-the-goblins-came-from/
Tons of new features in Codex app, and also this really nice dynamic UI

https://chatgpt.com/codex/for-work/
Agent made me do this
https://telegra.ph/Lessons-from-3-days-of-using-goal-on-OpenClaw-05-03
This is the golden advice for any agentic workflow, not only for new “/goal”
I’ve come to something similar by trying and failing, but this is much more structured and useful than I could ever write
5.5 Instant - non-thinking version of 5.5 and a replacement for 5.3 Instant

Incremental bump in performance, model got bit smarter and also more concise - nice for a quick chatting

https://openai.com/index/gpt-5-5-instant/
Anthropic partnered up with SpaceX to use all compute capacity at Colossus 1 data center, which results in doubled rate limits and removed peak hours limit

They’re talking about 300 megawatts and 220k GPUs btw
👍1
Codex + iOS is next level

serve-sim - streams simulator to a local webpage, so you can open browser page in Codex app and let it do the work.

https://github.com/EvanBacon/serve-sim
TIL both codex and claude read only a fraction of the file, hence hallucinations when you don’t expect it

https://x.com/badlogicgames/status/2052499245903593736
Finally!
🔥1
Codex in the ChatGPT mobile app is a fully-featured mobile experience for getting work done with Codex. When you connect to any of your machines where Codex is running (whether that’s your laptop, a dedicated Mac mini, or a managed remote environment), the app loads the live state from that environment so you can work fluidly across active threads, approvals, plugins, and project context.

This is more than the ability to remotely control a single task or dispatch new tasks to your computer. From your phone, you can work across all of your threads, review outputs, approve commands, change models, or start something new. Your files, credentials, permissions, and local setup stay on the machine where Codex is operating, while updates flow back to your phone in real time, including screenshots, terminal output, diffs, test results, and approvals.

Under the hood, Codex uses a secure relay layer that keeps trusted machines reachable across devices without exposing them directly to the public internet. That relay also keeps active session state and context synced anywhere you’re signed in with ChatGPT.


https://openai.com/index/work-with-codex-from-anywhere/
And cherry on top - computer use from mobile app 🤌
👍2
TIL - cache prewarm is real
If you want to cut time-to-first-token when using Claude via API - send your system prompt before the user prompt with
max_tokens=0   

Claude writes it to cache, but skips output generation
When user request lands, it’ll hit a warm cache

More on this - https://platform.claude.com/docs/en/build-with-claude/prompt-caching#pre-warming-the-cache
Every two weeks I see a new tool, that claims x2/x5/x10/x100 over previous implementations
And still, from my personal experience, the least tools you expose to the model - the better it performs

Our engineering nature is to strive for better tools, but there’s a big roadblock - RL
My belief is that until we solve Continuous Learning in some form, models will continue to perform only on tools from their RL environments

Anyway, looks cool, check it out:
https://github.com/MinishLab/semble
👍2
Composer 2.5 - new model from Cursor
Based on Kimi K2.5, same as Composer 2

Given that all the improvements from previous version came only from RL - very impressive
Mostly agree.
Show this to me one year ago, and I’d never believe you 🥲