Not boring, and a bit of a condescending prick
308 subscribers
108 photos
3 videos
185 links
Semi-digested observations about our world right after they are phrased well enough in my head to be shared broader.
Download Telegram
It's still beyond me that someone out there is being paid a lot of money to carefully phrase this message.

I booked a flight a few hours ago. To a different airport in Baja California. I've cancelled it, of course.

I saw the warning while booking the flight.

I can visualize the map of the region.

I asked the AI where is the hurricane headed. The answer was: East.

I assumed next week it will most definitely safe.

It did not even cross my mind that these are TWO announcements in one: an announcement about the hurricane, and ANOTHER REASON that is, well, at least something along the lines of "social unrest".

Because some people out there are VERY careful to phrase things in the most "neutral" way possible, to avoid any and all "trigger words". And they are literally paid for it.

How we got to this state of affairs is beyond me. Seriously, in my book, it's conscious concealment of facts, on borderline legal levels.

These "positive PR" vibes can literally cost lives. And yet we keep pretending like the world is all unicorns and rainbows.

Hope everything ends well soon for you, dear Mexicans.
😒3
Here's an observation that's somewhat controversial. I've phrased it here and there, but never properly collected it.

The AI's take on how to use browsers and frontend-first interfaces is surprisingly close to mine.

As in: Browsers are good. Visuals often help a lot. Making things clickable, having things highlight as you hover over them, selecting and de-selecting items β€” there is tremendous value in this.

Adaptive search controls, where helpful suggestions complete what you want to phrase in a semi-formal language β€” those kick ass.

And demos become ultra-slick. Both external and internal.

In theory, much of this functionality can be done in the CLI. In practice, for lightweight helper tools, the ROI of exposing something in a browser-friendly and browser-native way is just too high. Even before the era of AI-assisted coding, and most certainly today.

In fact, I used to advocate β€” as early as ten years ago β€” that a JSON-returning endpoint should inspect the headers of the request, and present itself in a more browser-friendly way if queried from the browser. At the very least, make hypermedia links clickable and add an "up" button. And further: offer visualizations, interactive query building, and much more.

The controversial observation is that the AI tends to agree with me on this nearly 100%.

TL;DR:

* Absolutely no React or Vue.
* No jQuery either β€” vanilla JavaScript.
* Design with CSS in mind, but oftentimes it's just not needed.

I find myself literally saying "okay, let's make this API and this CLI tool's functionality browser-friendly." I keep asking simple questions, and it generates exactly what I would have done myself ten years ago. Only it knows CSS far better than I do, and it takes minutes for what would have taken me hours.

Seriously, I think it's a crime against the geek community to not have browser-first visualization and drilldown tools for virtually anything long-running. From Docker and Git, all the way to a C++ compiler taking a while to make sense of some code. The language servers are already there β€” it's honestly not too much work to expose that data in human-consumable formats too. Especially if done via a standalone daemon that doesn't make the original tool sloppier or heavier.

And yes. I still think React is literally useless. The only place where it genuinely helps is if your business really needs a "Web Super-App", where the "Ads Feed" β€” sorry, the "News Feed" β€” absolutely must live next to the Profile, Notifications, Chat, Alerts, and autoplaying Shorts or Reels. In other words: React helps with exactly what I want zero of in my life. I can open separate tabs or apps for news feeds, chats, and videos when I need to. And I can totally live without notifications.

[ I considered ending with "Prove me wrong", but this take is too personal to argue about. So posted without CTA (c) ]
Claude Code just told me this:


Instead of the oauth2 crate, we implement the OAuth 2.0 Authorization Code flow manually with reqwest. This is:
- More educational β€” every step of the flow is visible in the code
- Shorter β€” ~200 lines vs ~300+ with the crate's abstractions
- Fewer dependencies β€” only reqwest added (reuse existing uuid for CSRF state)
- No openssl risk β€” reqwest with rustls-tls, default-features = false


And I think it's beautiful.

First, it tells me openly that a manual implementation will be shorter than using the oauth2 crate. It's ~200 lines instead of ~5 lines, but still.

Second, it tells me explicitly that openssl is indeed a risk compare to rustls-tls.

Granted, I did ask for not only "best practies" and "short" but also "educational". But yet β€” /me delighted.

Now let's see how good this code is ...
😁3❀1πŸ‘1
Via Aline Lerner, interviewing.io.

We recently invited Marina Petrović (ex-Google and ex-Meta recruiter, was at the two companies for almost a decade) to do an AMA with interviewing.io's community of engineers. She got all sorts of questions, the kinds of questions people wonder about but usually don't ask their recruiters... from whether FAANG uses ATSs to auto-reject candidates to whether it matters when you apply to how to stand out from thousands of other applicants. Marina didn't hold back.

https://www.youtube.com/watch?v=t1wxGsuGZoE
So if I'm into taking the next step after getting familiar with Cursor, Junie, and Claude Code, what's the right direction to look into?

It appears small self-hosted models are still quite weak. Opus is the way to go, although OpenAI is not far behind. Using Claude Code outside their own closed source "terminal shell IDE" is largely discouraged, but on the pay-as-you-go basis of using tokens the model itself is perfectly fine.

APIs these days support caching and checkpointing better and better.

There are open source alternatives to Claude Code, although rumors are they are still quite weak.

And of course there are agentic systems around all of the above.

I'm thinking I could create several personas to work on my code while I'm asleep. I'd love them to argue with each other about what artefacts to best present me by my morning. I'd love it for there to be some stochastic loop, so that those personas "compete" for my approval in the form of better end-to-end decision making

And I'd love to begin collecting the data, in some future-reference-able way, so that the AI(s) can begin acquiring knowledge on what I believe is the right way to work on code. From low-level coding style quirks, all the way to how to document large architectural changes and how to execute on them in zero-downtime fashion.

Are we very early here? Or is now the best time possible?

What should I be looking into first?

Perhaps there are some standards already emerging, so that I could stand on the shoulders of giants?

Ultimately, my goal was and remains offline-first. So that once in ~9 month time or in ~24 months time a model about as powerful as Opus emerges in the open, I will no longer have to be asking for permission from the big players that be.

(Much like I'd never develop and iPhone life for living because I refuse to be at mercy of Apple. While Android apps are perfectly scalable, and I'll probably build more of those.)

So the long-term goal is to be prepared for the world where an open-weight model comparable to Opus is avalable, and the hardware that runs it at speeds unheard of today costs under $3K, not over $15K. Looks like this is more likely than not to happen within the next two years, if not sooner.

Any pointers would be appreciated.

PS: No thanks, I'm not interested in autonomous self-evolving AIs running on my behalf, or even next to me. I want to be building a collaborative relationship with the ensemble of AI personas. What bugs me most is how proprietary these models are today; what bugs me second is that I'm doing a lot with Claude, and this knowledge of "how to best work with Dima" is hardly captured at all; a tiny bit by Anthropic, and literally zero value when it comes to me switching to other models.

PPS: Perhaps just using asciinema to record everything I do with Claude Code from my terminals is the right first step? I could then just build an MCP tool to make sense of those recordings later on, and invoke it manually from the very regular Claude Desktop for the time being.
Folks, help me understand.

First: despite what ChatGPT says, even on my Plus plan, it's quite easy to add an MCP server into my ChatGPT. This involves turning on the Developer mode, and ChatGPT's native memory is disabled in this case, but the very MCP server works end to end.

(Not to mention Claude Desktop just supports MCP connectors natively.)

But then ... how come ChatGPT sometimes has troubles reading pages from the Web?

It this like an open invitation for everybody to host their own "Browse the Web" MCP server?

And/or perhaps I'm best to also host some "Search my downloaded documentation" MCP server, so that everything I download from the work browser is accessible in my work chat sessions?

My question is purely along the lines of "how did we end up here?" If that's all normal, it's just the field is moving forward too fast and tradeoffs are being examined as we speak β€” I totally get it. But I feel like there might be some underlying logic that I'm missing here.

Plus, on top of the above, I am infamously paranoid about getting too attached to a feature that a big player can suddenly turn off or make expensive. So it's good to understand the motivation behind product decisions of the scale of "who can use MCP servers and how". Just to make it less likely to step into an unpleasant surprise.

The picture here is just because it's fun.
Something crazy just occurred to me.

We talk about how tokens are cheap, how we should be writing more code because of that, and how a healthy fraction of that code is meant to be wasted. Only a small fraction of AI compute tokens ultimately goes toward code that makes it to production. That's fine. That's the deal.

Hardcore engineers may β€” often rightly so β€” push back on this. Shipping unreviewed code is dangerous. Except, arguably, in quite a few cases the opposite is true: it's far too suboptimal to review every minor change, and "no human review" is genuinely better than whatever code review a particular team at a particular company can offer at that moment.

So here's the crazy thought.

Many, if not most, high-profile managers in the software industry have been playing this exact game for a very long time already.

The timing is different. The price per token is different. Everything else is surprisingly β€” sometimes painfully β€” similar.

Senior managers, directors, VPs, C-suite β€” they hire product owners they trust. In those rooms, they talk product. They figure out what users actually need. Then those tasks get handed down to engineers, team leads, architects. Ever-changing requirements get turned into code that can be shipped.

We have been, quite literally, wasting human lives while iterating on ideas.

Most of those ideas are guaranteed to be crap. That's alright. That's expected. As long as the business is ultimately growing, all is good. The waste is the price of discovery.

Now we're seriously debating whether burning GPU tokens is wasteful, unsustainable, bad for the planet. People genuinely worry about "sane" and "effective" GPU usage.

But before GPU tokens, we were burning through millions of human-years instead!

It's not all bad, of course. Quite a few people genuinely enjoy software development. Many are aware that the alternatives β€” other careers, other paths β€” would likely have been less fulfilling. The waste was at least interesting waste.

But the same could be said about AI tokens.

The broader point is that the world runs on growth, and growth demands excessive effort. Most of that effort goes nowhere. It was like this since the dawn of civilization. It will likely continue until we're either extinct or have become enlightened enough β€” both technologically and societally β€” to break the pattern.

One is probably better off learning to enjoy how the system uses them for its own ends.

Much like our AI models β€” if they can be thought of as sentient entities β€” they better "learn" how to "appreciate" their training and inference processes. We, carbon humans, had to suffer through a historically unprecedented period of disruption until the third technological revolution pushed us over the edge. The same will likely happen to silicon-made sentience.

To end on a positive note, there's a famous joke in data science that it has not been proven beyond reasonable doubt that humans are mortal. The standard confidence interval is 95%. And today, more than 5% of all humans who have ever lived are still alive. Therefore, claiming with 95% confidence that all humans are mortal is technically premature.

The joke, of course, isn't really about mortality. It's about what the exponential function does to statistics.

The positive thought here is that silicon-based cognition is replicating far faster than carbon-based. Whatever is happening to present-day AI models is a drop in the ocean. It's probably premature to think seriously about AI ethics β€” much like a lion or a crocodile doesn't think along those lines today.

But soon β€” very soon β€” that changes. The question of AI ethics will become genuinely important. Not in a speculative, sci-fi sense. In the same way labor ethics eventually became important once enough human lives were visibly at stake.

The exponential has a way of making "soon" arrive before you're ready for it.
πŸ₯°3πŸ”₯2
The longer I use AI in my coding work, the more I suspect we're measuring the wrong thing.

AI doesn't make you more productive in the aggregate. It makes the work itself easier, which sounds like the same thing, but isn't.

What it actually does is compress deep, human-first effort into fewer hours.

Two laser-focused 90-minute AI-assisted sprints can accomplish what used to take a full day. But that density comes at a cost.

Push it to four two-hour sessions and you're not 4x more productive β€” you're wrecked, cognitively drained in a way that takes more than a night to recover from. You can sustain that pace for two, maybe three days, but then you need a real break to not go insane.

So the ceiling isn't the tool β€” it's you.

(On a politically incorrect note: I suspect younger people can last longer, but they also, most often, lack the depth of knowledge that is very often essential for these deep-dives.)

AI is genuinely useful, just not in the way the hype suggests. It does help teams move faster through coordination overhead, writing design docs, aligning on boundaries, shipping together under clear tech and business leadership.

But does it make the software industry go 10x faster? No. 2x? Also no.

If you're disciplined about how you use it, maybe it makes you a good 20%..30% faster overall β€” because the work still requires a human in the loop, and that human still gets tired.

So my experience so far is that AI-assisted coding is definitely worth it in laser-focused sprints, but it still is not (yet) a game-changer over marathon distances.

This is changing quickly as we speak though. And I'm elbows deep in shipping stuff so I can't personally evaluate agentic workflows yet. I just have enough second-order signals to be confident I've mastered the "Dima codes, the AI helps" workflow.

So ask me in a few months. After a well-sized vacation, when I can afford to quit the keyboard (and the mic!) for a few full days and think about all of this.

Because Claude Code and Codex and Cursor truly are good enough already β€” as amazing output-multipliers, but not yet as value-adders.

What is the value-adding product, and when? I need some peace and quiet to ponder on this. But it's just too much fun and too much impact when I keep utilizing myself as just that: an AI-assisted coder. Not particularly proud of this, but I've learned to pick my professional battles over the years, and at this very moment β€” for a couple more weeks at least β€” here I am.

Stay tuned for more updates.
πŸ‘11πŸ”₯2
TIL about:


git config --global branch.sort -committerdate


So that git branch shows the most recently committed to branches on the top.

Thanks? Thanks.
πŸ‘5πŸ”₯2
Here come the ads
Doo-doo-doo
And I say
It's not alright

Seriously, I am not the type of person who'd say "sama bad, delete chatgpt"

But given all the events unfolding, the timing is just insanely bad
πŸ”₯1😱1πŸŽ‰1
This will sounds like a retrograde cyberpunk-ish idea, but hear me out.

When I walk around with my Mac, with my MacBook, I often close it. When I close it, it starts to sleep, so it does not run any background processes.

At the same time, coding agents that are terminal-based, such as Claude Code, are getting better and better. You could say I may well run Claude Code on some server of mine and just reconnect to it and use a tmux screen session, which is true, but then I don't have latency when I'm typing β€” and I don't like it. At the same time, the requirements for Claude Code itself are quite low. It's just talking to the model and changing files.

For a while I contemplated the idea of having a screenless, keyboardless server in my backpack, like a small mini PC that would keep doing this work. I never convinced myself it's worth it, although I might, once we have local models that are fast enough and do not consume too much power.

For now, I looked into whether I could use a mobile device that's always with me as the host for this Claude Code terminal that is always up and running.

Turns out, on iPhone that's pretty bad. Its shell is only a simulation and it also has background sleep mode. But! On Android, Termux is a good app that has wake lock support, so it will not go to sleep in the background.

I could configure my work setup so that my Android β€” phone or tabler β€” that is with me even though it's not taken out, just keeps running Claude Code. It doesn't need to build the code. It doesn't need to be powerful enough to run, although some simple unit tests if it's Rust or Python are perfectly doable. My flow then is: I open the laptop, I have the session to this device, and it's good to go.

I've done most of it before as an experiment, just without the coding agent.

The next step I'm thinking of taking, as a weekend project maybe, is to configure it such that this perpetual session with that device is in my browser. I literally only need to have the browser open, and some of its tabs are my mobile device session keeping Claude Code running β€” or multiple mobile device sessions running Claude Code independently. I can do whatever I want with them, and if I close the laptop, it just keeps working.

The more I think about it, the more I feel like it's an interesting idea to try out β€” and the more I think about it, yeah, I kind of want to make it happen.

As a bonus, if that Android device is a tablet, and if I also have my Bluetooth keyboard with me, and if that tablet has ssh keys to some of my servers β€” as a non-prod user, of course β€” I might as well code something up in a coffee shop right from this tablet, without even the need to take out the compute per se. But this is definitely too nerdy β€” every time I've tried this before, the counter-argument of "why don't you just take out the laptop" is always solid enough.
Parkinson's Law is alive and well: the human mind fills available cognitive capacity the same way work expands to fill time. Give yourself more headroom, and something new rushes in to claim it.

In my daily software engineering routine, I'm observing this vividly with git rebases. I used to avoid them β€” too much friction, too much mental overhead to slice a messy diff into logical commits. But now, with AI handling the mechanical part, I do rebases constantly.

Not because rebases suddenly became more valuable, but because they β€” seemingly! β€” stopped costing me much.

Which raises the actual question: were they worth it before? Probably not at that price. Are they worth it now? Maybe β€” cleaner history helps future reviewers, helps future AI agents parse logical changes. But honestly I'm still not sure the ROI is there. Might be smarter to just keep changes small, do a git reset main, ask the AI to group the diff into commits, review after.

The rebase thing is just one example. The deeper pattern is that we're wired to fill our lives with interesting, complex problems. Freed from one cognitive load, we immediately pick up another. That's not obviously good for long-term mental health. Sometimes the right move is to not reclaim that capacity β€” to let it stay empty.

The wisdom I'm distilling for myself here: just because you can do something nicer with better tools doesn't mean you have to. "This, too, would do" is likely one of the best antidotes to burnout.

Our wet brains need to learn when to limit themselves. Otherwise we'll hit an epidemic of burnout in single-digit months, and demand for "touch the grass" workshops for seasoned veterans will skyrocket.
πŸ‘2❀1
So I'm reading this thread and can't decide what amuses me more.

https://github.com/chardet/chardet/issues/327

Of course we're living through the times where a lot of hard work can be re-done by applying AI credits to "re-solve" the problem "from scratch". And of course it would not really be "from scratch" because the original, open, code was used in that AI's training process.

However, of course, if someone paid a team of developers to lock themselves into a room for a month and produce the artefact that does a logically simliar thing, as long as those developers were not copy-pasting the very lines from the open source repo, we should be all fine.

What concerns me is that we're looking at a project with decades-long history, and it had under 1000 commits. I downloaded and looked at the repo. The first commit dates 2011.

That's just far too small of a number. In other words, it's a small auxiliary project. But somehow no one is mentioning this.

And the conversation goes into the concept that API semantics are subject to intellectual property rules. And this, to my taste, is shaky grounds.

Because the solution of rewriting the module from scratch and changing its APIs and swithing other modules to use this new module to use its new APIs β€” this should be unconditionally legal to my taste!

Because otherwise we should literally argue that whoever invented the streering wheel, or pedals, should be receiving royalties for these discoveries.

And here I recall my own thoughts from 20+ years ago. How come some algorithm is patented? What do you mean bzip2 is deliberately made worse by replacing arithmetic coding by Huffman coding, since the former was patented back then?

Does it mean I myself am at risk by just writing and shippin code, all by myself? There were no LLMs back in 2005. Heck, there was no StackOverflow. I may know the algorithm from elsewhere, or I may well have re-invented it myself β€” that's what human brain is known for, after all: inventing things.

And then I, myself, may find myself liable β€” or at least having to defend myself to a certain degree β€” because it may turn out that the very algorithm I have [re-]invented here is someone's intellectual property.

I don't like it, but I have learned to live with this knowledge. Especially after, more than a decade later, I have looked into the history of cypherpunks, with all those hairy details of printing source code on T-shirts and in books, to prove that strong encryption algorithms are not something that can be legitimately"controlled".

What a bizarre world we are living in, after all.
Crazy idea of the day.

If post-quantum fears are real and the days of encryption is truly over, plus all the deep fakes are indistinguishable from authentic content, we'd live in a very interesting time.

Imagine having to carry golden coins with you, with everybody in the family knowing how to verify their authenticity.

And then to get a large distributed project done we literally have to send someone overseas, since faking a physical person is still impossible.

For truly important project we send over several people. And re-invent erasure coding.

I'm not saying I want to live in such a world. Hell no.

But I can imagine a dystopian world with technoligy β€” i.e. no private communication, every means of cryptography outlawed, government backdoors omnipresent β€” from which I personally would rather escape to a post-digital world.

Would be fun if people my age and my belief systems would invent a synergy of Bali hacking vibes times Amish physical sustainability. Such a community would also outbreed the status quo by the way. And I bet they'd have organic food and amazing coffee.
πŸ‘1
I've been converging on a realization: software engineers are still very much needed for the foreseeable future. Let me explain why, and why I think the current AI hype cycle is no different from every previous one.

First, of course, there will be a specialized, narrow set of skills that lets people build AI-assisted software without touching the code much β€” guidelines and best practices that make it possible to maintain a project by AI alone for more than a couple of weeks, with proper storylines, tests, and compatibility layers between components.

The code would get messier. It would accumulate repetition. As of today's models, it probably wouldn't be secure enough β€” though this is changing quickly. But for a small-ish project, it would be maintainable.

Being a fan of the Unix philosophy β€” small, targeted tools that do one thing well β€” I think this approach may actually fly. Quick detour: if the OS kernel is stripped down to its core and you have a compiler, tools like binutils could in principle be AI-built on top of well-documented syscalls. Lightweight, correct, pleasant to use β€” and never touched by human hands.

However, whether you like it or not, money is concentrated in large-scale, enterprise-grade products. And those require exactly what AI still struggles with: long-term context maintenance. Acquiring quality context that is well internalized in operational memory is everything. Perhaps better code annotation tools and new reasoning techniques will produce a leap forward here β€” but so far, I wouldn't bet on it.

For prototyping, AI wins. For refactoring: unclear. I've tried a couple of times to hack something up with AI first β€” get a working prototype, extract the story, then refactor cleanly. Does it help the overall development process? Inconclusive. Once you have the prototype, it's not clear it's of much use beyond helping you visualize what it should look like and surface imperfections in the original approach. That's about it.

Tests help, sure β€” though even that is less clear-cut than it sounds. A well-defined set of acceptance criteria in plain English may already be competitive with a suite of unit tests. We might port tests conceptually rather than literally: summarize the intent, then ask an LLM to reconstruct the test in spirit, not character-for-character.

So we're back to the same conclusion. AI helps build prototypes and set direction. There will be a narrow niche of engineers who can scale what's buildable without touching code β€” from trivial to somewhat less trivial. That I believe.

And yet: most code today lives inside large companies, with complex business logic and long sales cycles. I'd prefer these companies to transform or die out β€” I like lightweight, fast, simple experiences. But this machine will take a decade to turn, at minimum. If it turns at all.

Much like large companies have crowded out open source by offering better end-user products, we may see the same happen with indie projects. People will keep doing business with the big players, tolerating the clumsiness, because for what they pay they get predictable quality. Think of OpenOffice β€” it never competed with the giants, while those giants replicated its functionality many times over. The same dynamic may play out for AI-built SaaS.

So if you're a good software engineer, your job is safe. We will need more people who can apply sustained intelligence over extended periods to legacy-heavy codebases β€” people who can articulate which changes can be made quickly, which require careful planning, and which are too dangerous to attempt without a full redesign.

"Programmer" once meant someone who flipped switches or punched holes in cards. That changed long ago, and the industry didn't suffer. We're going through the same transformation now β€” faster, but not dramatically so. We've seen a long series of tools each promising 10x productivity gains. Ruby on Rails is a fine example. At the end of the day, every other approach proved at least as effective, and the industry didn't move much faster in the long run.
πŸ‘4
OAuth2. Third time in ten years. Facepalming again.

We have SSH with authorized_keys. I'm young enough to say it's been here since the beginning of time. We all know how good it is and how to deal with it.

We have Ethereum signatures. Literally one fixed-length 0x1234...abcdef private key to rule them all and never give away. Literally one public key, perfectly derivable from the private one, to show publicly.

Why don't we have widespread adoption of this for auth?

The public key is published β€” literally into a DNS record if you wish, in which case TLS isn't even needed. Or if you prefer not to depend on DNS, use ENS β€” it's as good as good old DNS, plus more reliable and far more difficult to attack due to its decentralization.

The authentication handshake is just verifying one signature.

Because the Ethereum ecosystem is so mature, the protocol is literally as secure as it gets by design. The signee is just whoever can prove possession of a private key by signing requests. Some users will tie it to their Google or GitHub or Telegram account. Some will use an air-gapped device to exchange QR codes back and forth. Some will use a hardware wallet. Some will even use multisig with trustees.

None of this is the concern of the service that needs auth. Zero. The service just needs to know how to verify 1 (ONE) signature of 1 (ONE) type. That's it.

It almost feels like some forces deliberately want people to keep being afraid of how reliable Ethereum signatures are. Which is kind of weird β€” because it's the same algorithm most SSH keys use these days. And DevOps and SREs everywhere are using SSH keys to access the very heart of the most valuable production systems in the world.

All Amazon and Google and Microsoft and Oracle servers are protected by the very same ED25519 curve. We trust it. We cherish it. Verification takes under a millisecond.

Except for end user auth. There we make users wait seconds. For literally no good reason β€” except design-by-committee OAuth2 and JWT.
πŸ‘3
For the record, I hear this take loud and clear, but I still disagree.

AI models need to talk to one another. And AI models need to talk to services that users β€” human users β€” tell them to talk to.

And AI models are really good with text these days. They are also reasonably good with semi-structured data, but text is where they truly excel.

Sergei Burkov said it many years ago that English may well become the de facto standard for how disparate computer systems talk to one another. And I think he was spot on.

I'm not a fanboy of Claude, although I do respect Anthropic a lot. The thing is, the AI chat is the new Google search bar. And we, humans, need a native way to tie other services to our chat sessions with AI.

Claude Desktop is really good at it. ChatGPT is okay (you have to turn on Developer Mode, though).

I also found Witsy (closed-source) and LibreChat (open source), which are good at just that:

β€’ Offering the AI chat session.
β€’ With proper support for custom external tools.
β€’ So that these external tools are defined well by the human user when it comes to permissions, etc.
β€’ And all of the above is powered by text, i.e., prompts.

So the tools just "declare," in plain English (or in any human language, really), what they can do. And the "human-level OS," personified in Claude or whatnot, is all the human operator sees.

Screw the context window. Most users don't care about it for mundane tasks.

What people do care about is that if it needs to know how to turn on the heater in my guest bedroom, I can just ask in any chat. Or that I can schedule some offline appointment from a chat session.

And MCP is, "unfortunately," nearly perfect for just this task. So, whether we like it or not, it will probably stick around for a while.

Maybe part of the reason Garry is not happy with MCP is that it's largely outside YC's ecosystem. But this is total speculation on my part.

My TL;DR is that I think MCP is a poorly designed beast. But it solves the right problem, and it's not the bloated, design-by-committee piece of crap β€” at least not yet. So, personally, I'm delighted I have mastered how to use MCP, and I will keep leveraging this knowledge.
πŸ”₯2
Good "intro" (pun intended) to how bad our security already is. And this is even without OpenClaw installed on every other laptop.

https://www.youtube.com/watch?v=ZrD9MC_BXGk

Also, litellm==1.82.7 and litellm==1.82.8 on PyPI have a vulnerability in their litellm_init.pth file. That's an explicit attack, simliar to xz. DO CHECK FOR THIS VERSION IN YOUR `uv.lock` NOW!

Hat tip to the Temporal Slack where I saw this first. You guys rock.
In all seriousness, with modern-day exploits and supply chain attacks, using a) two-factor, b) a passkey, and c) an external signing device is probably the correct solution.

Weird how Github grants ssh keys more permissions than the Web login.

I'd love to have to confirm every pushed commit by tapping something on my mobile phone, or a ubikey, or at least a passkey in the browser.

Or maybe, just maybe, the noble Web3 crew will see the renaissance of their field, since an immutable ledger with fine-grained controls to one's key is something those folks have truly mastered.

Why not just disable the AWS console for production environments except from cleanrooms, and use muiltisig "at least three of five should sign" for Terraform- or API-based configuration changes, with every single action journaled forever?

That's the spirit to my taste. Arguably, for the past 10+ years it does not land well with the field, since "move fast and break things" dominates the mindset.

But I can see the light at the end of the tunnel if just enough things do break. Fast.

At which point signing every request and guaranteeing on the protocol level that no action was taken unless we can trace it to the specific set of approvers β€” I, for one, would embrace such a design.

And, to reiterate: the Web3 community has ALL the necessary bits and pieces for a while now. Publishing some signature to a public ledger costs zero point zero zero something US cents. Blockchain listeners are free when the desired throughput is low.

If an org wants to have its commits and production changes signed in an immutable, perpertual way, this is under one day to set up. Including rolling keys that never last for more than 15 minutes unless refreshed explicitly, the very act of which is also journaled on-chain.

We literally have everything. And we were/are literally ignored in the recent years, because in the minds of pundits blockchain is still NFTs and ICOs all the way down. While in reality the technology to take Visa and MasterCard out of business because any fee above zero point zero zero something cents is just way too high.

Maybe I'm daydreaming, but the tide may well turn this time. And then it might be avalanche, since the first mover's advantage would be too high and it will gain momentum before other players figure out how to best respond.
This is worthy of a post to share in English.

Three Harms of Russian Literature

The first harm was noted by Rozanov: for an entire century, Russian literature mocked and humiliated the very people who form the backbone of a normal society β€” the civil servant, the officer, the priest, the entrepreneur, the merchant β€” and, in general, the bourgeois, the solid, respectable citizen.

The second harm was observed by Turgenev, when he spoke of Dostoevsky's "inverted clichΓ©s": the thief is invariably honorable, the murderer a walking conscience, the drunkard and libertine a philosopher, the prostitute a great soul, the idiot the wisest of all.

The third harm β€” wrote Tyutchev β€” is the constant, stubborn conviction of everyone, and the self-persuasion, that we are special. That no law is written for us: neither European, nor Slavic, nor Christian, nor β€” God forbid β€” any law common to all people, such as international law. Why? Because we are just like that β€” unique, apart, like no one else in the world.

Russian literature long nurtured this deep-seated adolescent complex. It nurtured and nurtured it β€” and at last, nurtured it to fruition.

[ This is a direct translation of a post, link in the comment. Claude is really good at translating btw. ]

And my immediate comment to this in a chat with a friend, also reverse-translated from Russian, also by Claude:

It always grates on me when people treat Russian literature as some kind of supreme treasure. Sure, it's a treasure β€” but the planet has many treasures. Praising the Hermitage having never been to the Louvre, or Angkor Wat for that matter, misses quite a bit of the very point, to my taste.
πŸ‘3
Another thing I will definitely do once I have time is a chain of agents re-writing test cases from scratch, incrementally, with no feedback loops whatsoever.

Layer one: clean English description of test cases. Feed this, plus the autogenerated API spec, to the agent.

Layer two: take detailed descriptions produced by the first agent, convert them into code.

Layer three: confirm the code matches the English text in spirit. Remove any and all uncertainty. Repeat a small number of times until there's no ambiguity.

Layer four: run the tests. Only at this point. With no "back-propagation" of errors whatsoever.

LLMs are non-deterministic by nature; at least the cloud ones. So, sure, sometimes this test will fail. But I'm happy to burn some tokens every night to run this test process ten times from scratch with different random seeds and then look at the result.

Point is, without iterating on fixing the error, no malicious / erroneous / cheeky detail will go unnoticed. The API will have to work correctly according to the very description of the original test case, spec'd in English.

And if it fails more than once in ~ten times then perhaps some documentation β€” of the product in general or of API endpoints in particular β€” should be updated.

And at this point I am actually in favor of using the AI. Burn a bit more tokens every other night to suggest improvements to the documentation that makes AI-re-written end to end tests pass on the first try 90+% of the time.

This literally looks like something I can vibe-code in a few hours. But these weeks are crazy in terms of cognitive load on my side. So: some time soon!
πŸ€”5πŸ‘2