Offshore

1 view18:08

The Transcript
$SPOT Co-CEO says senior engineers at Spotify Technology have largely stopped writing code themselves since December 2025 when Claude's Opus 4.5 came out:

"So it is a big change. It is real and it's happening fast" https://t.co/6o7rTlAkRO
tweet

1 view18:08

Offshore

1 view18:08

Offshore

Photo

God of Prompt
RT @godofprompt: 🚨 I just read Google DeepMind’s new paper called "Intelligent AI Delegation."

And it quietly exposes why 99% of AI agents will fail in the real world.

Here’s the paper:

Most “AI agents” today aren’t agents.

They’re glorified task runners.

You give them a goal.
They break it into steps.
They call tools.
They return an output.

That’s not delegation.

That’s automation with better marketing.

Google’s paper makes a brutal point:

Delegation isn’t just splitting tasks.

It’s transferring authority, responsibility, accountability, and trust across agents dynamically.

And almost no current system does this.

Here’s what they argue real delegation actually requires:

1. Dynamic assessment

Before assigning a task, an agent must evaluate:

- Capability
- Resource availability
- Risk
- Cost
- Verifiability
- Reversibility

Not just “who has the tool?”

But: “Who should be trusted with this specific task under these constraints?”

That’s a massive shift.

2. Adaptive execution

If the delegatee underperforms…

You don’t wait for failure.

You reassign mid-execution.

Switch agents.
Escalate to a human.
Restructure the task graph.

Current agents are brittle.
Real agents need recovery logic.

3. Structural transparency

Today’s AI-to-AI delegation is opaque.

If something fails, you don’t know:

- Was it incompetence?
- Misalignment?
- Bad decomposition?
- Malicious behavior?
- Tool failure?

The paper proposes enforced auditability and verifiable completion.

In other words:

Agents must prove what they did.

Not just say they did it.

4. Trust calibration

This is huge.

Humans routinely over-trust AI.
AI agents may over-trust other agents.
Both are dangerous.

Delegation must align trust with actual capability.

Too much trust = catastrophe.
Too little trust = wasted potential.

5. Systemic resilience

This is the part nobody is talking about.

If every agent delegates to the same high-performing model…

You create a monoculture.

One failure.
System-wide collapse.

Efficiency without redundancy = fragility.

Google explicitly warns about cascading failures in agentic economies.

That’s not sci-fi.
That’s distributed systems reality.

The paper also breaks down:

- Principal-agent problems in AI
- Authority gradients between agents
- “Zones of indifference” (agents complying without critical thinking)
- Transaction cost economics for AI markets
- Game-theoretic coordination
- Hybrid human-AI delegation models

This isn’t a toy-agent paper.

It’s an operating system blueprint for the “agentic web.”

The core idea:

Delegation must be a protocol.
Not a prompt.

Right now, most “multi-agent systems” are:

Agent A → Agent B → Agent C

With zero formal responsibility structure.

In a real delegation framework:

• Roles are defined
• Permissions are bounded
• Verification is required
• Monitoring is enforced
• Market coordination is decentralized
• Failures are attributable

That’s enterprise-grade infrastructure.

And we don’t have it yet.

The most important line in the paper?

Automation is not just about what AI can do.

It’s about what AI *should* do.

That distinction will decide:

- which startups survive
- which enterprises scale
- which ai deployments implode

We’re entering the phase where:

Prompt engineering → Agent engineering → Delegation engineering.

The companies that figure out intelligent delegation protocols first will build:

• Autonomous economic systems
• Scalable AI marketplaces
• Human-AI hybrid orgs
• Resilient agent swarms

Everyone else will ship brittle demos.

This paper isn’t flashy.

No benchmarks.
No model release.
No hype numbers.

Just a 42-page warning:

If we don’t build adaptive, accountable delegation frameworks…

The agentic web collapses under its own complexity.

And honestly?

They’re probably right. tweet

1 view18:08

Offshore

Moon Dev
im taking me + opus over openclaw + opus anyday
tweet

1 view18:18

Offshore

Javier Blas
RT @LiveSquawk: US Pres Trump: I Think Negotiations With Iran Will Be Successful
- If Iran Talks Unsuccessful, It'll Be Bad Day For Iran
- Relationship With Venezuela Is As Good As Possible
- We'll Need Aircraft Carrier If No Deal With Iran
- Second Carrier Just Arrived In Persian Gulf
- Looking At A Prime Minister For Iraq
- Russia Wants A Deal, Zelenskiy Has To Get Moving
- We’re Negotiating Right Now For Greenland
tweet

1 view18:18

Offshore

1 view18:38

Offshore

Photo

God of Prompt
I should charge $99 for this.

But I'm giving away our Claude Mastery Guide for free.
We just updated it with a full Claude Skills section, the feature most people still don't know exists.

Inside:
→ 30 prompt engineering principles
→ 10+ mega-prompts ready to copy
→ Mini-course from beginner to advanced
→ How to build Skills that make Claude remember your workflows forever
→ Glossary + strategic use cases

This turns Claude from a chatbot into your actual work system.

Comment "Claude" and I'll DM it to you.
(Must be following me to receive it)
tweet

1 view18:38

1 view18:38

1 view18:38

The Few Bets That Matter
$TMDX should be trading closer to $ISRG

Both are in the healthcare domain with a product years in advance on competition, growign market shares and importance within a healthcare system.

Comparable growth profiles, although $ISRG is less explosive meaning no decline, but a stable growth.

Comparable margins, although again $ISRG is slightly superior due to being optimized for probitability now, something $TMDX is working on with great results, as the lattest quarters show clearly.

There are small difference which explain why $ISRG has such a premium, and it deserves it. But the market will need to realize that $TMDX execution risks which it is pricing are only a matter of delay. Not risk.

In a few quarters, $TMDX will deserve equivalent premium.
tweet

1 view18:38

1 view18:48

1 view18:48

1 view18:48

God of Prompt
RT @godofprompt: How to use LLMs for competitive intelligence (scraping, analysis, reporting): https://t.co/xlGOSpRQPy
tweet

1 view18:48

Offshore

1 view18:48

Offshore

Photo

God of Prompt
RT @alex_prompter: 🚨 Anthropic just dropped a complete guide on how to build Skills like a pro.

And if you’re building AI agents, this is required reading.

It’s a 30+ page deep dive called The Complete Guide to Building Skills for Claude and it quietly shifts the conversation from “prompt engineering” to real execution design.

Here’s the big idea:

A Skill isn’t just a prompt.
It’s a structured system.

You package instructions inside a https://t.co/NFHAROW040 file, optionally add scripts, references, and assets, and teach Claude a repeatable workflow once instead of re-explaining it every chat.

But the real unlock is something they call progressive disclosure.

Instead of dumping everything into context:

• A lightweight YAML frontmatter tells Claude when to use the skill
• Full instructions load only when relevant
• Extra files are accessed only if needed

Less context bloat. More precision.

They also introduce a powerful analogy:

MCP gives Claude the kitchen.
Skills give it the recipe.

Without skills: users connect tools and don’t know what to do next.
With skills: workflows trigger automatically, best practices are embedded, API calls become consistent.

They outline 3 major patterns:

1) Document & asset creation
2) Workflow automation
3) MCP enhancement

And they emphasize something most builders ignore: testing.

Trigger accuracy.
Tool call efficiency.
Failure rate.
Token usage.

This isn’t about clever wording.

It’s about designing an execution layer on top of LLMs.

Skills work across https://t.co/6tb6ixQpca, Claude Code, and the API. Build once, deploy everywhere.

The era of “just write a better prompt” is ending.

Anthropic just handed everyone a blueprint for turning chat into infrastructure.
tweet

1 view18:48

Offshore

1 view18:58

Offshore

Photo

DAIR.AI
RT @omarsar0: MiniMax just dropped M2.5, a top-tier open-weight model.

Already competitive with models like Opus 4.6.

The speed at which open-weight models are improving is wild.

It's fast and surprisingly fluent at generating and operating Word, Excel, and PowerPoint files.

But the bigger deal for me is using M2.5 for long-horizon agents.

@MiniMax_AI's M2.5 is one of the first open models I've seen that show serious signs of improvement on long-running tasks.

M2.5 was trained with RL across hundreds of thousands of complex real-world environments.

The model learned to optimize its actions through planning, which is a meaningful difference from models that are merely prompted to plan.

When your agent runs for hours across multi-step tasks, a model that plans natively will drift less and waste fewer tokens.

You can try it on coding tasks, but I think the bigger unlock is using it as the backbone for an agent that operates across your full workspace (code, docs, spreadsheets, browser)

Benchmark results are nice too:

80.2% on SWE-Bench Verified. 76.3% on BrowseComp. 76.8% on BFCL for agentic tool-calling. 51.3% on Multi-SWE-Bench.

All of these map directly to what long-running agents need, which are coding, searching, tool use, and multi-step execution.

And here is what makes it practical:

$1 per hour at 100 tps!

With only 10B activated parameters, it's the smallest Tier-1 model, which makes self-hosting real. 37% faster execution times on complex tasks.

Find all the resources below:
tweet

1 view18:58

Offshore

The Transcript
$SPOT CO-CEO Spotify positioned as structural AI beneficiary with aligned ads + subscription model.

“If you look at the AI companies, the business model is subscription and increasingly ads. That's what we excel at. So we have the right business model.”
tweet

1 view19:08

About

Blog

Apps

Platform