Offshore
Photo
God of Prompt
RT @godofprompt: 🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time.

This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great.

The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied.

Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation).

Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints.

Across all three, the same failure patterns keep showing up.

> First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process.

> Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply.

> Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing.

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.

Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience.

Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable.

The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance.

But they’re very clear that none of these are silver bullets yet.

The takeaway isn’t that LLMs can’t reason.

It’s more uncomfortable than that.

LLMs reason just enough to sound convincing, but not enough to be reliable.

And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.

That’s the real warning shot in this paper.

Paper: Large Language Model Reasoning Failures
tweet
Offshore
Video
Startup Archive
Steve Jobs on his strategy for saving Apple from bankruptcy

Apple was on the verge of bankruptcy when Steve Jobs returned to the company in July of 1997. The clip below is from a CNBC interview three months later.

When asked about his strategy for turning the company around, Jobs shared the following advice:

“Somebody taught me a long time ago a very valuable lesson which is if you do the right things on the top line, the bottom line will follow. And what they meant by that was: if you get the right strategy, if you have the right people, and if you have the right culture at your company, you’ll do the right products. You’ll do the right marketing. You’ll do the right things logistically and in manufacturing and distribution. And if you do all those things right, the bottom line will follow.”

Video source: @CNBC (1997)
tweet
Offshore
Video
Brady Long
RT @thisdudelikesAI: This is revolutionary

Being able to access all of these models in the same Figma-like collab screen makes creating marketing videos as a team at least 60% easier.

@TopviewAIhq https://t.co/goKJdDCEOa

The @figma for AI Content Creation is finally here.

Meet Topview 4.0: The world’s first collaborative AI video creation board.

- Real-time Collaboration: Create, review, and iterate with your team.
- Seamless Flow: Prompt → Image → Video → Avatar in one tab.
- All Top Models: Seedance, VEO, Sora, Kling, Nano Banana & more.

Use code: LPS3TEDZ for 20% off!!!
- TopviewAI
tweet
Offshore
Video
DAIR.AI
RT @omarsar0: Agentic Video Editing

This is crazy. I just asked Claude Code to build me an entire agent-powered video editing app.

~10K lines of code.

Uses Claude Agent SDK. It's really good.

Runs locally. Highly customizable.

You can just building things. https://t.co/vgkl7BWa7n
tweet
Offshore
Video
DAIR.AI
RT @omarsar0: Agentic Video Editing

This is crazy!

I just asked Claude Code to build me an entire agent-powered video editing app.

~10K lines of code.

Uses Claude Agent SDK + Claude Opus 4.6.

It's really good.

Runs locally. Highly customizable.

You can just build things. https://t.co/P8y6F0uKZK
tweet
Offshore
Photo
The Few Bets That Matter
$TMDX received unconditional FDA approval for its heart trial a few weeks after the one for lungs.

Practically, this means both next-gen OCS for heart and lungs were approved in their current form; no upgrades required to move forward.

That’s a major green light.

The next phase consists of onboarding patients for the clinical studies. Lungs have already started, under predefined trial rules (patient count, conditions, endpoints), and hearts are about to follow.

This phase is designed to clinically prove, with real cases, that outcomes using OCS are superior to cold storage, with a clean dataset - also to be compared against other technologies even if this isn't the main objective.

Management said this approval was imminent back in January. Now we’re there.

$TMDX remains one of the most overlooked healthcare plays as the market rotates out of tech and toward safer names.
tweet
Offshore
Video
Brady Long
RT @bigaiguy: Video creation just isn't the same anymore. It's wild.

Collaborating with ppl virtually used to be a huge pain. Creatively and logistically. @TopviewAIhq is one of the first I've seen really change that.

--> Figma-like collab board + All major models in 1 place (good pricing) https://t.co/v29UDag5rw

The @figma for AI Content Creation is finally here.

Meet Topview 4.0: The world’s first collaborative AI video creation board.

- Real-time Collaboration: Create, review, and iterate with your team.
- Seamless Flow: Prompt → Image → Video → Avatar in one tab.
- All Top Models: Seedance, VEO, Sora, Kling, Nano Banana & more.

Use code: LPS3TEDZ for 20% off!!!
- TopviewAI
tweet
Offshore
Photo
The Few Bets That Matter
Few $NBIS thoughts on the market, earnings, stock reaction & closing my position.

The bull case from here stock-wise is that the market rewards higher spending, after punishing Google, Amazon & Microsoft for it.
https://t.co/aMp8Ww91zM

It could happen with a short-term narrative shift. I’m curious about what would change it, but maybe.

Company-wise, I have no issue at all. $NBIS is a great company, well managed, with a likely bright future. But I don't invest in maybes - or I try to avoid them as much as I can.

That said, I closed my position a bit in a rush, in hindsight. Not because the stock ran; as I pushed most of that cash into $ALAB - which ran even more, my returns have been better.
https://t.co/GMEAfNMLCj

But I didn't follow my system, I got impatient and a bit emotional.

My system made $NBIS a clear sell on Thursday night, combined with $GOOG earnings and the market’s negative reaction to higher spending. Didn’t smell real good.

But the sell became a hold Friday after a massive green day. And I usually know better than to sell on sentiment before the end of a week. I didn’t expect that, but my expectations shouldn’t drive decisions anyway.

Once again, I’m better off return-wise.
But that’s not the point. Following the system is.

I still believe my read of the market is the right one, but only time will confirm or deny it. I’m fine either way and wish nothing but green dildos to holders. I wouldn’t be confident medium term but who knows.

I’ll watch from the sidelines. Sometimes, it’s better not to be involved.

$NBIS Nebius reports earnings this week. The key question is how many additional GW of capacity they add to the pipeline.

Last Thursday, $IREN announced an additional 1.6GW of capacity secured in Oklahoma.

$CIFR reports at the end of the month. https://t.co/HQi8BEeFQ3
- Sam Badawi
tweet