Offshore
Photo
God of Prompt
🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time.

This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great.

The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied.

Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation).

Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints.

Across all three, the same failure patterns keep showing up.

> First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process.

> Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply.

> Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing.

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.

Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience.

Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable.

The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance.

But they’re very clear that none of these are silver bullets yet.

The takeaway isn’t that LLMs can’t reason.

It’s more uncomfortable than that.

LLMs reason just enough to sound convincing, but not enough to be reliable.

And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.

That’s the real warning shot in this paper.

Paper: Large Language Model Reasoning Failures
tweet
Michael Fritzell (Asian Century Stocks)
RT @anonymous3nibrv: Stronger yen could be the catalyst.
It's be hard not to hold 'at least some' growth
as people de-risk from weak yen beneficiaries.
I'd be comfortable half-kellying 6562 Genie.
Will study a bit and do a writeup tomorrow morning.

Japan really is a two-tier market… I feel more comfortable with the growth index at this point
- Michael Fritzell (Asian Century Stocks)
tweet
Offshore
Photo
Javier Blas
CHART OF THE DAY: Vitol, the world's largest independent oil trader, shifts its view on global peak oil demand: higher and later.

Now, Vitol sees a peak by the "mid" 2030s (previously it anticipated "early" 2030s). Peak demand seen at ~112m b/d (previously "almost" ~110m b/d). https://t.co/67qVesRu2r
tweet
Jukan
SK Chairman Choi Tae-won and Jensen Huang Hold ‘Chicken & Beer Summit’… Ultra-Close Ties on HBM4

SK Group Chairman Choi Tae-won, currently visiting the United States, met with Nvidia CEO Jensen Huang for a chicken-and-beer gathering in Silicon Valley. Observers say the discussion likely went beyond negotiating supply volumes for sixth-generation High Bandwidth Memory (HBM4) to include strategic cooperation on building next-generation artificial intelligence (AI) data centers.

According to U.S. semiconductor industry sources on February 8 (local time), Chairman Choi met with CEO Huang on February 5 at 99 Chicken, a Korean fried chicken restaurant in Santa Clara, California. Also present were Choi’s second daughter, Choi Min-jung, CEO of Integral Health, and Huang’s daughter, Madison Huang, Senior Director of Nvidia’s Robotics Division.

Chairman Choi has been in the U.S. since February 3 for a series of meetings with big tech companies including Nvidia and Meta. The two sides are reported to have exchanged broad views on topics including SoCamm (server-use low-power DRAM modules)—the next-generation server memory module expected to succeed HBM—and NAND flash memory supply. A semiconductor industry insider interpreted the meeting as follows: “SOCAMM is the next battleground that will reshape the power architecture of AI servers. This is a signal that SK Group intends to leverage HBM as a powerful lever to make a full-scale entry into the next-generation AI infrastructure market.”

Chairman Choi’s plan to establish an AI investment entity tentatively called “AI Company” in the U.S. is also closely linked to these efforts. The new entity plans to combine data center operations with AI startup incubation and investment, and is set to launch a data center pilot project in North America within the year.

SK and Nvidia: ‘Close Cooperation’ Extending Beyond HBM4 to AI Solutions

Nvidia’s next AI accelerator coming in H2 — SK secures 55% of the HBM4 to be installed

Every time SK Group Chairman Choi Tae-won meets Nvidia CEO Jensen Huang, major shifts follow in the global semiconductor market. That was the case in May 2021 as well. Shortly after the two titans met at Nvidia’s U.S. headquarters, the two companies and Taiwan foundry giant TSMC joined forces to form an “AI Triple Alliance.”

That is why the global semiconductor industry is closely watching the “chicken summit” between the two leaders on February 5 in Silicon Valley. Industry observers predict the meeting will not only accelerate SK Hynix’s supply of HBM4 to Nvidia, but also bring closer to reality the supply of enterprise SSDs (eSSDs) and AI data center solutions through SK’s Silicon Valley subsidiary “AI Company.”

SK Hynix Executives in Attendance
According to semiconductor industry sources on February 9, key executives from SK Group’s memory semiconductor business, including SK Hynix CEO Kwak Noh-jung, attended the meeting. Observers believe discussions covered the supply of HBM, a core component of Nvidia’s AI accelerators. Nvidia’s next-generation AI accelerator “Vera Rubin,” set for release in the second half of this year, will use HBM4 with a capacity of 288 gigabytes (GB) per unit.

Given that HBM production (4 months) and TSMC’s packaging (approximately 2–3 months) take 6–7 months in total, the industry expects SK Hynix—which has the industry’s largest HBM production capacity (monthly wafer input of 150,000 as of 2025)—to begin full-scale mass production soon.

SK: “On-Schedule HBM4 Supply”
This year’s HBM market landscape has shifted from last year, when SK Hynix virtually monopolized the supply of 12-layer HBM3E for Nvidia. Samsung Electronics passed Nvidia’s quality test for its 12-layer HBM3E product in September of last year and has now begun what is the industry’s first mass production and shipment of HBM4 this month. Samsung’s HBM4 reportedly achieves speeds of 11.7 gigabits per second (Gb/s), exceeding Nvidia’s required level of 10–11 Gb/s.

SK Hynix agreed with Nv[...]
Offshore
Jukan SK Chairman Choi Tae-won and Jensen Huang Hold ‘Chicken & Beer Summit’… Ultra-Close Ties on HBM4 SK Group Chairman Choi Tae-won, currently visiting the United States, met with Nvidia CEO Jensen Huang for a chicken-and-beer gathering in Silicon Valley.…
idia at the end of last year to supply “more than 55%” of required HBM4 volumes and is preparing for full-scale production on schedule. Notably, SK Hynix’s HBM4 is known to deliver comparable performance despite using processes that are a generation or more behind Samsung Electronics—which employs 4-nanometer foundry technology and 10nm sixth-generation (1c) DRAM—by using 12nm foundry and 1b DRAM.

Chairman Choi reportedly promised CEO Huang “supply without disruption” at the meeting. The industry also speculates that discussions touched on seventh-generation HBM (HBM4E), set to see its market open in earnest in 2027, as well as customized HBM (cHBM).

Potential for Comprehensive AI Solution Collaboration
Observers believe Chairman Choi also brought to the table cooperation plans with Nvidia extending beyond HBM to AI semiconductors, servers, and data centers. Chairman Choi envisions SK Group’s future as a “comprehensive AI solutions provider” and has decided to rename SK Hynix’s U.S.-based NAND flash subsidiary Solidigm to “AI Company,” transforming it into a dedicated entity overseeing SK Group’s AI investment and solutions business. The business scope of AI Company is expected to encompass not only SK Hynix’s AI semiconductor operations but also SK Telecom’s AI technology and solutions capabilities.

Whether SK Hynix and Nvidia will deepen their cooperation on eSSDs is also drawing attention. In January, Nvidia announced that it would apply a new memory solution called “ICMS” to Vera Rubin. A full Vera Rubin set will contain 9,600 terabytes (TB) of eSSD—a 16-fold increase in demand compared to existing products. Observers also speculate that the two sides discussed plans to supply Nvidia with SK Group’s comprehensive AI solutions, spanning everything from AI data center design to semiconductor and server delivery.
tweet
Offshore
Photo
Javier Blas
COLUMN: Britrish oil giant BP should suspend its $750 million quarterly share buybacks to give incoming CEO Meg O'Neill (who arrives in April) extra financial breathing room.

@Opinion $BP
https://t.co/23vNRD4Nsw
tweet
Offshore
Photo
Brady Long
🚨 I watched a senior engineer at Anthropic build a feature in 4 hours that would've taken me 3 days.

He wasn't coding faster. He was running 8 Claude instances in parallel—each solving different parts simultaneously.

The future of coding isn't writing code. It's orchestrating AI swarms.

Here's the framework:
tweet
Offshore
Photo
Michael Fritzell (Asian Century Stocks)
RT @nishantkumar07: Updated: January hedge fund returns table, now includes Rokos, Pharo, Walleye + others. Macro popped, multistrats mostly did their steady grind https://t.co/cHF359ud8U
tweet
Offshore
Photo
Michael Fritzell (Asian Century Stocks)
RT @InvestInJapan: @imuvill is creating the "Japanese Berkshire" His company Kaihou just acquired a 31% stake in Jiban Net.
As always, I will be rooting for him!!! https://t.co/R1HTNsUUCc
tweet
Moon Dev
I feel bad for phds who literally built their whole personality around being smart

Now retards like me are lapping them with 8 Claude codes while sipping coughee
tweet
Offshore
Photo
God of Prompt
RIP "act as an expert" and basic prompting.

A former OpenAI engineer just exposed "Prompt Contract" - the internal technique that makes LLMs actually obey you.

Works on ChatGPT, Claude, Gemini, everything.

Here's how to use it right now: https://t.co/6ZDCFs5JvK
tweet
Jukan
Lately, I’ve been feeling very contradictory.

I want to show off the knowledge I have to others, but at the same time, it really irritates me when people take my knowledge and rework it or redistribute it.

Why do I feel this way?
tweet