Offshore – Telegram

Offshore

8 subscribers

35.3K photos

6.07K videos

2 files

47.7K links

https://x.com/i/lists/1669153613199835138?t=R0mCicxs7zfJE_yOAek4gQ&s=09

Download Telegram

About

Blog

Apps

Platform

The Transcript
ON Semi CEO: "We...met expectations in Q4 25 as we saw increasing signs of stabilization in our key markets.-.."

CFO: "With our major investment cycle behind us and new technologies ramping, we continue to strengthen our financial foundation

$ON: +4% AH https://t.co/xtyMMpToCk
tweet

1 view22:38

1 view22:38

The Transcript
Upwork misses on earnings

CEO "2025 marked the year we rebuilt Upwork for the age of human-plus-AI collaboration, turning global change into a definitive tailwind..."

CFO: "We expect 2026 to be a year of accelerating growth."

$UPWK: -25% AH https://t.co/qEBKqKf8Qx
tweet

1 view22:38

1 view22:38

The Transcript
$DUOL



Duolingo saw a 35% increase in Spanish learners last night.

Is this what a one-night stand feels like? https://t.co/acf0DZczhh

- Duolingo
tweet

1 view22:38

1 view22:48

The Transcript
ON Semi CEO: "We...met expectations in Q4 25 as we saw increasing signs of stabilization in our key markets.-.."

CFO: "With our major investment cycle behind us and new technologies ramping, we continue to strengthen our financial foundation

$ON: -4% AH https://t.co/Nnc2LOzoM5
tweet

1 view22:48

Dimitry Nakhla | Babylon Capital®
RT @Fred_Abyss: @DimitryNakhla Man you are probably Top3 of all X accounts anyway but lately you have been REALLY cooking with the quality. One banger tweet after another. Just WOW.
tweet

1 view22:48

1 view22:48

1 view22:48

God of Prompt
RT @godofprompt: 🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time.

This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great.

The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied.

Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation).

Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints.

Across all three, the same failure patterns keep showing up.

> First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process.

> Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply.

> Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing.

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.

Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience.

Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable.

The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance.

But they’re very clear that none of these are silver bullets yet.

The takeaway isn’t that LLMs can’t reason.

It’s more uncomfortable than that.

LLMs reason just enough to sound convincing, but not enough to be reliable.

And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.

That’s the real warning shot in this paper.

Paper: Large Language Model Reasoning Failures
tweet

1 view22:48

1 view23:14

The Transcript
Tuesday's earnings

Before Open: $KO $SPOT $DDOG $CVS $BP $RACE $SPGI $HAS $DUK $MAR $OSCR $AZN $FISV
After Close: $HOOD $ALAB $F $LYFT $NET $AEIS $EW $UPST $ZG $GXO $AIG $GILD $MAT https://t.co/IdG6NoIR5T
tweet

1 view23:14

Illiquid
$ICHR: “Early indications of customer demand entering the year provide us with a first-quarter revenue outlook reflecting solidly upward momentum from Q4’s trough levels. At this time, we expect this upward trend to continue into the second half of the year."
tweet

1 view23:14

This media is not supported in your browser

VIEW IN TELEGRAM

1 view23:14

Brady Long
RT @unusual_whales: OpenAI's Greg Brockman says this ad, originally leaked on Reddit, is "fake news."

Company officials told Business Insider the ad wasn't real and "Not OpenAI, not connected to us at all." https://t.co/FebxbLas9G
tweet

1 view23:14

1 view23:14

DAIR.AI
Great paper on improving efficieny of reasoning models.

Long chain-of-thought reasoning is powerful but fundamentally limited.

The longer a model reasons, the more expensive it gets. It's well know that self-attention scales quadratically with sequence length, context windows impose hard ceilings, and critical early information fades as traces grow longer.

But what if a model could reason indefinitely without hitting any of those walls?

This new research introduces InftyThink+, an RL framework that teaches models to break reasoning into iterative rounds connected by self-generated summaries. Instead of one massive chain-of-thought, the model reasons in bounded segments, compresses its progress into a summary, and continues fresh.

Iterative reasoning only works when the model makes good decisions about when to summarize, what to preserve, and how to continue. Previous methods used supervised learning or fixed heuristics to handle these decisions. InftyThink+ treats them as a sequential decision problem optimized end-to-end with trajectory-level RL.

Training proceeds in two stages. A supervised cold-start teaches the basic iterative format. Then RL optimizes the full trajectory, learning strategic summarization and continuation policies through reward signals.

The results on DeepSeek-R1-Distill-Qwen-1.5B: InftyThink+ improves accuracy on AIME24 by 21 percentage points, outperforming conventional long chain-of-thought RL by an additional 9 points. On the out-of-distribution GPQA benchmark, it gains 5 points over the baseline and 4 points over vanilla RL. On AIME25, inference latency drops by 32.8% compared to standard reasoning. RL training itself speeds up by 18.2%.

A key finding: RL doesn't just make the model reason longer. It teaches the model to generate better summaries. When researchers replaced RL-trained summaries with external ones from a separate LLM, performance dropped. After RL training, the model's own summaries become tightly coupled with its downstream reasoning in ways external summarizers can't replicate.

The approach also decouples reasoning depth from wall-clock time. After RL, InftyThink+ extends reasoning depth while keeping latency nearly flat on several benchmarks. Standard reasoning sees latency balloon as depth increases.

Reasoning models today are bounded by context windows and crushed by quadratic attention costs. InftyThink+ removes both constraints by teaching models to reason in compressed iterations, enabling theoretically infinite-horizon reasoning with bounded compute per step.

Paper: https://t.co/VWM71BzXUf

Learn to build effective AI Agents in our academy: https://t.co/LRnpZN7L4c
tweet

1 view23:14

Media is too big

VIEW IN TELEGRAM

1 view23:55

Moon Dev
openclaw for tradingview is one of the biggest unlocks ive ever seen for traders

tradingview (and trading) will never be the same https://t.co/JVtCrJQAgG
tweet

1 view23:55