Offshore
Video
Michael Fritzell (Asian Century Stocks)
How can I bet Kyrgyzstan or Kazakhstan ski tourism? Air Astana?

A friend: CZ, this a-hole is spreading FUD againโ€ฆ

Me:

(Video from today, in Kyrgyzstan. No AI) https://t.co/owjYTv6N58
- CZ ๐Ÿ”ถ BNB
tweet
Offshore
Video
Quiver Quantitative
We were recently asked about new congressional stock trades which caught our eye.

Viasat stock has now risen 516% since we reported on it.

Rheinmetall stock has risen 239%. https://t.co/cqRS4zPkIg
tweet
Offshore
Photo
Clark Square Capital
RT @ClarkSquareCap: Sharing a new project: the Special Situations Digest.

Check out the (free) link below. https://t.co/NT0wb21Sxl
tweet
Offshore
Photo
memenodes
How the gun in my drawer looks at me every time I get liquidated https://t.co/W6L9h1TelZ
tweet
Dimitry Nakhla | Babylon Capitalยฎ
20 Quality Compounders Return on Capital Employed (ROCE) >30% over LTM ๐Ÿ’ธ

1. $NFLX 30%
2. $TSM 30%
3. $CTAS 31%
4. $BLK 33%
5. $PM 34%
6. $VLO 35%
7. $NVR 36%
8. $V 38%
9. $KLAC 42%
10. $ASML 43%
11. $MTD 44%
12. $LRCX 46%
13. $STX 51%
14. $MA 60%
15. $IDXX 62%
16. $APP 63%
17. $AAPL 65%
18. $BKNG 68%
19. $NVDA 81%
20. $FICO 89%
___

๐˜ผ ๐™๐™ž๐™œ๐™๐™š๐™ง ๐™๐™Š๐˜พ๐™€ ๐™ง๐™–๐™ฉ๐™ž๐™ค ๐™ž๐™ฃ๐™™๐™ž๐™˜๐™–๐™ฉ๐™š๐™จ ๐™ข๐™ค๐™ง๐™š ๐™š๐™›๐™›๐™ž๐™˜๐™ž๐™š๐™ฃ๐™ฉ ๐™˜๐™–๐™ฅ๐™ž๐™ฉ๐™–๐™ก ๐™ช๐™จ๐™–๐™œ๐™š ๐Ÿ‘‡๐Ÿฝ

๐‘๐Ž๐‚๐„ = ๐Ž๐ฉ๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐๐ซ๐จ๐Ÿ๐ข๐ญ (๐„๐๐ˆ๐“) รท ๐‚๐š๐ฉ๐ข๐ญ๐š๐ฅ ๐„๐ฆ๐ฉ๐ฅ๐จ๐ฒ๐ž๐

๐Ž๐ฉ๐ž๐ซ๐š๐ญ๐ข๐ง๐  ๐๐ซ๐จ๐Ÿ๐ข๐ญ (๐„๐๐ˆ๐“) = profit before interest and taxes

๐‚๐š๐ฉ๐ข๐ญ๐š๐ฅ ๐„๐ฆ๐ฉ๐ฅ๐จ๐ฒ๐ž๐ = total capital used in the business*

*Commonly calculated as Total Assets โˆ’ Current Liabilities ๐˜–๐˜ณ Equity + Long-term Debt
___

Imagine a car wash business:

You invest $1,000,000 to build it (land, equipment, machines)

Each year, the car wash generates $200,000 in operating profit (before interest & taxes)

ROCE = $200,000 รท $1,000,000 = 20%

๐˜›๐˜ฉ๐˜ช๐˜ด ๐˜ฎ๐˜ฆ๐˜ข๐˜ฏ๐˜ด: For every dollar tied up in the business, the company generates 20 cents of operating profit per year
tweet
Offshore
Photo
God of Prompt
RT @godofprompt: ๐Ÿšจ Holy shitโ€ฆ Stanford just published the most uncomfortable paper on LLM reasoning Iโ€™ve read in a long time.

This isnโ€™t a flashy new model or a leaderboard win. Itโ€™s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say theyโ€™re doing great.

The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied.

Non-embodied reasoning is what most benchmarks test and itโ€™s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation).

Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints.

Across all three, the same failure patterns keep showing up.

> First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process.

> Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply.

> Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasnโ€™t stable to begin with; it just happened to work for that phrasing.

One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.

This is worse than being wrong, because it trains users to trust explanations that donโ€™t correspond to the actual decision process.

Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience.

Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable.

The authors donโ€™t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance.

But theyโ€™re very clear that none of these are silver bullets yet.

The takeaway isnโ€™t that LLMs canโ€™t reason.

Itโ€™s more uncomfortable than that.

LLMs reason just enough to sound convincing, but not enough to be reliable.

And unless we start measuring how models fail not just how often they succeed weโ€™ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.

Thatโ€™s the real warning shot in this paper.

Paper: Large Language Model Reasoning Failures
tweet
Dimitry Nakhla | Babylon Capitalยฎ
RT @DimitryNakhla: I donโ€™t think many investors truly appreciate how deep the moats at $SPGI and $MCO really are.

These arenโ€™t just data businesses โ€” theyโ€™re embedded gatekeepers in global capital markets, with network effects, regulatory reliance, & decades of trust that are hard to replicate.
tweet
Offshore
Photo
Benjamin Hernandez๐Ÿ˜Ž
โšก The "Electronic Giant" Choice
Recommendation: $AXTI ~$28.20

AXT Inc. is a "Buy" rated powerhouse with a $1.56B valuation. Today's +17.19% rally is backed by a massive 6.97M shares traded.

Reason calling it: High institutional turnover at $28.20 suggests a long-term bottom. https://t.co/dGsp8x98EG
tweet
Offshore
Photo
DAIR.AI
What if you could get multi-agent performance from a single model?

Multi-agent debate systems are powerful. Multiple LLMs can critique each other's reasoning, catch errors, and converge on better answers.

However, the cost scales linearly with the number of agents. Five agents means 5x the compute. Twenty agents means 20x and so on.

But the intelligence gained from debate doesn't have to stay locked behind a compute wall.

This new research introduces AgentArk, a framework that distills the reasoning capabilities of multi-agent debate into a single LLM through trajectory extraction and targeted fine-tuning.

This work addresses an important problem: multi-agent systems are effective but expensive at inference time. AgentArk moves that cost to training time, letting a single model carry the reasoning depth of an entire agent team.

The key idea: run multi-agent debate offline to generate high-quality reasoning traces, then train a smaller model to internalize those patterns.

Five agents debate, one student learns.

AgentArk tests three distillation methods. RSFT uses supervised fine-tuning on correct trajectories. DA filters for diverse reasoning paths. PAD, their strongest method, preserves the full structure of multi-agent deliberation, capturing how agents verify intermediate steps and localize errors.

The results across 120 experiments:

> PAD achieves a 4.8% average gain over single-agent baselines, with in-domain improvements reaching up to 30%. On reasoning quality metrics,

> PAD scores highest in intermediate verification (4.07 vs 2.41 baseline) and reasoning coherence (3.96 vs 1.88 baseline).

>The distilled models also transfer: trained on math, they improve on TruthfulQA with ROUGE-L jumping from 0.613 to 0.657.

Scaling from Qwen3-32B teachers down to Qwen3-0.6B students, the framework holds up. Even sub-billion parameter models absorb meaningful reasoning improvements from multi-agent debate.

Paper: https://t.co/cyPTig221s

Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
tweet
Offshore
Photo
Quiver Quantitative
JUST IN: Someone on Polymarket has bet $100K that the US will strike Iran today.

They will win $4,000,000 if it happens.

Insider or gambler? https://t.co/p70QMgWPo1
tweet
Offshore
Photo
God of Prompt
RT @rryssf_: MIT researchers just mass-published evidence that the next paradigm after reasoning models isn't bigger context windows โ˜ ๏ธ

Recursive Language Models (RLMs) let the model write code to examine, decompose, and recursively call itself over its own input.

the results are genuinely wild. here's the full breakdown:
tweet