Offshore – Telegram

Offshore

8 subscribers

35.4K photos

6.09K videos

2 files

47.8K links

https://x.com/i/lists/1669153613199835138?t=R0mCicxs7zfJE_yOAek4gQ&s=09

Download Telegram

About

Blog

Apps

Platform

1 view02:27

DAIR.AI
RT @dair_ai: What if you could get multi-agent performance from a single model?

Multi-agent debate systems are powerful. Multiple LLMs can critique each other's reasoning, catch errors, and converge on better answers.

However, the cost scales linearly with the number of agents. Five agents means 5x the compute. Twenty agents means 20x and so on.

But the intelligence gained from debate doesn't have to stay locked behind a compute wall.

This new research introduces AgentArk, a framework that distills the reasoning capabilities of multi-agent debate into a single LLM through trajectory extraction and targeted fine-tuning.

This work addresses an important problem: multi-agent systems are effective but expensive at inference time. AgentArk moves that cost to training time, letting a single model carry the reasoning depth of an entire agent team.

The key idea: run multi-agent debate offline to generate high-quality reasoning traces, then train a smaller model to internalize those patterns.

Five agents debate, one student learns.

AgentArk tests three distillation methods. RSFT uses supervised fine-tuning on correct trajectories. DA filters for diverse reasoning paths. PAD, their strongest method, preserves the full structure of multi-agent deliberation, capturing how agents verify intermediate steps and localize errors.

The results across 120 experiments:

> PAD achieves a 4.8% average gain over single-agent baselines, with in-domain improvements reaching up to 30%. On reasoning quality metrics,

> PAD scores highest in intermediate verification (4.07 vs 2.41 baseline) and reasoning coherence (3.96 vs 1.88 baseline).

>The distilled models also transfer: trained on math, they improve on TruthfulQA with ROUGE-L jumping from 0.613 to 0.657.

Scaling from Qwen3-32B teachers down to Qwen3-0.6B students, the framework holds up. Even sub-billion parameter models absorb meaningful reasoning improvements from multi-agent debate.

Paper: https://t.co/cyPTig221s

Learn to build effective AI agents in our academy: https://t.co/LRnpZN7L4c
tweet

1 view02:27

1 view02:27

DAIR.AI
RT @omarsar0: This is a great read if you are building complex applications with Claude Code and Codex.

Most AI coding agents can generate a frontend.

But building a real full-stack application is a completely different story.

The gap between generating a landing page and shipping a working app with a functional backend, database, and API layer remains wide.

Most coding agents default to mock data, fake endpoints, and frontend-only implementations.

But real-world web development requires all three layers working together.

This new research introduces FullStack-Agent, a multi-agent system designed for end-to-end full-stack web development with two key innovations: Development-Oriented Testing and Repository Back-Translation.

The system uses three specialized agents. A Planning Agent generates structured frontend and backend designs in JSON. A Backend Coding Agent implements server logic with a dedicated debugging tool that sends HTTP requests and validates responses. A Frontend Coding Agent builds the UI against real backend APIs with a tool that monitors terminal and browser console errors dynamically.

Development-Oriented Testing validates code during generation, not after. Each agent gets real-time execution feedback, catching integration failures as they happen rather than at the end of a long generation chain.

Repository Back-Translation solves the training data problem. An information-gathering agent reads real open-source repositories and extracts development patterns. A trajectory agent then reproduces those repositories from scratch given only the extracted plans, generating high-quality training examples grounded in real codebases.

The results on their FullStack-Bench (647 frontend, 604 backend, 389 database test cases): FullStack-Dev achieves 64.7% frontend accuracy, 77.8% backend accuracy, and 77.9% database accuracy. That's an 8.7%, 38.2%, and 15.9% improvement over the strongest baseline respectively.

After training a Qwen3-Coder-30B model on 2K crawled and 8K augmented trajectories, frontend accuracy improved by 9.7% and backend accuracy by 9.5% in just two rounds.

The bottleneck in AI-assisted web development isn't frontend generation. It's building functional backends and databases that actually work together. FullStack-Agent closes that gap with execution-grounded testing and real-world training data.

Paper: https://t.co/b2g041Pvrb

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
tweet

1 view02:27

1 view02:37

Michael Fritzell (Asian Century Stocks)
RT @ai: TSMC is going to manufacture advanced AI semiconductors in Japan. This is a big deal for supply chain diversification. For decades, cutting-edge chip fabrication was concentrated in Taiwan. Now TSMC is building serious capacity in Japan, Arizona, and Germany.

https://t.co/EHng9bVPd6
tweet

1 view02:37

1 view02:37

1 view02:37

Michael Fritzell (Asian Century Stocks)
RT @origoinvest: Korea discount still in full force despite 121% stock market run-up https://t.co/PWUwvwnqSL



@DanielSLoeb1 makes a strong case for SK Square currently trading at a 47% discount to a NAV where its primary asset is SK Hynix 

3 ways of winning here beyond your typical NAV discount story:

1) DDRAM supercycle 
2) Explicit NAV discount reduction targets
3) Compound gains through margin driven buybacks

Haven't been involved but this looks very compelling 

$402340.KS $000660.KS

1 view02:37

Quiver Quantitative
BREAKING: Our national debt has hit a record high of $38.58 trillion.

It has risen by almost $2.4 trillion in the last year.

Interest payments on the debt are now larger than the defense budget.
tweet

1 view02:57

1 view03:07

God of Prompt
RT @godofprompt: Prompting is foundational infrastructure, not a surface-level trick - it only appears to die because success means integration. https://t.co/KidFQLFwnO
tweet

1 view03:07

1 view03:07

Jukan
"Samsung Expected to Turn Non-Memory Business Profitable Next Year" – Kiwoom Securities

Kiwoom Securities forecast on the 10th that Samsung Electronics' non-memory division "will return to operating profit next year," maintaining a 'Buy' rating and a target price of 210,000 won.

Analyst Park Yu-ak of the firm said, "Yield improvements in the 2nm second-generation (SF2P) process to be used for the Exynos 2700, growing customer demand for cost reduction, and improved benchmark performance will lead to greater market share for the Exynos 2700." He added, "The Exynos 2700 will enter full-scale mass production on Samsung Foundry's SF2P process in the second half of this year, and is expected to achieve around 50% adoption within the Galaxy S27."

Accordingly, Kiwoom Securities estimates Samsung Electronics' non-memory division will post revenue of 36.4 trillion won next year, up 21% year-over-year, with operating profit turning to a 1.8 trillion won surplus.

Park noted, "As the market raises its expectations for non-memory earnings, this will serve as additional upside momentum for the stock."

Kiwoom Securities projects Samsung Electronics' total revenue and operating profit this year at 500 trillion won and 173 trillion won, respectively—up 50% and 295% year-over-year. This is based on expectations that commodity DRAM and NAND prices will rise 109% and 105%, respectively, over the same period. The firm also believes the surge in commodity DRAM prices and profitability will create favorable conditions for price negotiations on sixth-generation High Bandwidth Memory (HBM4).
tweet

1 view03:07

1 view03:07

Michael Fritzell (Asian Century Stocks)
Newsletter readers loved the myopia feature, and the one on Korean reforms. 75% "loved it" is what I'm aiming for. https://t.co/9sLszNnjQ5
tweet

1 view03:07

Michael Fritzell (Asian Century Stocks)
What's a headline for the emails sent out as weekly updates?
- Weekly update
- This week in Asia
- Other (specify)
- Article name (eg Myopia)
tweet

1 view03:18

1 view03:28

Michael Fritzell (Asian Century Stocks)
As much as I love Nintendo, the momentum is not showing up in Google search query data. More of a margin story once the DRAM issue is over, plus the second movie in April and Super Mario Odyssey 2 in October. https://t.co/ROA5s96zqW
tweet

1 view03:28

1 view03:38