Mathematician Terence Tao, a recipient of the Fields Medal, now has a YouTube channel: https://www.youtube.com/@TerenceTao27
In his latest video, he tests Claude 3.7 Sonnet and o4-mini-high's ability to formalize a proof in Lean. He concludes that LLMs like Claude and o4 can accelerate the process of formalizing proofs in Lean. They can successfully convert informal proof steps into Lean code.
In his latest video, he tests Claude 3.7 Sonnet and o4-mini-high's ability to formalize a proof in Lean. He concludes that LLMs like Claude and o4 can accelerate the process of formalizing proofs in Lean. They can successfully convert informal proof steps into Lean code.
🔥9🤣3
THE “NAMES IN BOXES” (100 PRISONERS) PUZZLE
INTRODUCTION: FROM HOPELESS TO HOPEFUL
Imagine a life‑or‑death puzzle where the odds are so astronomically stacked against you that success seems impossible. Yet a simple, coordinated strategy can transform this near‑certain failure into a significant, very real chance of success. This walk‑through explains the puzzle, the clever plan, and the mathematics that turn despair into opportunity.
1. THE RULES IN ONE BREATH
- 100 prisoners are numbered 1–100.
- 100 boxes are numbered 1–100. Each box hides exactly one prisoner’s number (all numbers appear once).
- One prisoner at a time may enter the room and open up to 50 boxes.
- If every prisoner finds their own number, everyone goes free; if even one fails, they all lose.
- The prisoners may agree on a strategy beforehand but cannot communicate once the process starts.
2. WHY “JUST GUESSING” IS HOPELESS
If a prisoner opens 50 random boxes, their chance of success is 50⁄100 = 1⁄2.
For all 100 prisoners to succeed this way the probability is (1⁄2)^100 ≈ 7.9 × 10^-31.
3. A SIMPLE, STARTLINGLY GOOD PLAN
Start at your own label. Prisoner k first opens Box k.
Follow the breadcrumb.
- If Box k contains number m (where m ≠ k), next open Box m.
- If Box k contains k, you’re done — success!
Continue until:
- you find your own number (success), or
- you have opened 50 boxes (failure).
That’s the whole strategy.
4. WHY “FOLLOWING THE ARROWS” WORKS (AN INFORMAL PROOF)
1) Arrow‑paths must form cycles. Picture each box label (1–100) with an outgoing arrow to the label found inside. After more than 100 steps you must revisit a label. Since each label has only one incoming arrow (from the unique slip pointing to it), the first label revisited in your path must be your starting label, closing a loop. Every box lies on exactly one such closed loop (cycle).
2) Your starting box is on the cycle you trace. Beginning at Box k and following arrows keeps you on that loop.
3) Your number is on the same cycle. If slip k were outside this loop it would send a second arrow into Box k, contradicting the “one slip each” rule.
Consequence: If the loop containing Box k has length L, Prisoner k will open L boxes and succeed as long as L ≤ 50. Therefore everyone succeeds iff every loop has length ≤ 50.
5. CALCULATING THE ODDS OF SUCCESS
5.1 The deciding factor: They win iff no cycle is longer than 50.
5.2 How likely are long cycles? For a random shuffle of 100 items, the probability that it contains a cycle of length K (> 50) is 1⁄K.
5.3 Only one long cycle is possible. Two cycles longer than 50 would need more than 100 boxes.
5.4 Chance of failure: Pr(failure) = (1⁄51) + (1⁄52) + … + (1⁄100) ≈ 0.688 (68.8 %).
5.5 Chance of success: Pr(success) = 1 − Pr(failure) ≈ 1 − 0.688 = 0.312 → about 31.2 %.
6. WHY THE CHANCE OF ONE LONG LOOP OF LENGTH K IS 1⁄K (K > 50)
How many of these shuffles contain a cycle of a particular length K (where K > 50)?
Step ①: Choose K numbers for the cycle. There are "N choose K" ways (or C(N,K)) to select which K numbers will form this cycle.
Step ②: Arrange these K numbers into a cycle. To form a cycle with K chosen numbers, fix one number (say, the smallest) in place. The remaining K-1 numbers can be arranged in (K-1)! ways to complete the cycle. (Fixing one prevents overcounting rotations as distinct cycles).
Step ③: Arrange the other N-K numbers. These remaining numbers can be arranged in the remaining N-K boxes in (N-K)! ways. These will form other, shorter cycles.
The number of shuffles containing a specific K-length cycle is the product of these choices:
Number of shuffles = C(N, K) * (K-1)! * (N-K)!
Since C(N, K) = N! / ( K! * (N-K)! ), this simplifies to:
C(N, K) * (K-1)! * (N-K)! = (N! / ( K! * (N-K)! )) * (K-1)! * (N-K)! = N! / K.
Step ④: Calculate Probability
The total number of possible shuffles is N!.
So, Pr(K-cycle) = (Shuffles with a K-cycle) / (Total shuffles)
= (N! / K) / N!
= 1/K.
INTRODUCTION: FROM HOPELESS TO HOPEFUL
Imagine a life‑or‑death puzzle where the odds are so astronomically stacked against you that success seems impossible. Yet a simple, coordinated strategy can transform this near‑certain failure into a significant, very real chance of success. This walk‑through explains the puzzle, the clever plan, and the mathematics that turn despair into opportunity.
1. THE RULES IN ONE BREATH
- 100 prisoners are numbered 1–100.
- 100 boxes are numbered 1–100. Each box hides exactly one prisoner’s number (all numbers appear once).
- One prisoner at a time may enter the room and open up to 50 boxes.
- If every prisoner finds their own number, everyone goes free; if even one fails, they all lose.
- The prisoners may agree on a strategy beforehand but cannot communicate once the process starts.
2. WHY “JUST GUESSING” IS HOPELESS
If a prisoner opens 50 random boxes, their chance of success is 50⁄100 = 1⁄2.
For all 100 prisoners to succeed this way the probability is (1⁄2)^100 ≈ 7.9 × 10^-31.
3. A SIMPLE, STARTLINGLY GOOD PLAN
Start at your own label. Prisoner k first opens Box k.
Follow the breadcrumb.
- If Box k contains number m (where m ≠ k), next open Box m.
- If Box k contains k, you’re done — success!
Continue until:
- you find your own number (success), or
- you have opened 50 boxes (failure).
That’s the whole strategy.
4. WHY “FOLLOWING THE ARROWS” WORKS (AN INFORMAL PROOF)
1) Arrow‑paths must form cycles. Picture each box label (1–100) with an outgoing arrow to the label found inside. After more than 100 steps you must revisit a label. Since each label has only one incoming arrow (from the unique slip pointing to it), the first label revisited in your path must be your starting label, closing a loop. Every box lies on exactly one such closed loop (cycle).
2) Your starting box is on the cycle you trace. Beginning at Box k and following arrows keeps you on that loop.
3) Your number is on the same cycle. If slip k were outside this loop it would send a second arrow into Box k, contradicting the “one slip each” rule.
Consequence: If the loop containing Box k has length L, Prisoner k will open L boxes and succeed as long as L ≤ 50. Therefore everyone succeeds iff every loop has length ≤ 50.
5. CALCULATING THE ODDS OF SUCCESS
5.1 The deciding factor: They win iff no cycle is longer than 50.
5.2 How likely are long cycles? For a random shuffle of 100 items, the probability that it contains a cycle of length K (> 50) is 1⁄K.
5.3 Only one long cycle is possible. Two cycles longer than 50 would need more than 100 boxes.
5.4 Chance of failure: Pr(failure) = (1⁄51) + (1⁄52) + … + (1⁄100) ≈ 0.688 (68.8 %).
5.5 Chance of success: Pr(success) = 1 − Pr(failure) ≈ 1 − 0.688 = 0.312 → about 31.2 %.
6. WHY THE CHANCE OF ONE LONG LOOP OF LENGTH K IS 1⁄K (K > 50)
How many of these shuffles contain a cycle of a particular length K (where K > 50)?
Step ①: Choose K numbers for the cycle. There are "N choose K" ways (or C(N,K)) to select which K numbers will form this cycle.
Step ②: Arrange these K numbers into a cycle. To form a cycle with K chosen numbers, fix one number (say, the smallest) in place. The remaining K-1 numbers can be arranged in (K-1)! ways to complete the cycle. (Fixing one prevents overcounting rotations as distinct cycles).
Step ③: Arrange the other N-K numbers. These remaining numbers can be arranged in the remaining N-K boxes in (N-K)! ways. These will form other, shorter cycles.
The number of shuffles containing a specific K-length cycle is the product of these choices:
Number of shuffles = C(N, K) * (K-1)! * (N-K)!
Since C(N, K) = N! / ( K! * (N-K)! ), this simplifies to:
C(N, K) * (K-1)! * (N-K)! = (N! / ( K! * (N-K)! )) * (K-1)! * (N-K)! = N! / K.
Step ④: Calculate Probability
The total number of possible shuffles is N!.
So, Pr(K-cycle) = (Shuffles with a K-cycle) / (Total shuffles)
= (N! / K) / N!
= 1/K.
👍8💊7
US vs. Houthis
The Houthis nearly shot down several F-16s and an F-35 (despite its stealth capability).
So the US decided to end the campaign.
Source: https://www.nytimes.com/2025/05/12/us/politics/trump-houthis-bombing.html [no paywall: https://archive.is/0aTiI]
The Houthis nearly shot down several F-16s and an F-35 (despite its stealth capability).
So the US decided to end the campaign.
In those first 30 days, the Houthis shot down seven American MQ-9 drones (around $30 million each), hampering Central Command's ability to track and strike the militant group. Several American F-16s and an F-35 fighter jet were nearly struck by Houthi air defences, making real the possibility of American casualties, multiple US officials said.
That possibility became reality when two pilots and a flight deck crew member were injured in the two episodes involving the F/A-18 Super Hornets, which fell into the Red Sea from the aircraft carrier Harry S. Truman within 10 days of each other.
Source: https://www.nytimes.com/2025/05/12/us/politics/trump-houthis-bombing.html [no paywall: https://archive.is/0aTiI]
🫡10🤣7
VDT: a solution to decision theory https://www.lesswrong.com/posts/LcjuHNxubQqCry9tT/vdt-a-solution-to-decision-theory
...no comprehensive decision theory that resolves all decision theory dilemmas has yet been formalized. This paper at long last resolves this dilemma, by introducing a new decision theory: VDT.
❤5😁3👀1
Links for 2025-05-14
AI
1. Sakana AI presents Continuous Thought Machines, mimicking brain-like neural timing for dynamic, process-based reasoning. https://sakana.ai/ctm/
2. Seed-Coder: Let the Code Model Curate Data for Itself https://github.com/ByteDance-Seed/Seed-Coder
3. “In order to break past the "pre-training ceiling," we will need to continually collect and invent new tasks and environments, likely based on systems grounded to real-world applications with both humans and models in the loop.” https://x.com/MinqiJiang/status/1921176396228952253
4. How far can reasoning models scale? It appears that the rapid scaling of reasoning training, like the jump from o1 to o3, will likely slow down in a year or so. https://epochai.substack.com/p/how-far-can-reasoning-models-scale
5. Slow corporations as an intuition pump for AI R&D automation https://www.lesswrong.com/posts/hMSuXTsEHvk4NG6pm/slow-corporations-as-an-intuition-pump-for-ai-r-and-d
6. 9 Years to AGI? OpenAI’s Dan Roberts Reasons About Emulating Einstein https://www.youtube.com/watch?v=_rjD_2zn2JU
7. Tool-using LLMs can learn to reason—without reasoning traces. https://arxiv.org/abs/2505.00024
8. Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait https://arxiv.org/abs/2505.04616
9. UniVLA: Learning to Act Anywhere with Task-centric Latent Actions https://arxiv.org/abs/2505.06111
10. Multi-agent Embodied AI: Advances and Future Directions https://arxiv.org/abs/2505.05108
11. DanceGRPO: Unleashing GRPO on Visual Generation https://arxiv.org/abs/2505.07818
12. MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering https://arxiv.org/abs/2505.07782
13. Learning from Peers in Reasoning Models https://arxiv.org/abs/2505.07787
14. Dynamic Byte Latent Transformer: An alternative to traditional tokenization https://ai.meta.com/blog/meta-fair-updates-perception-localization-reasoning/
15. Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs https://arxiv.org/abs/2503.01307
16. “Muscle Mem is a cache system for AI agents, allowing them to learn and efficiently replay complex behaviors.” https://www.youtube.com/watch?v=hToIl9PRyRk
17. OpenAI enables PDF export for detailed research reports with sources and rich formatting. https://x.com/OpenAI/status/1921998278628901322
18. Norway’s giant sovereign wealth fund said it would scale back hiring thanks to AI improvements. https://www.bloomberg.com/news/articles/2025-05-13/norway-wealth-fund-s-use-of-ai-triggers-hiring-freeze-ceo-says [no paywall: https://archive.is/HDb4t]
19. From 12 weeks to 10 minutes: How Novo Nordisk accelerates time to value with GenAI and MongoDB https://www.mongodb.com/solutions/customer-case-studies/novo-nordisk
20. Saudi Arabia and NVIDIA to Build AI Factories to Power Next Wave of Intelligence for the Age of Reasoning https://nvidianews.nvidia.com/news/saudi-arabia-and-nvidia-to-build-ai-factories-to-power-next-wave-of-intelligence-for-the-age-of-reasoning
21. Microsoft is offering to give up some of its equity stake in OpenAI's new for-profit in exchange for continued access to new models developed beyond 2030 https://www.ft.com/content/8d9e5149-7e4f-4886-a035-9d200204972a [no paywall: https://archive.is/1NIP8]
22. Republicans push for a decadelong ban on states regulating AI https://www.theverge.com/news/666288/republican-ai-state-regulation-ban-10-years
Miscellaneous
1. A protein from tiny tardigrades may help cancer patients tolerate radiation therapy https://news.mit.edu/2025/tiny-tardigrades-protein-may-help-cancer-patients-tolerate-radiation-therapy-0226
2. Apple to Support Brain-Implant Control of Its Devices https://www.wsj.com/tech/apple-brain-computer-interface-9ec69919 [no paywall: https://archive.is/Xw3Wo]
3. Mass spectrometry method identifies pathogens within minutes instead of days https://phys.org/news/2025-05-mass-spectrometry-method-pathogens-minutes.html
AI
1. Sakana AI presents Continuous Thought Machines, mimicking brain-like neural timing for dynamic, process-based reasoning. https://sakana.ai/ctm/
2. Seed-Coder: Let the Code Model Curate Data for Itself https://github.com/ByteDance-Seed/Seed-Coder
3. “In order to break past the "pre-training ceiling," we will need to continually collect and invent new tasks and environments, likely based on systems grounded to real-world applications with both humans and models in the loop.” https://x.com/MinqiJiang/status/1921176396228952253
4. How far can reasoning models scale? It appears that the rapid scaling of reasoning training, like the jump from o1 to o3, will likely slow down in a year or so. https://epochai.substack.com/p/how-far-can-reasoning-models-scale
5. Slow corporations as an intuition pump for AI R&D automation https://www.lesswrong.com/posts/hMSuXTsEHvk4NG6pm/slow-corporations-as-an-intuition-pump-for-ai-r-and-d
6. 9 Years to AGI? OpenAI’s Dan Roberts Reasons About Emulating Einstein https://www.youtube.com/watch?v=_rjD_2zn2JU
7. Tool-using LLMs can learn to reason—without reasoning traces. https://arxiv.org/abs/2505.00024
8. Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait https://arxiv.org/abs/2505.04616
9. UniVLA: Learning to Act Anywhere with Task-centric Latent Actions https://arxiv.org/abs/2505.06111
10. Multi-agent Embodied AI: Advances and Future Directions https://arxiv.org/abs/2505.05108
11. DanceGRPO: Unleashing GRPO on Visual Generation https://arxiv.org/abs/2505.07818
12. MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering https://arxiv.org/abs/2505.07782
13. Learning from Peers in Reasoning Models https://arxiv.org/abs/2505.07787
14. Dynamic Byte Latent Transformer: An alternative to traditional tokenization https://ai.meta.com/blog/meta-fair-updates-perception-localization-reasoning/
15. Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs https://arxiv.org/abs/2503.01307
16. “Muscle Mem is a cache system for AI agents, allowing them to learn and efficiently replay complex behaviors.” https://www.youtube.com/watch?v=hToIl9PRyRk
17. OpenAI enables PDF export for detailed research reports with sources and rich formatting. https://x.com/OpenAI/status/1921998278628901322
18. Norway’s giant sovereign wealth fund said it would scale back hiring thanks to AI improvements. https://www.bloomberg.com/news/articles/2025-05-13/norway-wealth-fund-s-use-of-ai-triggers-hiring-freeze-ceo-says [no paywall: https://archive.is/HDb4t]
19. From 12 weeks to 10 minutes: How Novo Nordisk accelerates time to value with GenAI and MongoDB https://www.mongodb.com/solutions/customer-case-studies/novo-nordisk
20. Saudi Arabia and NVIDIA to Build AI Factories to Power Next Wave of Intelligence for the Age of Reasoning https://nvidianews.nvidia.com/news/saudi-arabia-and-nvidia-to-build-ai-factories-to-power-next-wave-of-intelligence-for-the-age-of-reasoning
21. Microsoft is offering to give up some of its equity stake in OpenAI's new for-profit in exchange for continued access to new models developed beyond 2030 https://www.ft.com/content/8d9e5149-7e4f-4886-a035-9d200204972a [no paywall: https://archive.is/1NIP8]
22. Republicans push for a decadelong ban on states regulating AI https://www.theverge.com/news/666288/republican-ai-state-regulation-ban-10-years
Miscellaneous
1. A protein from tiny tardigrades may help cancer patients tolerate radiation therapy https://news.mit.edu/2025/tiny-tardigrades-protein-may-help-cancer-patients-tolerate-radiation-therapy-0226
2. Apple to Support Brain-Implant Control of Its Devices https://www.wsj.com/tech/apple-brain-computer-interface-9ec69919 [no paywall: https://archive.is/Xw3Wo]
3. Mass spectrometry method identifies pathogens within minutes instead of days https://phys.org/news/2025-05-mass-spectrometry-method-pathogens-minutes.html
👍2
Google AlphaEvolve: a Gemini-powered coding agent for algorithm discovery.
It’s able to:
🔘 Design faster matrix multiplication algorithms
🔘 Find new solutions to open math problems
🔘 Make data centers, chip design and AI training more efficient across Google.
The system uses:
🔵 LLMs: To synthesize information about problems as well as previous attempts to solve them - and to propose new versions of algorithms
🔵 Automated evaluation: To address the broad class of problems where progress can be clearly and systematically measured.
🔵 Evolution: Iteratively improving the best algorithms found, and re-combining ideas from different solutions to find even better ones.
Over the past year, Google has deployed algorithms discovered by AlphaEvolve across Google’s computing ecosystem, including data centers, software and hardware.
It’s been able to:
🔧 Optimize data center scheduling
🔧 Assist in hardware design
🔧 Enhance AI training and inference
They applied AlphaEvolve to a fundamental problem in computer science: discovering algorithms for matrix multiplication. It managed to identify multiple new algorithms.
This significantly advances their previous model AlphaTensor, which AlphaEvolve outperforms using its better and more generalist approach.
Google also applied AlphaEvolve to over 50 open problems in analysis ✍️, geometry 📐, combinatorics ➕ and number theory 🔂, including the kissing number problem.
🔵 In 75% of cases, it rediscovered the best solution known so far.
🔵 In 20% of cases, it improved upon the previously best known solutions, thus yielding new discoveries.
Read more: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
It’s able to:
🔘 Design faster matrix multiplication algorithms
🔘 Find new solutions to open math problems
🔘 Make data centers, chip design and AI training more efficient across Google.
The system uses:
🔵 LLMs: To synthesize information about problems as well as previous attempts to solve them - and to propose new versions of algorithms
🔵 Automated evaluation: To address the broad class of problems where progress can be clearly and systematically measured.
🔵 Evolution: Iteratively improving the best algorithms found, and re-combining ideas from different solutions to find even better ones.
Over the past year, Google has deployed algorithms discovered by AlphaEvolve across Google’s computing ecosystem, including data centers, software and hardware.
It’s been able to:
🔧 Optimize data center scheduling
🔧 Assist in hardware design
🔧 Enhance AI training and inference
They applied AlphaEvolve to a fundamental problem in computer science: discovering algorithms for matrix multiplication. It managed to identify multiple new algorithms.
This significantly advances their previous model AlphaTensor, which AlphaEvolve outperforms using its better and more generalist approach.
Google also applied AlphaEvolve to over 50 open problems in analysis ✍️, geometry 📐, combinatorics ➕ and number theory 🔂, including the kissing number problem.
🔵 In 75% of cases, it rediscovered the best solution known so far.
🔵 In 20% of cases, it improved upon the previously best known solutions, thus yielding new discoveries.
Read more: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
👍6🔥2🥴2
Links for 2025-05-15
AI
1. Whole-Body Pose Control for Legged Manipulation https://www.youtube.com/watch?app=desktop&v=D0bvsit_jqE
2. AI headphones translate multiple speakers at once, cloning their voices in 3D sound https://www.washington.edu/news/2025/05/09/ai-headphones-translate-multiple-speakers-at-once-cloning-their-voices-in-3d-sound/
3. CrowdStrike CEO announces 5% of workforce to be slashed globally, citing artificial intelligence efficiencies created in the business https://www.theguardian.com/technology/2025/may/09/crowdstrike-to-cut-jobs-and-use-ai
4. Early evidence on human + Ai in accounting: “We document significant productivity gains among AI adopters, including a 55% increase in weekly client support and a reallocation of approximately 8.5% of accountant time from routine data entry toward high-value tasks such as business communication and quality assurance. AI usage further corresponds to improved financial reporting quality, evidenced by a 12% increase in general ledger granularity and a 7.5-day reduction in monthly close time.” https://marginalrevolution.com/marginalrevolution/2025/05/early-evidence-on-human-ai-in-accounting.html
5. LLMs Get Lost In Multi-Turn Conversation https://arxiv.org/abs/2505.06120
6. Revealing economic facts: LLMs know more than they say https://arxiv.org/abs/2505.08662
7. Why OpenAI projects only $174B of revenue by 2030? https://www.lesswrong.com/posts/Yhfgygybmkyfgs64k/untitled-draft-fqpt
8. Yudkowsky and Soares Announce Major New Book: “If Anyone Builds It, Everyone Dies” https://intelligence.org/2025/05/15/yudkowsky-and-soares-announce-major-new-book-if-anyone-builds-it-everyone-dies/
Miscellaneous
1. Ozempic is a miracle drug. This looks like another: SGLT2 inhibitors may be a miracle cure for kidney disease – but also for liver disease, dementia, respiratory diseases, and even, very possibly, for old age. https://www.worksinprogress.news/p/everything-drugs
AI
1. Whole-Body Pose Control for Legged Manipulation https://www.youtube.com/watch?app=desktop&v=D0bvsit_jqE
2. AI headphones translate multiple speakers at once, cloning their voices in 3D sound https://www.washington.edu/news/2025/05/09/ai-headphones-translate-multiple-speakers-at-once-cloning-their-voices-in-3d-sound/
3. CrowdStrike CEO announces 5% of workforce to be slashed globally, citing artificial intelligence efficiencies created in the business https://www.theguardian.com/technology/2025/may/09/crowdstrike-to-cut-jobs-and-use-ai
4. Early evidence on human + Ai in accounting: “We document significant productivity gains among AI adopters, including a 55% increase in weekly client support and a reallocation of approximately 8.5% of accountant time from routine data entry toward high-value tasks such as business communication and quality assurance. AI usage further corresponds to improved financial reporting quality, evidenced by a 12% increase in general ledger granularity and a 7.5-day reduction in monthly close time.” https://marginalrevolution.com/marginalrevolution/2025/05/early-evidence-on-human-ai-in-accounting.html
5. LLMs Get Lost In Multi-Turn Conversation https://arxiv.org/abs/2505.06120
6. Revealing economic facts: LLMs know more than they say https://arxiv.org/abs/2505.08662
7. Why OpenAI projects only $174B of revenue by 2030? https://www.lesswrong.com/posts/Yhfgygybmkyfgs64k/untitled-draft-fqpt
8. Yudkowsky and Soares Announce Major New Book: “If Anyone Builds It, Everyone Dies” https://intelligence.org/2025/05/15/yudkowsky-and-soares-announce-major-new-book-if-anyone-builds-it-everyone-dies/
Miscellaneous
1. Ozempic is a miracle drug. This looks like another: SGLT2 inhibitors may be a miracle cure for kidney disease – but also for liver disease, dementia, respiratory diseases, and even, very possibly, for old age. https://www.worksinprogress.news/p/everything-drugs
👍2
Scientists have been publishing climate models since ~1970.
A good way to evaluate their skill is to compare what they expected to happen in the years after the model was published to observed climate changes.
It turns out most models were pretty spot-on
Source: https://x.com/hausfath/status/1922794856054702160
🤡5👍4🤗1
How to Spot and Lock-In “Free-Money” Gaps Between Betting Sites
---
1. What’s an arbitrage in prediction markets?
Think of buying a product cheaply in one shop and instantly reselling it at a higher price next door—guaranteed profit, zero risk.
In betting, the “product” is a bet on each possible outcome. The prices are given as decimal odds (for example, odds of 3.0 mean a €1 bet returns €3 if it wins, for a €2 profit). If the odds across different bookmakers for all possible outcomes don’t add up correctly, you can place bets on every outcome in the right proportions and finish with more money than you started—no matter which outcome occurs.
---
Lemma (Constant-payout stakes)
This lemma provides the formula for calculating stakes that ensure a constant profit across all outcomes, a core principle for achieving risk-free arbitrage.
Let O_i be the decimal odds on n mutually exclusive, exhaustive outcomes labeled 1 through n. The implied probability of outcome i according to the bookmaker is 1 divided by O_i. In a fair market without a bookmaker’s margin, the sum over i from 1 to n of (1/O_i) would equal 1.
An arbitrage opportunity arises when:
S = sum over i from 1 to n of (1/O_i) < 1.
If this holds, then for any total capital T > 0, staking:
B_i = T / (S * O_i) for i = 1, ..., n
yields the same payoff, equal to T/S, regardless of which outcome occurs.
Proof.
* Payoff for outcome i: B_i * O_i = T/S.
* Total stakes: sum over i=1 to n of B_i = (T/S) * sum over i of (1/O_i) = (T/S) * S = T.
This shows:
1. The payoff T/S is the same for any outcome.
2. The total amount staked equals the initial capital T.
---
Corollary (Arbitrage)
Profit = (T/S) - T = T * (1/S - 1) > 0 because 0 < S < 1.
---
Full proof that price discrepancies across markets generate arbitrage
Theorem (Cross-market prediction-market arbitrage)
Let M_1 through M_k be k betting venues on the same event with outcomes labeled 1 through n. For each outcome i, let O_bar_i be the best available odds—that is, the maximum odds for outcome i across all k markets. Define:
S_bar = sum over i from 1 to n of (1/O_bar_i).
If S_bar < 1, then risk-free profit is attainable:
1. Market aggregation. By construction, no other bookmaker offers better odds for outcome i than O_bar_i.
2. Stake selection. With budget T > 0, wager:
B_i = T / (S_bar * O_bar_i).
3. State-independent payoff. Whichever outcome occurs, return = B_i * O_bar_i = T/S_bar.
4. Positive profit. Initial outlay = T; payoff = T/S_bar. Profit = T * (1/S_bar - 1) > 0 if 0 < S_bar < 1.
5. Risk-free. Payoff is identical across all outcomes, so variance is zero and no outcome can eliminate the gain.
End of proof.
---
1. What’s an arbitrage in prediction markets?
Think of buying a product cheaply in one shop and instantly reselling it at a higher price next door—guaranteed profit, zero risk.
In betting, the “product” is a bet on each possible outcome. The prices are given as decimal odds (for example, odds of 3.0 mean a €1 bet returns €3 if it wins, for a €2 profit). If the odds across different bookmakers for all possible outcomes don’t add up correctly, you can place bets on every outcome in the right proportions and finish with more money than you started—no matter which outcome occurs.
---
Lemma (Constant-payout stakes)
This lemma provides the formula for calculating stakes that ensure a constant profit across all outcomes, a core principle for achieving risk-free arbitrage.
Let O_i be the decimal odds on n mutually exclusive, exhaustive outcomes labeled 1 through n. The implied probability of outcome i according to the bookmaker is 1 divided by O_i. In a fair market without a bookmaker’s margin, the sum over i from 1 to n of (1/O_i) would equal 1.
An arbitrage opportunity arises when:
S = sum over i from 1 to n of (1/O_i) < 1.
If this holds, then for any total capital T > 0, staking:
B_i = T / (S * O_i) for i = 1, ..., n
yields the same payoff, equal to T/S, regardless of which outcome occurs.
Proof.
* Payoff for outcome i: B_i * O_i = T/S.
* Total stakes: sum over i=1 to n of B_i = (T/S) * sum over i of (1/O_i) = (T/S) * S = T.
This shows:
1. The payoff T/S is the same for any outcome.
2. The total amount staked equals the initial capital T.
---
Corollary (Arbitrage)
Profit = (T/S) - T = T * (1/S - 1) > 0 because 0 < S < 1.
---
Full proof that price discrepancies across markets generate arbitrage
Theorem (Cross-market prediction-market arbitrage)
Let M_1 through M_k be k betting venues on the same event with outcomes labeled 1 through n. For each outcome i, let O_bar_i be the best available odds—that is, the maximum odds for outcome i across all k markets. Define:
S_bar = sum over i from 1 to n of (1/O_bar_i).
If S_bar < 1, then risk-free profit is attainable:
1. Market aggregation. By construction, no other bookmaker offers better odds for outcome i than O_bar_i.
2. Stake selection. With budget T > 0, wager:
B_i = T / (S_bar * O_bar_i).
3. State-independent payoff. Whichever outcome occurs, return = B_i * O_bar_i = T/S_bar.
4. Positive profit. Initial outlay = T; payoff = T/S_bar. Profit = T * (1/S_bar - 1) > 0 if 0 < S_bar < 1.
5. Risk-free. Payoff is identical across all outcomes, so variance is zero and no outcome can eliminate the gain.
End of proof.
🍾1
Month-by-month question volume for Mathematics Stack Exchange and Stack Overflow, with vertical dashed lines marking the public announcements of GPT-3 (June 2020), ChatGPT (November 2022), and o1-preview (Sep 2024).
(Note: I used o3 to both extract the data from Stack Exchange Data Explorer and to generate the graph.)
(Note: I used o3 to both extract the data from Stack Exchange Data Explorer and to generate the graph.)
👍6❤1
This media is not supported in your browser
VIEW IN TELEGRAM
Google presents LightLab: https://nadmag.github.io/LightLab/
Controlling Light Sources in Images with Diffusion Models
Controlling Light Sources in Images with Diffusion Models
🔥12
One of Putin's most impressive achievements is bootstrapping German killer drone startups.
For ten years, Germany fiercely debated whether to purchase one or two armed drones from Israel. Even when operated remotely by humans, armed drones were deemed extremely unethical.
But thanks to Putin, those concerns were nuked from orbit.
Germany now has several startups that focus on mass-producing fully autonomous combat drones:
1. Helsing:
Helsing builds drones with a 100 km range that are fully resistant to electronic warfare thanks to onboard AI for autonomous target search and identification. In the absence of a control link, the HX-2 can independently re-acquire and strike targets, making it “immune to hostile EW”. They have multi-purpose, anti-tank, and anti-structure ammunition.
Notably, the HX-2 is swarm-capable – it integrates with Helsing’s Altra reconnaissance-strike software platform so that multiple HX-2s (and other ISR assets or artillery) can coordinate in swarming attacks.
Tens of thousands can be produced in so-called Resilience Factories. Rather than a single plant, these are distributed, rapidly scalable factories across Europe to mass-produce drones in wartime.
(Field tests are underway in Ukraine.)
2. Stark Defense:
Their OWE-V "Virtus" kamikaze drone was designed with combat experience in Ukraine in mind. It features a vertical takeoff and landing capability, autonomous targeting, a range of up to 100 kilometers, speeds of up to 250 kilometers per hour, and a 5-kilogram payload. It can also operate without a connection. This autonomy is enabled by an onboard AI “core” for navigation and targeting, which lets Virtus identify, track, and precisely engage targets even in GPS-denied or jammed environments.
A standout aspect of Virtus is its swarm capability. Stark has built a proprietary control software called “Minerva” that enables a single operator to manage a large number of drones simultaneously, coordinating them as a swarm.
(Field tests are underway in Ukraine.)
3. Quantum-Systems:
Their drones – notably the Vector – have seen extensive use in Ukraine for frontline reconnaissance and targeting. The Vector boasts an endurance of up to 3 hours and an operational range of ~30–40 km (extendable via mesh networks or satcom).
The Vector can navigate by camera (visually) and detect/classify objects autonomously, even at night or in poor visibility. This is a game-changer for GPS-denied environments: if GPS is jammed, the drone’s AI-driven vision module provides signal-independent positioning and guidance. It also automatically recognizes and tracks targets of interest (vehicles, etc.) in its video feed via onboard AI, reducing the cognitive load on operators.
More:
- Alpine Eagle (Air-to-Air Counter-Drone Interceptors): It combines a swarm of interceptor drones and sensors to detect, track, and physically engage hostile UAVs.
- Dedrone (Counter-Drone Systems with AI Detection): The latest DedroneDefender 2 is described as an “AI-powered” jammer rifle that precisely targets a drone’s control link even beyond visual range.
P.S.
Established German defense firms like Rheinmetall are also developing various related systems:
- Rheinmetall’s activities now span drone production, loitering munitions, and layered C-UAS defenses. A cornerstone is the Oerlikon Skynex short-range air defense system. Skynex is a networked system that can tie together various sensors (radars, EO) with effectors like the 35 mm revolver gun firing programmable airburst rounds or effector modules (e.g. missiles or high-power lasers).
- Hensoldt’s “Elysion” Mission Core – AI is used to filter and fuse radar tracks, visual sightings, and RF detections of drones, then quickly recommend the best effector or jamming response.
- Diehl Defence unveiled the “Sky Sphere” C-UAS system with a novel anti-drone interceptor called CICADA. CICADA is described as an “eMissile” – essentially a very small surface-to-air missile tailored to engage Class 1 & 2 drones (small quadcopters up to larger UAVs).
For ten years, Germany fiercely debated whether to purchase one or two armed drones from Israel. Even when operated remotely by humans, armed drones were deemed extremely unethical.
But thanks to Putin, those concerns were nuked from orbit.
Germany now has several startups that focus on mass-producing fully autonomous combat drones:
1. Helsing:
Helsing builds drones with a 100 km range that are fully resistant to electronic warfare thanks to onboard AI for autonomous target search and identification. In the absence of a control link, the HX-2 can independently re-acquire and strike targets, making it “immune to hostile EW”. They have multi-purpose, anti-tank, and anti-structure ammunition.
Notably, the HX-2 is swarm-capable – it integrates with Helsing’s Altra reconnaissance-strike software platform so that multiple HX-2s (and other ISR assets or artillery) can coordinate in swarming attacks.
Tens of thousands can be produced in so-called Resilience Factories. Rather than a single plant, these are distributed, rapidly scalable factories across Europe to mass-produce drones in wartime.
(Field tests are underway in Ukraine.)
2. Stark Defense:
Their OWE-V "Virtus" kamikaze drone was designed with combat experience in Ukraine in mind. It features a vertical takeoff and landing capability, autonomous targeting, a range of up to 100 kilometers, speeds of up to 250 kilometers per hour, and a 5-kilogram payload. It can also operate without a connection. This autonomy is enabled by an onboard AI “core” for navigation and targeting, which lets Virtus identify, track, and precisely engage targets even in GPS-denied or jammed environments.
A standout aspect of Virtus is its swarm capability. Stark has built a proprietary control software called “Minerva” that enables a single operator to manage a large number of drones simultaneously, coordinating them as a swarm.
(Field tests are underway in Ukraine.)
3. Quantum-Systems:
Their drones – notably the Vector – have seen extensive use in Ukraine for frontline reconnaissance and targeting. The Vector boasts an endurance of up to 3 hours and an operational range of ~30–40 km (extendable via mesh networks or satcom).
The Vector can navigate by camera (visually) and detect/classify objects autonomously, even at night or in poor visibility. This is a game-changer for GPS-denied environments: if GPS is jammed, the drone’s AI-driven vision module provides signal-independent positioning and guidance. It also automatically recognizes and tracks targets of interest (vehicles, etc.) in its video feed via onboard AI, reducing the cognitive load on operators.
More:
- Alpine Eagle (Air-to-Air Counter-Drone Interceptors): It combines a swarm of interceptor drones and sensors to detect, track, and physically engage hostile UAVs.
- Dedrone (Counter-Drone Systems with AI Detection): The latest DedroneDefender 2 is described as an “AI-powered” jammer rifle that precisely targets a drone’s control link even beyond visual range.
P.S.
Established German defense firms like Rheinmetall are also developing various related systems:
- Rheinmetall’s activities now span drone production, loitering munitions, and layered C-UAS defenses. A cornerstone is the Oerlikon Skynex short-range air defense system. Skynex is a networked system that can tie together various sensors (radars, EO) with effectors like the 35 mm revolver gun firing programmable airburst rounds or effector modules (e.g. missiles or high-power lasers).
- Hensoldt’s “Elysion” Mission Core – AI is used to filter and fuse radar tracks, visual sightings, and RF detections of drones, then quickly recommend the best effector or jamming response.
- Diehl Defence unveiled the “Sky Sphere” C-UAS system with a novel anti-drone interceptor called CICADA. CICADA is described as an “eMissile” – essentially a very small surface-to-air missile tailored to engage Class 1 & 2 drones (small quadcopters up to larger UAVs).
🔥17🤷♂4❤3👎2👍1😁1🤓1
Links for 2025-05-17
AI
1. OpenAI introduced Codex: It is a software engineering agent that runs in the cloud and does tasks for you, like writing a new feature or fixing a bug. https://openai.com/index/introducing-codex/
2. With AI, researchers predict the location of virtually any protein within a human cell https://news.mit.edu/2025/researchers-predict-protein-location-within-human-cell-using-ai-0515
3. Large Language Models Are More Persuasive Than Incentivized Human Persuaders https://arxiv.org/abs/2505.09662
4. A faster way to compute product of matrix by its transpose has been discovered. This operation is performed trillions of times every minute globally. Estimated energy save: 5-10%. Estimated time save: 5%. The discovery was made by combining Machine Learning-based Search and Combinatorial Optimization. https://arxiv.org/abs/2505.09814
5. How Fast Can Algorithms Advance Capabilities? https://www.lesswrong.com/posts/qhjNejRxbMGQp4wHt/how-fast-can-algorithms-advance-capabilities-or-epoch
6. “Today, the radiology department has an A.I. team of 40 people including A.I. scientists, radiology researchers, data analysts and software engineers. They have developed a series of A.I. tools, from tissue analyzers to disease predictors.” https://www.nytimes.com/2025/05/14/technology/ai-jobs-radiologists-mayo-clinic.html [no paywall: https://archive.is/cw1Zt]
7. Microsoft’s CEO on How AI Will Remake Every Company, Including His https://www.bloomberg.com/news/features/2025-05-15/microsoft-ceo-satya-nadella-on-his-ai-efforts-and-openai-partnership [no paywall: https://archive.is/s7Peb]
8. Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models https://arxiv.org/abs/2505.10554
9. "transformers keep track of distinctions in anticipated distribution over the entire future, beyond distinctions in next token predictions, even though the transformer is only trained explicitly on next token prediction!" https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their
10. AI-Generated Law https://www.schneier.com/blog/archives/2025/05/ai-generated-law.html
11. Generating the Funniest Joke with RL (according to GPT-4.1) https://www.lesswrong.com/posts/xMGmibZpPDnawjHXk/generating-the-funniest-joke-with-rl-according-to-gpt-4-1
12. How AI wrappers and meta-apps are reshaping industries https://every.to/context-window/rise-of-the-ai-wrappers
13. Google hits 150 million users for subscription service with help of AI https://www.reuters.com/business/google-hits-150-million-users-subscription-service-with-help-ai-2025-05-15/ [no paywall: https://archive.is/uvfMH]
14. Nvidia plans Shanghai research centre in new commitment to China https://www.ft.com/content/c886a4c0-da75-4ea7-8230-6ffd18815fa4 [no paywall: https://archive.is/5F79f]
AI
1. OpenAI introduced Codex: It is a software engineering agent that runs in the cloud and does tasks for you, like writing a new feature or fixing a bug. https://openai.com/index/introducing-codex/
2. With AI, researchers predict the location of virtually any protein within a human cell https://news.mit.edu/2025/researchers-predict-protein-location-within-human-cell-using-ai-0515
3. Large Language Models Are More Persuasive Than Incentivized Human Persuaders https://arxiv.org/abs/2505.09662
4. A faster way to compute product of matrix by its transpose has been discovered. This operation is performed trillions of times every minute globally. Estimated energy save: 5-10%. Estimated time save: 5%. The discovery was made by combining Machine Learning-based Search and Combinatorial Optimization. https://arxiv.org/abs/2505.09814
5. How Fast Can Algorithms Advance Capabilities? https://www.lesswrong.com/posts/qhjNejRxbMGQp4wHt/how-fast-can-algorithms-advance-capabilities-or-epoch
6. “Today, the radiology department has an A.I. team of 40 people including A.I. scientists, radiology researchers, data analysts and software engineers. They have developed a series of A.I. tools, from tissue analyzers to disease predictors.” https://www.nytimes.com/2025/05/14/technology/ai-jobs-radiologists-mayo-clinic.html [no paywall: https://archive.is/cw1Zt]
7. Microsoft’s CEO on How AI Will Remake Every Company, Including His https://www.bloomberg.com/news/features/2025-05-15/microsoft-ceo-satya-nadella-on-his-ai-efforts-and-openai-partnership [no paywall: https://archive.is/s7Peb]
8. Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models https://arxiv.org/abs/2505.10554
9. "transformers keep track of distinctions in anticipated distribution over the entire future, beyond distinctions in next token predictions, even though the transformer is only trained explicitly on next token prediction!" https://www.lesswrong.com/posts/gTZ2SxesbHckJ3CkF/transformers-represent-belief-state-geometry-in-their
10. AI-Generated Law https://www.schneier.com/blog/archives/2025/05/ai-generated-law.html
11. Generating the Funniest Joke with RL (according to GPT-4.1) https://www.lesswrong.com/posts/xMGmibZpPDnawjHXk/generating-the-funniest-joke-with-rl-according-to-gpt-4-1
12. How AI wrappers and meta-apps are reshaping industries https://every.to/context-window/rise-of-the-ai-wrappers
13. Google hits 150 million users for subscription service with help of AI https://www.reuters.com/business/google-hits-150-million-users-subscription-service-with-help-ai-2025-05-15/ [no paywall: https://archive.is/uvfMH]
14. Nvidia plans Shanghai research centre in new commitment to China https://www.ft.com/content/c886a4c0-da75-4ea7-8230-6ffd18815fa4 [no paywall: https://archive.is/5F79f]
I was born at a time when, if you had a question, the most you could do was go to the local library. Later, the internet, especially search engines like Google, made it possible to access humanity's collective wisdom. However, this still limited you to the part of human knowledge that was accessible in your language, and it often required you to invest considerable time reading many websites or asking questions on specialized forums.
With modern AI search, we have overcome these last hurdles. You can now search the whole internet and have the research done for you in a few minutes.
Case in point, I came across this photo on Facebook [see comments], accompanied by a brief text, but there were no sources cited. I asked o3 to verify and research the photo and text using original Japanese sources:
What you’re looking at
The bright-red loops in your photo are U-shaped steel dampers (U 型鋼材ダンパー) – a type of yielding “shock-absorber” that forms part of the base-isolation layer for Miyazaki Prefecture’s new Disaster-Management Building (県防災庁舎, 2020).
They are purposely slender so that, during a strong quake, the steel yields and bends, converting seismic energy into harmless plastic deformation instead of transmitting it to the floors above.
Verifying the location
- Multiple social-media posts shot in the 地下駐車場 (underground car-park) of the prefectural complex show the same red devices and the same concrete finish as your photo.
- The prefecture’s own feature on the building explicitly highlights its 「地下免震装置」 (underground isolation devices); the accompanying photo on the page matches the arrangement seen in your picture.
What the dampers actually do (and what they don’t)
Common description: “The concrete columns sit on those metal loops.”
Engineering reality: Vertical gravity loads are still carried by hidden bearing plates / sliding or rubber isolators directly under the column stub. The U-dampers are bolted to steel plates on either side of that bearing and take only horizontal shear.
Common description: “They look too flimsy to hold thousands of tons.”
Engineering reality: A single NSU-type U-damper (the model family used here) has a yield shear of ~300–600 kN and can cycle ±300 mm repeatedly. A cluster of 4-8 units at each column easily provides several meganewtons of resistance.
Common description: “They just cushion the blow.”
Engineering reality: More precisely, they dissipate energy and add damping; the bearings next to them lengthen the building’s natural period, and the dampers calm the residual motion. The combination allows the super-structure to stay elastic even in a magnitude-7 class event, as noted in the Asahi Shimbun report (免震オイルダンパー12基で震度7対応).
Bottom line
- Yes – the photo is genuine and was taken in the Miyazaki Prefectural Disaster-Management Building’s car-park isolation layer.
- The red loops are shock-absorbers, but they don’t carry the building’s weight; they dissipate horizontal energy while hidden bearings handle gravity and provide the sliding motion.
- Far from being fragile, each loop is sized to yield in a controlled way many times before needing replacement, keeping the upper structure essentially undamaged after even severe earthquakes.
Ingenious indeed – just a bit more nuanced than the social-media caption!
Japanese sources (for deeper reading):
1. https://www.pref.miyazaki.lg.jp/contents/org/honbu/hisho/komiya/202010/sp.html
2. https://www.eng.nipponsteel.com/steelstructures/wp/wp-content/themes/nssmc/files/download/catalog_sud.pdf
3. https://www.asahi.com/articles/ASN836RD1N83TNAB00L.html
With modern AI search, we have overcome these last hurdles. You can now search the whole internet and have the research done for you in a few minutes.
Case in point, I came across this photo on Facebook [see comments], accompanied by a brief text, but there were no sources cited. I asked o3 to verify and research the photo and text using original Japanese sources:
What you’re looking at
The bright-red loops in your photo are U-shaped steel dampers (U 型鋼材ダンパー) – a type of yielding “shock-absorber” that forms part of the base-isolation layer for Miyazaki Prefecture’s new Disaster-Management Building (県防災庁舎, 2020).
They are purposely slender so that, during a strong quake, the steel yields and bends, converting seismic energy into harmless plastic deformation instead of transmitting it to the floors above.
Verifying the location
- Multiple social-media posts shot in the 地下駐車場 (underground car-park) of the prefectural complex show the same red devices and the same concrete finish as your photo.
- The prefecture’s own feature on the building explicitly highlights its 「地下免震装置」 (underground isolation devices); the accompanying photo on the page matches the arrangement seen in your picture.
What the dampers actually do (and what they don’t)
Common description: “The concrete columns sit on those metal loops.”
Engineering reality: Vertical gravity loads are still carried by hidden bearing plates / sliding or rubber isolators directly under the column stub. The U-dampers are bolted to steel plates on either side of that bearing and take only horizontal shear.
Common description: “They look too flimsy to hold thousands of tons.”
Engineering reality: A single NSU-type U-damper (the model family used here) has a yield shear of ~300–600 kN and can cycle ±300 mm repeatedly. A cluster of 4-8 units at each column easily provides several meganewtons of resistance.
Common description: “They just cushion the blow.”
Engineering reality: More precisely, they dissipate energy and add damping; the bearings next to them lengthen the building’s natural period, and the dampers calm the residual motion. The combination allows the super-structure to stay elastic even in a magnitude-7 class event, as noted in the Asahi Shimbun report (免震オイルダンパー12基で震度7対応).
Bottom line
- Yes – the photo is genuine and was taken in the Miyazaki Prefectural Disaster-Management Building’s car-park isolation layer.
- The red loops are shock-absorbers, but they don’t carry the building’s weight; they dissipate horizontal energy while hidden bearings handle gravity and provide the sliding motion.
- Far from being fragile, each loop is sized to yield in a controlled way many times before needing replacement, keeping the upper structure essentially undamaged after even severe earthquakes.
Ingenious indeed – just a bit more nuanced than the social-media caption!
Japanese sources (for deeper reading):
1. https://www.pref.miyazaki.lg.jp/contents/org/honbu/hisho/komiya/202010/sp.html
2. https://www.eng.nipponsteel.com/steelstructures/wp/wp-content/themes/nssmc/files/download/catalog_sud.pdf
3. https://www.asahi.com/articles/ASN836RD1N83TNAB00L.html
👍13
This media is not supported in your browser
VIEW IN TELEGRAM
Can ChatGPT Pass a Top-Tier Engineering Coursework?
An Imperial College engineering professor gave four LLMs a problem set that graduate students had two months to solve.
He had teaching assistants grade the results blindly, alongside real submissions.
Meta AI and Claude failed. ChatGPT ranked 27 out of 36 students, while Gemini 2.5 Pro ranked 4 out of 36.
Source: https://www.youtube.com/watch?v=lSbnMBb6INA
An Imperial College engineering professor gave four LLMs a problem set that graduate students had two months to solve.
He had teaching assistants grade the results blindly, alongside real submissions.
Meta AI and Claude failed. ChatGPT ranked 27 out of 36 students, while Gemini 2.5 Pro ranked 4 out of 36.
Source: https://www.youtube.com/watch?v=lSbnMBb6INA
🔥3🥱3❤1
I just tested NotebookLM because it can now generate audio in German, and I was blown away by the results.
I fed it an extremely dry German document about opening a bank account that was full of legal jargon. NotebookLM transformed it into an extremely natural-sounding conversation between two people who explained everything in easy-to-understand language.
Try it yourself for free: https://notebooklm.google/
I fed it an extremely dry German document about opening a bank account that was full of legal jargon. NotebookLM transformed it into an extremely natural-sounding conversation between two people who explained everything in easy-to-understand language.
Try it yourself for free: https://notebooklm.google/
❤6
Links for 2025-05-20
AI
1. Researchers have developed "ShiQ," a new AI training method to improve Large Language Models (LLMs). It adapts Q-learning, a type of Reinforcement Learning, to make LLMs more efficient and capable, especially for learning from existing data. https://arxiv.org/abs/2505.11081
2. Modern large-language-model training still leans on Adam-style optimizers, even though they eat memory and force lots of hyper-parameter fiddling. Frank-Wolfe-inspired alternatives such as Muon and Scion showed promise, but the maths never quite matched the code. Gluon closes that gap and, in the process, turns these ideas into a practical, theory-backed replacement for Adam. https://arxiv.org/abs/2505.13416
3. Improving Assembly Code Performance with Large Language Models via Reinforcement Learning https://arxiv.org/abs/2505.11480
4. Robin: A multi-agent system for automating scientific discovery https://arxiv.org/abs/2505.13400
5. OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning https://one-two-vla.github.io/
6. Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity https://arxiv.org/abs/2505.11107
7. Survey on scaffolded language models that learn beyond parametric update https://arxiv.org/abs/2410.16392
8. Model Merging in Pre-training of Large Language Models https://arxiv.org/abs/2505.12082
9. Is Compression Really Linear with Code Intelligence? https://arxiv.org/abs/2505.11441
10. Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models https://arxiv.org/abs/2505.10543
11. Vertical AI Agents Could Be 10X Bigger Than SaaS https://www.youtube.com/watch?v=ASABxNenD_U
12. AI conjures up potential new antibody drugs in a matter of months https://www.science.org/content/article/ai-conjures-potential-new-antibody-drugs-matter-months [no paywall: https://archive.is/HF9ye]
13. “There are growing signs that artificial intelligence poses a real threat to a substantial number of the jobs that normally serve as the first step for each new generation of young workers.” https://www.nytimes.com/2025/05/19/opinion/linkedin-ai-entry-level-jobs.html [no paywall: https://archive.is/SHk0Z]
14. “…the rating that managers gave to briefs done with ChatGPT was nearly 10 per cent higher than for those done by mere humans.” https://www.ft.com/content/f8cac59b-b467-4c83-86fe-6fae065559b5 [no paywall: https://archive.is/z4wun]
15. Anthropic closes $2.5 billion credit facility as Wall Street continues plunging money into AI boom https://www.cnbc.com/2025/05/16/anthropic-ai-credit-facility.html
Miscellaneous
1. Could we catch the next outbreak before anyone gets sick? Airborne biosensors that can detect viruses in real time. https://press.asimov.com/articles/contagion
2. Jensen Huang says the NVLink Spine moves more data than the traffic of the entire internet. 130 TB/s, fully meshed across 5,000 coaxial cables. https://www.youtube.com/live/TLzna9__DnI?si=6iK1PSU4-WuTsjFz&t=2357
3. Yet another paper finds that Ashkenazi Jews are mostly Italian. https://www.nature.com/articles/s41467-025-59936-3
4. Notes on deep atheism by Michael Nielsen https://michaelnotebook.com/deepatheism/index.html [response: https://x.com/jkcarlsmith/status/1924557034545881205]
5. North Korean Navy Diver Reacts to US Navy SEALs Training https://www.youtube.com/watch?v=pf7R5UHmu2Y
AI
1. Researchers have developed "ShiQ," a new AI training method to improve Large Language Models (LLMs). It adapts Q-learning, a type of Reinforcement Learning, to make LLMs more efficient and capable, especially for learning from existing data. https://arxiv.org/abs/2505.11081
2. Modern large-language-model training still leans on Adam-style optimizers, even though they eat memory and force lots of hyper-parameter fiddling. Frank-Wolfe-inspired alternatives such as Muon and Scion showed promise, but the maths never quite matched the code. Gluon closes that gap and, in the process, turns these ideas into a practical, theory-backed replacement for Adam. https://arxiv.org/abs/2505.13416
3. Improving Assembly Code Performance with Large Language Models via Reinforcement Learning https://arxiv.org/abs/2505.11480
4. Robin: A multi-agent system for automating scientific discovery https://arxiv.org/abs/2505.13400
5. OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning https://one-two-vla.github.io/
6. Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity https://arxiv.org/abs/2505.11107
7. Survey on scaffolded language models that learn beyond parametric update https://arxiv.org/abs/2410.16392
8. Model Merging in Pre-training of Large Language Models https://arxiv.org/abs/2505.12082
9. Is Compression Really Linear with Code Intelligence? https://arxiv.org/abs/2505.11441
10. Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models https://arxiv.org/abs/2505.10543
11. Vertical AI Agents Could Be 10X Bigger Than SaaS https://www.youtube.com/watch?v=ASABxNenD_U
12. AI conjures up potential new antibody drugs in a matter of months https://www.science.org/content/article/ai-conjures-potential-new-antibody-drugs-matter-months [no paywall: https://archive.is/HF9ye]
13. “There are growing signs that artificial intelligence poses a real threat to a substantial number of the jobs that normally serve as the first step for each new generation of young workers.” https://www.nytimes.com/2025/05/19/opinion/linkedin-ai-entry-level-jobs.html [no paywall: https://archive.is/SHk0Z]
14. “…the rating that managers gave to briefs done with ChatGPT was nearly 10 per cent higher than for those done by mere humans.” https://www.ft.com/content/f8cac59b-b467-4c83-86fe-6fae065559b5 [no paywall: https://archive.is/z4wun]
15. Anthropic closes $2.5 billion credit facility as Wall Street continues plunging money into AI boom https://www.cnbc.com/2025/05/16/anthropic-ai-credit-facility.html
Miscellaneous
1. Could we catch the next outbreak before anyone gets sick? Airborne biosensors that can detect viruses in real time. https://press.asimov.com/articles/contagion
2. Jensen Huang says the NVLink Spine moves more data than the traffic of the entire internet. 130 TB/s, fully meshed across 5,000 coaxial cables. https://www.youtube.com/live/TLzna9__DnI?si=6iK1PSU4-WuTsjFz&t=2357
3. Yet another paper finds that Ashkenazi Jews are mostly Italian. https://www.nature.com/articles/s41467-025-59936-3
4. Notes on deep atheism by Michael Nielsen https://michaelnotebook.com/deepatheism/index.html [response: https://x.com/jkcarlsmith/status/1924557034545881205]
5. North Korean Navy Diver Reacts to US Navy SEALs Training https://www.youtube.com/watch?v=pf7R5UHmu2Y
👍4🥴2❤1👏1
We know AI time horizons on software tasks are currently ~1.5hr and doubling every 4-7 months, but what about other domains? Here's a preliminary result comparing METR's task suite (orange line) to benchmarks in other domains, all of which have some kind of grounding in human data
By Thomas Kwa: https://www.lesswrong.com/posts/Zr37dY5YPRT6s56jY/thomas-kwa-s-shortform?commentId=KaaSfntGEBBgadvrF
🤔2