CLEVER: A Curated Benchmark for Formally Verified Code Generation https://arxiv.org/abs/2505.13938
arXiv.org
CLEVER: A Curated Benchmark for Formally Verified Code Generation
We introduce ${\rm C{\small LEVER}}$, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a...
Solving the fractional quantum Hall problem with self-attention neural network https://journals.aps.org/prb/abstract/10.1103/PhysRevB.111.205117
Physical Review B
Solving the fractional quantum Hall problem with self-attention neural network
We introduce an attention-based fermionic neural network (FNN) to variationally solve the problem of two-dimensional Coulomb electron gas in magnetic fields, a canonical platform for fractional quantum Hall (FQH) liquids, Wigner crystals, and other unconventional…
Forwarded from Hacker News
I used o3 to find a remote zeroday in the Linux SMB implementation (Score: 161+ in 6 hours)
Link: https://readhacker.news/s/6v2yL
Comments: https://readhacker.news/c/6v2yL
Link: https://readhacker.news/s/6v2yL
Comments: https://readhacker.news/c/6v2yL
Sean Heelan's Blog
How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation
In this post I’ll show you how I found a zeroday vulnerability in the Linux kernel using OpenAI’s o3 model. I found the vulnerability with nothing more complicated than the o3 API ̵…
Seed1.5-VL Technical Report https://github.com/ByteDance-Seed/Seed1.5-VL/blob/main/Seed1.5-VL-Technical-Report.pdf
GitHub
Seed1.5-VL/Seed1.5-VL-Technical-Report.pdf at main · ByteDance-Seed/Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks. ...
Generalized Hall Conductivities in Local Commuting Projector Models: Generalized Symmetries and Protected Surface Modes https://arxiv.org/abs/2505.20384
arXiv.org
Generalized Hall Conductivities in Local Commuting Projector...
Hall conductivities are important characterizations of phases of matter. It is known that nonzero Hall conductivities are difficult to realize in local commuting projector lattice models due to...
Pauli Propagation: A Computational Framework for Simulating Quantum Systems https://arxiv.org/abs/2505.21606
arXiv.org
Pauli Propagation: A Computational Framework for Simulating Quantum Systems
Classical methods to simulate quantum systems are not only a key element of the physicist's toolkit for studying many-body models but are also increasingly important for verifying and challenging...
Disturbing news about the d=2+ε expansion https://arxiv.org/abs/2505.21611
arXiv.org
Disturbing news about the $d=2+ε$ expansion
The $O(N)$ Non-Linear Sigma Model (NLSM) in $d=2+ε$ has long been conjectured to describe the same conformal field theory (CFT) as the Wilson-Fisher (WF) $O(N)$ fixed point obtained from...
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents https://arxiv.org/abs/2505.20411
arXiv.org
SWE-rebench: An Automated Pipeline for Task Collection and...
LLM-based agents have shown promising capabilities in a growing range of software engineering (SWE) tasks. However, advancing this field faces two critical challenges. First, high-quality training...
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions https://arxiv.org/abs/2505.18492
arXiv.org
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction...
Mathematical reasoning lies at the heart of artificial intelligence, underpinning applications in education, program verification, and research-level mathematical discovery. Mathematical...
Quantized Transport of Disordered Superconducting Fractional Quantum Hall Edges https://arxiv.org/abs/2505.20398
arXiv.org
Quantized Transport of Disordered Superconducting $ν=2/3$...
The $ν=2/3$ fractional quantum Hall (FQH) edge states, which have counter-propagating modes, are known to flow under relevant neutral disorders into a stable Kane-Fisher-Polchinski (KFP)...
Generalization Bias in Large Language Model Summarizati https://arxiv.org/abs/2504.00025
arXiv.org
Generalization Bias in Large Language Model Summarization of...
Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize...
Forwarded from Hacker News
A Lean companion to Analysis I (Score: 150+ in 6 hours)
Link: https://readhacker.news/s/6vp2P
Comments: https://readhacker.news/c/6vp2P
Link: https://readhacker.news/s/6vp2P
Comments: https://readhacker.news/c/6vp2P
What's new
A Lean companion to “Analysis I”
Almost 20 years ago, I wrote a textbook in real analysis called “Analysis I”. It was intended to complement the many good available analysis textbooks out there by focusing more on foun…
Forwarded from Боря программирует
Training superhuman coding models at Cursor
Случайно наткнулся на видео, где ребята из Cursor обсуждают всякое разное про LLM. Обычно в подобных подкастах все высказывания очень поверхносные, чтобы случайно не выдать каких-нибудь секретов. А тут на удивление упомянули довольно много технических деталей.
Краткий список затронутых тем:
- Как делать RL, когда нет одного правильного ответа?
- Что делать, если вероятность получить "правильный" ответ очень маленькая?
- Как сделать, чтобы модель могла ориентироваться в большом проекте?
- Как поддерживать long context?
- Как делать credit assignment для memory tool?
- Как cursor может обучаться на пользовательских данных.
- Почему плохо смотреть на лайки/дизлайки ответов.
- Какая инфра нужна для больших RL тренировок.
Судя по количеству просмотров, если сам этим не занимаешься, то смотреть не очень интересно. Но мне понравилось!
Случайно наткнулся на видео, где ребята из Cursor обсуждают всякое разное про LLM. Обычно в подобных подкастах все высказывания очень поверхносные, чтобы случайно не выдать каких-нибудь секретов. А тут на удивление упомянули довольно много технических деталей.
Краткий список затронутых тем:
- Как делать RL, когда нет одного правильного ответа?
- Что делать, если вероятность получить "правильный" ответ очень маленькая?
- Как сделать, чтобы модель могла ориентироваться в большом проекте?
- Как поддерживать long context?
- Как делать credit assignment для memory tool?
- Как cursor может обучаться на пользовательских данных.
- Почему плохо смотреть на лайки/дизлайки ответов.
- Какая инфра нужна для больших RL тренировок.
Судя по количеству просмотров, если сам этим не занимаешься, то смотреть не очень интересно. Но мне понравилось!
How to factor 2048 bit RSA integers with less than a million noisy qubits https://arxiv.org/abs/2505.15917
arXiv.org
How to factor 2048 bit RSA integers with less than a million noisy qubits
Planning the transition to quantum-safe cryptosystems requires understanding the cost of quantum attacks on vulnerable cryptosystems. In Gidney+Ekerå 2019, I co-published an estimate stating...
Forwarded from Axis of Ordinary
Please open Telegram to view this post
VIEW IN TELEGRAM
Scene-Centric Unsupervised Panoptic Segmentation https://openaccess.thecvf.com/content/CVPR2025/html/Hahn_Scene-Centric_Unsupervised_Panoptic_Segmentation_CVPR_2025_paper.html
A 2D-CFT Factory: Critical Lattice Models from Competing Anyon Condensation Processes in SymTO/SymTFT https://arxiv.org/abs/2506.05324
arXiv.org
A 2D-CFT Factory: Critical Lattice Models from Competing Anyon...
In this paper, we introduce a ``CFT factory'' : a novel algorithm of methodically generating 2D lattice models that would flow to 2D conformal fixed points in the infrared. These 2D models are...