Just links

CLEVER: A Curated Benchmark for Formally Verified Code Generation https://arxiv.org/abs/2505.13938

CLEVER: A Curated Benchmark for Formally Verified Code Generation

We introduce ${\rm C{\small LEVER}}$, a high-quality, curated benchmark of 161 problems for end-to-end verified code generation in Lean. Each problem consists of (1) the task of generating a...

5.8K views18:35

Just links

Solving the fractional quantum Hall problem with self-attention neural network https://journals.aps.org/prb/abstract/10.1103/PhysRevB.111.205117

Physical Review B

Solving the fractional quantum Hall problem with self-attention neural network

We introduce an attention-based fermionic neural network (FNN) to variationally solve the problem of two-dimensional Coulomb electron gas in magnetic fields, a canonical platform for fractional quantum Hall (FQH) liquids, Wigner crystals, and other unconventional…

6.0K views06:11

Just links

Forwarded from Hacker News

I used o3 to find a remote zeroday in the Linux SMB implementation (Score: 161+ in 6 hours)

Link: https://readhacker.news/s/6v2yL
Comments: https://readhacker.news/c/6v2yL

Sean Heelan's Blog

How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation

In this post I’ll show you how I found a zeroday vulnerability in the Linux kernel using OpenAI’s o3 model. I found the vulnerability with nothing more complicated than the o3 API &#821…

6.1K views20:32

Read 39+ Comments

Just links

Seed1.5-VL Technical Report https://github.com/ByteDance-Seed/Seed1.5-VL/blob/main/Seed1.5-VL-Technical-Report.pdf

GitHub

Seed1.5-VL/Seed1.5-VL-Technical-Report.pdf at main · ByteDance-Seed/Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks. ...

5.2K views08:55

Just links

https://unitaryhack.dev/bounties/

unitaryhack.dev

Hackathon Bounties

All unitaryHACK participating bounties

6.1K views04:54

Just links

Generalized Hall Conductivities in Local Commuting Projector Models: Generalized Symmetries and Protected Surface Modes https://arxiv.org/abs/2505.20384

arXiv.org

Generalized Hall Conductivities in Local Commuting Projector...

Hall conductivities are important characterizations of phases of matter. It is known that nonzero Hall conductivities are difficult to realize in local commuting projector lattice models due to...

6.5K views12:14

Just links

Pauli Propagation: A Computational Framework for Simulating Quantum Systems https://arxiv.org/abs/2505.21606

arXiv.org

Pauli Propagation: A Computational Framework for Simulating Quantum Systems

Classical methods to simulate quantum systems are not only a key element of the physicist's toolkit for studying many-body models but are also increasingly important for verifying and challenging...

4.8K views09:02

Just links

Disturbing news about the d=2+ε expansion https://arxiv.org/abs/2505.21611

arXiv.org

Disturbing news about the $d=2+ε$ expansion

The $O(N)$ Non-Linear Sigma Model (NLSM) in $d=2+ε$ has long been conjectured to describe the same conformal field theory (CFT) as the Wilson-Fisher (WF) $O(N)$ fixed point obtained from...

5.4K views09:06

Just links

SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents https://arxiv.org/abs/2505.20411

arXiv.org

SWE-rebench: An Automated Pipeline for Task Collection and...

LLM-based agents have shown promising capabilities in a growing range of software engineering (SWE) tasks. However, advancing this field faces two critical challenges. First, high-quality training...

6.3K views10:53

Just links

https://odyssey.world/introducing-interactive-video

odyssey.world

AI video you can both watch and interact with in real-time

A research preview of interactive video, generated by AI in real-time.

7.5K views11:48

Just links

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions https://arxiv.org/abs/2505.18492

arXiv.org

Enumerate-Conjecture-Prove: Formally Solving Answer-Construction...

Mathematical reasoning lies at the heart of artificial intelligence, underpinning applications in education, program verification, and research-level mathematical discovery. Mathematical...

7.7K views07:07

Just links

Quantized Transport of Disordered Superconducting Fractional Quantum Hall Edges https://arxiv.org/abs/2505.20398

arXiv.org

Quantized Transport of Disordered Superconducting $ν=2/3$...

The $ν=2/3$ fractional quantum Hall (FQH) edge states, which have counter-propagating modes, are known to flow under relevant neutral disorders into a stable Kane-Fisher-Polchinski (KFP)...

8.0K views12:55

Just links

Generalization Bias in Large Language Model Summarizati https://arxiv.org/abs/2504.00025

arXiv.org

Generalization Bias in Large Language Model Summarization of...

Artificial intelligence chatbots driven by large language models (LLMs) have the potential to increase public science literacy and support scientific research, as they can quickly summarize...

7.9K views10:52

Just links

Forwarded from Hacker News

A Lean companion to Analysis I (Score: 150+ in 6 hours)

Link: https://readhacker.news/s/6vp2P
Comments: https://readhacker.news/c/6vp2P

What's new

A Lean companion to “Analysis I”

Almost 20 years ago, I wrote a textbook in real analysis called “Analysis I”. It was intended to complement the many good available analysis textbooks out there by focusing more on foun…

7.6K views05:44

Read 10+ Comments

Just links

Forwarded from Боря программирует

Training superhuman coding models at Cursor

Случайно наткнулся на видео, где ребята из Cursor обсуждают всякое разное про LLM. Обычно в подобных подкастах все высказывания очень поверхносные, чтобы случайно не выдать каких-нибудь секретов. А тут на удивление упомянули довольно много технических деталей.

Краткий список затронутых тем:
- Как делать RL, когда нет одного правильного ответа?
- Что делать, если вероятность получить "правильный" ответ очень маленькая?
- Как сделать, чтобы модель могла ориентироваться в большом проекте?
- Как поддерживать long context?
- Как делать credit assignment для memory tool?
- Как cursor может обучаться на пользовательских данных.
- Почему плохо смотреть на лайки/дизлайки ответов.
- Какая инфра нужна для больших RL тренировок.

Судя по количеству просмотров, если сам этим не занимаешься, то смотреть не очень интересно. Но мне понравилось!

6.2K views18:22

Just links

https://mlcommons.org/benchmarks/training/

MLCommons

Benchmark MLPerf Training | MLCommons Version 2.0 Results

The MLPerf Benchmark Suites measures how fast machine learning systems can train models to a target quality metric using v2.0 results.

5.9K views16:08

Just links

How to factor 2048 bit RSA integers with less than a million noisy qubits https://arxiv.org/abs/2505.15917

arXiv.org

How to factor 2048 bit RSA integers with less than a million noisy qubits

Planning the transition to quantum-safe cryptosystems requires understanding the cost of quantum attacks on vulnerable cryptosystems. In Gidney+Ekerå 2019, I co-published an estimate stating...