Self Supervised Boy
@selfsupervised
160
subscribers
9
photos
56
links
Posting links to papers I read. Right now I'm mostly interested in things around LLMs, AI agents, and ML4Code. That is subject to change.
@martolod
Download Telegram
Join
Self Supervised Boy
160 subscribers
Self Supervised Boy
https://arxiv.org/abs/2509.26476
arXiv.org
Regression Language Models for Code
We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to...
Self Supervised Boy
https://arxiv.org/abs/2510.01123
arXiv.org
Rethinking Thinking Tokens: LLMs as Improvement Operators
Reasoning training incentivizes LLMs to produce long chains of thought (long CoT), which among other things, allows them to explore solution strategies with self-checking. This results in higher...
Self Supervised Boy
https://arxiv.org/abs/2406.18665v4
arXiv.org
RouteLLM: Learning to Route LLMs with Preference Data
Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More...
π
1
Self Supervised Boy
https://arxiv.org/abs/2403.12031
arXiv.org
RouterBench: A Benchmark for Multi-LLM Routing System
As the range of applications for Large Language Models (LLMs) continues to grow, the demand for effective serving solutions becomes increasingly critical. Despite the versatility of LLMs, no...
π
1
Self Supervised Boy
https://arxiv.org/abs/2510.02375
arXiv.org
Pretraining with hierarchical memories: separating long-tail and...
The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge...
Self Supervised Boy
https://arxiv.org/abs/2510.05445
arXiv.org
AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative...
Large language models (LLMs) and agent-based frameworks have advanced rapidly, enabling diverse applications. Yet, with the proliferation of models and agentic strategies, practitioners face...
Self Supervised Boy
https://arxiv.org/abs/2510.12773
arXiv.org
Dr.LLM: Dynamic Layer Routing in LLMs
Large Language Models (LLMs) process every token through all layers of a transformer stack, causing wasted computation on simple queries and insufficient flexibility for harder ones that need...
Self Supervised Boy
https://arxiv.org/abs/2510.18148v1
arXiv.org
Extracting Rule-based Descriptions of Attention Features in Transformers
Mechanistic interpretability strives to explain model behavior in terms of bottom-up primitives. The leading paradigm is to express hidden states as a sparse linear combination of basis vectors,...
π
1
Self Supervised Boy
https://arxiv.org/abs/2510.18147v1
arXiv.org
LLMs Encode How Difficult Problems Are
Large language models exhibit a puzzling inconsistency: they solve complex problems yet frequently fail on seemingly simpler ones. We investigate whether LLMs internally encode problem difficulty...
Self Supervised Boy
https://arxiv.org/abs/2510.21614v1
arXiv.org
Huxley-GΓΆdel Machine: Human-Level Coding Agent Development by an...
Recent studies operationalize self-improvement through coding agents that edit their own codebases. They grow a tree of self-modifications through expansion strategies that favor higher software...
Self Supervised Boy
https://arxiv.org/abs/2601.05167
arXiv.org
RelayLLM: Efficient Reasoning via Collaborative Decoding
Large Language Models (LLMs) for complex reasoning is often hindered by high computational costs and latency, while resource-efficient Small Language Models (SLMs) typically lack the necessary...
Self Supervised Boy
https://arxiv.org/abs/2601.03335v1
arXiv.org
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
Large language models (LLMs) are increasingly being used to evolve solutions to problems in many domains, in a process inspired by biological evolution. However, unlike biological evolution, most...
π
1
Self Supervised Boy
https://arxiv.org/abs/2601.04786v1
arXiv.org
AgentOCR: Reimagining Agent History via Optical Self-Compression
Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked...
Self Supervised Boy
https://arxiv.org/abs/2601.05106
arXiv.org
Token-Level LLM Collaboration via FusionRoute
Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to...
π
1
Self Supervised Boy
https://arxiv.org/abs/2601.07582v1
arXiv.org
ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents
Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term interactions. While existing memory mechanisms offer basic storage and retrieval...
Self Supervised Boy
https://arxiv.org/abs/2601.09503v1
arXiv.org
What Do LLM Agents Know About Their World? Task2Quiz: A Paradigm...
Large language model (LLM) agents have demonstrated remarkable capabilities in complex decision-making and tool-use tasks, yet their ability to generalize across varying environments remains a...
π₯
1
Self Supervised Boy
https://arxiv.org/abs/2601.10343v1
arXiv.org
OctoBench: Benchmarking Scaffold-Aware Instruction Following in...
Modern coding scaffolds turn LLMs into capable software agents, but their ability to follow scaffold-specified instructions remains under-examined, especially when constraints are heterogeneous...
Self Supervised Boy
https://arxiv.org/abs/2601.10245v1
arXiv.org
TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step...
Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods...
Self Supervised Boy
https://arxiv.org/abs/2601.10639v1
arXiv.org
STEM: Scaling Transformers with Embedding Modules
Fine-grained sparsity promises higher parametric capacity without proportional per-token compute, but often suffers from training instability, load balancing, and communication overhead. We...
π
1
Self Supervised Boy
Forwarded from
Just links
Time Horizon 1.1
https://metr.org/blog/2026-1-29-time-horizon-1-1/
metr.org
Time Horizon 1.1
Weβre releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval infrastructure.
π
3