Linkstream

Oops Tencent guys beat me to distilling agent behaviors from LLMs into "regular" code 😅

We present PORTAL, a novel framework for developing artificial intelligence agents capable of playing thousands of 3D video games through language-guided policy generation.

https://zhongwen.one/projects/portal/

👍1

464 views19:38

Linkstream

^ that was the original idea that led me to write a custom benchmark
https://t.me/notatky/1292

χаотичні нотатки

ok chat i've made a reasoning LLM benchmark that can't be saturated (inspired by AidanBench), what models should I test?

currently I test on 200 easiest tasks solvable with pen and paper in seconds but the problem is NP complete and the number of tasks is…

174 viewsedited 19:56

Linkstream

https://ezyang.github.io/ai-blindspots/

AI Blindspots – Blindspots in LLMs I've noticed while AI coding

regardless if you use AI, almost all of these are simply good coding practices to adhere to anyway :) and there's quite a list, so take a look

🔥2

209 views09:39

Linkstream

https://thingofthings.wordpress.com/2017/01/10/meditation-for-people-who-hate-meditating/
interesting.

Thing of Things

Meditation For People Who Hate Meditating

[content warning: exercise] I hate mindfulness. Hate it, hate it, hate it. The ten minutes I spend meditating is easily the least pleasant ten minutes of my day. Unfortunately, I am also a borderli…

👍1🤯1

189 views21:57

Linkstream

if you add x-middleware-subrequest header to your request to the next.js website it would be treated like it has already passed through the auth wall middleware. cozy php vibes

https://security.snyk.io/vuln/SNYK-JS-NEXT-9508709

Learn more about npm with Snyk Open Source Vulnerability Database

Improper Authorization in next | CVE-2025-29927 | Snyk

Critical severity (9.3) Improper Authorization in next | CVE-2025-29927

🤣2😁1

191 viewsedited 10:19

Linkstream

that box is something.
https://x.com/austinbv/status/1903276706699546958

X (formerly Twitter)

Austin Vance (@austinbv) on X

DAAUUUMMMM! Deep Seek R1 - 4bit on a single Mac Studio 512gb.

18.26 Tokens per second with MLX.

Took over a minute to load the model but I sped that up. Generation was great!

thanks @awnihannun mlx is the future.

👍1

240 views21:11

Linkstream

instantly wishlisted. portable Raman spectrometer is like a sixth sense, except it's real

Also, "Minimal ignition risk" is an absolutely lovely detail 😂

https://fixupx.com/jwt0625/status/1904562738833367531?s=46

🧵 Thread • FixupX

0:50

outside five sigma (@jwt0625)

Portable handheld standoff Raman spectrometer from Pendar.
First time seeing one that could do it from this far.

⚡1❤1

236 views22:04

Linkstream

build the code that builds physical objects!
also a thread with a lot of solid experience

https://x.com/seveibar/status/1905443905979715725?s=46

X (formerly Twitter)

Seve (@seveibar) on X

13 things I would have told myself before building an autorouter 🧵

I’ve spent about a year working on an autorouter for tscircuit (an open-source electronics CAD kernel written in Typescript). If I could go back a year, these are the 13 things I would tell…

👍1👾1

238 views06:01

Linkstream

glimpse of ASI (narrow domain, still interesting as a workflow)

https://www.pnas.org/doi/10.1073/pnas.2406675122

PNAS

Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero | PNAS

AI systems have attained superhuman performance across various domains. If the hidden
knowledge encoded in these highly capable systems can be leve...

213 views15:11

Linkstream

backpropagation-free training, this time much more efficient. small datasets tho.
https://arxiv.org/abs/2503.24322

arXiv.org

NoProp: Training Neural Networks without Full Back-propagation or...

The canonical deep learning approach for learning requires computing a gradient term at each block by back-propagating the error signal from the output towards each learnable parameter. Given the...

🔥4

197 views12:59

Linkstream

Karpathy delivers: while LLMs ship slop 90% of the time they empower individuals with access to skill diversity previously available to corporations only.

the catch? your job now is to filter thru all that slop.
No, you can't delegate that.

https://x.com/karpathy/status/1909308143156240538?s=46

👍3

232 viewsedited 00:39

Linkstream

llm "hallucinations", human anxiety and a lot of other things have one thing in common: trying to do something when the optimal decision would be to do nothing this time
https://www.evanmiller.org/attention-is-off-by-one.html

www.evanmiller.org

Attention Is Off By One

Let’s fix these pesky Transformer outliers using Softmax One and QuietAttention.

👍5

179 views11:16

Linkstream

this looks like a nice middle ground between rigid electrode arrays and [very finicky] external EM field/electricity BCI interfaces.

still needs survery but i guess we might eventually get to the implants that just grow into you [just a speculation though]

https://actu.epfl.ch/news/soft-brainstem-implant-delivers-high-resolution--2/

actu.epfl.ch

Soft brainstem implant delivers high-resolution hearing

EPFL researchers have developed a flexible auditory brainstem implant (ABI) that closely conforms to the curved surface of the brainstem. The technology has been successfully demonstrated high-resolution “prosthetic hearing” in macaques.

189 views11:04

Linkstream

i'd say if you take the model far enough OOD from the sequence of chat sentences it will do whatever you want

in other words, the space of jailbreaks is infinite.

https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/

HiddenLayer | Security for AI

Novel Universal Bypass for All Major LLMs

HiddenLayer’s latest research uncovers a universal prompt injection bypass impacting GPT-4, Claude, Gemini, and more, exposing major LLM security gaps.

👍1

429 views16:19

Linkstream

Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

nice paper, what's interesting is that graph databases make almost no difference in their tests, compared to just a vector store of facts distilled from history
https://mem0.ai/research

214 views00:02

Linkstream

This media is not supported in your browser

VIEW IN TELEGRAM

https://github.com/jessevig/bertviz is an interactive llm visualizer; be sure to play with it while i'm trying to find the time to >_<

🔥1

227 views00:32

Linkstream

programmable matter/software defined labs is the future of R&D
https://arxiv.org/abs/2408.09171

arXiv.org

Chemputer and Chemputation -- A Universal Chemical Compound...

Chemputation reframes synthesis as the programmable execution of reaction code on a universally re-configurable hardware graph. Here we prove that a chemputer equipped with a finite, but...

221 viewsedited 21:01

Linkstream

self-contained software engineering company as a benchmark
https://arxiv.org/abs/2412.14161

arXiv.org

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet. At the same time, thanks...

217 views10:49

Linkstream

turns out pre-training on 4chan is an important part of aligning the LLMs
https://arxiv.org/abs/2505.04741

arXiv.org

When Bad Data Leads to Good Models

In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training...

👍1😱1

168 views00:56

Linkstream

Terence Tao shows how to use LLMs for math.

Interesting because making formal spec drafts from informal descriptions to feed into discrete solvers is the way to get results that are exact, not approximate.
https://www.youtube.com/watch?v=zZr54G7ec7A

YouTube

Formalizing a proof in Lean using Claude and o4

Following on from the previous video at https://www.youtube.com/watch?v=cyyR7j2ChCI, I now attempt to formalize a different proof of the same assertion using the large language models Claude 3.4 Sonnet and o4-mini-high, after giving them the informal and…

👍3

311 viewsedited 13:21

Linkstream

for all (surveyed) large enough models, embeddings converge to projections legible/interpretable without access to the model itself, not even as a blackbox 🤯
https://arxiv.org/abs/2505.12540

arXiv.org

Harnessing the Universal Geometry of Embeddings

We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach...

🔥3

184 views17:52

About

Blog

Apps

Platform