Oops Tencent guys beat me to distilling agent behaviors from LLMs into "regular" code 😅
https://zhongwen.one/projects/portal/
We present PORTAL, a novel framework for developing artificial intelligence agents capable of playing thousands of 3D video games through language-guided policy generation.
https://zhongwen.one/projects/portal/
👍1
^ that was the original idea that led me to write a custom benchmark
https://t.me/notatky/1292
https://t.me/notatky/1292
Telegram
χаотичні нотатки
ok chat i've made a reasoning LLM benchmark that can't be saturated (inspired by AidanBench), what models should I test?
currently I test on 200 easiest tasks solvable with pen and paper in seconds but the problem is NP complete and the number of tasks is…
currently I test on 200 easiest tasks solvable with pen and paper in seconds but the problem is NP complete and the number of tasks is…
https://ezyang.github.io/ai-blindspots/
regardless if you use AI, almost all of these are simply good coding practices to adhere to anyway :) and there's quite a list, so take a look
AI Blindspots – Blindspots in LLMs I've noticed while AI coding
regardless if you use AI, almost all of these are simply good coding practices to adhere to anyway :) and there's quite a list, so take a look
🔥2
https://thingofthings.wordpress.com/2017/01/10/meditation-for-people-who-hate-meditating/
interesting.
interesting.
Thing of Things
Meditation For People Who Hate Meditating
[content warning: exercise] I hate mindfulness. Hate it, hate it, hate it. The ten minutes I spend meditating is easily the least pleasant ten minutes of my day. Unfortunately, I am also a borderli…
👍1🤯1
if you add x-middleware-subrequest header to your request to the next.js website it would be treated like it has already passed through the auth wall middleware. cozy php vibes
https://security.snyk.io/vuln/SNYK-JS-NEXT-9508709
https://security.snyk.io/vuln/SNYK-JS-NEXT-9508709
Learn more about npm with Snyk Open Source Vulnerability Database
Improper Authorization in next | CVE-2025-29927 | Snyk
Critical severity (9.3) Improper Authorization in next | CVE-2025-29927
🤣2😁1
that box is something.
https://x.com/austinbv/status/1903276706699546958
https://x.com/austinbv/status/1903276706699546958
X (formerly Twitter)
Austin Vance (@austinbv) on X
DAAUUUMMMM! Deep Seek R1 - 4bit on a single Mac Studio 512gb.
18.26 Tokens per second with MLX.
Took over a minute to load the model but I sped that up. Generation was great!
thanks @awnihannun mlx is the future.
18.26 Tokens per second with MLX.
Took over a minute to load the model but I sped that up. Generation was great!
thanks @awnihannun mlx is the future.
👍1
instantly wishlisted. portable Raman spectrometer is like a sixth sense, except it's real
Also, "Minimal ignition risk" is an absolutely lovely detail 😂
https://fixupx.com/jwt0625/status/1904562738833367531?s=46
Also, "Minimal ignition risk" is an absolutely lovely detail 😂
https://fixupx.com/jwt0625/status/1904562738833367531?s=46
🧵 Thread • FixupX
outside five sigma (@jwt0625)
Portable handheld standoff Raman spectrometer from Pendar.
First time seeing one that could do it from this far.
First time seeing one that could do it from this far.
⚡1❤1
build the code that builds physical objects!
also a thread with a lot of solid experience
https://x.com/seveibar/status/1905443905979715725?s=46
also a thread with a lot of solid experience
https://x.com/seveibar/status/1905443905979715725?s=46
X (formerly Twitter)
Seve (@seveibar) on X
13 things I would have told myself before building an autorouter 🧵
I’ve spent about a year working on an autorouter for tscircuit (an open-source electronics CAD kernel written in Typescript). If I could go back a year, these are the 13 things I would tell…
I’ve spent about a year working on an autorouter for tscircuit (an open-source electronics CAD kernel written in Typescript). If I could go back a year, these are the 13 things I would tell…
👍1👾1
glimpse of ASI (narrow domain, still interesting as a workflow)
https://www.pnas.org/doi/10.1073/pnas.2406675122
https://www.pnas.org/doi/10.1073/pnas.2406675122
PNAS
Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero | PNAS
AI systems have attained superhuman performance across various domains. If the hidden
knowledge encoded in these highly capable systems can be leve...
knowledge encoded in these highly capable systems can be leve...
backpropagation-free training, this time much more efficient. small datasets tho.
https://arxiv.org/abs/2503.24322
https://arxiv.org/abs/2503.24322
arXiv.org
NoProp: Training Neural Networks without Full Back-propagation or...
The canonical deep learning approach for learning requires computing a gradient term at each block by back-propagating the error signal from the output towards each learnable parameter. Given the...
🔥4
Karpathy delivers: while LLMs ship slop 90% of the time they empower individuals with access to skill diversity previously available to corporations only.
the catch? your job now is to filter thru all that slop.
No, you can't delegate that.
https://x.com/karpathy/status/1909308143156240538?s=46
the catch? your job now is to filter thru all that slop.
No, you can't delegate that.
https://x.com/karpathy/status/1909308143156240538?s=46
👍3
llm "hallucinations", human anxiety and a lot of other things have one thing in common: trying to do something when the optimal decision would be to do nothing this time
https://www.evanmiller.org/attention-is-off-by-one.html
https://www.evanmiller.org/attention-is-off-by-one.html
www.evanmiller.org
Attention Is Off By One
Let’s fix these pesky Transformer outliers using Softmax One and QuietAttention.
👍5
this looks like a nice middle ground between rigid electrode arrays and [very finicky] external EM field/electricity BCI interfaces.
still needs survery but i guess we might eventually get to the implants that just grow into you [just a speculation though]
https://actu.epfl.ch/news/soft-brainstem-implant-delivers-high-resolution--2/
still needs survery but i guess we might eventually get to the implants that just grow into you [just a speculation though]
https://actu.epfl.ch/news/soft-brainstem-implant-delivers-high-resolution--2/
actu.epfl.ch
Soft brainstem implant delivers high-resolution hearing
EPFL researchers have developed a flexible auditory brainstem implant (ABI) that closely conforms to the curved surface of the brainstem. The technology has been successfully demonstrated high-resolution “prosthetic hearing” in macaques.
i'd say if you take the model far enough OOD from the sequence of chat sentences it will do whatever you want
in other words, the space of jailbreaks is infinite.
https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/
in other words, the space of jailbreaks is infinite.
https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/
HiddenLayer | Security for AI
Novel Universal Bypass for All Major LLMs
HiddenLayer’s latest research uncovers a universal prompt injection bypass impacting GPT-4, Claude, Gemini, and more, exposing major LLM security gaps.
👍1
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
nice paper, what's interesting is that graph databases make almost no difference in their tests, compared to just a vector store of facts distilled from history
https://mem0.ai/research
nice paper, what's interesting is that graph databases make almost no difference in their tests, compared to just a vector store of facts distilled from history
https://mem0.ai/research
This media is not supported in your browser
VIEW IN TELEGRAM
https://github.com/jessevig/bertviz is an interactive llm visualizer; be sure to play with it while i'm trying to find the time to >_<
🔥1
programmable matter/software defined labs is the future of R&D
https://arxiv.org/abs/2408.09171
https://arxiv.org/abs/2408.09171
arXiv.org
Chemputer and Chemputation -- A Universal Chemical Compound...
Chemputation reframes synthesis as the programmable execution of reaction code on a universally re-configurable hardware graph. Here we prove that a chemputer equipped with a finite, but...
self-contained software engineering company as a benchmark
https://arxiv.org/abs/2412.14161
https://arxiv.org/abs/2412.14161
arXiv.org
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet. At the same time, thanks...
turns out pre-training on 4chan is an important part of aligning the LLMs
https://arxiv.org/abs/2505.04741
https://arxiv.org/abs/2505.04741
arXiv.org
When Bad Data Leads to Good Models
In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training...
👍1😱1
Terence Tao shows how to use LLMs for math.
Interesting because making formal spec drafts from informal descriptions to feed into discrete solvers is the way to get results that are exact, not approximate.
https://www.youtube.com/watch?v=zZr54G7ec7A
Interesting because making formal spec drafts from informal descriptions to feed into discrete solvers is the way to get results that are exact, not approximate.
https://www.youtube.com/watch?v=zZr54G7ec7A
YouTube
Formalizing a proof in Lean using Claude and o4
Following on from the previous video at https://www.youtube.com/watch?v=cyyR7j2ChCI, I now attempt to formalize a different proof of the same assertion using the large language models Claude 3.4 Sonnet and o4-mini-high, after giving them the informal and…
👍3
for all (surveyed) large enough models, embeddings converge to projections legible/interpretable without access to the model itself, not even as a blackbox 🤯
https://arxiv.org/abs/2505.12540
https://arxiv.org/abs/2505.12540
arXiv.org
Harnessing the Universal Geometry of Embeddings
We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach...
🔥3