Hacker News
24.1K subscribers
118K links
Top stories from https://news.ycombinator.com (with 100+ score)
Contribute to the development here: https://github.com/phil-r/hackernewsbot
Also check https://t.me/designer_news

Contacts: @philr
Download Telegram
Sci-Net (🔥 Score: 153+ in 2 hours)

Link: https://readhacker.news/s/6uA6T
Comments: https://readhacker.news/c/6uA6T
The Awful German Language (1880) (Score: 151+ in 13 hours)

Link: https://readhacker.news/s/6uzbP
Comments: https://readhacker.news/c/6uzbP
Show HN: Visual flow-based programming for Erlang, inspired by Node-RED (Score: 153+ in 4 hours)

Link: https://readhacker.news/s/6uABz
Comments: https://readhacker.news/c/6uABz

Hi There,
Erlang-RED has been my project for the last couple of months and I would love to get some feedback from the HN community.
The idea is to take advantage of Erlangs message passing and low overhead processes to have true concurrency in Node-RED flows. Plus also to bring low-code visual flow-based programming to Erlang.
MIT asks arXiv to take down preprint of paper on AI and scientific discovery (Score: 153+ in 6 hours)

Link: https://readhacker.news/s/6uAF4
Comments: https://readhacker.news/c/6uAF4
I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA (Score: 153+ in 7 hours)

Link: https://readhacker.news/c/6uAEf

I'll be here for the next 5-6 hours. As usual, there are countless topics given the rapidly changing immigration landscape and I'll be guided by whatever you're concerned with. Please remember that I can't provide legal advice on specific cases because I won't have access to all the facts. Please stick to a factual discussion in your questions and I'll try to do the same in my answers.
Edit: I am taking a break now and will return later this afternoon/evening to respond to any comments and answer any questions. Thank you everyone for a great and engaged AMA so far.
Show HN: KVSplit – Run 2-3x longer contexts on Apple Silicon (🔥 Score: 150+ in 2 hours)

Link: https://readhacker.news/s/6uBAK
Comments: https://readhacker.news/c/6uBAK

I discovered that in LLM inference, keys and values in the KV cache have very different quantization sensitivities. Keys need higher precision than values to maintain quality.
I patched llama.cpp to enable different bit-widths for keys vs. values on Apple Silicon. The results are surprising:
- K8V4 (8-bit keys, 4-bit values): 59% memory reduction with only 0.86% perplexity loss
- K4V8 (4-bit keys, 8-bit values): 59% memory reduction but 6.06% perplexity loss
- The configurations use the same number of bits, but K8V4 is 7× better for quality
This means you can run LLMs with 2-3× longer context on the same Mac. Memory usage scales with sequence length, so savings compound as context grows.
Implementation was straightforward:
1. Added --kvq-key and --kvq-val flags to llama.cpp
2. Applied existing quantization logic separately to K and V tensors
3. Validated with perplexity metrics across context lengths
4. Used Metal for acceleration (with -mlong-calls flag to avoid vectorization issues)
Benchmarked on an M4 MacBook Pro running TinyLlama with 8K context windows. Compatible with Metal/MPS and optimized for Apple Silicon.
GitHub: https://github.com/dipampaul17/KVSplit