Hacker News
24.1K subscribers
118K links
Top stories from https://news.ycombinator.com (with 100+ score)
Contribute to the development here: https://github.com/phil-r/hackernewsbot
Also check https://t.me/designer_news

Contacts: @philr
Download Telegram
Palette lighting tricks on the Nintendo 64 (Score: 151+ in 5 hours)

Link: https://readhacker.news/s/6uDgM
Comments: https://readhacker.news/c/6uDgM
Mystical (Score: 151+ in 7 hours)

Link: https://readhacker.news/s/6uDJF
Comments: https://readhacker.news/c/6uDJF
Experts have it easy (2024) (Score: 152+ in 14 hours)

Link: https://readhacker.news/s/6uEt7
Comments: https://readhacker.news/c/6uEt7
Show HN: I modeled the Voynich Manuscript with SBERT to test for structure (🔥 Score: 156+ in 2 hours)

Link: https://readhacker.news/s/6uFKt
Comments: https://readhacker.news/c/6uFKt

I built this project as a way to learn more about NLP by applying it to something weird and unsolved.
The Voynich Manuscript is a 15th-century book written in an unknown script. No one’s been able to translate it, and many think it’s a hoax, a cipher, or a constructed language. I wasn’t trying to decode it — I just wanted to see: does it behave like a structured language?
I stripped a handful of common suffix-like endings (aiin, dy, etc.) to isolate what looked like root forms. I know that’s a strong assumption — I call it out directly in the repo — but it helped clarify the clustering. From there, I used SBERT embeddings and KMeans to group similar roots, inferred POS-like roles based on position and frequency, and built a Markov transition matrix to visualize cluster-to-cluster flow.
It’s not translation. It’s not decryption. It’s structural modeling — and it revealed some surprisingly consistent syntax across the manuscript, especially when broken out by section (Botanical, Biological, etc.).
GitHub repo: https://github.com/brianmg/voynich-nlp-analysis
Write-up: https://brig90.substack.com/p/modeling-the-voynich-manuscrip...
I’m new to the NLP space, so I’m sure there are things I got wrong — but I’d love feedback from people who’ve worked with structured language modeling or weird edge cases like this.