Stuff
3 subscribers
199K links
Download Telegram
Show HN: Chat with 19 years of HN
30 by vercantez | 16 comments on Hacker News.
Hey HN We loaded a BigQuery dataset of all of Hacker News, every comment, story and user, into camelAI. You can ask questions like: • “When does dang tend to comment during the day?” • “Which domains have gained the most submissions since 2015, year-over-year?” • “How has average comment length changed each January since 2007?” • “Top five users who link to arXiv papers the most.” It's behind a log-in to prevent abuse but free to use for 10 messages. No payment info required. We use OpenAI o3 or Claude sonnet 3.7 for the agent which can be really expensive. Would love feedback especially around graph/chart quality and o3 vs sonnet.
State of the Art PFAS [pdf]
4 by paulmist | 0 comments on Hacker News.
How the Sun Enterprise 10000 was born (2007)
20 by robin_reala | 7 comments on Hacker News.
Google Logo Ligature Bug
54 by cubefox | 4 comments on Hacker News.
Building my childhood dream PC
3 by todsacerdoti | 1 comments on Hacker News.
Pluto Flyover from New Horizons
17 by dxs | 2 comments on Hacker News.
Show HN: I modeled the Voynich Manuscript with SBERT to test for structure
17 by brig90 | 0 comments on Hacker News.
I built this project as a way to learn more about NLP by applying it to something weird and unsolved. The Voynich Manuscript is a 15th-century book written in an unknown script. No one’s been able to translate it, and many think it’s a hoax, a cipher, or a constructed language. I wasn’t trying to decode it — I just wanted to see: does it behave like a structured language? I stripped a handful of common suffix-like endings (aiin, dy, etc.) to isolate what looked like root forms. I know that’s a strong assumption — I call it out directly in the repo — but it helped clarify the clustering. From there, I used SBERT embeddings and KMeans to group similar roots, inferred POS-like roles based on position and frequency, and built a Markov transition matrix to visualize cluster-to-cluster flow. It’s not translation. It’s not decryption. It’s structural modeling — and it revealed some surprisingly consistent syntax across the manuscript, especially when broken out by section (Botanical, Biological, etc.). GitHub repo: https://ift.tt/K8vz1fI Write-up: https://ift.tt/xStcDgu... I’m new to the NLP space, so I’m sure there are things I got wrong — but I’d love feedback from people who’ve worked with structured language modeling or weird edge cases like this.
Show HN: Model2vec-Rs – Fast Static Text Embeddings in Rust
6 by Tananon | 0 comments on Hacker News.
Hey HN!We’ve just open-sourced model2vec-rs, a Rust crate for loading and running Model2Vec static embedding models with zero Python dependency. This allows you to embed text at (very) high throughput; for example, in a Rust-based microservice or CLI tool. This can be used for semantic search, retrieval, RAG, or any other text embedding usecase. Main Features: - Rust-native inference: Load any Model2Vec model from Hugging Face or your local path with StaticModel::from_pretrained(...). - Tiny footprint: The crate itself is only ~1.7 mb, with embedding models between 7 and 30 mb. Performance: We benchmarked single-threaded on a CPU: - Python: ~4650 embeddings/sec - Rust: ~8000 embeddings/sec (~1.7× speedup) First open-source project in Rust for us, so would be great to get some feedback!
Spaced Repetition Memory System
26 by gasull | 0 comments on Hacker News.
Show HN: Buckaroo – The data table UI for Notebooks
22 by paddy_m | 2 comments on Hacker News.
Buckaroo is my open source project. It is a dataframe viewer that has the basic features we expect in a modern table - scroll, search, sort. In addition there are summary stats, and histograms available. Buckaroo support Pandas and Polars dataframes and works on Jupter, Marimo, VSCode and Google Colab notebooks.All of this is extensible. I think of Buckaroo as a framework for building table UIs, and an initial data exploration app built on top of that framework. AG-Grid is used for the core table display and it has been customized with a declarative layer so you don't have to pass JS functions around for customizations. On the python side there is a framework for adding summary stats (with a small DAG for dependencies). There is also an entire Low Code UI for point and click selection of common commands (drop column). The lowcode UI also generates a python function that accomplishes the same tasks. This is built on top of JLisp - a small lisp interpreter that reads JSON flavored lisp. Auto Cleaning looks at columns and heuristically suggests common cleaning operations. The operations are added to the lowcode UI where they can be edited. Multiple cleaning strategies can be applied and the best fit retained. Autocleaning without a UI and multiple strategies is very opaque. Since this runs heuristically (not with an LLM), it’s fast and data stays local. I'm eager to hear feedback from data scientists and other users of dataframes/notebooks.
Ditching Obsidian and building my own
22 by williamsss | 12 comments on Hacker News.
Show HN: Racketmeter – Measure Badminton String Tension Using Sound Frequency
4 by zhacker | 0 comments on Hacker News.
Racketmeter lets badminton players measure string tension using the sound frequency produced when tapping the racket strings. It's 100% free, works in your browser on mobile and desktop, and requires no sign-up or installation. I built it to solve a personal problem. I started playing badminton regularly in 2016 and quickly learned that players often ask stringers to string rackets at specific tensions (like 22 or 26 lbs). But after a few stringing jobs, I began to feel like the tension was inconsistent. Other players told me they just tap the strings and go by ear where "sharper sound meant higher tension." One day while tuning my guitar, I could see exact sound frequencies on my tuner app. That’s when it clicked. It should be possible to build a tuner for badminton strings as well! I searched online and found some tension-frequency data shared by professional stringers, but it wasn’t clean or comprehensive. So I visited 5 or 6 local stringers, gave them a frequency measuring app, and asked them to record racket head size, string thickness, tension, and sound frequency for each job. Some asked for a small payment, but most helped for free. Within a week, I had over 200 solid data points. I trained a simple regression model using that data and validated it with newly strung rackets. It turned out to be surprisingly accurate. I shared it with friends and fellow players, and it started to spread in badminton forums. There was another app that launched a few months later with big celebrity endorsements, but it was less accurate, harder to use, and required in-app purchases. Mine wasn't built to compete, but it ended up being more useful. I originally released it as a mobile app, but constant changes in Google Play policies kept taking it down. So I rebuilt it as a simple browser-based tool. Would love feedback, suggestions for improvements, or ideas on how to sustain it without cluttering it with ads or paywalls. Let me know what you think.
The RISC OS GUI
21 by rbanffy | 2 comments on Hacker News.
An Uplifting Origin of 86 (2001)
12 by susam | 1 comments on Hacker News.