Naur's "Programming as Theory Building" and LLMs replacing human programmers
6 by bertman | 0 comments on Hacker News.
6 by bertman | 0 comments on Hacker News.
Nationwide Power Outages Also Disrupt Internet Traffic in Portugal and Spain
23 by emot | 1 comments on Hacker News.
23 by emot | 1 comments on Hacker News.
Deep dive into how DOS games do copy protection by making themselves unwinnable
13 by abra0 | 2 comments on Hacker News.
13 by abra0 | 2 comments on Hacker News.
Optery (YC W22) – Engineering Team Lead and Engineers with Node.js (U.S., Latam)
1 by beyondd | 0 comments on Hacker News.
1 by beyondd | 0 comments on Hacker News.
Reports of widespread power cuts in Spain and Portugal
363 by lleims | 443 comments on Hacker News.
All of Spain is without energy. All systems have shut down immediately and are not coming back. Apparently the same has happened in Portugal.
363 by lleims | 443 comments on Hacker News.
All of Spain is without energy. All systems have shut down immediately and are not coming back. Apparently the same has happened in Portugal.
Tiny-LLM – a course of serving LLM on Apple Silicon for systems engineers
7 by sarkory | 0 comments on Hacker News.
7 by sarkory | 0 comments on Hacker News.
Show HN: Autarkie – Instant grammar fuzzing using Rust macros
14 by r9295 | 1 comments on Hacker News.
14 by r9295 | 1 comments on Hacker News.
Reanimation of the original Logic Theorist, the first AI, in IPL-V
9 by abrax3141 | 1 comments on Hacker News.
9 by abrax3141 | 1 comments on Hacker News.
Show HN: Web-eval-agent – Let the coding agent debug itself
11 by neversettles | 2 comments on Hacker News.
Hey HN! We’ve been building an MCP server to help AI-assisted web app developers by using browser agents to test whether changes made by an AI inside an editor actually work. We've been testing it on scenarios like verifying new flows in a UI, or checking that sending a chat request triggers a response. The idea is to let your coding agent both code and evaluate if what it did was correct. Here’s a short demo with Cursor: https://youtu.be/_AoQK-bwR0w When building apps, we found the hardest part of AI-assisted coding isn’t the coding—it’s tedious point-and-click testing to see if things work. We got tired of this loop: open the app, click through flows, stare at the network tab, copy console errors to the editor, repeat. It felt obvious this should be AI-assisted too. If you can vibe-code, you should be able to vibe-test! Some agents like Cline and Windsurf have browser integrations, but Cline’s (via Anthropic Computer Use) felt slow and only reported console logs, and Windsurf’s didn’t work reliably yet. We got so tired of manually testing that we decided to fix it. Our MCP server sits between your IDE agent (Cursor/Windsurf/Cline/Continue) and a Playwright-powered browser-use agent. It spins up the browser, navigates your app per instructions from the IDE agent, and sends back steps, console events, and network events so the IDE agent can assess the app’s state. We proxy Browser-use’s original Claude calls and swap in Gemini Flash 2.0, cutting latency from ~8s → ~3s per step. We also cap console/network logs at 10,000 characters to stay within context limits, and filter out irrelevant logs (e.g., noisy XHR requests). At the end, the browser agent outputs a summary like: Web Evaluation Report for http://localhost:5173 Task: delete an API key and evaluate UX Steps: Home → Login → API Keys → Create Key → Delete Key Flow tested successfully; UX had problems X, Y, Z... Console (8)... Network (13)... Timeline of events (57) … This gives the coding agent the ability to recognize the console and network errors, or any issues with clicking around, and have the coding agent fix them before returning back to the user. (There’s a longer example in the README at https://ift.tt/cQafJ9W .) Try it in Cursor / Cline / Windsurf / Claude Desktop: (macOS/Linux): curl -LSf https://ift.tt/bdGKYE4 -o install.sh less -N install.sh # inspect if you’d like bash install.sh # installs uv + jq + Playwright + server # then in Cursor/Cline/Windsurf/Continue: craft a prompt using the web_eval_agent tool (For Windows, there’s a 4-line manual install in the README.) What we want to do next: pause/go for OAuth screens; save/load browser auth states; Playwright step recording for automated test creation and regression test creation; supporting Loveable / v0 / Bolt.new sites by offering a web version. We’d love to hear your feedback, especially if you’ve experienced the pain of having to manually test changes happening in your web apps after making changes from inside your IDE, or if you’ve tried any alternative MCP tools for this that have worked well. Try it out if you feel it’d be helpful for your workflow: https://ift.tt/cQafJ9W . (note: the server hits our operative.sh proxy to cover Gemini tokens. The MCP server itself is OSS; Anthropic base-URL support is coming soon. Free tier included; heavy users can grab the $10 plan to offset our model bill.) Let us know what you think! Thanks for reading!
11 by neversettles | 2 comments on Hacker News.
Hey HN! We’ve been building an MCP server to help AI-assisted web app developers by using browser agents to test whether changes made by an AI inside an editor actually work. We've been testing it on scenarios like verifying new flows in a UI, or checking that sending a chat request triggers a response. The idea is to let your coding agent both code and evaluate if what it did was correct. Here’s a short demo with Cursor: https://youtu.be/_AoQK-bwR0w When building apps, we found the hardest part of AI-assisted coding isn’t the coding—it’s tedious point-and-click testing to see if things work. We got tired of this loop: open the app, click through flows, stare at the network tab, copy console errors to the editor, repeat. It felt obvious this should be AI-assisted too. If you can vibe-code, you should be able to vibe-test! Some agents like Cline and Windsurf have browser integrations, but Cline’s (via Anthropic Computer Use) felt slow and only reported console logs, and Windsurf’s didn’t work reliably yet. We got so tired of manually testing that we decided to fix it. Our MCP server sits between your IDE agent (Cursor/Windsurf/Cline/Continue) and a Playwright-powered browser-use agent. It spins up the browser, navigates your app per instructions from the IDE agent, and sends back steps, console events, and network events so the IDE agent can assess the app’s state. We proxy Browser-use’s original Claude calls and swap in Gemini Flash 2.0, cutting latency from ~8s → ~3s per step. We also cap console/network logs at 10,000 characters to stay within context limits, and filter out irrelevant logs (e.g., noisy XHR requests). At the end, the browser agent outputs a summary like: Web Evaluation Report for http://localhost:5173 Task: delete an API key and evaluate UX Steps: Home → Login → API Keys → Create Key → Delete Key Flow tested successfully; UX had problems X, Y, Z... Console (8)... Network (13)... Timeline of events (57) … This gives the coding agent the ability to recognize the console and network errors, or any issues with clicking around, and have the coding agent fix them before returning back to the user. (There’s a longer example in the README at https://ift.tt/cQafJ9W .) Try it in Cursor / Cline / Windsurf / Claude Desktop: (macOS/Linux): curl -LSf https://ift.tt/bdGKYE4 -o install.sh less -N install.sh # inspect if you’d like bash install.sh # installs uv + jq + Playwright + server # then in Cursor/Cline/Windsurf/Continue: craft a prompt using the web_eval_agent tool (For Windows, there’s a 4-line manual install in the README.) What we want to do next: pause/go for OAuth screens; save/load browser auth states; Playwright step recording for automated test creation and regression test creation; supporting Loveable / v0 / Bolt.new sites by offering a web version. We’d love to hear your feedback, especially if you’ve experienced the pain of having to manually test changes happening in your web apps after making changes from inside your IDE, or if you’ve tried any alternative MCP tools for this that have worked well. Try it out if you feel it’d be helpful for your workflow: https://ift.tt/cQafJ9W . (note: the server hits our operative.sh proxy to cover Gemini tokens. The MCP server itself is OSS; Anthropic base-URL support is coming soon. Free tier included; heavy users can grab the $10 plan to offset our model bill.) Let us know what you think! Thanks for reading!
Why Pale Blue Dot generates feelings of cosmic insignificance
9 by rbanffy | 5 comments on Hacker News.
9 by rbanffy | 5 comments on Hacker News.
Show HN: A pure WebGL image editor with filters, crop and perspective correction
4 by axelMI | 0 comments on Hacker News.
I'm working on a pure js webgl image editor with effects, filters, crop & perspective correction, etc.My goal is to give the community an opensource solution as unfortunately most comparable apps are closed sources. https://ift.tt/d0Qn2gD to try it out ( https://ift.tt/XDvLlcI )
4 by axelMI | 0 comments on Hacker News.
I'm working on a pure js webgl image editor with effects, filters, crop & perspective correction, etc.My goal is to give the community an opensource solution as unfortunately most comparable apps are closed sources. https://ift.tt/d0Qn2gD to try it out ( https://ift.tt/XDvLlcI )
Show HN: Sim Studio – Open-Source Agent Workflow GUI
3 by waleedlatif1 | 0 comments on Hacker News.
Hi HN! We're Emir and Waleed, and we're building Sim Studio ( https://simstudio.ai ), an open-source drag and drop UI for building and managing multi-agent workflows as a directed graph. You can define how agents interact with each other, use tools, and handle complex logic like branching, loops, transformations, and conditional execution. Our repo is https://ift.tt/ho4zuRv , docs are at https://ift.tt/C4XMmfU , and we have a demo here: https://youtu.be/JlCktXTY8sE?si=uBAf0x-EKxZmT9w4 Building reliable, multi-step agent systems with current frameworks often gets complicated fast. In OpenAI's 'practical guide to building agents', they claim that the non-declarative approach and single multi-step agents are the best path forward, but from experience and experimentation, we disagree. Debugging these implicit flows across multiple agent calls and tool uses is painful, and iterating on the logic or prompts becomes slow. We built Sim Studio because we believe defining the workflow explicitly and visually is the key to building more reliable and maintainable agentic applications. In Sim Studio, you design the entire architecture, comprising of agent blocks that have system prompts, a variety of models (hosted and local via ollama), tools with granular tool use control, and structured output. We have plenty of pre-built integrations that you can use as standalone blocks or as tools for your agents. The nodes are all connected with if/else conditional blocks, llm-based routing, loops, and branching logic for specialized agents. Also, the visual graph isn't just for prototyping and is actually executable. You can run simulations of the workflows 1, 10, 100 times to see how modifying any small system prompt change, underlying model, or tool call change change impacts the overall performance of the workflow. You can trigger the workflows manually, deploy as an API and interact via HTTP, or schedule the workflows to run periodically. They can also be set up to trigger on incoming webhooks and deployed as standalone chat instances that can be password or domain-protected. We have granular trace spans, logs, and observability built-in so you can easily compare and contrast performance across different model providers and tools. All of these things enable a tighter feedback loop and significantly faster iteration. So far, users have built deep research agents to detect application fraud, chatbots to interface with their internal HR documentation, and agents to automate communication between manufacturing facilities. Sim Studio is Apache 2.0 licensed, and fully open source. We're excited about bringing a visual, workflow-centric approach to agent development. We think it makes building robust, complex agentic workflows far more accessible and reliable. We'd love to hear the HN community's thoughts!
3 by waleedlatif1 | 0 comments on Hacker News.
Hi HN! We're Emir and Waleed, and we're building Sim Studio ( https://simstudio.ai ), an open-source drag and drop UI for building and managing multi-agent workflows as a directed graph. You can define how agents interact with each other, use tools, and handle complex logic like branching, loops, transformations, and conditional execution. Our repo is https://ift.tt/ho4zuRv , docs are at https://ift.tt/C4XMmfU , and we have a demo here: https://youtu.be/JlCktXTY8sE?si=uBAf0x-EKxZmT9w4 Building reliable, multi-step agent systems with current frameworks often gets complicated fast. In OpenAI's 'practical guide to building agents', they claim that the non-declarative approach and single multi-step agents are the best path forward, but from experience and experimentation, we disagree. Debugging these implicit flows across multiple agent calls and tool uses is painful, and iterating on the logic or prompts becomes slow. We built Sim Studio because we believe defining the workflow explicitly and visually is the key to building more reliable and maintainable agentic applications. In Sim Studio, you design the entire architecture, comprising of agent blocks that have system prompts, a variety of models (hosted and local via ollama), tools with granular tool use control, and structured output. We have plenty of pre-built integrations that you can use as standalone blocks or as tools for your agents. The nodes are all connected with if/else conditional blocks, llm-based routing, loops, and branching logic for specialized agents. Also, the visual graph isn't just for prototyping and is actually executable. You can run simulations of the workflows 1, 10, 100 times to see how modifying any small system prompt change, underlying model, or tool call change change impacts the overall performance of the workflow. You can trigger the workflows manually, deploy as an API and interact via HTTP, or schedule the workflows to run periodically. They can also be set up to trigger on incoming webhooks and deployed as standalone chat instances that can be password or domain-protected. We have granular trace spans, logs, and observability built-in so you can easily compare and contrast performance across different model providers and tools. All of these things enable a tighter feedback loop and significantly faster iteration. So far, users have built deep research agents to detect application fraud, chatbots to interface with their internal HR documentation, and agents to automate communication between manufacturing facilities. Sim Studio is Apache 2.0 licensed, and fully open source. We're excited about bringing a visual, workflow-centric approach to agent development. We think it makes building robust, complex agentic workflows far more accessible and reliable. We'd love to hear the HN community's thoughts!