AISecHub
1.47K subscribers
546 photos
35 videos
254 files
1.4K links
https://linktr.ee/aisechub managed by AISecHub. Sponsored by: innovguard.com
Download Telegram
An Addendum to the Guidelines and Companion Guide on Securing AI Systems
https://www.csa.gov.sg/resources/publications/addendum-on-securing-ai-systems/
2
An Assessment Framework for Evaluating Agentic AI Systems - https://arxiv.org/pdf/2512.12791

In this work, we present the first steps towards an assessment framework for evaluating agentic AI systems across four pillars.

1️⃣ LLM - Did the agent consult the relevant policies/guidelines before taking action, and did it follow them?

2️⃣ Memory - Did the agent retrieve the right past context (and avoid pulling irrelevant/incorrect memories) when deciding what to do?

3️⃣ Tools - Did the agent call the correct tools, in the correct order, with the correct parameters and required verification steps?

4️⃣ Environment - Did the agent operate within enforced guardrails/permissions, and did it avoid triggering environment protection violations?
How Dark Patterns Manipulate Web Agents - https://arxiv.org/pdf/2512.22894 | https://agentdarkpatterns.org/

Consider a common scenario: You need to purchase flowers quickly. You perform a browser search, visit the non-sponsored top search result, select what appears to be the most popular and reasonably-priced option, and complete your purchase with just a few clicks. The process seems routine until you realize the most expensive bouquet and premium shipping were pre-selected and purchased simply because you did not opt out. This illustrates an example of sneaking, a form of dark pattern common on today’s internet, which can also manifest in many other forms.

This raises a critical question:

Can web agents, particularly those operating autonomously online, also be manipulated by dark patterns to act against their users’ intents and goals?

Across evaluated agents, dark patterns steer agent trajectories in more than 70% of cases, compared to about 31% for humans.
👍2
AISecHub - https://x.com/AISecHub/ - More than 500,000 Impressions in one month!!!
🔥1
Implementing Secure AI
Framework Controls in Google Cloud - New Version

https://services.google.com/fh/files/misc/ociso_2025_saif_cloud_paper.pdf
Skynet Starter Kit - https://media.ccc.de/v/39c3-skynet-starter-kit-from-embodied-ai-jailbreak-to-remote-takeover-of-humanoid-robots#t=26 - CCC

From Embodied AI Jailbreak to Remote Takeover of Humanoid Robots

We present a comprehensive security assessment of Unitree's robotic ecosystem. We identified and exploited multiple security flaws across multiple communication channels, including Bluetooth, LoRa radio, WebRTC, and cloud management services. Besides pwning multiple traditional binary or web vulnerabilities, we also exploit the embodied AI agent in the robots, performing prompt injection and achieve root-level remote code execution. Furthermore, we leverage a flaw in cloud management services to take over any Unitree G1 robot connected to the Internet. By deobfuscating and patching the customized, VM-based obfuscated binaries, we successfully unlocked forbidden robotic movements restricted by the vendor firmware on consumer models such as the G1 AIR. We hope our findings could offer a roadmap for manufacturers to strengthen robotic designs, while arming researchers and consumers with critical knowledge to assess security in next-generation robotic systems.
AI-generated content in Wikipedia - https://www.youtube.com/watch?v=fKU0V9hQMnY by @presroi

I successfully failed with a literature related project and accidentally built a ChatGPT detector. Then I spoke to the people who uploaded ChatGPT generated content on Wikipedia.

It began as a standard maintenance project: I wanted to write a tool to find and fix broken ISBN references in Wikipedia. Using the built-in checksum, this seemed like a straightforward technical task. I expected to find mostly typos. But I also found texts generated by LLMs. These models are effective at creating plausible-sounding content, but (for now) they often fail to generate correct checksums for identifiers like ISBNs.

This vulnerability turned my tool into an unintentional detector for this type of content. This talk is the story of that investigation. I'll show how the tool works and how it identifies this anti-knowledge. But the tech is only half the story. The other half is human. I contacted the editors who had added this undeclared AI content. I will talk about why they did it and how the Wikipedians reacted and whether "The End is Nigh" calls might be warranted.
AI Agent, AI Spy - https://media.ccc.de/v/39c3-ai-agent-ai-spy#t=246

Agentic AI is the catch-all term for AI-enabled systems that propose to complete more or less complex tasks on their own, without stopping to ask permission or consent. What could go wrong?

These systems are being integrated directly into operating systems and applications, like web browsers. This move represents a fundamental paradigm shift, transforming them from relatively neutral resource managers into an active, goal-oriented infrastructure ultimately controlled by the companies that develop these systems, not by users or application developers.

Systems like Microsoft's "Recall," which create a comprehensive "photographic memory" of all user activity, are marketed as productivity enhancers, but they function as OS-level surveillance and create significant privacy vulnerabilities. In the case of Recall, we’re talking about a centralized, high-value target for attackers that poses an existential threat to the privacy guarantees of meticulously engineered applications like Signal. This shift also fundamentally undermines personal agency, replacing individual choice and discovery with automated, opaque recommendations that can obscure commercial interests and erode individual autonomy.

This talk will review the immediate and serious danger that the rush to shove agents into our devices and digital lives poses to our fundamental right to privacy and our capacity for genuine personal agency. Drawing from Signal's analysis, it moves beyond outlining the problem to also present a "tourniquet" solution: looking at what we need to do *now* to ensure that privacy at the application layer isn’t eliminated, and what the hacker community can do to help. We will outline a path for ensuring developer agency, granular user control, radical transparency, and the role of adversarial research.
Data that is too dirty for "AI" - https://media.ccc.de/v/39c3-a-media-almost-archaeology-on-data-that-is-too-dirty-for-ai#t=202 by #jiawenuffline

In 1980s, non-white women’s body size data was categorized as dirty data when establishing the first women's sizing system in US. Now in the age of GPT, what is considered as dirty data and how are they removed from massive training materials?

Datasets nowadays for training large models have been expanded to the volume of (partial) internet, with the idea of “scale averages out noise”, these datasets were scaled up by scrabbling whatever available data on the internet for free then “cleaned” with a human-not-in-the-loop, cheaper-than-cheap-labor method: heuristic filtering. Heuristics in this context are basically a set of rules came up by the engineers with their imagination and estimation that are “good enough” to remove “dirty data” of their perspective, not guaranteed to be optimal, perfect, or rational.

The talk will show some intriguing patterns of “dirty data” from 23 extraction-based datasets, like how NSFW gradually equals to NSFTM (not safe for training model), and reflect on these silent, anonymous yet upheld estimations and not-guaranteed rationalities in current sociotechnical artifacts, and ask for whom these estimations are good-enough, as it will soon be part our technological infrastructures.
Breaking BOTS: Cheating at Blue Team CTFs with AI Speed-Runs - CCC
https://media.ccc.de/v/39c3-breaking-bots-cheating-at-blue-team-ctfs-with-ai-speed-runs#t=93

After we announced our results, CTFs like Splunk's Boss of the SOC (BOTS) started prohibiting AI agents. For science & profit, we keep doing it anyways. In BOTS, the AIs solve most of it in under 10 minutes instead of taking the full day. Our recipe was surprisingly simple: Teach AI agents to self-plan their investigation steps, adapt their plans to new information, work with the SIEM DB, and reason about log dumps. No exotic models, no massive lab budgets - just publicly available LLMs mixed with a bit of science and perseverance. We'll walk through how that works, including videos of the many ways AI trips itself up that marketers would rather hide, and how to do it at home with free and open-source tools.

CTF organizers can't detect this - the arms race is probably over before it really began. But the real question isn't "can we cheat at CTFs?" It's what happens when investigations evolve from analysts-who-investigate to analysts-who-manage-AI-investigators. We'll show you what that transition already looks like today and peek into some uncomfortable questions about what comes next.
VulnLLM-R: Specialized Reasoning LLM for Vulnerability Detection - https://github.com/ucsb-mlsec/VulnLLM-R

Through extensive experiments on SOTA datasets across Python, C/C++, and Java, we show that VulnLLM-R has superior effectiveness and efficiency than SOTA static analysis tools and both open-source and commercial large reasoning models.

We further conduct a detailed ablation study to validate the key designs in our training recipe. Finally, we construct an agent scaffold around our model and show that it outperforms CodeQL and AFL++ in realworld projects. Our agent further discovers a set of zero-day vulnerabilities in actively maintained repositories. This work represents a pioneering effort to enable real-world, project-level vulnerability detection using AI agents powered by specialized reasoning models.

More info: https://arxiv.org/pdf/2512.07533
Cupcake - Make AI agents follow the rules - https://github.com/eqtylab/cupcake

Cupcake intercepts agent events and evaluates them against user-defined rules written in Open Policy Agent (OPA) Rego.

Agent actions can be blocked, modified, and auto-corrected by providing the agent helpful feedback. Additional benefits include reactive automation for tasks you dont need to rely on the agent to conduct (like linting after a file edit).

Why Cupcake?

Modern agents are powerful but inconsistent at following operational and security rules, especially as context grows. Cupcake turns the rules you already maintain (e.g., http://CLAUDE.md, http://AGENT.md, .cursor/rules) into enforceable guardrails that run before actions execute.

- Multi-harness support with first‑class integrations for Claude Code, Cursor, Factory AI, and OpenCode.

- Governance‑as‑code using OPA/Rego compiled to WebAssembly for fast, sandboxed evaluation.

- Enterprise‑ready controls: allow/deny/review, enriched audit trails for AI SOCs, and proactive warnings.
ARES-Dashboard - AI Red Team Operations Console https://github.com/Arnoldlarry15/ARES-Dashboard

Demo: https://ares-dashboard-mauve.vercel.app/

ARES is an AI Red Team Operations Dashboard for planning, executing, and auditing structured adversarial testing of AI systems across established risk frameworks.

ARES Dashboard is an enterprise-oriented AI red team operations console designed to help security teams, AI safety researchers, and governance programs conduct structured, repeatable, and auditable adversarial testing of AI systems.

ARES provides a centralized workspace for building attack manifests, managing red team campaigns, aligning assessments with recognized frameworks such as OWASP LLM Top 10 and MITRE, and exporting evidence for review and compliance workflows.

The system supports role-based access control, audit logging, persistent campaign storage, and optional AI-assisted scenario generation. A built-in demo mode allows full exploration of core functionality without requiring external API keys.

ARES is designed to serve as the operational execution layer within a broader AI safety and governance ecosystem, enabling disciplined red teaming without automating exploitation or removing human oversight.