ML Research Hub

✨GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs

📝 Summary:
GateBreaker is the first framework to compromise MoE LLM safety by identifying and disabling ~3% of safety neurons in expert layers. This raises attack success rates from 7.4% to 64.9% across eight LLMs and generalizes to VLMs, showing concentrated and transferable safety vulnerabilities.

🔹 Publication Date: Published on Dec 24

🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2512.21008
• PDF: https://arxiv.org/pdf/2512.21008

==================================

For more data science resources:
✓ https://t.me/DataScienceT

#LLM #AIsecurity #MoELLMs #AIvulnerability #GateBreaker

415 views09:02

✨ Explore Data Science 📝 Write your paper

About

Blog

Apps

Platform