Agent Safety

Agent safety is a critical concern for deploying LLM-based agents in the real world. This requires them to remain aligned, controllable, and recoverable even under uncertainty. While this direction is relatively underexplored, four key strategies emerge as promising for addressing the safety concerns. First, establish rigorous permission management and controlled environmental affordances to ensure safe operations, when agents invoke tools, execute physical operations, access critical business assets, or interact with sensitive data. Second, architect agents from security-hardened components, e.g. scaffoldsfortified against hijacking and code injection, memory modules that enforced confidentiality, and trustworthy MCP (Model Context Protocol) channels secured through cryptographic verification. Third, build system resilience by serializing security states into snapshots, enabling restoration and rollback to the last verified configuration, or fallback to hardened policies upon detecting anomalies, goal drift, or unauthorized actions. Fourth, introduce supervisory agents when building multi-agent systems to prevent collusion or rouge behaviors arising from information asymmetry and cascade amplification.

If you find this work helpful, please consider citing our paper:

@article{hu2025hands,
  title={Hands-on LLM-based Agents: A Tutorial for General Audiences},
  author={Hu, Shuyue and Ren, Siyue and Chen, Yang and Mu, Chunjiang and Liu, Jinyi and Cui, Zhiyao and Zhang, Yiqun and Li, Hao and Zhou, Dongzhan and Xu, Jia and others},
  journal={Hands-on},
  volume={21},
  pages={6},
  year={2025}
}