Certifiable AI Security against Localized Corruption Attacks

Xiang, Chong

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4hb0v79x

Title:	Certifiable AI Security against Localized Corruption Attacks
Authors:	Xiang, Chong
Advisors:	Mittal, Prateek
Contributors:	Electrical and Computer Engineering Department
Keywords:	Adversarial Patch Attacks AI Security Certifiable Robustness Localized Corruption Attacks Retrieval Corruption Attacks
Subjects:	Computer science
Issue Date:	2025
Publisher:	Princeton, NJ : Princeton University
Abstract:	Building secure and robust AI models has proven to be difficult. Nearly all defenses, including those published at top-tier venues and recognized with prestigious awards, can be circumvented by adaptive attackers, who adjust their attack strategies once they learn about the underlying defense algorithms. This dissertation studies one of the most challenging problems in AI security: How can we design defenses with formal robustness guarantees that remain effective against future adaptive attacks? We target the concept of certifiable robustness: it aims to certifiably establish a provable lower bound on model robustness against all possible attacks within a given threat model. Specifically, we study the threat of localized corruption attacks: the attacker arbitrarily corrupts part of the input to induce inaccurate model predictions at the inference time. It is one of the most practical and common threats to AI models across a wide range of tasks, architectures, and data modalities. To mitigate localized attacks across different settings, we develop two generalizable defense principles, small receptive fields and input masking, and leverage them to design six certifiably robust algorithms. We structure this dissertation into three parts. The first part studies robust image classification and introduces three algorithms: PatchGuard, PatchCleanser, and PatchCURE. The second part researches robust object detection, presenting two algorithms DetectorGuard and ObjectSeeker. The final part examines text generation with large language models, detailing a robust retrieve-augmented generation algorithm named RobustRAG. Notably, the algorithms presented in this dissertation scale effectively to large AI models like ViT and Llama, and are evaluated on large realistic datasets like ImageNet and Natural Questions. Several defenses achieve high certifiable robustness while maintaining benign model utility close to that of undefended models (e.g., 1% difference). These results represent one of the few notable advancements in AI security over the past few years, and we hope they inspire researchers to reflect on how we approach the challenges of securing AI systems.
URI:	http://arks.princeton.edu/ark:/99999/fk4hb0v79x
Type of Material:	Academic dissertations (Ph.D.)
Language:	en
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Size	Format
Xiang_princeton_0181D_15343.pdf	17.83 MB	Adobe PDF	View/Download

Show full item record

Search

Browse