Certifiable AI Security against Localized Corruption Attacks

Xiang, Chong

Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4hb0v79x

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Mittal, Prateek
dc.contributor.author	Xiang, Chong
dc.contributor.other	Electrical and Computer Engineering Department
dc.date.accessioned	2025-02-11T15:40:10Z	-
dc.date.available	2025-02-11T15:40:10Z	-
dc.date.created	2024-01-01
dc.date.issued	2025
dc.identifier.uri	http://arks.princeton.edu/ark:/99999/fk4hb0v79x	-
dc.description.abstract	Building secure and robust AI models has proven to be difficult. Nearly all defenses, including those published at top-tier venues and recognized with prestigious awards, can be circumvented by adaptive attackers, who adjust their attack strategies once they learn about the underlying defense algorithms. This dissertation studies one of the most challenging problems in AI security: How can we design defenses with formal robustness guarantees that remain effective against future adaptive attacks? We target the concept of certifiable robustness: it aims to certifiably establish a provable lower bound on model robustness against all possible attacks within a given threat model. Specifically, we study the threat of localized corruption attacks: the attacker arbitrarily corrupts part of the input to induce inaccurate model predictions at the inference time. It is one of the most practical and common threats to AI models across a wide range of tasks, architectures, and data modalities. To mitigate localized attacks across different settings, we develop two generalizable defense principles, small receptive fields and input masking, and leverage them to design six certifiably robust algorithms. We structure this dissertation into three parts. The first part studies robust image classification and introduces three algorithms: PatchGuard, PatchCleanser, and PatchCURE. The second part researches robust object detection, presenting two algorithms DetectorGuard and ObjectSeeker. The final part examines text generation with large language models, detailing a robust retrieve-augmented generation algorithm named RobustRAG. Notably, the algorithms presented in this dissertation scale effectively to large AI models like ViT and Llama, and are evaluated on large realistic datasets like ImageNet and Natural Questions. Several defenses achieve high certifiable robustness while maintaining benign model utility close to that of undefended models (e.g., 1% difference). These results represent one of the few notable advancements in AI security over the past few years, and we hope they inspire researchers to reflect on how we approach the challenges of securing AI systems.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Princeton, NJ : Princeton University
dc.subject	Adversarial Patch Attacks
dc.subject	AI Security
dc.subject	Certifiable Robustness
dc.subject	Localized Corruption Attacks
dc.subject	Retrieval Corruption Attacks
dc.subject.classification	Computer science
dc.title	Certifiable AI Security against Localized Corruption Attacks
dc.type	Academic dissertations (Ph.D.)
pu.date.classyear	2025
pu.department	Electrical and Computer Engineering
Appears in Collections:	Electrical Engineering

Files in This Item:

File	Size	Format
Xiang_princeton_0181D_15343.pdf	17.83 MB	Adobe PDF	View/Download

Show simple item record

Search

Browse