Skip navigation
Please use this identifier to cite or link to this item: http://arks.princeton.edu/ark:/99999/fk4hb0v79x
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorMittal, Prateek
dc.contributor.authorXiang, Chong
dc.contributor.otherElectrical and Computer Engineering Department
dc.date.accessioned2025-02-11T15:40:10Z-
dc.date.available2025-02-11T15:40:10Z-
dc.date.created2024-01-01
dc.date.issued2025
dc.identifier.urihttp://arks.princeton.edu/ark:/99999/fk4hb0v79x-
dc.description.abstractBuilding secure and robust AI models has proven to be difficult. Nearly all defenses, including those published at top-tier venues and recognized with prestigious awards, can be circumvented by adaptive attackers, who adjust their attack strategies once they learn about the underlying defense algorithms. This dissertation studies one of the most challenging problems in AI security: How can we design defenses with formal robustness guarantees that remain effective against future adaptive attacks? We target the concept of certifiable robustness: it aims to certifiably establish a provable lower bound on model robustness against all possible attacks within a given threat model. Specifically, we study the threat of localized corruption attacks: the attacker arbitrarily corrupts part of the input to induce inaccurate model predictions at the inference time. It is one of the most practical and common threats to AI models across a wide range of tasks, architectures, and data modalities. To mitigate localized attacks across different settings, we develop two generalizable defense principles, small receptive fields and input masking, and leverage them to design six certifiably robust algorithms. We structure this dissertation into three parts. The first part studies robust image classification and introduces three algorithms: PatchGuard, PatchCleanser, and PatchCURE. The second part researches robust object detection, presenting two algorithms DetectorGuard and ObjectSeeker. The final part examines text generation with large language models, detailing a robust retrieve-augmented generation algorithm named RobustRAG. Notably, the algorithms presented in this dissertation scale effectively to large AI models like ViT and Llama, and are evaluated on large realistic datasets like ImageNet and Natural Questions. Several defenses achieve high certifiable robustness while maintaining benign model utility close to that of undefended models (e.g., 1% difference). These results represent one of the few notable advancements in AI security over the past few years, and we hope they inspire researchers to reflect on how we approach the challenges of securing AI systems.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.publisherPrinceton, NJ : Princeton University
dc.subjectAdversarial Patch Attacks
dc.subjectAI Security
dc.subjectCertifiable Robustness
dc.subjectLocalized Corruption Attacks
dc.subjectRetrieval Corruption Attacks
dc.subject.classificationComputer science
dc.titleCertifiable AI Security against Localized Corruption Attacks
dc.typeAcademic dissertations (Ph.D.)
pu.date.classyear2025
pu.departmentElectrical and Computer Engineering
Appears in Collections:Electrical Engineering

Files in This Item:
File SizeFormat 
Xiang_princeton_0181D_15343.pdf17.83 MBAdobe PDFView/Download


Items in Dataspace are protected by copyright, with all rights reserved, unless otherwise indicated.