Research Themes

My research builds on a principled, algorithmic foundation to advance both the offensive and defensive fronts of security and privacy in ML and AI pipelines. My group works across three complementary themes, namely AI for Security, Security of AI, and Scalable Encrypted Systems. Check out my full list of publications.

AI for Security

AI-Enabled Security Analysis

Papers: IEEE S&P 2024, USENIX Security 2025 (LLMmap), arXiv (Hacking Back the AI-Hacker)
Awards: Finalist for "Best Cryptographic Attack" at Pwnie Awards 2024; Finalist for "Most Epic Achievement" at Pwnie Awards 2025
Vulneranility Disclosure: Cloudflare Inc.
Media Coverage: Dark Reading - Schneier on Security

Description: We study how AI can be deployed as a tool for security analysis when paired with a careful examination of the algorithmic structure of the target system. For Cloudflare's MIGP protocol (second-generation compromised-credential checking), our cryptanalysis of the protocol's tagging mechanism identifies a blindspot in the cryptographic protocol, and a neural network trained on this leakage converts the leakage into recovery of encrypted breached passwords, achieving roughly 62% recovery with a single online query and 90% with fifty. LLMmap treats LLM identification as a discrimination problem over response distributions, where a small set of probes selected to bring out distinguishing behaviors across alignment, refusal, and malformed input regimes suffices to identify the underlying model with high accuracy. Mantis recasts prompt injection from an attack vector into a defense by embedding adversarial prompts into decoy services, so that LLM-driven attackers are diverted into traps or self-compromise.

Security of AI

Manipulating the Use of Machine Learning in Systems

Papers: NeurIPS 2021, SIGMOD 2022, VLDB 2024, USENIX Security 2026 (AI Oops)
Media coverage: The Register - Schneier on Security

Description: Modern data and IT systems increasingly rely on learned components whose efficiency guarantees rest on assumptions about the input distribution. Our SIGMOD 2022 paper, the very first attack on a learned system, formalizes the dependence of learned indexes on the cumulative distribution function and uses convex optimization to construct poisoning sets that maximally distort it under a budget constraint, inflating lookup cost by up to 300×. Our VLDB 2024 work formulates adversarial workload generation against dynamic learned indexes as a knapsack-style combinatorial optimization, yielding both space and time complexity attacks with slowdowns up to 1,600×. Our NeurIPS 2021 work studies adversarial examples for k-NN classifiers through higher-order Voronoi diagrams from computational geometry, reducing the search for minimal perturbations to a geometric problem with provable optimality guarantees. Extending the line to LLM-driven infrastructure, AIOpsDoom presents the first formal security analysis of LLM-based AIOps systems and introduces telemetry injection, while AIOpsShield defends by exploiting the structured nature of telemetry to separate authentic from injected signals.

The Two Paths to Privacy-Preserving Learning, Cryptographic and Hardware-Assisted

Papers: CCS 2024 (AITIA), arXiv (TENNOR)

Description: Privacy-preserving training and inference admit two extremes, namely cryptographic protection that assumes no trust in the stack, and hardware-assisted execution inside a trusted component. AITIA addresses the cryptographic side with the first secure two-party protocol for bivariate causal discovery. Rather than retrofitting an existing regression learner into MPC, we co-design BSGD-SVR, a budgeted stochastic gradient descent variant for support vector regression that removes data-dependent branching, enforces oblivious updates, and bounds support vector growth, reducing training complexity and yielding 3.6×–340× speedups over baselines. TENNOR addresses the hardware side by enabling doubly-oblivious training of wide neural networks (output layers up to 325K neurons) inside Intel TDX. The key algorithmic step recasts sparse neuron activation as a retrieval problem over a locality-sensitive hashing structure and introduces the first multi-probe scheme for rank-based LSH, eliminating the storage blow-up of multi-table designs while preserving retrieval quality.

Foundations of Safety

Papers: arXiv (Safe Language Generation in the Limit)

Description: Safety and alignment in generative AI are typically studied empirically, tied to specific architectures. We take a different approach, grounding these questions in formal learning theory. Building on Kleinberg and Mullainathan's result that generation in the limit is solvable even though identification is not, we establish rigorous guarantees for safe language generation that are all-encompassing in the learning-theoretic sense: they apply to any learner within this model, independently of the underlying deep learning architecture.

Scalable Encrypted Systems

Leakage-Abuse Attacks and Quantifying Privacy in Searchable Encryption

Papers: IEEE S&P 2019, IEEE S&P 2020, IEEE S&P 2021, CCS 2022, arXiv (LAMa)

Description: Searchable encryption schemes balance query functionality with confidentiality, but the access patterns they reveal can be exploited to recover the underlying plaintext. The attacks in this line draw on tools from computational geometry, statistical learning theory, and combinatorial optimization to exploit the leakage of k-NN queries, queries from non-uniform distributions, and response-hiding range schemes. On the defensive side, our CCS 2022 work introduces leakage inversion. By treating leakage as a function and analyzing its preimage, we define the reconstruction space as the set of databases consistent with the observed leakage and derive closed-form expressions and bounds on its entropy and diameter, yielding privacy quantities that depend only on the scheme. Most recently, LAMa formalizes frequency matching as cryptanalysis by encoding the matching constraints as an SMT instance, extending the analysis to encrypted range queries in arbitrary dimensions and giving the first guarantees that match the information-theoretic limits of reconstruction.

Privacy-Preserving Data Structures

Papers: ESORICS 2016, IEEE EuroS&P 2017, IEEE EuroS&P 2019

Description: Classical data structures are designed for correctness and efficiency, but security-sensitive deployments must also withstand adversarial inputs and avoid leaking information through their internal state. Our ESORICS 2016 work gives a hash table based on linear probing that is simultaneously weakly history-independent, in the sense that its memory layout reveals only the current set rather than the insertion sequence, and secure against collision-timing attacks, combining two properties that earlier constructions achieved only in isolation. Our EuroS&P 2017 work introduces auditable data structures, a model that bridges weak and strong history-independence by supporting observations at arbitrary times while relaxing the restriction that the structure cannot react to them, leveraging the practical fact that data owners typically know when audits occur and yielding constructions far more efficient than strongly history-independent ones. Our EuroS&P 2019 work initiates the study of adversarial inputs against secure similarity-approximation protocols based on sketching, exploiting the common randomness shared by participants to perturb inputs offline before any secure protocol is executed and steer the approximation toward an attacker-chosen output, demonstrating attacks on widely deployed sketches and proposing mitigations.

Cross-Disciplinary Synergies

Auditing Anonymization Tools for Clinical Data

Papers: CCS 2025
Vulneranility Disclosure: ARX Anonymization Tool
Media Coverage: Mason Spirit Magazine

Description: In collaboration with the School of Nursing and the Mason and Partners Clinics, which serve uninsured and refugee populations in Northern Virginia, we audited ARX, a widely used open-source anonymization tool endorsed in policy guidelines by the UK Anonymisation Network and the European Medicines Agency. Our Combinatorial Refinement Attacks are the first against ARX's local recoding implementation that succeed without auxiliary information. We cast the released equivalence classes as a linear program whose constraints encode the dependencies induced by ARX's greedy utility maximization, and solve it to prune the space of plausible re-identifications by up to 39,000× on real clinical datasets. The analysis reveals the counterintuitive fact that larger k can amplify leakage rather than mitigate it. We disclosed the vulnerabilities to the ARX developers, who updated their documentation and code in response.

Algorithms and Graphs

Papers: WSDM 2014, COCOA 2015, JGAA 2016, DMKD 2016, CGF 2017

Description: This line of work includes sampling-based estimators for betweenness centrality with concentration bounds derived via VC-dimension arguments, visualization techniques for directed acyclic graphs that exploit dominance and orthogonality, and adaptive probe scheduling for rapid event detection.