My research builds on a principled, algorithmic foundation to advance both the offensive and defensive fronts of security and privacy in ML and AI pipelines. My group works across three complementary themes, namely AI for Security, Security of AI, and Scalable Encrypted Systems. Check out my full list of publications.
AI for Security
AI-Enabled Security Analysis
Papers:
IEEE S&P 2024,
USENIX Security 2025 (LLMmap),
arXiv (Hacking Back the AI-Hacker)
Awards: Finalist for "Best Cryptographic Attack" at Pwnie Awards 2024;
Finalist for "Most Epic Achievement" at Pwnie Awards 2025
Vulneranility Disclosure: Cloudflare Inc.
Media Coverage: Dark Reading - Schneier on
Security
Description:
We study how AI can be deployed as a tool for security analysis when
paired with a careful examination of the algorithmic structure of
the target system. For Cloudflare's MIGP protocol
(second-generation compromised-credential checking), our
cryptanalysis of the protocol's tagging mechanism identifies
a blindspot in the cryptographic protocol, and a neural network trained on this
leakage converts the leakage into recovery of encrypted breached
passwords, achieving roughly 62% recovery with a single online
query and 90% with fifty. LLMmap treats LLM identification as a
discrimination problem over response distributions, where a small
set of probes selected to bring out distinguishing behaviors
across alignment, refusal, and malformed input regimes suffices to
identify the underlying model with high accuracy. Mantis recasts
prompt injection from an attack vector into a defense by embedding
adversarial prompts into decoy services, so that LLM-driven
attackers are diverted into traps or self-compromise.
Security of AI
Manipulating the Use of Machine Learning in Systems
Papers:
NeurIPS 2021,
SIGMOD 2022,
VLDB 2024,
USENIX Security 2026 (AI Oops)
Media coverage:
The Register -
Schneier
on Security
Description:
Modern data and IT systems increasingly rely on learned components
whose efficiency guarantees rest on assumptions about the input
distribution. Our SIGMOD 2022 paper, the very first attack on
a learned system, formalizes the dependence of learned indexes on
the cumulative distribution function and uses convex optimization to
construct poisoning sets that maximally distort it under a budget
constraint, inflating lookup cost by up to 300×. Our VLDB 2024 work
formulates adversarial workload generation against dynamic learned
indexes as a knapsack-style combinatorial optimization, yielding
both space and time complexity attacks with slowdowns up to 1,600×.
Our NeurIPS 2021 work studies adversarial examples for k-NN
classifiers through higher-order Voronoi diagrams from computational
geometry, reducing the search for minimal perturbations to a
geometric problem with provable optimality guarantees. Extending
the line to LLM-driven infrastructure, AIOpsDoom presents the first
formal security analysis of LLM-based AIOps systems and introduces
telemetry injection, while AIOpsShield defends by exploiting the
structured nature of telemetry to separate authentic from injected
signals.
The Two Paths to Privacy-Preserving Learning, Cryptographic and Hardware-Assisted
Papers:
CCS 2024 (AITIA),
arXiv (TENNOR)
Description:
Privacy-preserving training and inference admit two extremes,
namely cryptographic protection that assumes no trust in the stack,
and hardware-assisted execution inside a trusted component. AITIA
addresses the cryptographic side with the first secure two-party
protocol for bivariate causal discovery. Rather than retrofitting
an existing regression learner into MPC, we co-design BSGD-SVR, a
budgeted stochastic gradient descent variant for support vector
regression that removes data-dependent branching, enforces
oblivious updates, and bounds support vector growth, reducing
training complexity and yielding 3.6×–340× speedups over baselines.
TENNOR addresses the hardware side by enabling doubly-oblivious
training of wide neural networks (output layers up to 325K neurons)
inside Intel TDX. The key algorithmic step recasts sparse neuron
activation as a retrieval problem over a locality-sensitive hashing
structure and introduces the first multi-probe scheme for
rank-based LSH, eliminating the storage blow-up of multi-table
designs while preserving retrieval quality.
Foundations of Safety
Papers:
arXiv (Safe Language Generation in the Limit)
Description:
Safety and alignment in generative AI are typically studied empirically,
tied to specific architectures. We take a different approach, grounding
these questions in formal learning theory. Building on Kleinberg and
Mullainathan's result that generation in the limit is solvable even
though identification is not, we establish rigorous guarantees for safe
language generation that are all-encompassing in the learning-theoretic
sense: they apply to any learner within this model, independently of the
underlying deep learning architecture.
Scalable Encrypted Systems
Leakage-Abuse Attacks and Quantifying Privacy in Searchable Encryption
Papers:
IEEE S&P 2019,
IEEE S&P 2020,
IEEE S&P 2021,
CCS 2022,
arXiv (LAMa)
Description:
Searchable encryption schemes balance query functionality with
confidentiality, but the access patterns they reveal can be
exploited to recover the underlying plaintext. The attacks in this
line draw on tools from computational geometry, statistical
learning theory, and combinatorial optimization to exploit the
leakage of k-NN queries, queries from non-uniform distributions,
and response-hiding range schemes. On the defensive side, our CCS
2022 work introduces leakage inversion. By treating leakage as a
function and analyzing its preimage, we define the
reconstruction space as the set of databases consistent with
the observed leakage and derive closed-form expressions and bounds
on its entropy and diameter, yielding privacy quantities that
depend only on the scheme. Most recently, LAMa formalizes
frequency matching as cryptanalysis by encoding the matching
constraints as an SMT instance, extending the analysis to encrypted
range queries in arbitrary dimensions and giving the first
guarantees that match the information-theoretic limits of
reconstruction.
Privacy-Preserving Data Structures
Papers:
ESORICS 2016,
IEEE EuroS&P 2017,
IEEE EuroS&P 2019
Description:
Classical data structures are designed for correctness and
efficiency, but security-sensitive deployments must also withstand
adversarial inputs and avoid leaking information through their
internal state. Our ESORICS 2016 work gives a hash table based on
linear probing that is simultaneously weakly history-independent,
in the sense that its memory layout reveals only the current set
rather than the insertion sequence, and secure against
collision-timing attacks, combining two properties that earlier
constructions achieved only in isolation. Our EuroS&P 2017
work introduces auditable data structures, a model that bridges
weak and strong history-independence by supporting observations at
arbitrary times while relaxing the restriction that the structure
cannot react to them, leveraging the practical fact that data
owners typically know when audits occur and yielding constructions
far more efficient than strongly history-independent ones. Our
EuroS&P 2019 work initiates the study of adversarial inputs
against secure similarity-approximation protocols based on
sketching, exploiting the common randomness shared by participants
to perturb inputs offline before any secure protocol is executed
and steer the approximation toward an attacker-chosen output,
demonstrating attacks on widely deployed sketches and proposing
mitigations.
Cross-Disciplinary Synergies
Auditing Anonymization Tools for Clinical Data
Papers:
CCS 2025
Vulneranility Disclosure: ARX Anonymization Tool
Media Coverage: Mason Spirit
Magazine
Description:
In collaboration with the School of Nursing and the Mason and
Partners Clinics, which serve uninsured and refugee populations in
Northern Virginia, we audited ARX, a widely used open-source
anonymization tool endorsed in policy guidelines by the UK
Anonymisation Network and the European Medicines Agency. Our
Combinatorial Refinement Attacks are the first against ARX's local
recoding implementation that succeed without auxiliary information.
We cast the released equivalence classes as a linear program whose
constraints encode the dependencies induced by ARX's greedy utility
maximization, and solve it to prune the space of plausible
re-identifications by up to 39,000× on real clinical datasets. The
analysis reveals the counterintuitive fact that larger k can
amplify leakage rather than mitigate it. We disclosed the
vulnerabilities to the ARX developers, who updated their
documentation and code in response.
Algorithms and Graphs
Papers:
WSDM 2014,
COCOA 2015,
JGAA 2016,
DMKD 2016,
CGF 2017
Description:
This line of work includes sampling-based estimators for
betweenness centrality with concentration bounds derived via
VC-dimension arguments, visualization techniques for directed
acyclic graphs that exploit dominance and orthogonality, and
adaptive probe scheduling for rapid event detection.
