Ph.D. student in Computer Science · Stevens Institute of Technology · Expected 2028
Current AI agents are powerful but fundamentally unverifiable. My work focuses on making them reliable for high-stakes domains.
I'm a Ph.D. student in Computer Science at
Stevens Institute of Technology,
advised by Prof. William Eiers.
My research improves the reliability of large language models with neurosymbolic methods, focusing on systems that can provide hard correctness guarantees. I study how to bring these guarantees to high-stakes domains where LLMs are increasingly deployed, including finance, healthcare, and cloud infrastructure.
Outside research, I enjoy biking, hiking, and running.
We show that disagreements between independently-generated LLM formalizations of the same requirement reliably signal latent ambiguity, and that SMT-generated counterexamples can localize and repair it.
We show that SMT-based fault localization, which pinpoints the exact policy statement at fault before invoking the LLM, more than doubles repair rates over LLM-only baselines on real-world AWS policies.
We show that LLMs fail to reason about policy semantics with reliability-grade precision, even when given their own explanations as input, and that pairing automata-derived characterizations with LLM simplification resolves this gap.
Projects
VERIMED
VERIMED audits natural-language software requirements before they become formal models or implementations. It checks for ambiguity, inconsistency, vacuousness, redundancy, and safety violations, producing solver-backed evidence instead of subjective review.
Key insight: ambiguity in a requirement causes independent LLMs to produce contradictory formalizations — a disagreement SMT can detect and localize with a concrete counterexample.
CloudFix
CloudFix repairs AWS IAM access-control policies with a verified LLM workflow. It localizes the faulty policy statement with SMT, asks an LLM for a targeted patch, and formally checks the result before accepting it.
AWS IAMPolicy repairFault localizationQuacky + Z3
Evaluated on 282 real-world AWS IAM policies84.0% repair rate vs. 48.2% baseline — a 36-point improvement on 10-request tasks54.3% vs. 22.3% baseline on harder 30-request tasks — more than double the baseline
Key insight: SMT fault localization identifies which specific policy statement is wrong before the LLM is asked to repair it, dramatically narrowing the repair search space.
POLICYSUMMARIZER
POLICYSUMMARIZER characterizes what cloud access-control policies actually allow. It converts AWS, Azure, or GCP policies into automata, simplifies the result with an LLM, and verifies the summary against the formal representation before accepting it.
Non-reasoning LLMs top out at 45.8% semantic equivalence, highlighting deployment reliability limitsReasoning LLMs reach 86.4%–93.7% — highlighting the gap between model classesFormal summarizer achieves 0.889 mean similarity on semantic request descriptions
Key insight: automata-based equivalence checking verifies that a natural-language summary is semantically correct, not just fluent — guaranteeing the description matches what the policy actually enforces.
News
2026
May 2026Preprint released, Neurosymbolic Auditing of Natural-Language Software Requirements (arXiv).
Mar 2026Presented CloudFix at SANER 2026, Limassol, Cyprus.
Mar 2026Passed Ph.D. oral qualifying exam.
2025
Dec 2025Paper accepted to SANER 2026 (Research Track).
2024
May 2024Passed Ph.D. written qualifying exam.
2021
Aug 2021Selected as a Global UGRAD Scholar by the U.S. Department of State.