Bethel Hall

Bethel Hall (ቤቴል ሆል)

Ph.D. student in Computer Science · Stevens Institute of Technology · Expected 2028
Current AI agents are powerful but fundamentally unverifiable. My work focuses on making them reliable for high-stakes domains.

I'm a Ph.D. student in Computer Science at Stevens Institute of Technology, advised by Prof. William Eiers. My research improves the reliability of large language models with neurosymbolic methods, focusing on systems that can provide hard correctness guarantees. I study how to bring these guarantees to high-stakes domains where LLMs are increasingly deployed, including finance, healthcare, and cloud infrastructure.

Outside research, I enjoy biking, hiking, and running.

Publications

Neurosymbolic Auditing of Natural-Language Software Requirements

Bethel Hall, William Eiers
Preprint, arXiv:2605.13817

CloudFix: Automated Policy Repair for Cloud Access Control Policies Using Large Language Models SANER 2026

Bethel Hall, Owen Ungaro, William Eiers
SANER 2026 (Research Track)

Exploring Large Language Models for Access Control Policy Synthesis and Summarization

Adarsh Vatsa, Bethel Hall, William Eiers
Preprint, arXiv:2510.20692

Projects

VERIMED

VERIMED audits natural-language software requirements before they become formal models or implementations. It checks for ambiguity, inconsistency, vacuousness, redundancy, and safety violations, producing solver-backed evidence instead of subjective review.

Natural-language requirements SMT Z3 Counterexample-guided repair
100% formalization rate — 64/64 requirements converted without manual intervention QA accuracy lifted from 55.4% to 98.5% using SMT counterexample feedback (ABZ hemodialysis benchmark, n=64) Caught 12 latent ambiguities in real hemodialysis machine specifications
Method
Results
Key insight: ambiguity in a requirement causes independent LLMs to produce contradictory formalizations — a disagreement SMT can detect and localize with a concrete counterexample.

CloudFix

CloudFix repairs AWS IAM access-control policies with a verified LLM workflow. It localizes the faulty policy statement with SMT, asks an LLM for a targeted patch, and formally checks the result before accepting it.

AWS IAM Policy repair Fault localization Quacky + Z3
Evaluated on 282 real-world AWS IAM policies 84.0% repair rate vs. 48.2% baseline — a 36-point improvement on 10-request tasks 54.3% vs. 22.3% baseline on harder 30-request tasks — more than double the baseline
Method
Results
Key insight: SMT fault localization identifies which specific policy statement is wrong before the LLM is asked to repair it, dramatically narrowing the repair search space.

POLICYSUMMARIZER

POLICYSUMMARIZER characterizes what cloud access-control policies actually allow. It converts AWS, Azure, or GCP policies into automata, simplifies the result with an LLM, and verifies the summary against the formal representation before accepting it.

LLM evaluation Neurosymbolic AI Regex simplification Human-AI collaboration
Non-reasoning LLMs top out at 45.8% semantic equivalence, highlighting deployment reliability limits Reasoning LLMs reach 86.4%–93.7% — highlighting the gap between model classes Formal summarizer achieves 0.889 mean similarity on semantic request descriptions
Method
Results
Key insight: automata-based equivalence checking verifies that a natural-language summary is semantically correct, not just fluent — guaranteeing the description matches what the policy actually enforces.

News