John Zachary Fitch
Agent tooling - systems performance - privacy-first infrastructure
I build tools that make agents reliable in real codebases: fast retrieval, verifiable edits, reusable skills/plugins, and execution environments you can reason about.
Recent highlight: I investigated and helped fix a subtle Codex CLI release-build regression where a pre-main hardening step stripped LD_* / DYLD_* for tool subprocesses. In CUDA/Conda/MKL/HPC-style setups, this could silently force slow fallback paths (often 10x+, sometimes 100x+ when GPU workflows fell back to CPU).
Featured Impact (Jan 2026)
Ghost in the Codex Machine - OpenAI Codex
A security-hardening routine ran before main() in release builds and stripped LD_* / DYLD_* environment variables. In certain CUDA/Conda/MKL environments, that made critical dynamic libraries effectively "disappear" inside tool subprocesses, quietly pushing workflows onto slow fallback paths. I reproduced the issue with a minimal harness, attached representative measurements, and worked with the Codex maintainers to ship the upstream fix (credited in release notes). Because this lives in the CLI substrate, it improves the baseline for every tool call and removes a hard-to-diagnose failure mode.
Impact (representative; varies by environment)
| Workload | Before | After | Why this happens |
|---|---|---|---|
| MKL/BLAS (repro harness) | ~2.71s | ~0.239s | Losing LD_LIBRARY_PATH forces a slow BLAS fallback |
| CUDA workflows | 11x-300x slower | restored | Missing CUDA libs can trigger CPU fallback in downstream tooling |
What this demonstrates
- Systems debugging under real-world constraints (pre-main execution, release-only behavior, silent failures)
- Performance engineering with reproducible measurement
- Security tradeoff reasoning that preserves developer usability
- Upstream collaboration: issue -> fix -> verification -> shipped release notes
What I Build
- Agent tooling: local-first retrieval, MCP tool surfaces, and patch application that stays deterministic and debuggable.
- Systems performance: mmap indices, custom binary formats, and fast search over large file trees.
- ML in the product surface: WebGPU client-side inference and evaluation-aware UX.
- Privacy-first infrastructure: declarative NixOS deployments with post-quantum security and DNS/cert automation.
Selected Work
pyghidra-lite
Token-efficient MCP server for tool-driven program analysis. Official MCP registry: io.github.johnzfitch/pyghidra-lite (v0.1.1, active).
SpecHO v2
161D linguistic fingerprinting for AI text detection (algorithm-first + tiered runtime).
definitelynot.ai
Unicode-security-aware sanitizer (Trojan Source, BiDi, homoglyph defense).
Infrastructure Snapshot
NixOS + Post-Quantum Security
I run production infrastructure on dedicated bare metal with declarative NixOS configuration, authoritative DNS, automated wildcard certs via DNS-01 (RFC2136), and a post-quantum VPN layer (WireGuard + Rosenpass). I optimize for data sovereignty, reliability, and a small, auditable surface area.
Highlights
- Multi-IP, multi-subnet DNS redundancy and DDoS resilience
- Caddy with HTTP/3, rootless containers (Podman), and reproducible deployments
- Post-quantum SSH key exchange + post-quantum VPN key exchange overlay
- Encrypted secrets and automated backups
Now
- Looking for: roles building agent runtimes, developer tools, retrieval systems, and privacy/security foundations.
- Operating style: evidence-first, performance-aware, pragmatic about tradeoffs.