Agent Tooling
Local-first tooling that makes agents reliable in real repositories: deterministic retrieval, verifiable edits, MCP servers, and reusable skill packaging.
My bias is pragmatic. The tooling around the model should be reliable; the model itself doesn't have to be.
What that means for the substrate:
- Determinism over magic (same inputs produce the same outputs).
- Auditability over vibes (you can inspect what changed and why).
- Privacy-first defaults (local processing whenever possible).
Retrieval That Scales (Local-First Hybrid Search)
llmx
What: Local-first codebase indexer with hybrid retrieval — BM25 keyword ranking combined with neural embeddings (mdbr-leaf-ir) running locally via WebGPU/WASM, fused via Reciprocal Rank Fusion. No embedding service required; embeddings run in-browser/on-device or can be skipped entirely for BM25-only mode. Deterministic chunking and content hashing make exports reproducible.
Why it matters: Most agent fights in large repos are lost on retrieval, not on intelligence. llmx hands the agent a working map: hybrid scoring, deterministic chunks, semantic quality without a remote oracle in the call chain.
Verifiable Editing and Reproducible Builds (Codex Toolchain)
codex-xtreme (includes codex-patcher)
What: An interactive wizard for producing optimized, patched Codex binaries, backed by a verified patch application engine.
Why it matters: The edit-test loop made explicit: apply code modifications reliably, then build and run them in an isolated, reproducible sandbox. Every patch is auditable, ensuring full control over the compiled binary.
Primary repo (pin this on GitHub): codex-xtreme
Patch engine (linked inside): codex-patcher
Packaging Domain Expertise for Agents
burn-plugin
What: Claude Code plugin for the Burn deep learning framework, with reusable skills/workflows and evidence-backed references.
Why it matters: Packages the Burn deep-learning framework's reference material and workflows into a portable Claude Code plugin. Enables developers to consistently reuse the same workflows and references without manually re-reading the source documentation.
Skill Systems (Available on Request)
cwork
What: A context compiler that assembles "base capabilities + domain primer + project context" into a minimal, task-specific prompt package.
Why it matters: Assembles base capabilities, domain knowledge, and specific project context into a minimal, task-specific prompt package. This turns ad-hoc prompting into a repeatable, structured pipeline that minimizes token waste and keeps agents focused.
Agent Hardening (Security Boundaries and Observability)
claude-warden ★57
What: Defense-in-depth security hooks for Claude Code: SSRF protection (blocks RFC1918 / link-local / metadata endpoints), MCP output compression, OTEL tracing exported to Grafana/Loki, per-session subagent budgets, and quiet-overrides that cap verbose command output before it floods context.
Why it matters: Default Claude Code can incinerate tokens on noisy command output, leak internal network topology via SSRF probes, spawn unbounded subagents, and produce traces nobody can inspect. Warden seals each leak and turns the runtime into something you can audit after the fact.
MCP Servers (Structured Tool APIs)
pyghidra-lite ★32
What: Token-efficient MCP server that exposes a structured "tool surface" for program analysis workflows (compact output by default, opt-in verbosity).
Registry: Official MCP registry listing: io.github.johnzfitch/pyghidra-lite (v0.1.1, status: active, published 2026-01-29).
Why it matters: Good agents are tool-driven. An MCP server provides a structured, predictable interface. pyghidra-lite compacts Ghidra's highly verbose output into a token-efficient form so that binary analysis workflows fit comfortably within model context limits.
LLM Desktop Workflow (Anthropic Ecosystem)
claude-cowork-linux ★236
What: Run the official Claude Desktop app's Cowork mode natively on Linux using compatibility stubs and a bubblewrap sandbox.
Why it matters: Makes Claude Desktop a first-class Linux application without sacrificing isolation. Avoids a virtual machine layer by sandboxing the application via bubblewrap directly on the host.
Note: Unofficial community project; no proprietary Claude code is committed.
Professional UX (No Emojis)
Iconics
What: Semantic icon library (8k+ icons) designed to replace emojis with consistent PNG icons and meaning-based search.
Why it matters: Documentation is a product surface. Consistent iconography is cleaner and more professional than informal emojis. Meaning-based search and deterministic exports ensure the same query always returns the same icon across docs and agent contexts alike.
Closing thought: Algorithms and invariants first. Model intelligence on top.