Stop patching every agent for local models.
Relay is a lightweight compatibility gateway for llama.cpp and local inference servers. It exposes OpenAI- and Anthropic-compatible APIs, normalizes streaming, tools, models, and errors, and lets real agents talk to your local models without custom glue code.