Skip to content

Stop patching every agent for local models.

Relay is a lightweight compatibility gateway for llama.cpp and local inference servers. It exposes OpenAI- and Anthropic-compatible APIs, normalizes streaming, tools, models, and errors, and lets real agents talk to your local models without custom glue code.

Almost compatibleis where agents break.

Local model servers speak familiar protocols until the details matter: stream events, tool calls, fields, errors, and capability metadata. Relay makes the boundary actually compatible.

OpenAI-compatible endpoints differ subtly across local servers - header conventions, field presence, and error shapes don't match what SDKs expect.

Anthropic clients expect different message shapes, streaming event orders, and tool-call structures than what upstream servers return.

Tool calls, model IDs, SSE chunk framing, and capability metadata often cause agent loops to break or silently degrade.

Relay normalizes the boundary instead of forcing every client, SDK, and agent to special-case each local inference server.

ClientsOpenAI / AnthropicSDK requests, tools, streams
RelayNormalize boundarymessages, SSE, tools, errors
Local upstreamllama.cpp / vLLMnative server shape
Relay sits between agent SDKs and local inference servers, translating protocol details in both directions.

A single compatibility surface across protocols

OpenAI chat completions
Anthropic messages
Streaming / SSE
Tool call shapes
Model aliases & capabilities
Upstream errors & status
Health / readiness
Deployment config
Why not direct llama.cpp?

llama.cpp is excellent at inference. Relay is not trying to replace it. Relay handles the compatibility layer around it: API shapes, streaming behavior, tool semantics, model metadata, and observability. Think of it as the protocol adapter that sits between raw inference and real-world agent tooling.

terminal
bash
# 1. Start your local model server
llama-server --model ./model.gguf --host 127.0.0.1 --port 8080

# 2. Clone and start Relay
git clone https://github.com/achuthanmukundan00/relay.git
cd relay && npm install && cp .env.example .env && npm run dev

# 3. Verify
curl http://127.0.0.1:1234/v1/models
Using Synax? Relay can act as the local model gateway underneath it. Synax is the agent UX and runtime; Relay is the compatibility boundary that makes local inference actually work with it.