Perception Claude Proxy
OpenAI-compatible bridge that lets Perception (and any other OpenAI-format client) talk to a Claude subscription via the Claude Agent SDK.
Why this exists
Perception speaks OpenAI's chat-completions format. Claude doesn't, and the official Anthropic API charges per token, which adds up fast when you're running long RE sessions with big memory dumps in the context.
This proxy sits between Perception and the Claude Agent SDK and translates in both directions. The SDK runs against your existing Claude subscription (Pro / Max), so you pay a flat monthly rate instead of per-token. Tool calls, streaming, thinking blocks, and image input all get translated transparently. Perception thinks it's talking to OpenAI, Claude thinks it's talking to Claude Code.
What it does
- OpenAI-compatible endpoints –
/v1/chat/completions(streaming + non-streaming) and/v1/models. Drop-in replacement for an OpenAI base URL. - Persistent SDK sessions – an LRU pool reuses warm Claude sessions across turns. Same conversation prefix + tool list = cache hit, dropping warm-turn latency by 50–70%.
- Per-project context – hijacks Perception's
update_notestool and routes notes to<workspace>/.proxy/context.mdinstead of the IDE's single global pool. Notes for one game don't bleed into another. Workspace root is auto-detected from absolute paths the model sees. - Auto-fallback – on overload (
529/rate_limit_error), idle timeout, or the SDK's ~120K-token "Usage Policy" refusal, the proxy walksopus → sonnet → haikuautomatically. Perception sees a status note in the stream. - Thinking visibility – extended-thinking deltas surface as
<think>...</think>blocks (default),reasoning_contentdeltas, or both. - Image input – data URLs and
image_urlfields are decoded and passed through as native Claude image blocks. - 1M-context beta on by default – long RE conversations don't get clipped on plans where Opus isn't auto-upgraded.
- Input-size guards – per-tool-result, per-message, and total-prompt char caps clip oversized history (e.g. a 5MB memory dump) before it reaches the SDK, preserving the tail so the active query stays intact.
- Two-tier timeouts – hard wall-clock cap plus an idle-token cap that resets on every chunk and catches silently stuck upstreams.
Tools the model gets
On top of whatever tools Perception already exposes, the proxy bundles a curated set of Claude Agent SDK tools so the model isn't capped at the IDE workspace:
- Read / Glob / Grep – executed inside the proxy via an internal text-tool protocol. This is what gives the model access to every drive root (
C:\,D:\, ...) instead of just your IDE workspace. On by default. - Bash – shell exec. Opt-in.
- Write / Edit – mutating filesystem ops. Opt-in.
- WebSearch – the SDK's web-search tool. Opt-in.
- EXTRA_SDK_TOOLS – comma-separated extra SDK tool names for advanced users.
The dangerous ones are off by default for obvious reasons. Flip them on in .env when you trust the workflow.
Setup
Requirements:
- Node.js 18+ (20+ recommended for
--env-filesupport) - A working Claude subscription (Pro or Max)
- Claude Code installed and signed in once on the machine. The Agent SDK reuses its credentials.
Install:
git clone https://github.com/sinistercodes/claude-proxy
cd claude-proxy
npm install
cp .env.example .env # optional; defaults are sane
node server.jsThe proxy listens on :4001 by default.
Point Perception at it:
- Base URL:
http://localhost:4001/v1 - API key: anything non-empty (it's not validated; the proxy uses your Claude subscription)
- Model:
sonnet,opus, orhaiku
That's it. Send a request and you should see it stream back through Perception.
Useful flags:
node server.js --verbose–[proxy]log linesnode server.js --live– coloured live request feed on stdoutLOG_REQUESTS=1– dumps every request body tologs/requests/(last 50, rotated)
Common tweaks (in .env):
CLAUDE_MODEL=opus– default to Opus instead of SonnetCLAUDE_THINKING=max– crank thinking budget to 32K tokensENABLE_BASH_TOOL=1– let the model run shell commandsENABLE_WRITE_TOOLS=1– let the model write/edit filesCLAUDE_FALLBACK=0– pin the model, no auto-downgrade on overload
See .env.example in the repo for the full annotated list (~30 vars covering sessions, timeouts, tool toggles, input clipping, betas, observability).
Debug endpoints
Bound to loopback only. Useful when something looks weird:
GET /debug/status runtime config + session count
GET /debug/last-exchange last full request/response
GET /debug/last-request last logged request body
GET /debug/exchanges recent exchange ring
GET /debug/tools tool catalog from the last request
GET /debug/workspace resolved per-project workspace cache
POST /debug/workspace/reset clear the workspace cache