sinister - Perception Claude Proxy

OpenAI-compatible bridge that lets Perception (and any other OpenAI-format client) talk to a Claude subscription via the Claude Agent SDK.

# Why this exists

Perception speaks OpenAI's chat-completions format. Claude doesn't, and the official Anthropic API charges per token, which adds up fast when you're running long RE sessions with big memory dumps in the context.

This proxy sits between Perception and the Claude Agent SDK and translates in both directions. The SDK runs against your existing Claude subscription (Pro / Max), so you pay a flat monthly rate instead of per-token. Tool calls, streaming, thinking blocks, and image input all get translated transparently. Perception thinks it's talking to OpenAI, Claude thinks it's talking to Claude Code.

# What it does

OpenAI-compatible endpoints – /v1/chat/completions (streaming + non-streaming) and /v1/models. Drop-in replacement for an OpenAI base URL.
Persistent SDK sessions – an LRU pool reuses warm Claude sessions across turns. Same conversation prefix + tool list = cache hit, dropping warm-turn latency by 50–70%.
Per-project context – hijacks Perception's update_notes tool and routes notes to <workspace>/.proxy/context.md instead of the IDE's single global pool. Notes for one game don't bleed into another. Workspace root is auto-detected from absolute paths the model sees.
Auto-fallback – on overload (529 / rate_limit_error), idle timeout, or the SDK's ~120K-token "Usage Policy" refusal, the proxy walks opus → sonnet → haiku automatically. Perception sees a status note in the stream.
Thinking visibility – extended-thinking deltas surface as <think>...</think> blocks (default), reasoning_content deltas, or both.
Image input – data URLs and image_url fields are decoded and passed through as native Claude image blocks.
1M-context beta on by default – long RE conversations don't get clipped on plans where Opus isn't auto-upgraded.
Input-size guards – per-tool-result, per-message, and total-prompt char caps clip oversized history (e.g. a 5MB memory dump) before it reaches the SDK, preserving the tail so the active query stays intact.
Two-tier timeouts – hard wall-clock cap plus an idle-token cap that resets on every chunk and catches silently stuck upstreams.

# Tools the model gets

On top of whatever tools Perception already exposes, the proxy bundles a curated set of Claude Agent SDK tools so the model isn't capped at the IDE workspace:

Read / Glob / Grep – executed inside the proxy via an internal text-tool protocol. This is what gives the model access to every drive root (C:\, D:\, ...) instead of just your IDE workspace. On by default.
Bash – shell exec. Opt-in.
Write / Edit – mutating filesystem ops. Opt-in.
WebSearch – the SDK's web-search tool. Opt-in.
EXTRA_SDK_TOOLS – comma-separated extra SDK tool names for advanced users.

The dangerous ones are off by default for obvious reasons. Flip them on in .env when you trust the workflow.

# Setup

Requirements:

Node.js 18+ (20+ recommended for --env-file support)
A working Claude subscription (Pro or Max)
Claude Code installed and signed in once on the machine. The Agent SDK reuses its credentials.

Install:

git clone https://github.com/sinistercodes/claude-proxy
cd claude-proxy
npm install
cp .env.example .env    # optional; defaults are sane
node server.js

The proxy listens on :4001 by default.

Point Perception at it:

Base URL: http://localhost:4001/v1
API key: anything non-empty (it's not validated; the proxy uses your Claude subscription)
Model: sonnet, opus, or haiku

That's it. Send a request and you should see it stream back through Perception.

Useful flags:

node server.js --verbose – [proxy] log lines
node server.js --live – coloured live request feed on stdout
LOG_REQUESTS=1 – dumps every request body to logs/requests/ (last 50, rotated)

Common tweaks (in .env):

CLAUDE_MODEL=opus – default to Opus instead of Sonnet
CLAUDE_THINKING=max – crank thinking budget to 32K tokens
ENABLE_BASH_TOOL=1 – let the model run shell commands
ENABLE_WRITE_TOOLS=1 – let the model write/edit files
CLAUDE_FALLBACK=0 – pin the model, no auto-downgrade on overload

See .env.example in the repo for the full annotated list (~30 vars covering sessions, timeouts, tool toggles, input clipping, betas, observability).

# Debug endpoints

Bound to loopback only. Useful when something looks weird:

GET  /debug/status           runtime config + session count
GET  /debug/last-exchange    last full request/response
GET  /debug/last-request     last logged request body
GET  /debug/exchanges        recent exchange ring
GET  /debug/tools            tool catalog from the last request
GET  /debug/workspace        resolved per-project workspace cache
POST /debug/workspace/reset  clear the workspace cache