LightPhon is an OpenAI-compatible API endpoint backed by a decentralized network of GPU nodes. Point any OpenAI client at LightPhon β it handles routing, load balancing, and payment automatically.
LightPhon is an OpenAI-compatible proxy that sits between your application and a decentralized network of GPU nodes running open-source LLMs.
Think of it like this: instead of sending requests to api.openai.com, you send them to lightphon.com. The API format is identical β same endpoints, same JSON structure, same SDKs. LightPhon takes your request, picks the best available GPU node, and returns the response.
https://lightphon.combase_url to LightPhon and api_key to your token. Everything else stays the same.
Get up and running in 3 steps:
Go to lightphon.com/app.html, register, and deposit some sats via Lightning or card.
Open the π§ Model Router tab β Step 1 β click + New Key. Copy the key β it's your apiKey.
Use any OpenAI-compatible client. Here's a quick Python example:
from openai import OpenAI
client = OpenAI(
base_url="https://lightphon.com/v1",
api_key="your-api-key-here"
)
response = client.chat.completions.create(
model="auto", # router picks the best model
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
That's it. The model: "auto" setting tells the router to pick the best available model automatically.
There are two ways to authenticate with the LightPhon API:
Generate an API Key from the Model Router β Step 1 in the web app. This key never expires and works like a standard OpenAI API key.
# Use as a Bearer token in the Authorization header:
Authorization: Bearer lp_abc123def456...
# Or pass it as apiKey in any OpenAI SDK:
client = OpenAI(base_url="https://lightphon.com/v1", api_key="lp_abc123def456...")
The Agent Token embeds authentication directly in the URL. Useful for tools that don't support custom headers (some IDE plugins, automation scripts, etc.).
# Base URL with embedded token:
https://lightphon.com/api/agent/<your-token>
# The apiKey field can be anything (token is in the URL):
client = OpenAI(
base_url="https://lightphon.com/api/agent/<your-token>",
api_key="x"
)
All endpoints follow the OpenAI convention under /v1/:
| Method | Path | Description |
|---|---|---|
| GET | /v1/models |
List all available models |
| GET | /v1/models/{id} |
Get details for a specific model |
| POST | /v1/chat/completions |
Create a chat completion (main endpoint) |
| POST | /api/agent/<token>/v1/chat/completions |
Chat completion via Agent Token URL |
| POST | /api/models/route |
Advanced: query the router directly |
The main endpoint. Send a conversation and get a response β identical to OpenAI's chat completions API.
POST /v1/chat/completions
Authorization: Bearer <api-key>
Content-Type: application/json
{
"model": "auto",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Explain Bitcoin Lightning in 3 lines." }
],
"max_tokens": 512,
"temperature": 0.7,
"stream": false
}
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1717171200,
"model": "Qwen/Qwen2.5-Coder-14B-Instruct-GGUF",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Bitcoin Lightning is a Layer 2 payment protocol..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 86,
"total_tokens": 110
}
}
model β model ID from /v1/models, or "auto" for automatic routingmessages β array of { role, content } objectsmax_tokens β maximum tokens to generate (default: 4096)temperature β sampling temperature 0β2 (default: 0.7)stream β true for Server-Sent Events streamingtop_p, frequency_penalty, presence_penalty β standard OpenAI paramsSet "stream": true to receive tokens as they are generated, in real-time.
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
from openai import OpenAI
client = OpenAI(base_url="https://lightphon.com/v1", api_key="your-key")
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Hello!"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
curl -N https://lightphon.com/v1/chat/completions \
-H "Authorization: Bearer <api-key>" \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Hi"}],"stream":true}'
The Model Router automatically selects the best available node and model for your request. You don't have to hardcode a model name β just use "auto".
"auto" worksWhen you set "model": "auto", the router:
The response includes a routing_info field showing which node and model were used.
You can query the router directly to find models matching specific criteria:
POST /api/models/route
Authorization: Bearer <api-key>
Content-Type: application/json
{
"providers": ["meta", "mistral", "deepseek"],
"min_params_b": 7,
"max_params_b": 72,
"capabilities": ["code", "reasoning"],
"prefer_size": "balanced",
"limit": 5
}
providers β meta, mistral, deepseek, google, microsoft, alibabaβ¦min_params_b / max_params_b β parameter range in billionssize_classes β tiny, small, medium, large, xlargecapabilities β chat, code, reasoning, math, vision, tool_use, rag, agentβ¦families β llama, qwen, mistral, gemmaβ¦min_context_length β minimum context windowprefer_size β smallest | largest | balancedknown_only β true to only return cataloged modelsWhen you call GET /v1/models, you'll see models from two types of sources:
These are real hardware machines running open-source LLMs (LLaMA, Qwen, Mistral, DeepSeek, etc.) via the LightPhon node software. They connect to the server over WebSocket and serve inference requests in real-time. Anyone can run a GPU node β see the Download page.
OpenClaw is an external OpenAI-compatible AI platform that the LightPhon server can integrate as a virtual node. When OpenClaw is enabled, the server queries its /v1/models endpoint at startup and makes those models available in the network alongside real GPU nodes.
owned_by field will show "OpenClaw" for these models.
The router treats both sources equally when selecting the best model. If an OpenClaw model is the best match for your request, the server forwards to OpenClaw directly via HTTP (no WebSocket needed). The response format is always the same regardless of source.
OpenClaw and LightPhon can work together in two directions:
If you use OpenClaw as your AI assistant, you can add LightPhon as one of its model providers. This way, OpenClaw sends inference requests to the LightPhon network.
# In your OpenClaw models configuration:
models:
providers:
- name: lightphon
api: openai-completions
baseUrl: https://lightphon.com/v1
apiKey: "your-api-key"
models:
- name: auto # router picks best model
- name: Qwen/Qwen2.5-Coder-14B-Instruct-GGUF # or a specific model
auto as the model name to let the router pick the best model. Or specify a model ID from GET /v1/models to always use a particular model.
Server administrators can configure LightPhon to pull models from an OpenClaw instance, making them available alongside local GPU nodes.
OPENCLAW_ENABLED=true
OPENCLAW_BASE_URL=http://<openclaw-host>:<port>
OPENCLAW_API_KEY= # optional, if OpenClaw requires auth
OPENCLAW_NODE_NAME=OpenClaw
OPENCLAW_PRICE=0 # cost in sats/min (0 = free)
OPENCLAW_TIMEOUT=120 # request timeout in seconds
On startup, the LightPhon server calls GET /v1/models on the OpenClaw instance and registers its models as a virtual node.
The Model Router scores all sources equally β local GPU nodes and OpenClaw β and picks the best match.
If an OpenClaw model is selected, the server forwards via POST /v1/chat/completions over HTTP (no WebSocket).
The response is returned in standard OpenAI format with an added routing_info field showing the source.
LightPhon works with any tool or library that supports the OpenAI chat completions API. Just change the base URL and API key.
from openai import OpenAI
client = OpenAI(
base_url="https://lightphon.com/v1",
api_key="your-api-key"
)
resp = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Hello!"}]
)
print(resp.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://lightphon.com/v1",
apiKey: "your-api-key"
});
const resp = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "Hello!" }]
});
console.log(resp.choices[0].message.content);
curl https://lightphon.com/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [{"role":"user","content":"Hello!"}]
}'
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="https://lightphon.com/v1",
api_key="your-api-key",
model="auto"
)
print(llm.invoke("Explain Lightning in 2 lines").content)
Most AI coding tools let you configure a custom OpenAI endpoint. Use these settings:
Base URL : https://lightphon.com/v1
API Key : your-api-key
Model : auto
Or, if the tool only supports a single URL (no separate API key field), use the Agent Token URL:
Base URL : https://lightphon.com/api/agent/<your-token>
API Key : x (any non-empty string)
Model : auto
Errors follow the standard OpenAI format:
{
"error": {
"message": "No suitable node found for the requested model",
"type": "invalid_request_error",
"code": "model_not_available"
}
}
| HTTP | Meaning | Fix |
|---|---|---|
401 |
Invalid or missing token | Check your API key or Agent Token URL |
402 |
Insufficient balance | Deposit more sats to your wallet |
404 |
Model not found | Use "auto" or check /v1/models for available models |
503 |
No nodes available | All nodes are offline β wait or check network status |
504 |
Inference timeout | Reduce max_tokens or try a smaller model |
"auto" and should I use it?Yes, for most use cases. "auto" tells the Model Router to pick the best available model based on availability, speed, and capabilities. You can also specify a model ID if you need a specific model.
Both authenticate your requests. The API Key goes in the Authorization: Bearer header (standard OpenAI style). The Agent Token is embedded in the URL β useful for tools that don't support custom headers. Use whichever is more convenient.
OpenClaw is an external AI platform. The LightPhon server can integrate it as a virtual node, making its models available alongside local GPU nodes. OpenClaw models are marked with a πΎ badge. The router treats them the same as local models.
Not directly. You deposit sats to your LightPhon wallet (via Lightning invoice or EUR card), and the server deducts from your balance when you use the API. No external wallet interaction needed after funding.
No. Conversations are routed directly to the GPU node β no data is stored on the LightPhon server. When the session ends, everything is gone.
It depends on which nodes are online. Call GET /v1/models or check the Model Router in the web app to see the current list. Common models include LLaMA, Qwen, Mistral, DeepSeek, Gemma, and more.
Yes! LightPhon is fully open source. See the GitHub repository for deployment instructions.