IntentRouter is a cognitive gateway that turns LLM capabilities into infrastructure. Push logic to Git repositories, call one OpenAI-compatible endpoint, and execute on any provider with zero prompt drift. It's basically is Skills runtime.
Base prompts grow uncontrollable. Tokens are wasted on every request. Nobody knows exactly what's inside the system prompt anymore. Copy-paste engineering is not high-scale architecture.
Thousands of repetitive tokens in every payload.
One tiny tweak can silently degrade performance.
Sync reasoning logic and policies via Git.
Send messages to a single standard endpoint.
Our JIT compiler matches intents, applies schemas, and routes to models.
Bring your own keys. Run any model. Zero lock-in.
We never store your secrets. You maintain direct relationships with LLM providers.
Switch between OpenAI, Anthropic, Gemini, and Ollama with a single parameter change.
Only inject necessary logic. Drastically reduce your token consumption and latency.
Load skills directly from Git repos per request. Pin versions via SHAs or tags for production stability.
Every response includes skillz_metadata. Track intent matches, provider latencies, and total
duration across your cognitive stack.
import requests
payload = {
"provider": "openai",
"model": "gpt-4",
"api_key": "YOUR_OPENAI_KEY",
"messages": [
{"role": "user", "content": "Triage this support ticket: Refund request for #8812"}
],
"skill": {
"ref": "local://examples/skills/customer-support-agent"
}
}
response = requests.post(
"https://skill.intentrouter.com/v1/chat/completions",
json=payload
)
print(response.json())
Reference these production-ready skills directly in your API calls.
local://examples/skills/customer-support-agent
Full ticket triage, sentiment analysis, and solution routing logic.
local://examples/skills/code-review-assistant
Security-first linting and architectural review logic.
local://examples/skills/sql-query-builder
Natural language to secure, optimized SQL query generation.
local://examples/skills/data-analysis-assistant
Automated EDA and business metric extraction logic.
Test your skills in real-time. Results are fetched directly from the runtime gateway.
No. We act as a stateless processing layer. Your keys are passed in the request and used immediately to communicate with the provider, then discarded.
Everything. Support includes OpenAI, Anthropic, Gemini, Groq, and local models via Ollama. If it has a chat completion interface, we can route to it.
Just push to your Git repo. The runtime pulls the latest version (or a specific tag/commit if specified) when your request hits our endpoint.
Minimal. We use a high-performance JIT compiler to prepare the skill logic before streaming it to the model. In many cases, the reduced token count leads to faster Time-To-First-Token (TTFT).