The Resource Management API lets you manage your Infersec infrastructure programmatically. Create models, deploy inference sources, and expose endpoints - all via a REST API authenticated with your Infersec API key.
For the full interactive API reference, see the API documentation.
Authentication
All Resource Management endpoints require an API key. Pass it via the Authorization header or the x-api-key header:
Authorization: Bearer <your-api-key>
x-api-key: <your-api-key>
Create API keys in the console under Administration - API Keys.
Base URL
https://api.infersec.ai
Models
Models are LLM definitions sourced from HuggingFace. They define which weights are available for inference.
List models
GET /api/v1/models
Returns all models in your account.
Create a model
POST /api/v1/models
Register a new model from HuggingFace:
{
"name": "My Qwen Model",
"provider": "huggingface",
"slug": "barozp/qwen3.6-28b-reap20-a3b-gguf",
"format": "gguf"
}
The slug must be a fully qualified HuggingFace repo (owner/repo). The format determines which engine can serve the model - gguf models use llama.cpp, all others use vllm.
Update a model
PATCH /api/v1/models/:modelID
Delete a model
DELETE /api/v1/models/:modelID
Inference Sources
An inference source represents a single compute instance (your hardware) running one model on one engine. Sources are connected to your machine via Conduit.
List sources
GET /api/v1/sources
Create a source
POST /api/v1/sources
{
"name": "My GPU Server",
"engine": "llama.cpp",
"modelID": "<model-id>",
"contextLength": 32768,
"parallelism": 1,
"quantizationLabel": "Q4_K_M"
}
engine-llama.cpporvllmcontextLength- max context window (model-dependent)parallelism- number of concurrent request slotsquantizationLabel- quantization variant (GGUF only)
Update a source
PATCH /api/v1/sources/:sourceID
Delete a source
DELETE /api/v1/sources/:sourceID
Inference Endpoints
An inference endpoint exposes a public URL that routes requests to one or more inference sources.
List endpoints
GET /api/v1/endpoints
Create an endpoint
POST /api/v1/endpoints
{
"name": "Production API",
"sourceIDs": ["<source-id-1>", "<source-id-2>"],
"routingMethod": "round-robin",
"enabled": true
}
Routing methods:
first-available- routes to the first online sourceround-robin- distributes across all online sources
Update an endpoint
PATCH /api/v1/endpoints/:endpointID
Delete an endpoint
DELETE /api/v1/endpoints/:endpointID
Using endpoints
Once an endpoint is created and enabled, it exposes OpenAI and Anthropic-compatible APIs:
# OpenAI compatible
POST /api/inferencing/:endpointID/oai/v1/chat/completions
# Anthropic compatible
POST /api/inferencing/:endpointID/anthropic/v1/messages
See Getting Started for full usage examples with various clients.
Full reference
For complete request/response schemas and the interactive try-it-out console, visit the API documentation.