The Resource Management API lets you manage your Infersec infrastructure programmatically. Create models, deploy inference sources, and expose endpoints - all via a REST API authenticated with your Infersec API key.

For the full interactive API reference, see the API documentation.

Authentication

All Resource Management endpoints require an API key. Pass it via the Authorization header or the x-api-key header:

Authorization: Bearer <your-api-key>
x-api-key: <your-api-key>

Create API keys in the console under Administration - API Keys.

Base URL

https://api.infersec.ai

Models

Models are LLM definitions sourced from HuggingFace. They define which weights are available for inference.

List models

GET /api/v1/models

Returns all models in your account.

Create a model

POST /api/v1/models

Register a new model from HuggingFace:

{
    "name": "My Qwen Model",
    "provider": "huggingface",
    "slug": "barozp/qwen3.6-28b-reap20-a3b-gguf",
    "format": "gguf"
}

The slug must be a fully qualified HuggingFace repo (owner/repo). The format determines which engine can serve the model - gguf models use llama.cpp, all others use vllm.

Update a model

PATCH /api/v1/models/:modelID

Delete a model

DELETE /api/v1/models/:modelID

Inference Sources

An inference source represents a single compute instance (your hardware) running one model on one engine. Sources are connected to your machine via Conduit.

List sources

GET /api/v1/sources

Create a source

POST /api/v1/sources
{
    "name": "My GPU Server",
    "engine": "llama.cpp",
    "modelID": "<model-id>",
    "contextLength": 32768,
    "parallelism": 1,
    "quantizationLabel": "Q4_K_M"
}
  • engine - llama.cpp or vllm
  • contextLength - max context window (model-dependent)
  • parallelism - number of concurrent request slots
  • quantizationLabel - quantization variant (GGUF only)

Update a source

PATCH /api/v1/sources/:sourceID

Delete a source

DELETE /api/v1/sources/:sourceID

Inference Endpoints

An inference endpoint exposes a public URL that routes requests to one or more inference sources.

List endpoints

GET /api/v1/endpoints

Create an endpoint

POST /api/v1/endpoints
{
    "name": "Production API",
    "sourceIDs": ["<source-id-1>", "<source-id-2>"],
    "routingMethod": "round-robin",
    "enabled": true
}

Routing methods:

  • first-available - routes to the first online source
  • round-robin - distributes across all online sources

Update an endpoint

PATCH /api/v1/endpoints/:endpointID

Delete an endpoint

DELETE /api/v1/endpoints/:endpointID

Using endpoints

Once an endpoint is created and enabled, it exposes OpenAI and Anthropic-compatible APIs:

# OpenAI compatible
POST /api/inferencing/:endpointID/oai/v1/chat/completions

# Anthropic compatible
POST /api/inferencing/:endpointID/anthropic/v1/messages

See Getting Started for full usage examples with various clients.

Full reference

For complete request/response schemas and the interactive try-it-out console, visit the API documentation.