Introduction

Infersec is a secure AI provider at its core. It provides its service and peripheral tools to allow everyone from hobbyists through to large organisations to build secure inferencing APIs using hardware they control. Anything from Macbooks through to nVidia rack servers can be used with Conduit — our glue that connects private customer hardware and infrastructure to Infersec's secure and privacy-focused platform. Infersec then allows users to expose authenticated APIs that routes prompts and other requests through to intelligently-selected Conduit inference sources.

Inference endpoints created with Infersec are highly compatible, supporting both OpenAI and Anthropic-style API schemas.

Why Infersec?

Own your compute — run models on hardware you control. No third-party hosting, no shared infrastructure.
Privacy by design — the platform never logs prompt content or tool-call payloads. Your data stays in a tight loop between your machines and your consumers.
OpenAI and Anthropic compatible — inference endpoints expose standard /v1/chat/completions and /v1/messages interfaces. Drop-in compatible with most AI tooling.
Intelligent routing — distribute requests across multiple inference sources using first-available or round-robin policies. Inference sources are selected based on health and priority.
EU-based — infrastructure and data handling operate under EU regulations.

Core concepts

Models

A model is a registered LLM definition pulled from HuggingFace. Models are format-specific (GGUF, safetensors, pytorch, AWQ, GPTQ) and determine which engine can serve them. GGUF models run on llama.cpp; all others run on vLLM.

Inference Sources

An inference source represents a single compute instance — your hardware — running one model on one engine. Inference sources are created in the console and connected to your machine via Conduit. Each inference source reports its state (downloading files, booting, online, error) back to the platform.

Inference Endpoints

An inference endpoint is the public-facing URL that consumers connect to. It routes incoming requests to one or more inference sources using a configurable routing method. Inference endpoints are OpenAI and Anthropic-compatible, authenticated via API keys, and can be enabled or paused at any time.

Conduit

Conduit is the self-hostable agent that bridges your local hardware to the Infersec cloud. It downloads model files, manages the LLM engine lifecycle, and proxies inference requests. Conduit runs as a lightweight Node.js process or Docker container on your machine.

Architecture overview

  Consumer (Opencode, AnythingLLM, etc.)
       │
       ▼
  Infersec API (cloud)
       │
       ├── Endpoint routing
       │
       ▼
  Intelligent Routing
       │
       ├──▶ Conduit agent → llama.cpp / vLLM → Inference Source A
       └──▶ Conduit agent → llama.cpp / vLLM → Inference Source B

Requests arrive at the Infersec API, which selects an available inference source based on the inference endpoint's routing policy. The request is dispatched to the target Conduit agent via Redis, streamed through the local LLM engine, and returned to the consumer.

Next steps

Ready to set up your first pipeline? Read the Getting Started guide.