Cortex.cpp
Cortex.cpp is a Local AI engine that is used to run and customize LLMs. Cortex can be deployed as a standalone server, or integrated into apps like Jan.ai
Cortex's roadmap is to eventually support full OpenAI API-equivalence.
It includes a Drogon server, with request queues, model orchestration logic, and hardware telemetry, and more, for prod environments.
This guide walks you through how Cortex.cpp is designed, the codebase structure, and future plans.
Usage
See Quickstart
Interface
Architecture
Code Structure
├── app/│ │ ├── controllers/│ │ ├── models/│ │ ├── services/│ │ ├── ?engines/│ │ │ ├── llama.cpp│ │ │ ├── tensorrt-llm│ │ │ └── ...│ │ └── ...│ ├── CMakeLists.txt│ ├── config.json│ ├── Dockerfile│ ├── docker-compose.yml│ ├── README.md│ └── ...
cortex-cpp
folder contains stateless implementations, most of which call into llamacpp
and tensorrt-llm
, depending on the engine at runtime.
Here you will find the implementations for stateless endpoints:
/chat/completion
/audio
/fine_tuning
/embeddings
/load_model
/unload_model
And core hardware and model management logic like CPU instruction set detection, and multiple model loading logic.