Skip to main content

Cortex Basic Usage

Cortex has an API server that runs at localhost:39281.

The port parameter can be set in .cortexrc with the apiServerPort parameter

Server

Start Cortex Server


# By default the server will be started on port `39281`
cortex
# Start a server with different port number
cortex -a <address> -p <port_number>
# Set the data folder directory
cortex --dataFolder <dataFolderPath>

Terminate Cortex Server


curl --request DELETE \
--url http://127.0.0.1:39281/processManager/destroy

Engines

Cortex currently supports 3 industry-standard engines: llama.cpp, ONNXRuntime and TensorRT-LLM.

By default, Cortex installs llama.cpp engine which supports most laptops, desktops and OSes.

For more information, see Engine Management

List available engines


curl --request GET \
--url http://127.0.0.1:39281/v1/engines

Install an Engine (eg llama-cpp)


curl --request POST \
--url http://127.0.0.1:39281/v1/engines/install/llama-cpp

Manage Models

Pull Model


curl --request POST \
--url http://127.0.0.1:39281/v1/models/pull \
-H "Content-Type: application/json" \
--header 'Content-Type: application/json' \
--data '{
"model": "tinyllama:gguf",
"id": "my-custom-model-id",
}'

If the model download was interrupted, this request will download the remainder of the model files.

The downloaded models are saved to the Cortex Data Folder.

Stop Model Download


❯ curl --request DELETE \
--url http://127.0.0.1:39281/v1/models/pull \
--header 'Content-Type: application/json' \
--data '{
"taskId": "tinyllama:1b-gguf"
}'

List Models


curl --request GET \
--url http://127.0.0.1:39281/v1/models

Delete Model


curl --request DELETE \
--url http://127.0.0.1:39281/v1/models/tinyllama:1b-gguf

Run Models

Start Model


# Start the model
curl --request POST \
--url http://127.0.0.1:39281/v1/models/start \
--header 'Content-Type: application/json' \
--data '{
"model": "tinyllama:1b-gguf"
}'

Create Chat Completion


# Invoke the chat completions endpoint
curl --request POST \
--url http://localhost:39281/v1/chat/completions \
-H "Content-Type: application/json" \
--data '{
"messages": [
{
"role": "user",
"content": "Write a Haiku about cats and AI"
},
],
"model": "tinyllama:1b-gguf",
"stream": false,
}'

Stop Model


curl --request POST \
--url http://127.0.0.1:39281/v1/models/stop \
--header 'Content-Type: application/json' \
--data '{
"model": "tinyllama:1b-gguf"
}'