Cortex Basic Usage

Cortex has an API server that runs at localhost:39281.

The port parameter can be set in .cortexrc with the apiServerPort parameter.

Server

By default the server will be started on port 39281.


cortex start

Start a server with different port number.


cortex-p <port_number>

To create a directory for storing logs and other files.


cortex --data_folder_path <your_directory>

To terminate the cortex server.


curl --request DELETE \
  --url http://127.0.0.1:39281/processManager/destroy

Engines

Cortex currently supports 2 specialized ones for different multi-modal foundation models: llama.cpp and ONNXRuntime.

By default, Cortex installs llama.cpp as it main engine as it can be used in most laptops, desktop environments and operating systems.

For more information, check out Engine Management.

Here are some commands to get you started.

List all available engines.


curl --request GET \
  --url http://127.0.0.1:39281/v1/engines


{
  "llama-cpp": [
    {
      "engine": "llama-cpp",
      "name": "linux-amd64-avx2-cuda-12-0",
      "version": "v0.1.49"
    }
  ]
}

Install an Engine (eg llama-cpp)


curl http://127.0.0.1:39281/v1/engines/llama-cpp/install \
  --request POST \
  --header 'Content-Type: application/json'


{
  "message": "Engine starts installing!"
}

Models

Pull a Model


curl --request POST \
  --url http://127.0.0.1:39281/v1/models/pull \
  -H "Content-Type: application/json" \
  --data '{"model": "tinyllama:1b-gguf-q3-km"}'


{
  "message": "Model start downloading!",
  "task": {
    "id": "tinyllama:1b-gguf-q3-km",
    "items": [
      {
        "bytes": 0,
        "checksum": "N/A",
        "downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q3-km/metadata.yml",
        "downloadedBytes": 0,
        "id": "metadata.yml",
        "localPath": "/home/rpg/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q3-km/metadata.yml"
      },
      {
        "bytes": 0,
        "checksum": "N/A",
        "downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q3-km/model.gguf",
        "downloadedBytes": 0,
        "id": "model.gguf",
        "localPath": "/home/rpg/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q3-km/model.gguf"
      },
      {
        "bytes": 0,
        "checksum": "N/A",
        "downloadUrl": "https://huggingface.co/cortexso/tinyllama/resolve/1b-gguf-q3-km/model.yml",
        "downloadedBytes": 0,
        "id": "model.yml",
        "localPath": "/home/rpg/cortexcpp/models/cortex.so/tinyllama/1b-gguf-q3-km/model.yml"
      }
    ],
    "type": "Model"
  }
}

If the model download was interrupted, this request will download the remainder of the model files.

The downloaded models are saved to the Cortex Data Folder.

Stop Model Download


curl --request DELETE \
  --url http://127.0.0.1:39281/v1/models/pull \
  --header 'Content-Type: application/json' \
  --data '{"taskId": "tinyllama:tinyllama:1b-gguf-q3-km"}'

List All Models


curl --request GET \
  --url http://127.0.0.1:39281/v1/models

Delete a Model


curl --request DELETE \
  --url http://127.0.0.1:39281/v1/models/tinyllama:1b-gguf-q3-km


{
  "message":"Deleted successfully!"
}

Run Models

Start Model


# Start the model
curl --request POST \
  --url http://127.0.0.1:39281/v1/models/start \
  --header 'Content-Type: application/json' \
  --data '{"model": "llama3.1:8b-gguf-q4-km"}'


{
  "message":"Started successfully!"
}

Create Chat Completion


# Invoke the chat completions endpoint
curl --request POST \
  --url http://localhost:39281/v1/chat/completions \
  -H "Content-Type: application/json" \
  --data '{
    "messages": [
      {
        "role": "user",
        "content": "Write a Haiku about cats and AI"
      },
    ],
    "model": "tinyllama:1b-gguf",
    "stream": false,
}'


{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Whiskers soft as code\nMachines mimic their gaze\nFurry, digital dreamer",
        "role": "assistant"
      }
    }
  ],
  "created": 1737722349,
  "id": "5vjsnGlRQfxw6CNzzkph",
  "model": "_",
  "object": "chat.completion",
  "system_fingerprint": "_",
  "usage": {
    "completion_tokens": 19,
    "prompt_tokens": 19,
    "total_tokens": 38
  }
}

Stop Model


curl --request POST \
  --url http://127.0.0.1:39281/v1/models/stop \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "tinyllama:1b-gguf"
}'


{
  "message":"Stopped successfully!"
}

Server​

Engines​

List all available engines.​

Install an Engine (eg llama-cpp)​

Models​

Pull a Model​

Stop Model Download​

List All Models​

Delete a Model​

Run Models​

Start Model​

Create Chat Completion​

Stop Model​