Skip to main content

model.yaml

warning

🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

Cortex.cpp uses a model.yaml file to specify the configuration for running a model. Models can be downloaded from the Cortex Model Hub or Hugging Face repositories. Once downloaded, the model data is parsed and stored in the models folder.

model.list​

The model.list file acts as a registry for all model files used by Cortex.cpp. It keeps track of every downloaded and imported model by listing their details in a structured format. Each time a model is downloaded or imported, Cortex.cpp will automatically append an entry to model.list with the following format:


# Downloaded model
<model-id> <author_repo-id> <branch-name> <path-to-model.yaml> <model-alias>
# Imported model
<model-id> local imported <path-to-model-id.yaml> <model-alias>

model.yaml High Level Structure​

Here is an example of model.yaml format:


# BEGIN GENERAL METADATA
model: gemma-2-9b-it-Q8_0 ## Model ID which is used for request construct - should be unique between models (author / quantization)
name: Llama 3.1 ## metadata.general.name
version: 1 ## metadata.version
sources: ## can be universal protocol (models://) OR absolute local file path (file://) OR https remote URL (https://)
- models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf # for downloaded model from HF
- files://C:/Users/user/Downloads/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf # for imported model
# END GENERAL METADATA
# BEGIN INFERENCE PARAMETERS
## BEGIN REQUIRED
stop: ## tokenizer.ggml.eos_token_id
  - <|end_of_text|>
  - <|eot_id|>
  - <|eom_id|>
## END REQUIRED
## BEGIN OPTIONAL
stream: true # Default true?
top_p: 0.9 # Ranges: 0 to 1
temperature: 0.6 # Ranges: 0 to 1
frequency_penalty: 0 # Ranges: 0 to 1
presence_penalty: 0 # Ranges: 0 to 1
max_tokens: 8192 # Should be default to context length
## END OPTIONAL
# END INFERENCE PARAMETERS
# BEGIN MODEL LOAD PARAMETERS
## BEGIN REQUIRED
prompt_template: |+ # tokenizer.chat_template
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>
  {system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>
  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
## END REQUIRED
## BEGIN OPTIONAL
ctx_len: 0 # llama.context_length | 0 or undefined = loaded from model
ngl: 33 # Undefined = loaded from model
## END OPTIONAL
# END MODEL LOAD PARAMETERS

The model.yaml is composed of three high-level sections:

Cortex Meta​


model: gemma-2-9b-it-Q8_0
name: Llama 3.1
version: 1
sources:
- models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
- files://C:/Users/user/Downloads/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf

Cortex Meta consists of essential metadata that identifies the model within Cortex.cpp. The required parameters include:

ParameterDescriptionRequired
nameThe identifier name of the model, used as the model_id.Yes
modelDetails specifying the variant of the model, including size or quantization.Yes
versionThe version number of the model.Yes
sourcesThe source file of the model.Yes

Inference Parameters​


stop:
  - <|end_of_text|>
  - <|eot_id|>
  - <|eom_id|>
stream: true
top_p: 0.9
temperature: 0.6
frequency_penalty: 0
presence_penalty: 0
max_tokens: 8192

Inference parameters define how the results will be produced. The required parameters include:

ParameterDescriptionRequired
top_pThe cumulative probability threshold for token sampling.No
temperatureControls the randomness of predictions by scaling logits before applying softmax.No
frequency_penaltyPenalizes new tokens based on their existing frequency in the sequence so far.No
presence_penaltyPenalizes new tokens based on whether they appear in the sequence so far.No
max_tokensMaximum number of tokens in the output.No
streamEnables or disables streaming mode for the output (true or false).No
stopSpecifies the stopping condition for the model, which can be a word, a letter, or a specific text.Yes

Model Load Parameters​


prompt_template: |+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
ctx_len: 0
ngl: 33

Model load parameters include the options that control how Cortex.cpp runs the model. The required parameters include:

ParameterDescriptionRequired
nglNumber of attention heads.No
ctx_lenContext length (maximum number of tokens).No
prompt_templateTemplate for formatting the prompt, including system messages and instructions.Yes
info

You can download all the supported model formats from the following: