warning

🚧 Cortex Platform is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

`cortex benchmark`

info

This CLI command calls the following API endpoint:

This command benchmarking your hardware to analyze the selected model performance on your system.

Usage


cortex benchmark [options] [model_id]

For example, it will return the following:


## JSON Format
 results: [
    {
      round: 1,
      results: [
        {
          tokens: 2048,
          token_length: 3567,
          latency: 12012,
          resourceChange: { cpuLoad: -45.543634551575586, mem: -0.22862459579142327 },
          tpot: 3.3675357443229603,
          throughput: 296.95304695304696,
          ttft: 182
        }
      ],
      hardwareChanges: { cpuLoad: 204.51297399539473, mem: 0.93911874639132 }
    }
  ],
  metrics: {
    p50: {
      latency: 12012,
      tpot: 3.3675357443229603,
      throughput: 296.95304695304696,
      ttft: 182
    },
    p75: {
      latency: 12012,
      tpot: 3.3675357443229603,
      throughput: 296.95304695304696,
      ttft: 182
    },
    p95: {
      latency: 12012,
      tpot: 3.3675357443229603,
      throughput: 296.95304695304696,
      ttft: 182
    }
  },
  model: {
    modelId: 'tinyllama',
    engine: 'llamacpp',
    status: 'running',
    duration: '2h 38m 44s',
    ram: '-',
    vram: '-'
  }
}
## Table Format
Results:
Round 1:
┌─────────┬────────┬──────────────┬─────────┬────────────────────────────────────────────────────────────┬───────────────────┬────────────────────┬──────┐
│ (index) │ tokens │ token_length │ latency │ resourceChange                                             │ tpot              │ throughput         │ ttft │
├─────────┼────────┼──────────────┼─────────┼────────────────────────────────────────────────────────────┼───────────────────┼────────────────────┼──────┤
│ 0       │ 2048   │ 3461         │ 12021   │ { cpuLoad: -37.98941038167731, mem: -0.30508369223866116 } │ 3.473273620340942 │ 287.91281923300886 │ 248  │
└─────────┴────────┴──────────────┴─────────┴────────────────────────────────────────────────────────────┴───────────────────┴────────────────────┴──────┘
Metrics:
┌─────────┬─────────┬───────────────────┬────────────────────┬──────┐
│ (index) │ latency │ tpot              │ throughput         │ ttft │
├─────────┼─────────┼───────────────────┼────────────────────┼──────┤
│ p50     │ 12021   │ 3.473273620340942 │ 287.91281923300886 │ 248  │
│ p75     │ 12021   │ 3.473273620340942 │ 287.91281923300886 │ 248  │
│ p95     │ 12021   │ 3.473273620340942 │ 287.91281923300886 │ 248  │
└─────────┴─────────┴───────────────────┴────────────────────┴──────┘

info

The JSON benchmark file is located on ~cortex\benchmark\outpout.json.
This command uses a model that has been downloaded to your file system. Downloads a model by using the pull or run command.

Options

Option	Description	Required	Default value	Example
`model_id`	The model identifier you want to benchmark.	No	`Prompt to select from the available models`	`mistral`
`-n`, `--num_rounds <num_rounds>`	Number of rounds to run the benchmark.	No	`10`	`-n 20`
`-c`, `--concurrency <concurrency>`	Number of concurrent requests to run the benchmark.	No	`false`	`-c 5`
`-o`, `--output <output>`	Output format for the benchmark results. Choices are `json` or table format.	No	`json`	`-o json`
`-h`, `--help`	Display help information for the command.	No	-	`-h`

Usage​

Options​

Usage

Options