warning
🚧 Cortex Platform is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
cortex benchmark
info
This CLI command calls the following API endpoint:
This command benchmarking your hardware to analyze the selected model performance on your system.
Usage
cortex benchmark [options] [model_id]
For example, it will return the following:
## JSON Format results: [ { round: 1, results: [ { tokens: 2048, token_length: 3567, latency: 12012, resourceChange: { cpuLoad: -45.543634551575586, mem: -0.22862459579142327 }, tpot: 3.3675357443229603, throughput: 296.95304695304696, ttft: 182 } ], hardwareChanges: { cpuLoad: 204.51297399539473, mem: 0.93911874639132 } } ], metrics: { p50: { latency: 12012, tpot: 3.3675357443229603, throughput: 296.95304695304696, ttft: 182 }, p75: { latency: 12012, tpot: 3.3675357443229603, throughput: 296.95304695304696, ttft: 182 }, p95: { latency: 12012, tpot: 3.3675357443229603, throughput: 296.95304695304696, ttft: 182 } }, model: { modelId: 'tinyllama', engine: 'llamacpp', status: 'running', duration: '2h 38m 44s', ram: '-', vram: '-' }}## Table FormatResults:Round 1:┌─────────┬────────┬──────────────┬─────────┬────────────────────────────────────────────────────────────┬───────────────────┬────────────────────┬──────┐│ (index) │ tokens │ token_length │ latency │ resourceChange │ tpot │ throughput │ ttft │├─────────┼────────┼──────────────┼─────────┼────────────────────────────────────────────────────────────┼───────────────────┼────────────────────┼──────┤│ 0 │ 2048 │ 3461 │ 12021 │ { cpuLoad: -37.98941038167731, mem: -0.30508369223866116 } │ 3.473273620340942 │ 287.91281923300886 │ 248 │└─────────┴────────┴──────────────┴─────────┴────────────────────────────────────────────────────────────┴───────────────────┴────────────────────┴──────┘Metrics:┌─────────┬─────────┬───────────────────┬────────────────────┬──────┐│ (index) │ latency │ tpot │ throughput │ ttft │├────── ───┼─────────┼───────────────────┼────────────────────┼──────┤│ p50 │ 12021 │ 3.473273620340942 │ 287.91281923300886 │ 248 ││ p75 │ 12021 │ 3.473273620340942 │ 287.91281923300886 │ 248 ││ p95 │ 12021 │ 3.473273620340942 │ 287.91281923300886 │ 248 │└─────────┴─────────┴───────────────────┴────────────────────┴──────┘
info
Options
Option | Description | Required | Default value | Example |
---|---|---|---|---|
model_id | The model identifier you want to benchmark. | No | Prompt to select from the available models | mistral |
-n , --num_rounds <num_rounds> | Number of rounds to run the benchmark. | No | 10 | -n 20 |
-c , --concurrency <concurrency> | Number of concurrent requests to run the benchmark. | No | false | -c 5 |
-o , --output <output> | Output format for the benchmark results. Choices are json or table format. | No | json | -o json |
-h , --help | Display help information for the command. | No | - | -h |