Skip to main content

Benchmarking

warning

🚧 Cortex Platform is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.

Benchmark is a feature to benchmark and analyze the performance of a specific AI model in your hardware system. This will determine how the hardware impacts the model's response time and overall throughput in different scenarios.

Usage​


cortex benchmark mistral

Collected Data​

Hardware Data​

Metrics NameMetric Data TypeExample ValueDescription
cpuObject{"avgLoad": 0.26, "currentLoad": 10.714285714285714, ...}CPU usage details, including average and current load.
gpuArray of Objects[{"vendor": "Apple", "model": "Apple M3 Pro", ...}]Details about the GPU, including vendor and model.
memObject{"total": 38654705664, "free": 368312320, "used": 38286393344, ...}Memory details, including total, free, and used memory.
resourceChangeObject{"cpu": null, "mem": 0.15513102213541669}Changes in resource usage, such as CPU and memory.

Model Data​

Model ID with runtime parameters:

  • max_length
  • temperature
  • kv-cache size (TBU)

Model Inference Performance​

Metrics NameMetric Data TypeExample ValueDescription
tokensInteger2048The total number of tokens processed.
token_lengthInteger78The length of each token processed.
latencyInteger662 msThe overall time it takes for the model to generate the full response for a user. Calculated as: latency = (TTFT) + (TPOT) * (the number of tokens to be generated).
tpot (Time per Output Token)Float8.487179487179487Time to generate an output token for each user querying our system.
throughput Float117.82477341389728 tokens/sThe number of output tokens per second an inference server can generate across all users and requests.
ttft (Time to First Token)Integer257 msHow quickly users start seeing the model's output after entering their query.

Data per Round of Testing​

The overall metrics data of:

  • P50 (50th Percentile / Median): The value below which 50% of the data points fall.
  • P75 (75th Percentile): The value below which 75% of the data points fall.
  • P95 (95th Percentile): The value below which 95% of the data points fall.
  • avg (Average / Mean): The arithmetic mean of all the data points.
note

Learn more about Benchmarking capabilities: