Concepts

Workloads

A workload is the central concept in Apache Solr Benchmark. It defines:

  • The data to load (corpora — compressed NDJSON files)
  • The collections to create and configure
  • The operations to run (bulk indexing, search queries, commits, etc.)
  • The challenges (test procedures) that sequence those operations

Workloads are defined in a workload.json file. Pre-built workloads for Apache Solr are at https://github.com/janhoy/solr-benchmark-workloads.

Challenges (Test Procedures)

A challenge (also called test procedure) is a named configuration within a workload that specifies a particular benchmark scenario. A workload can have multiple challenges; you select one with --challenge when running the benchmark.

Pipelines

A pipeline is a sequence of high-level phases that a benchmark run executes:

Pipeline Description
benchmark-only Run against an existing Solr cluster; no provisioning
docker Start a Solr cluster via Docker, then benchmark, then tear down
from-distribution Download and install Solr, benchmark, tear down
from-sources Build Solr from source, install, benchmark, tear down

Collections

A collection is the Solr equivalent of an OpenSearch index — a logical grouping of documents distributed across shards. Collections are defined in the workload’s "collections" array and are created before benchmarking begins.

Configsets

A configset is a named set of Solr configuration files (primarily schema.xml and solrconfig.xml) stored in ZooKeeper. Every collection references a configset. Supply a custom configset in your workload’s configset-path. See the Apache Solr Reference Guide for more information.

Operations

Operations are the individual benchmarking actions. Built-in operations include:

Operation Description
bulk-index Index a batch of documents from a corpus
search Execute a Solr query
commit Issue a hard commit to Solr
optimize Issue an optimize (force-merge) command
create-collection Create a Solr collection
delete-collection Delete a Solr collection
raw-request Execute an arbitrary Solr Admin API request

Schedules

A schedule controls how an operation executes: number of iterations, target throughput (ops/s), warmup iterations, and parallel client count.

Corpora

Corpora are the datasets used by workloads. Each corpus references one or more data files (gzip-compressed NDJSON). Apache Solr Benchmark downloads corpora from the workload repository or a configured data URL.

Facets

Facets are Solr’s aggregation mechanism — the Solr equivalent of OpenSearch aggregations. When using the Converter Tool, OpenSearch aggregation expressions are translated into Solr facet syntax.


Metrics

At the end of each benchmark run, Apache Solr Benchmark prints a summary table and saves it to disk. The table covers these metrics for every task in the challenge:

Metric Description
Throughput Operations completed per second
Service time Round-trip time from client request to client receipt of response
Latency Service time plus any queue waiting time (differs from service time only when target-throughput is set)
Error rate Fraction of operations that returned an error

How Apache Solr Benchmark defines service time and latency

These terms are often used interchangeably in the industry but have distinct meanings in Apache Solr Benchmark:

Metric Common definition Apache Solr Benchmark definition
Service time Server processing time, excluding network Time from when the HTTP client sends the request to when it receives the full response — including network latency, load balancer overhead, and serialization/deserialization
Latency Service time plus network latency Service time plus any time the request spent waiting in a local queue before being dispatched — only non-zero when target-throughput is configured

Processing time

Processing time measures the overhead that Apache Solr Benchmark adds during a request — for example, setting up the request context or dispatching to the client library. It is distinct from and excluded from service time measurements. This value is useful for understanding the benchmarking tool’s own footprint.

Service time

Service time is measured from the moment the HTTP client sends the request until the moment it receives the complete response. It includes:

  • Network round-trip time
  • Load balancer overhead (if any)
  • Server processing time
  • Serialization and deserialization on both ends

Latency

Latency is service time plus any time the request spent waiting in a local queue before being sent. A queue only builds up when you set target-throughput on a task and the cluster cannot keep up with the requested rate. In that case, subsequent requests must wait for an earlier request to complete, adding queue time to the total latency.

When no target-throughput is set — or when the cluster can handle every request as fast as they arrive — latency equals service time.

Throughput

Throughput is the rate at which Apache Solr Benchmark issues requests, assuming that responses are returned instantaneously. It is not a measure of how many requests completed; it is a measure of how quickly requests were dispatched.

The two benchmark modes

Pure throughput mode (target-throughput not set): Requests are issued as fast as possible — each client sends one request, waits for the response, then sends the next. Latency equals service time.

Throughput-throttled mode (target-throughput set): Requests are issued at a target rate (in ops/s). If you set a rate higher than the cluster can sustain, requests pile up in the local queue and latency grows. Set target-throughput to a value you know is achievable; see Target throughput for practical guidance.