Installing Apache Solr Benchmark
You can install Apache Solr Benchmark directly on a host running Linux or macOS. This page provides general hardware considerations and step-by-step installation instructions.
Choosing appropriate hardware
When selecting a host, consider which workloads you want to run. To see a list of available benchmark workloads, visit the solr-benchmark-workloads repository on GitHub. Make sure that the Solr Benchmark host has enough free storage space to store the compressed data corpus and the fully decompressed data once benchmarking begins.
Use the following table to estimate the minimum free space required (compressed + uncompressed):
| Workload name | Document count | Compressed size | Uncompressed size |
|---|---|---|---|
| eventdata | 20,000,000 | 756.0 MB | 15.3 GB |
| geonames | 11,396,503 | 252.9 MB | 3.3 GB |
| geopoint | 60,844,404 | 482.1 MB | 2.3 GB |
| http_logs | 247,249,096 | 1.2 GB | 31.1 GB |
| noaa | 33,659,481 | 949.4 MB | 9.0 GB |
| nyc_taxis | 165,346,692 | 4.5 GB | 74.3 GB |
| pmc | 574,199 | 5.5 GB | 21.7 GB |
| so | 36,062,278 | 8.9 GB | 33.1 GB |
Your Solr Benchmark host should use solid-state drives (SSDs) for storage. Spinning-disk hard drives introduce performance bottlenecks that make benchmark results unreliable.
Prerequisites
Before installing Solr Benchmark, ensure the following software is available on your host:
- Python 3.10 or later — required for all pipelines.
- pip — Python package manager.
- Git 2.3 or later — required to fetch workloads from a remote repository.
- Docker — required for the
--pipeline=dockerpipeline, which starts a Solr cluster automatically before the run. - JDK 21 — required for the
--pipeline=from-distributionpipeline, which downloads and installs a Solr release locally.
Checking software dependencies
Use pyenv to manage multiple versions of Python on your host. This is especially useful if your system Python is older than 3.10.
-
Check that Python 3.10 or later is installed:
python3 --version -
Check that
pipis installed and functional:pip --version -
Check that Git 2.3 or later is installed:
git --version
Installing on Linux and macOS
Apache Solr Benchmark is not yet published on PyPI. Install it directly from the source repository.
Clone the repository and install in editable mode:
git clone https://github.com/janhoy/solr-benchmark.git
cd solr-benchmark
pip install -e .
After the installation completes, verify it is working:
solr-benchmark --version
Virtual environment (recommended)
Install Solr Benchmark inside a virtual environment to avoid dependency conflicts with other Python packages on your system:
git clone https://github.com/janhoy/solr-benchmark.git
cd solr-benchmark
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
Developer install
To also install development and test dependencies:
pip install -e ".[develop]"
Upgrading
To pick up the latest changes, pull from the repository and reinstall:
cd solr-benchmark
git pull
pip install -e .
Starting a Solr cluster for benchmarking
Solr Benchmark can start and stop a Solr cluster for you as part of a benchmark run using two built-in pipelines:
--pipeline=docker— pulls the officialsolrDocker image and starts a single-node Solr cluster before the run. No JDK is required.--pipeline=from-distribution— downloads a Solr release archive, installs it locally, and starts a cluster. JDK 21 must be available on the host.
If you already have a running Solr cluster, use --pipeline=benchmark-only and point Solr Benchmark at it with --target-hosts.
See the run command reference for the full list of pipeline options and flags.
Directory structure
After running Solr Benchmark for the first time, all related files are stored under ~/.solr-benchmark/:
~/.solr-benchmark/
├── benchmark.ini
├── benchmarks/
│ ├── data/
│ │ └── nyc_taxis/
│ ├── distributions/ # populated by --pipeline=from-distribution
│ │ └── solr-9.10.1.tgz
│ ├── test-runs/
│ │ └── <run-id>/
│ │ └── test_run.json
│ └── workloads/
│ └── default/
│ └── nyc_taxis/
├── logging.json
└── logs/
└── benchmark.log
benchmark.ini— main configuration file. See Configuring.benchmarks/data/— downloaded and decompressed workload data corpora.benchmarks/distributions/— cached Solr release archives (only present when using thefrom-distributionpipeline).benchmarks/test-runs/— one subdirectory per run, each containingtest_run.jsonwith the computed results for that run.benchmarks/workloads/— cached workload definitions fetched from the workload repository.logging.json— logging configuration. See Logging.logs/— benchmark run logs, useful for diagnosing errors.
Next steps
- Configuring — customize
benchmark.inifor your environment. - Running workloads — run your first full benchmark.