After benchmarking version to version performance across a few runtimes (like Node.js and Python), we got a lot of requests to run Java next. We reran the suite from scratch for Eclipse Temurin Java 18 through 25 on an Apple M4 Mac mini, and expanded it beyond microbenchmarks with a synthetic application benchmark plus a larger set of microbenchmarks.
JDK versions tested
- Java 18.0.2.1 (Temurin)
- Java 19.0.2 (Temurin)
- Java 20.0.2 (Temurin)
- Java 21.0.10 (Temurin, LTS)
- Java 22.0.2 (Temurin)
- Java 23.0.2 (Temurin)
- Java 24.0.2 (Temurin)
- Java 25.0.2 (Temurin, LTS)
Synthetic application benchmark
This part of the suite is a synthetic application style workload. It is meant to be closer to a real service request than a single microbenchmark. Each operation mixes CPU work, allocations, and light data structure work:
- Picks a pre generated request payload (fixed size bytes)
- Scans a few fields (id, ts, user) out of the payload
- Computes a SHA 256 digest of the payload plus a few fields
- Touches a ConcurrentHashMap cache with a controlled hit rate
- Base64 encodes the digest and builds a small response string
- Allocates extra bytes per operation to apply GC pressure (256 bytes per op in this run)
The harness reports throughput, latency, garbage collection activity, and CPU and memory sampling.
Benchmark configuration
- Threads: 10
- Synthetic application benchmark: warmup 20s, measure 10m, 5 repeats per JDK, with cooldown between tests
- Microbenchmarks: warmup 5s, measure 10s, 5 repeats per JDK, with cooldown between tests
- Heap profiles:
-Xms512m -Xmx512m,-Xms2g -Xmx2g,-Xms4g -Xmx4g - Payload: 1024 bytes (pool 1024)
- Cache: size 50,000, key space 200,000, hit attempt rate 0.8
- Extra allocation: 256 bytes per op
- Compression is disabled in the synthetic application benchmark, and
micro_deflatemeasuresDeflaterthroughput
Synthetic application benchmark results
These charts show the synthetic application benchmark results. Summary bar charts show 512m, 2g, and 4g heap profiles for each Java version, ordered left to right. The time series charts stay on -Xms2g -Xmx2g so the lines stay readable.
Throughput by heap size
Average throughput in M ops/s across 5 repeats.
Request latency by heap size
Average per-request latency in microseconds across 5 repeats.
Garbage collection time by heap size
Average total garbage collection time during the measurement window in milliseconds.
CPU and memory
CPU utilization by heap size
Average process CPU utilization as a percent of all available cores.
Resident memory by heap size
Average resident set size in MiB.
Java heap used by heap size
Average Java heap used in MiB, not heap capacity.
Non-heap memory by heap size
Average non-heap memory used in MiB.
Time series
These charts show one sample per second during the measurement phase, using heap profile -Xms2g -Xmx2g. Each line is a Java version. Values are the median across 5 repeats for each second. The smoothing control defaults to 45s for this longer run. Set it to 1s for raw values. The x axis is minutes into measurement.
Heap used over time
Heap used (MiB) sampled once per second. Each line is a Java version. Values are the median across 5 repeats for each second.
RSS over time
Per-second RSS (MiB) during measurement.
CPU utilization over time
Per-second process CPU utilization (percent of all cores). Some series start at t=1s due to initial sample availability.
Microbenchmark results
Each microbenchmark chart shows 512m, 2g, and 4g heap profiles for each Java version, ordered left to right. Values are average throughput in M ops/s across 5 repeats.
JSON parsing
Repeated parsing of a small JSON payload.
JSON serialization
Repeated serialization of a small object to JSON.
SHA-256 hashing
Repeated SHA-256 digest calculation on the same payload size.
Base64 encoding and decoding
Repeated base64 encode and decode on the same payload size.
Regular expression matching
Repeated field extraction from a short string using a regular expression.
Sorting array of numbers
Repeated sorting of an int[] array.
Concurrent hash map updates
High-churn ConcurrentHashMap get and put workload.
Deflate compression
Repeated Deflater compression throughput on the same payload size.
How the tests were performed
- Host: Mac mini (Apple M4, 10 cores, 16 GB RAM), macOS 26.3 (arm64)
- JDKs: Eclipse Temurin Docker images for Java 18–25 (arm64)
- Container: LinuxKit (Docker) environment with 10 available processors
- Heaps:
-Xms512m -Xmx512m,-Xms2g -Xmx2g,-Xms4g -Xmx4g - Runs: 5 repeats per JDK and heap profile, with cooldown between tests
- Statistics: summary charts use averages across those 5 repeats, and time series use the median value at each second across the same 5 repeats
- Harness: purpose-built warmup + measurement runner (not JMH)
Conclusion
In this rerun, synthetic application throughput generally trends higher in the newest releases, with a few releases in the middle that land closer to older versions. Microbenchmark results are more mixed, so the charts above are the best way to see which hot paths matter most for your workload.
Let us know what you would like us to benchmark next.
Happy Benchmark Tuesday!