RepoFlow Team · Mar 10, 2026

Java 18 to 25 Benchmarks: How Performance Evolved Over Time

Synthetic application and microbenchmark results for Eclipse Temurin Java 18 to 25 on Apple M4

After benchmarking version to version performance across a few runtimes (like Node.js and Python), we got a lot of requests to run Java next. We reran the suite from scratch for Eclipse Temurin Java 18 through 25 on an Apple M4 Mac mini, and expanded it beyond microbenchmarks with a synthetic application benchmark plus a larger set of microbenchmarks.

JDK versions tested

  1. Java 18.0.2.1 (Temurin)
  2. Java 19.0.2 (Temurin)
  3. Java 20.0.2 (Temurin)
  4. Java 21.0.10 (Temurin, LTS)
  5. Java 22.0.2 (Temurin)
  6. Java 23.0.2 (Temurin)
  7. Java 24.0.2 (Temurin)
  8. Java 25.0.2 (Temurin, LTS)

Synthetic application benchmark

This part of the suite is a synthetic application style workload. It is meant to be closer to a real service request than a single microbenchmark. Each operation mixes CPU work, allocations, and light data structure work:

  1. Picks a pre generated request payload (fixed size bytes)
  2. Scans a few fields (id, ts, user) out of the payload
  3. Computes a SHA 256 digest of the payload plus a few fields
  4. Touches a ConcurrentHashMap cache with a controlled hit rate
  5. Base64 encodes the digest and builds a small response string
  6. Allocates extra bytes per operation to apply GC pressure (256 bytes per op in this run)

The harness reports throughput, latency, garbage collection activity, and CPU and memory sampling.

Benchmark configuration

  1. Threads: 10
  2. Synthetic application benchmark: warmup 20s, measure 10m, 5 repeats per JDK, with cooldown between tests
  3. Microbenchmarks: warmup 5s, measure 10s, 5 repeats per JDK, with cooldown between tests
  4. Heap profiles: -Xms512m -Xmx512m, -Xms2g -Xmx2g, -Xms4g -Xmx4g
  5. Payload: 1024 bytes (pool 1024)
  6. Cache: size 50,000, key space 200,000, hit attempt rate 0.8
  7. Extra allocation: 256 bytes per op
  8. Compression is disabled in the synthetic application benchmark, and micro_deflate measures Deflater throughput

Synthetic application benchmark results

These charts show the synthetic application benchmark results. Summary bar charts show 512m, 2g, and 4g heap profiles for each Java version, ordered left to right. The time series charts stay on -Xms2g -Xmx2g so the lines stay readable.

Throughput by heap size

Average throughput in M ops/s across 5 repeats.

Higher is better

Request latency by heap size

Average per-request latency in microseconds across 5 repeats.

Lower is better

Garbage collection time by heap size

Average total garbage collection time during the measurement window in milliseconds.

Lower is better

CPU and memory

CPU utilization by heap size

Average process CPU utilization as a percent of all available cores.

Lower is better
Resident memory by heap size

Average resident set size in MiB.

Lower is better
Java heap used by heap size

Average Java heap used in MiB, not heap capacity.

Lower is better
Non-heap memory by heap size

Average non-heap memory used in MiB.

Lower is better

Time series

These charts show one sample per second during the measurement phase, using heap profile -Xms2g -Xmx2g. Each line is a Java version. Values are the median across 5 repeats for each second. The smoothing control defaults to 45s for this longer run. Set it to 1s for raw values. The x axis is minutes into measurement.

Heap used over time

Heap used (MiB) sampled once per second. Each line is a Java version. Values are the median across 5 repeats for each second.

Lower is better
Smoothing window Smooths the line using a centered moving average across nearby seconds. Default is 45s. 1s shows raw values.
RSS over time

Per-second RSS (MiB) during measurement.

Lower is better
CPU utilization over time

Per-second process CPU utilization (percent of all cores). Some series start at t=1s due to initial sample availability.

Lower is better
Smoothing window Smooths the line using a centered moving average across nearby seconds. Default is 45s. 1s shows raw values.

Microbenchmark results

Each microbenchmark chart shows 512m, 2g, and 4g heap profiles for each Java version, ordered left to right. Values are average throughput in M ops/s across 5 repeats.

JSON parsing

Repeated parsing of a small JSON payload.

Higher is better

JSON serialization

Repeated serialization of a small object to JSON.

Higher is better

SHA-256 hashing

Repeated SHA-256 digest calculation on the same payload size.

Higher is better

Base64 encoding and decoding

Repeated base64 encode and decode on the same payload size.

Higher is better

Regular expression matching

Repeated field extraction from a short string using a regular expression.

Higher is better

Sorting array of numbers

Repeated sorting of an int[] array.

Higher is better

Concurrent hash map updates

High-churn ConcurrentHashMap get and put workload.

Higher is better

Deflate compression

Repeated Deflater compression throughput on the same payload size.

Higher is better

How the tests were performed

  1. Host: Mac mini (Apple M4, 10 cores, 16 GB RAM), macOS 26.3 (arm64)
  2. JDKs: Eclipse Temurin Docker images for Java 18–25 (arm64)
  3. Container: LinuxKit (Docker) environment with 10 available processors
  4. Heaps: -Xms512m -Xmx512m, -Xms2g -Xmx2g, -Xms4g -Xmx4g
  5. Runs: 5 repeats per JDK and heap profile, with cooldown between tests
  6. Statistics: summary charts use averages across those 5 repeats, and time series use the median value at each second across the same 5 repeats
  7. Harness: purpose-built warmup + measurement runner (not JMH)

Conclusion

In this rerun, synthetic application throughput generally trends higher in the newest releases, with a few releases in the middle that land closer to older versions. Microbenchmark results are more mixed, so the charts above are the best way to see which hot paths matter most for your workload.

Let us know what you would like us to benchmark next.
Happy Benchmark Tuesday!

Join our mailing list