RepoFlow | Java 18 to 25 Benchmarks: How Performance Evolved Over Time

After benchmarking version to version performance across a few runtimes (like Node.js and Python), we got a lot of requests to run Java next. We reran the suite from scratch for Eclipse Temurin Java 18 through 25 on an Apple M4 Mac mini, and expanded it beyond microbenchmarks with a synthetic application benchmark plus a larger set of microbenchmarks.

JDK versions tested

Java 18.0.2.1 (Temurin)
Java 19.0.2 (Temurin)
Java 20.0.2 (Temurin)
Java 21.0.10 (Temurin, LTS)
Java 22.0.2 (Temurin)
Java 23.0.2 (Temurin)
Java 24.0.2 (Temurin)
Java 25.0.2 (Temurin, LTS)

Synthetic application benchmark

This part of the suite is a synthetic application style workload. It is meant to be closer to a real service request than a single microbenchmark. Each operation mixes CPU work, allocations, and light data structure work:

Picks a pre generated request payload (fixed size bytes)
Scans a few fields (id, ts, user) out of the payload
Computes a SHA 256 digest of the payload plus a few fields
Touches a ConcurrentHashMap cache with a controlled hit rate
Base64 encodes the digest and builds a small response string
Allocates extra bytes per operation to apply GC pressure (256 bytes per op in this run)

The harness reports throughput, latency, garbage collection activity, and CPU and memory sampling.

Benchmark configuration

Threads: 10
Synthetic application benchmark: warmup 20s, measure 10m, 5 repeats per JDK, with cooldown between tests
Microbenchmarks: warmup 5s, measure 10s, 5 repeats per JDK, with cooldown between tests
Heap profiles: -Xms512m -Xmx512m, -Xms2g -Xmx2g, -Xms4g -Xmx4g
Payload: 1024 bytes (pool 1024)
Cache: size 50,000, key space 200,000, hit attempt rate 0.8
Extra allocation: 256 bytes per op
Compression is disabled in the synthetic application benchmark, and micro_deflate measures Deflater throughput

Synthetic application benchmark results

These charts show the synthetic application benchmark results. Summary bar charts show 512m, 2g, and 4g heap profiles for each Java version, ordered left to right. The time series charts stay on -Xms2g -Xmx2g so the lines stay readable.

Throughput by heap size

Average throughput in M ops/s across 5 repeats.

Higher is better

Request latency by heap size

Average per-request latency in microseconds across 5 repeats.

Lower is better

Garbage collection time by heap size

Average total garbage collection time during the measurement window in milliseconds.

Lower is better

CPU and memory

CPU utilization by heap size

Average process CPU utilization as a percent of all available cores.

Lower is better

Resident memory by heap size

Average resident set size in MiB.

Lower is better

Java heap used by heap size

Average Java heap used in MiB, not heap capacity.

Lower is better

Non-heap memory by heap size

Average non-heap memory used in MiB.

Lower is better

Time series

These charts show one sample per second during the measurement phase, using heap profile -Xms2g -Xmx2g. Each line is a Java version. Values are the median across 5 repeats for each second. The smoothing control defaults to 45s for this longer run. Set it to 1s for raw values. The x axis is minutes into measurement.

Heap used over time

Heap used (MiB) sampled once per second. Each line is a Java version. Values are the median across 5 repeats for each second.

Lower is better

Smoothing window

RSS over time

Per-second RSS (MiB) during measurement.

Lower is better

CPU utilization over time

Per-second process CPU utilization (percent of all cores). Some series start at t=1s due to initial sample availability.

Lower is better

Smoothing window

Microbenchmark results

Each microbenchmark chart shows 512m, 2g, and 4g heap profiles for each Java version, ordered left to right. Values are average throughput in M ops/s across 5 repeats.

JSON parsing

Repeated parsing of a small JSON payload.

Higher is better

JSON serialization

Repeated serialization of a small object to JSON.

Higher is better

SHA-256 hashing

Repeated SHA-256 digest calculation on the same payload size.

Higher is better

Base64 encoding and decoding

Repeated base64 encode and decode on the same payload size.

Higher is better

Regular expression matching

Repeated field extraction from a short string using a regular expression.

Higher is better

Sorting array of numbers

Repeated sorting of an int[] array.

Higher is better

Concurrent hash map updates

High-churn ConcurrentHashMap get and put workload.

Higher is better

Deflate compression

Repeated Deflater compression throughput on the same payload size.

Higher is better

How the tests were performed

Host: Mac mini (Apple M4, 10 cores, 16 GB RAM), macOS 26.3 (arm64)
JDKs: Eclipse Temurin Docker images for Java 18–25 (arm64)
Container: LinuxKit (Docker) environment with 10 available processors
Heaps: -Xms512m -Xmx512m, -Xms2g -Xmx2g, -Xms4g -Xmx4g
Runs: 5 repeats per JDK and heap profile, with cooldown between tests
Statistics: summary charts use averages across those 5 repeats, and time series use the median value at each second across the same 5 repeats
Harness: purpose-built warmup + measurement runner (not JMH)

Conclusion

In this rerun, synthetic application throughput generally trends higher in the newest releases, with a few releases in the middle that land closer to older versions. Microbenchmark results are more mixed, so the charts above are the best way to see which hot paths matter most for your workload.

Let us know what you would like us to benchmark next.
Happy Benchmark Tuesday!

Java 18 to 25 Benchmarks: How Performance Evolved Over Time

JDK versions tested

Synthetic application benchmark

Benchmark configuration

Synthetic application benchmark results

Throughput by heap size

Request latency by heap size

Garbage collection time by heap size

CPU and memory

CPU utilization by heap size

Resident memory by heap size

Java heap used by heap size

Non-heap memory by heap size

Time series

Heap used over time

RSS over time

CPU utilization over time

Microbenchmark results

JSON parsing

JSON serialization

SHA-256 hashing

Base64 encoding and decoding

Regular expression matching

Sorting array of numbers

Concurrent hash map updates

Deflate compression

How the tests were performed

Conclusion

Node.js vs Deno vs Bun Performance Benchmarks

Python 3.9 to 3.14 Performance Benchmarks for Official Python (CPython)

Node.js 16 to 25 Performance Benchmarks

Java 18 to 25 Benchmarks: How Performance Evolved Over Time

JDK versions tested

Synthetic application benchmark

Benchmark configuration

Synthetic application benchmark results

Throughput by heap size

Request latency by heap size

Garbage collection time by heap size

CPU and memory

CPU utilization by heap size

Resident memory by heap size

Java heap used by heap size

Non-heap memory by heap size

Time series

Heap used over time

RSS over time

CPU utilization over time

Microbenchmark results

JSON parsing

JSON serialization

SHA-256 hashing

Base64 encoding and decoding

Regular expression matching

Sorting array of numbers

Concurrent hash map updates

Deflate compression

How the tests were performed

Conclusion

Share article

Node.js vs Deno vs Bun Performance Benchmarks

Python 3.9 to 3.14 Performance Benchmarks for Official Python (CPython)

Node.js 16 to 25 Performance Benchmarks