StateVec Bank HA Kafka Performance Run

This report turns the 2026-05-12 Bank HA runner into a public engineering note.

The run used a three-node Kafka 4.2 KRaft cluster with replication factor 3 and one partition per benchmark topic. It is a measurement for this hardware, Incus placement, workload, and configuration. It is not a general product throughput ceiling.

Download the result package: CSV summaries, tx sequence integrity data, the original report, and generated charts.

Test setup

The run used an Incus-based environment on one physical host:

CPU: AMD Ryzen 9 9950X3D, 16 cores / 32 threads.
Memory: 249 GiB visible to Linux.
Coordination: 3 etcd containers.
Log and publication path: 3 Kafka 4.2 KRaft brokers, RF=3, one partition per benchmark topic.
StateVec Bank runtime: one leader container and one standby container.
Workload: mixed_bank_flow from bank_ha_host_runner.

The storage was ordinary consumer NVMe, split by role:

infra containers on Lexar NM1090 PRO;
Bank leader on Crucial T710;
Bank standby on WD_BLACK SN8100.

The container sizing was intentionally modest:

Kafka brokers: 3 containers, each 2C / 8GiB.
etcd: 3 containers, each 1C / 1GiB.
Bank leader and standby: each 4C / 16GiB.

The benchmark path was:

host runner
    |
    | commands
    v
Kafka RF=3 command topic
    |
    v
Bank leader
    |
    | committed TxResult / TxLog
    v
Kafka RF=3 txlog topic
    |
    v
Bank standby durable + apply

Bank leader -> Kafka RF=3 publication topic -> host publication observer

Each benchmark case used a distinct run id and distinct command, replication, and publication topics.

What was measured

The report keeps four signals separate:

Measured TPS: average command throughput over measured windows.
Publication latency: sampled event sent -> publication observed latency.
Tx sequence completeness: leader committed tx, standby durable tx, and standby applied tx all reach the expected sequence.
Standby lag p99: sampled p99 standby replication lag in transactions.

The low-latency target for this run was:

p95 publication latency < 7 ms;
p99 publication latency < 10 ms;
tx sequence completeness must pass.

Each case used one warmup window and five measured windows. Warmup is excluded from the summaries.

Throughput and latency

Compression none

Target	Avg TPS	p50 ms	p95 ms	p99 ms	p95<7	p99<10	tx_seq
5,000	4,975	5.17	6.42	6.74	yes	yes	yes
10,000	9,956	5.28	6.21	6.43	yes	yes	yes
20,000	19,910	5.39	6.31	6.55	yes	yes	yes
50,000	49,787	5.74	66.80	84.01	no	no	yes
100,000	99,105	5.83	6.85	58.15	yes	no	yes
150,000	148,033	4.82	5.78	74.20	yes	no	yes

Compression snappy

Target	Avg TPS	p50 ms	p95 ms	p99 ms	p95<7	p99<10	tx_seq
5,000	4,975	5.05	6.35	6.68	yes	yes	yes
10,000	9,956	5.38	6.33	6.62	yes	yes	yes
20,000	19,901	5.53	6.45	6.78	yes	yes	yes
50,000	49,782	5.60	6.50	7.81	yes	yes	yes
100,000	99,299	5.92	6.78	7.33	yes	yes	yes
150,000	148,360	4.95	47.72	76.31	no	no	yes

Compression lz4

Target	Avg TPS	p50 ms	p95 ms	p99 ms	p95<7	p99<10	tx_seq
5,000	4,973	5.31	6.49	6.77	yes	yes	yes
10,000	9,958	5.37	6.30	6.60	yes	yes	yes
20,000	19,911	5.58	6.49	6.85	yes	yes	yes
50,000	49,790	5.66	6.60	134.46	yes	no	yes
100,000	99,244	5.92	6.77	7.37	yes	yes	yes
150,000	147,966	4.97	5.83	8.19	yes	yes	yes

Tx sequence completeness

All 18 Kafka matrix runs passed the tx sequence completeness gate. The gate checks:

leader_committed_tx_seq >= expected_tx_seq
standby_durable_tx_seq >= expected_tx_seq
standby_applied_tx_seq >= expected_tx_seq

Compression	Target	Expected	Leader	Standby durable	Standby applied	Result
none	150,000	18,000,100	18,000,100	18,000,100	18,000,100	yes
snappy	150,000	18,000,100	18,000,100	18,000,100	18,000,100	yes
lz4	150,000	18,000,100	18,000,100	18,000,100	18,000,100	yes

The full tx sequence table is included in the downloadable package.

Resource profile

The resource data uses process CPU percent, process memory, and disk IO time sampled from the Incus containers. Kafka CPU and IO are averaged across the three broker containers.

For snappy, the most consistent compression mode in this run:

Target	Leader CPU	Standby CPU	Kafka CPU	Leader IO	Standby IO	Kafka IO	Lag p99 tx
5,000	22.1	63.6	23.5	83.8	629.3	187.3	0
10,000	24.9	82.4	24.0	119.0	198.4	156.3	82
20,000	30.6	73.4	24.6	130.0	415.5	180.9	887
50,000	42.0	75.3	26.3	133.5	287.1	260.1	2,005
100,000	67.0	44.5	28.5	133.1	562.1	413.8	24,368
150,000	98.1	29.3	39.8	180.2	838.9	475.4	4,910,043

Lag p99 tx is sampled standby replication lag in transactions. It is a runtime pressure signal, not a tx sequence failure. Final leader, standby durable, and standby applied sequences all reached the expected sequence.

What this result says

The useful result is not a single headline TPS number. The run shows where pressure appears while keeping state completeness explicit:

The matrix contains 18 completed Kafka runs: 6 target rates times 3 compression modes.
13 of 18 runs met both p95 < 7 ms and p99 < 10 ms while passing tx sequence completeness.
Snappy was the most consistent compression mode through 100K TPS.
LZ4 produced the cleanest 150K TPS result in this matrix: 147,966 measured TPS, p95 5.83 ms, p99 8.19 ms, and tx sequence completeness passed.
Kafka broker CPU was not saturated in the stable low-latency range. At higher targets, leader CPU and standby durable IO became more visible.
All final state completeness checks passed across leader committed state, standby durable TxLog, and standby applied state.

Until a newer run changes the evidence, 150K is the clean public performance anchor for this Kafka HA setup.