This note compares two ways to run a high-throughput business state engine on AWS.

It is an EC2 instance cost model, not a full production bill. Local instance SSD is included by choosing EC2 families with instance-store NVMe. It does not include EBS, snapshots, cross-AZ traffic, data transfer, load balancers, monitoring, NAT, support plans, reserved instances, savings plans, or managed-service markups.

Assumptions

Region and billing:

  • AWS region: us-east-1.
  • Pricing model: Linux On-Demand EC2.
  • Month length: 730 hours.
  • Kafka, etcd, Raft, and StateVec are assumed to run on EC2 instances.
  • Local SSD is modeled through instance-store EC2 families. There is no separate per-GB local SSD line item; the storage requirement changes the instance type.

Instance mapping:

Logical needAWS instancevCPUMemoryLocal NVMeHourlyMonthly
2C / 4G coordinationc6id.large24 GiB118 GB$0.10080$73.58
4C / 16G + 200 GB local SSDm6id.xlarge416 GiB237 GB$0.23730$173.23
8C / 16G + 400 GB local SSDc6id.2xlarge816 GiB474 GB$0.40320$294.34

Price references:

  • c6id.large: 2 vCPU / 4 GiB / 118 GB local NVMe, about $0.1008/hr.
  • m6id.xlarge: 4 vCPU / 16 GiB / 237 GB local NVMe, about $0.2373/hr.
  • c6id.2xlarge: 8 vCPU / 16 GiB / 474 GB local NVMe, about $0.4032/hr.

Option A: Raft, 16 Shards

In this document, "Raft" does not mean a specific open-source implementation. It refers to a distributed business architecture that gradually evolves toward sharded Raft replicated state machines: more state is partitioned into shards, each shard gets a replicated group, and event delivery is built around those groups.

Target setup:

  • Peak throughput target: 100k TPS.
  • Raft: 16 shards.
  • Each shard: 3 nodes.
  • Each Raft node target: 4C / 16G.
  • Each Raft node local SSD target: 200 GB, modeled as m6id.xlarge.
  • Kafka cluster: 3 brokers.
  • Each Kafka broker: 8C / 16G, modeled as c6id.2xlarge.

This model represents that partitioned Raft business engine architecture: each shard has its own replicated state-machine group, and the full system needs coordination and event delivery around those shard groups. The cost driver is the replicated state-machine footprint multiplied by shard count.

No private topology or domain-specific service layout is assumed here.

Node count:

+----------------------+----------+-------------------------+--------------+
| Component            | Count    | Modeled size            | Instance     |
+----------------------+----------+-------------------------+--------------+
| Raft nodes           | 16 * 3   | 4C / 16G + 200 GB SSD   | m6id.xlarge  |
| Kafka brokers        | 3        | 8C / 16G + 400 GB SSD   | c6id.2xlarge |
+----------------------+----------+-------------------------+--------------+

Monthly EC2 instance cost:

ComponentCountMonthly / nodeMonthly total
Raft nodes48$173.23$8,315.04
Kafka brokers3$294.34$883.01
Total51$9,198.00 / month

Annualized:

$9,198.00 * 12 = $110,376.00 / year

Option B: StateVec HA Queue Shape

Target setup:

  • etcd: 3 nodes.
  • Each etcd node: 2C / 4G, modeled as c6id.large.
  • Kafka cluster: 3 brokers.
  • Each Kafka broker: 8C / 16G, modeled as c6id.2xlarge.
  • StateVec: 3 nodes.
  • Each StateVec node target: 4C / 16G.
  • Each StateVec node local SSD target: 400 GB, modeled as c6id.2xlarge.

The StateVec runtime target is still 4C / 16G; c6id.2xlarge is used here because local instance SSD capacity is tied to the EC2 instance type. It provides 474 GB local NVMe, which satisfies the 400 GB local SSD assumption.

Node count:

+----------------------+----------+-------------------------+--------------+
| Component            | Count    | Modeled size            | Instance     |
+----------------------+----------+-------------------------+--------------+
| etcd nodes           | 3        | 2C / 4G                 | c6id.large   |
| Kafka brokers        | 3        | 8C / 16G + 400 GB SSD   | c6id.2xlarge |
| StateVec nodes       | 3        | 4C / 16G + 400 GB SSD   | c6id.2xlarge |
+----------------------+----------+-------------------------+--------------+

Monthly EC2 instance cost:

ComponentCountMonthly / nodeMonthly total
etcd nodes3$73.58$220.75
Kafka brokers3$294.34$883.01
StateVec nodes3$294.34$883.01
Total9$1,986.77 / month

Annualized:

$1,986.77 * 12 = $23,841.22 / year

EC2 Instance Cost Comparison

ArchitectureEC2 nodesMonthly EC2Annual EC2
Raft 16 shards + Kafka51$9,198.00$110,376.00
StateVec + etcd + Kafka9$1,986.77$23,841.22

Difference:

Monthly delta = $9,198.00 - $1,986.77 = $7,211.23
Annual delta  = $110,376.00 - $23,841.22 = $86,534.78
Cost ratio    = $9,198.00 / $1,986.77 = 4.63x

Under these assumptions, the StateVec setup has about 4.6x lower EC2 instance spend than the 16-shard Raft topology, even after modeling local instance SSD.

Why the Shape Changes

The cost difference is mostly structural.

The Raft layout multiplies the state engine by shard:

16 shards * 3 replicas = 48 state-engine nodes

That gives every shard its own replicated execution group. It can work, but the cost grows with shard count. Operational complexity also grows with shard count: placement, balancing, hot shards, shard-level recovery, and cross-shard business workflows become part of the system design.

The StateVec layout keeps the critical state engine as a smaller replicated runtime group:

3 StateVec nodes + shared Kafka + etcd coordination

Kafka and etcd still exist, but the business execution core is not multiplied by 16. That is where the EC2 reduction comes from.

Reading This Correctly

This document does not say StateVec always costs $1,986.77/month, or that a Raft system always costs $9,198.00/month.

It says that for the specific setups above:

  • Raft cost scales with shards * replicas.
  • StateVec cost scales with a smaller replicated execution group plus Kafka and etcd.
  • Local instance SSD changes the EC2 instance type, but it does not change the main cost driver.

Use this as a starting point for a workload-specific AWS estimate, not as a universal price quote.

Source Notes

The instance storage sizes are based on AWS EC2 instance type specifications for m6id and c6id local NVMe families. The hourly prices are public On-Demand price snapshots for Linux instances in us-east-1; check the AWS calculator before using this for procurement.