This note compares two ways to run a high-throughput business state engine on AWS.
It is an EC2 instance cost model, not a full production bill. Local instance SSD is included by choosing EC2 families with instance-store NVMe. It does not include EBS, snapshots, cross-AZ traffic, data transfer, load balancers, monitoring, NAT, support plans, reserved instances, savings plans, or managed-service markups.
Assumptions
Region and billing:
- AWS region:
us-east-1. - Pricing model: Linux On-Demand EC2.
- Month length:
730hours. - Kafka, etcd, Raft, and StateVec are assumed to run on EC2 instances.
- Local SSD is modeled through instance-store EC2 families. There is no separate per-GB local SSD line item; the storage requirement changes the instance type.
Instance mapping:
| Logical need | AWS instance | vCPU | Memory | Local NVMe | Hourly | Monthly |
|---|---|---|---|---|---|---|
| 2C / 4G coordination | c6id.large | 2 | 4 GiB | 118 GB | $0.10080 | $73.58 |
| 4C / 16G + 200 GB local SSD | m6id.xlarge | 4 | 16 GiB | 237 GB | $0.23730 | $173.23 |
| 8C / 16G + 400 GB local SSD | c6id.2xlarge | 8 | 16 GiB | 474 GB | $0.40320 | $294.34 |
Price references:
c6id.large: 2 vCPU / 4 GiB / 118 GB local NVMe, about$0.1008/hr.m6id.xlarge: 4 vCPU / 16 GiB / 237 GB local NVMe, about$0.2373/hr.c6id.2xlarge: 8 vCPU / 16 GiB / 474 GB local NVMe, about$0.4032/hr.
Option A: Raft, 16 Shards
In this document, "Raft" does not mean a specific open-source implementation. It refers to a distributed business architecture that gradually evolves toward sharded Raft replicated state machines: more state is partitioned into shards, each shard gets a replicated group, and event delivery is built around those groups.
Target setup:
- Peak throughput target:
100k TPS. - Raft:
16shards. - Each shard:
3nodes. - Each Raft node target:
4C / 16G. - Each Raft node local SSD target:
200 GB, modeled asm6id.xlarge. - Kafka cluster:
3brokers. - Each Kafka broker:
8C / 16G, modeled asc6id.2xlarge.
This model represents that partitioned Raft business engine architecture: each shard has its own replicated state-machine group, and the full system needs coordination and event delivery around those shard groups. The cost driver is the replicated state-machine footprint multiplied by shard count.
No private topology or domain-specific service layout is assumed here.
Node count:
+----------------------+----------+-------------------------+--------------+
| Component | Count | Modeled size | Instance |
+----------------------+----------+-------------------------+--------------+
| Raft nodes | 16 * 3 | 4C / 16G + 200 GB SSD | m6id.xlarge |
| Kafka brokers | 3 | 8C / 16G + 400 GB SSD | c6id.2xlarge |
+----------------------+----------+-------------------------+--------------+Monthly EC2 instance cost:
| Component | Count | Monthly / node | Monthly total |
|---|---|---|---|
| Raft nodes | 48 | $173.23 | $8,315.04 |
| Kafka brokers | 3 | $294.34 | $883.01 |
| Total | 51 | $9,198.00 / month |
Annualized:
$9,198.00 * 12 = $110,376.00 / yearOption B: StateVec HA Queue Shape
Target setup:
- etcd:
3nodes. - Each etcd node:
2C / 4G, modeled asc6id.large. - Kafka cluster:
3brokers. - Each Kafka broker:
8C / 16G, modeled asc6id.2xlarge. - StateVec:
3nodes. - Each StateVec node target:
4C / 16G. - Each StateVec node local SSD target:
400 GB, modeled asc6id.2xlarge.
The StateVec runtime target is still 4C / 16G; c6id.2xlarge is used here because local instance SSD capacity is tied to the EC2 instance type. It provides 474 GB local NVMe, which satisfies the 400 GB local SSD assumption.
Node count:
+----------------------+----------+-------------------------+--------------+
| Component | Count | Modeled size | Instance |
+----------------------+----------+-------------------------+--------------+
| etcd nodes | 3 | 2C / 4G | c6id.large |
| Kafka brokers | 3 | 8C / 16G + 400 GB SSD | c6id.2xlarge |
| StateVec nodes | 3 | 4C / 16G + 400 GB SSD | c6id.2xlarge |
+----------------------+----------+-------------------------+--------------+Monthly EC2 instance cost:
| Component | Count | Monthly / node | Monthly total |
|---|---|---|---|
| etcd nodes | 3 | $73.58 | $220.75 |
| Kafka brokers | 3 | $294.34 | $883.01 |
| StateVec nodes | 3 | $294.34 | $883.01 |
| Total | 9 | $1,986.77 / month |
Annualized:
$1,986.77 * 12 = $23,841.22 / yearEC2 Instance Cost Comparison
| Architecture | EC2 nodes | Monthly EC2 | Annual EC2 |
|---|---|---|---|
| Raft 16 shards + Kafka | 51 | $9,198.00 | $110,376.00 |
| StateVec + etcd + Kafka | 9 | $1,986.77 | $23,841.22 |
Difference:
Monthly delta = $9,198.00 - $1,986.77 = $7,211.23
Annual delta = $110,376.00 - $23,841.22 = $86,534.78
Cost ratio = $9,198.00 / $1,986.77 = 4.63xUnder these assumptions, the StateVec setup has about 4.6x lower EC2 instance spend than the 16-shard Raft topology, even after modeling local instance SSD.
Why the Shape Changes
The cost difference is mostly structural.
The Raft layout multiplies the state engine by shard:
16 shards * 3 replicas = 48 state-engine nodesThat gives every shard its own replicated execution group. It can work, but the cost grows with shard count. Operational complexity also grows with shard count: placement, balancing, hot shards, shard-level recovery, and cross-shard business workflows become part of the system design.
The StateVec layout keeps the critical state engine as a smaller replicated runtime group:
3 StateVec nodes + shared Kafka + etcd coordinationKafka and etcd still exist, but the business execution core is not multiplied by 16. That is where the EC2 reduction comes from.
Reading This Correctly
This document does not say StateVec always costs $1,986.77/month, or that a Raft system always costs $9,198.00/month.
It says that for the specific setups above:
- Raft cost scales with
shards * replicas. - StateVec cost scales with a smaller replicated execution group plus Kafka and etcd.
- Local instance SSD changes the EC2 instance type, but it does not change the main cost driver.
Use this as a starting point for a workload-specific AWS estimate, not as a universal price quote.
Source Notes
The instance storage sizes are based on AWS EC2 instance type specifications for m6id and c6id local NVMe families. The hourly prices are public On-Demand price snapshots for Linux instances in us-east-1; check the AWS calculator before using this for procurement.