Cluster Sizing¶

Sizing Principles¶

Quine Enterprise is a backpressured system: it slows down instead of crashing when a component is overwhelmed. An undersized cluster will not lose data, but it will fall behind. The goal of sizing is to provision enough capacity for the cluster to process data at the rate your workload demands, with headroom for peaks.

Three factors drive cluster size:

Throughput requirement: the sustained and peak event rate you need to process.
Query complexity: the cost of your ingest queries, standing queries, and their outputs.
Data connectivity: how connected your graph is. Highly connected data generates more cross-host messaging and limits horizontal scaling gains. See Data Locality for techniques to mitigate this.

Because these factors interact, there is no universal formula. The guidelines below help you choose a starting size; from there, tune based on observed metrics.

Per-Host Resources¶

Quine Enterprise Hosts¶

Each Quine Enterprise host should have between 8 and 32 CPU cores. 16-core hosts are recommended for most workloads. Smaller hosts tend to be more cost-efficient per core but may produce less-stable ingest rates. Larger hosts offer more throughput per member but often with diminishing returns beyond 32 cores.

For memory, 24 GB RAM per host is typically sufficient. The JVM heap should be set to a fixed size of 12 GB (recommended) to 16 GB (maximum) using -Xms and -Xmx set to the same value. Heap sizes above 16 GB risk long garbage collection pauses and are not recommended. The remaining memory accommodates off-heap usage (network buffers, memory-mapped files) and the operating system.

Tip

Set -Xms and -Xmx to the same value to prevent the JVM from dynamically resizing the heap. Resizing triggers a full GC, which can cause a lengthy pause.

Cassandra Hosts¶

Cassandra hosts have different requirements than Quine Enterprise hosts:

CPU: 16+ cores recommended.
Memory: 32+ GB RAM. High-memory instances are recommended, especially with fewer Cassandra nodes.
Cassandra heap: Follow the Apache Cassandra documentation on heap sizing. Oversized heaps cause longer GC pauses, which Quine Enterprise experiences as persistor latency spikes. Memory beyond the heap is used by Cassandra's off-heap caches and the OS page cache, both of which improve read performance.
Storage: Local SSDs. Cassandra is I/O-intensive, and network-attached storage adds latency that directly impacts Quine Enterprise throughput. See Cassandra Setup for configuration details.

See the Recommended Operating Environment for complete hardware and OS specifications.

Choosing Your Cluster Size¶

Quine Enterprise Hosts¶

The minimum production cluster is 3 Quine Enterprise hosts. A cluster with fewer than 3 members cannot reliably resolve a network partition, because the split-brain resolver requires a majority to determine which side of a partition survives.

Beyond the minimum, horizontal scaling generally improves throughput, but the gains depend on how much cross-host messaging the workload generates. Workloads with good data locality scale nearly linearly, while highly connected graphs or workloads that frequently query across cluster members see diminishing returns.

A single Quine Enterprise host typically sustains between 2,000 and 10,000 events/sec, depending on query complexity and data connectivity. Most production use cases fall in the 2,000 to 5,000 events/sec range per host. Trivial workloads can reach up to 13,000 events/sec, but this is uncommon. To estimate a starting cluster size, divide your required sustained throughput by a conservative per-host rate for your workload. For example, if you need 15,000 events/sec and expect each host to handle around 4,000 events/sec, start with 4 hosts.

In addition to positioned members, provision at least 1 hot spare for any production deployment. See Hot Spares for guidance on how many spares to provision based on your failure scenarios.

Cassandra Hosts¶

The minimum production Cassandra cluster is 3 nodes, regardless of how small the Quine Enterprise cluster is. Three nodes are required to support replication factor 3 for data durability. They also provide resilience against garbage collection pauses. With 3 nodes and LOCAL_QUORUM consistency, reads and writes continue through the remaining 2 nodes while the third is paused.

A common starting ratio is 4:1 (one Cassandra node for every four Quine Enterprise hosts), with a floor of 3. However, the right ratio depends on your workload and how you provision the Cassandra nodes. Some deployments use fewer, larger Cassandra nodes to minimize operational overhead. Others use more, smaller nodes to spread I/O across more disks and improve resilience. A larger Cassandra cluster tolerates individual node failures with less impact on overall read and write throughput.

When deciding how many Cassandra nodes to provision, consider:

Write volume. Workloads with high write amplification (many node properties or edges updated per ingested event) put more pressure on the persistence layer and benefit from more Cassandra nodes.
Read latency sensitivity. More Cassandra nodes distribute read requests across more machines, reducing per-node load and tail latencies.
Disk sizing tradeoffs. Fewer large nodes concentrate data on fewer disks, which can become an I/O bottleneck during compaction. More smaller nodes spread compaction work and reduce its impact on request latency.

If the persistor is consistently the bottleneck (see Reading the Metrics), add Cassandra nodes or increase their resources.

Cassandra Operational Considerations¶

These practices help maintain Cassandra performance as data volume grows:

Use TTLs to bound storage growth. Configure Cassandra's time-to-live settings to expire data that is no longer needed. Without TTLs, storage grows without bound, eventually degrading compaction performance and increasing read latencies.
Monitor compaction. Cassandra periodically merges SSTables in a process called compaction. Under heavy write loads, compaction can fall behind, causing read latency to increase as more SSTables must be scanned. If pending compaction tasks are consistently growing, the Cassandra cluster needs more I/O capacity (faster disks or additional nodes).
Use LOCAL_QUORUM consistency for both reads and writes in production. This ensures data survives the loss of a single node while keeping latencies reasonable.

Evaluating Your Cluster Size¶

Whether you are sizing a new deployment or evaluating an existing one, the process starts with measurement. Before making any sizing decisions, set up metrics collection and dashboards. See Metrics Quick Start to get started, Collected Metrics for the full reference, and Recommended Alerts to alert on the signals below.

What to Measure¶

Collect the following metrics over a sustained period that includes your peak traffic window:

Metric	What It Tells You
Per-host CPU utilization (external monitoring: OS tools, Grafana, or cloud provider metrics)	How much compute capacity each host is using. High CPU is normal under load; consistently low CPU across all hosts suggests over-provisioning.
Ingest rate: `ingest.{name}.count` per stream, summed across all hosts	Your actual throughput. Compare against the rate your sources are producing. For Kafka sources, check consumer group lag to see whether offsets are falling behind.
Standing query backpressure: `shared.valve.ingest`	Non-zero means standing query result processing cannot keep up and is forcing ingest to slow down.
Persistor latency: `persistor.{query}` avg and p95	How fast the persistence layer is responding. High average latency indicates a general Cassandra bottleneck. High p95 with normal average suggests occasional slow operations, often supernodes or compaction.
Cassandra compaction backlog: `nodetool compactionstats`	Whether Cassandra writes are outpacing compaction. A growing backlog increases read latency.

Reading the Metrics¶

The metrics will point to one of several situations. Most do not require changing the Quine Enterprise cluster size:

persistor.{query} latencies are high: The persistence layer is the bottleneck. Tune persistor configuration (disable journals, use singleton snapshots), optimize Cypher queries that cause unnecessary writes, or add Cassandra resources. See Cluster Performance and Diagnosing Bottlenecks.
shared.valve.ingest is active: Standing query processing is the bottleneck. Optimize standing query output queries, refactor MultipleValues queries to use DistinctId where possible, and check for queries that traverse supernodes. See Diagnosing Bottlenecks.
CPU is low with no other bottleneck identified: The cluster may not be fully utilizing its resources. Increase ingest stream parallelism. This is the most common fix. If running multiple ingest streams on the same host, divide the total parallelism across all streams, as ingests do not backpressure each other and setting parallelism too high across all streams can cause timeouts.
CPU is balanced and high, no persistor or standing query bottleneck: The cluster is well-tuned but needs more compute. Scale vertically first (up to 32 cores per host), then horizontally by adding hosts.
CPU is consistently low, no ingest lag, no backpressure, persistor latencies well under limits: The cluster is likely over-provisioned.

If you run multiple ingest streams at different volumes, evaluate each stream individually. A high-volume stream that is falling behind can be masked by aggregate metrics that include idle capacity from low-volume streams. Check ingest.{name}.count and consumer lag per stream. The highest-volume stream is typically what drives your sizing requirement.

After applying a change, measure again during peak traffic. Each round of tuning may resolve one bottleneck and reveal another. Continue until the cluster either meets your throughput requirement or you have confirmed that the cluster size itself needs to change.

Estimating a Target Size¶

Once you have confirmed that the cluster is over-provisioned or under-provisioned (not just misconfigured), use the CPU measurements from peak traffic to estimate a target host count:

Take the average CPU utilization across all Quine Enterprise hosts during peak traffic. For example, 25 hosts averaging 30% CPU at peak.
Multiply to get your effective utilization: 25 hosts × 0.30 = 7.5 host-equivalents of work.
Divide by a target CPU utilization of around 80%, which leaves headroom for traffic spikes: 7.5 ÷ 0.80 ≈ 10 hosts.
Apply the minimum floor of 3 and round up.

The same approach works in reverse. If CPU is saturated across all hosts with no other bottleneck, estimate how many hosts would bring utilization down to around 80%: multiply current hosts by current peak CPU, then divide by 0.80.

This estimate is approximate. Changing the host count redistributes data and changes cross-host messaging patterns, so actual CPU at the new size will differ from the prediction. Treat it as a starting point, not a final answer.

Validating and Resizing¶

If you have a staging environment, deploy the candidate cluster size with the same configuration and replay production data at peak rate. Measure and tune at the new size before committing to the change.
If you do not have a staging environment, plan the resize during a maintenance window and monitor closely after the change. Be prepared to resize again if the cluster does not handle peak traffic as expected.

Resizing a Quine Enterprise cluster requires stopping all members and restarting with the new target-size. Persisted data and standing queries are preserved, but ingests must be deleted and recreated because they are defined per-member. See the Cluster Scaling procedure for details.

Warning

Do not resize based on average load. Always measure during peak traffic.

Benchmarking Tips¶

Stabilize before measuring. Ingest rates are reported as exponentially weighted moving averages, which are volatile at the start of a stream. Allow at least 10 minutes for rates to stabilize.
Isolate ingest overhead. To measure the maximum rate the cluster can consume from a source, duplicate your ingest definition but set the format to Drop. This acknowledges and discards each record without parsing or executing the ingest query. Comparing the Drop rate to the full ingest rate shows how much overhead your parsing and query logic add.