Network

Imbalanced Network, RAM, or CPU, Activity

When viewing resource usage for a cluster, it’s common to see different usage profiles on different hosts. This is entirely normal. Due to the underlying nature of the distributed graph, there may be queries that need to read from, and/or write to, specific members in the cluster.

Historical Query Latency

In a clustered environment, some queries may need to traverse multiple members in order to satisfy a given request. When multiple machines are involved, there is always a chance of clock skew across servers. When this happens, in the worst case, thatDot Streaming Graph will delay the response by the amount equal to the time deviation between systems to ensure correctness.

Member Outage

In the event that a cluster member actively participating in the graph becomes unavailable, some or all graph processing will pause. If the cluster member resumes operation, the graph processing will resume. However, if the cluster removes the unresponsive member, a hot spare will be promoted into service to take the place of the compromised cluster member. If the unavailable member resumes operation at this point, it will shut itself down and can be safely restarted to participate in the cluster again (as hot spare or otherwise as needed). If there are more outages than hot spares, then the cluster will be effectively “down” until replacements have been provisioned for all cluster members.