Optimizing Standing Queries
Query optimization involves multiple factors such as cluster configuration, data locality, edge cardinality, and query complexity. This guide provides a starting point for optimizing standing queries in your Streaming Graph deployment.
Examine the Query Plan
The first step in optimizing any query is to examine the query plan generated for it. While the system does not currently provide a direct way to view the query plan for standing queries, you can inspect the standing query state on individual cluster nodes using the debug.node
procedure. This provides insights into how the query is distributed and executed across the cluster.
Example:
To identify inefficiencies in standing query execution, use:
CALL debug.node
Inspecting the output can reveal bottlenecks like high query match rates on specific nodes.
Anchor by ID Where Possible
The system is optimized for queries that match nodes by their IDs. Using a WHERE
clause of the form id(...) = ...
allows the query planner to perform efficient lookups and minimize the traversal overhead.
Example:
Instead of:
MATCH (n {name: "Alice"})
RETURN n
Use:
MATCH (n)
WHERE id(n) = idFrom("Alice")
RETURN n
This approach directly leverages the ID-based query optimization, significantly improving query performance.
Split Complex Standing Queries
Complex standing queries can benefit from being decomposed into smaller, simpler queries. By chaining the outputs of these smaller queries to modify the graph state or trigger subsequent queries, you can model complex workflows as dynamic programming problems.
Example Workflow:
Define a standing query to identify events of interest:
Standing Query
MATCH (n {type: "event"})
RETURN id(n)
Output Query
MATCH (n)
WHERE id(n) IN $results
MERGE (n)-[:PROCESSED]->(:State {status: "handled"})
This strategy reduces computational overhead per query and enhances maintainability.
Leverage Data Locality
To optimize query execution, ensure that related data resides on the same cluster node whenever possible. Use the locIdFrom
function for custom node placement to improve data locality and reduce cross-node communication.
Example:
Distribute nodes by a shared attribute:
MATCH (n)
SET id(n) = locIdFrom(0, n.partitionKey)
Aligning data locality with your query patterns can significantly reduce query latency.
Additional Tips
- Minimize Supernodes: Avoid designs that result in nodes with excessive edges, as they can degrade performance. Use properties instead of edges for enumerable values when possible.
- Monitor Backpressure: Keep an eye on the standing query results queue to ensure that the system processes matches efficiently without overwhelming downstream consumers.
- Iterative Testing: After applying optimizations, validate improvements by measuring query match rates, latency, and system resource utilization.
This guide provides an entry point for standing query optimization. For complex scenarios or deeper insights, contact support or reference the broader Streaming Graph documentation.