Collected Metrics¶

Upgrading from a previous version?

If you are updating dashboards or alerts after an upgrade, see Upgrading for migration steps and examples.

We expose a large number of JVM and application metrics via the DropWizard Metrics library.

They can be exported by periodically writing as CSV files, logging, to InfluxDB, and/or via JMX. By default only the JMX reporter is enabled. See the comments on the metrics-reporters setting in the Config Ref Manual for how to enable / configure the others - i.e. the part on one of [jmx, csv, influxdb, slf4j]. Some metrics are also exposed in JSON on the HTTP endpoint Metrics: GET /api/v2/system/metrics.

Available Metrics¶

The metrics that we explicitly measure in our code are as follows.

{graph-name}
- shard.shard-{n}
  - sleep-counters: Counters that track the sleep cycle (in aggregate) of nodes on the shard
    - removed
    - slept-failure
    - slept-success
    - woken
  - sleep-timers: Timers that measure the duration of sleep and wake operations on nodes
    - slept
    - woken
  - nodes-evicted: Meter tracking node evictions from memory (only emitted when enableDebugMetrics is set)
  - unlikely: Counters that track occurrences of supposedly unlikely (and generally bad) code paths
    - wake-up-failed: Despite repeated attempts, we cannot wakeup the requested node.
    - wake-up-error: An unexpected error was encountered when attempting to wake up a node; will retry.
    - hard-limit-reached: A node was blocked from being woken up because the hard limit for number of active nodes has been hit; will retry.
    - actor-name-reserved
    - incomplete-shutdown: A shard did not complete shutdown cleanly.
- node: Bucketed counters
  - edge-counts: A counter for the numbers of edges on nodes, split into buckets
    - 1-7
    - 8-127
    - 128-2047
    - 2048-16383
    - 16384-infinity
  - property-counts: A counter for the numbers of properties on nodes, split into buckets
    - 1-7
    - 8-127
    - 128-2047
    - 2048-16383
    - 16384-infinity
  - property-sizes: A histogram of property sizes (in bytes) observed since startup
- ingest.{ingest-name}
  - count: Number of records ingested
  - bytes: Number of bytes ingested (aggregate data payload size)
  - query: Timer measuring the duration of ingest query executions
  - deserialization: Timer measuring the duration of ingest record deserialization
- standing-queries
  - results.{standing-query-name}: Meter of results that were produced for a named standing query on this member
  - dropped.{standing-query-name}: Counter of results that were dropped for a named standing query on this member due to an excess of messages already in-flight when the standing query backpressures. This should be zero.
  - states.{standing-query-id}: Histogram of the size (in bytes) of persistent standing query states.
  - queue-time.{standing-query-name}: Timer measuring how long SQ results spend in the result queue before being accepted for processing
persistor: All are timers, except snapshot-sizes, which is a histogram.
- get-journal: Measures how long it takes to query a node's journal from the persistor
- get-latest-snapshot: Measures how long it takes to retrieve a node's snapshot from the persistor
- persist-event: Measures how long it takes to persist a change to a node's state.
- persist-snapshot: Measures how long it takes to persist a node's snapshot.
- set-standing-query-state: Measures how long it takes to persist standing query state.
- get-standing-query-states: Measures how long it takes to retrieve standing query states.
- snapshot-sizes: A histogram that measures the serialized size (in bytes) of a node's persisted snapshot.
shard.shard-{n}
- delivery-relay-deduplicated: Counter of deduplicated message deliveries on this shard.
shared
- valve.{name}: A gauge representing how many operations are currently pausing an ingest due to backpressuring.
cache
- {context}.insert: Timer tracking insert operations into internal caches (e.g. ingest-XYZ-deduplication, http-webpage-serve).
node
- mailbox-sizes: A counter for the sizes of message mailboxes on nodes, split into buckets
  - 1-7
  - 8-127
  - 128-2047
  - 2048-16383
  - 16384-infinity
dgn-reg
- count: Gauge measuring the number of in-memory registered DomainGraphNodes.
messaging: Timers for cross-host message delivery.
- relayTell: Timer measuring cross-host fire-and-forget message delivery
- relayAsk: Timer measuring cross-host request-response message delivery

Other libraries we use also export metrics via this mechanism - e.g. the Cassandra client reports metrics relating to the usage of the Cassandra server, which can optionally be enabled in your config file: https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/metrics/#enabling-specific-driver-metrics.

All metrics in Quine are also valid in Quine Enterprise!

Quine Enterprise supports multiple named graphs (namespaces). Metrics prefixed with {graph-name} substitute the graph's configured name. Quine stores all data in a single graph named quine, which is also the default graph in Quine Enterprise. This means that every metric name from Quine works unchanged in Quine Enterprise.