Collected Metrics¶
Upgrading from a previous version?
If you are updating dashboards or alerts after an upgrade, see Upgrading for migration steps and examples.
We expose a large number of JVM and application metrics via the DropWizard Metrics library.
They can be exported by periodically writing as CSV files, logging, to InfluxDB, and/or via JMX. By default only the JMX reporter is enabled. See the comments on the metrics-reporters setting in the Config Ref Manual for how to enable / configure the others - i.e. the part on one of [jmx, csv, influxdb, slf4j]. Some metrics are also exposed in JSON on the HTTP endpoint Metrics: GET /api/v2/system/metrics.
Available Metrics¶
The metrics that we explicitly measure in our code are as follows.
- {model-name}
- shard.shard-{n}
- sleep-counters: Counters that track the sleep cycle (in aggregate) of nodes on the shard
- removed
- slept-failure
- slept-success
- woken
- sleep-timers: Timers that measure the duration of sleep and wake operations on nodes
- slept
- woken
- nodes-evicted: Meter tracking node evictions from memory (only emitted when
enableDebugMetricsis set) - unlikely: Counters that track occurrences of supposedly unlikely (and generally bad) code paths
- wake-up-failed: Despite repeated attempts, we cannot wakeup the requested node.
- wake-up-error: An unexpected error was encountered when attempting to wake up a node; will retry.
- hard-limit-reached: A node was blocked from being woken up because the hard limit for number of active nodes has been hit; will retry.
- actor-name-reserved
- incomplete-shutdown: A shard did not complete shutdown cleanly.
- sleep-counters: Counters that track the sleep cycle (in aggregate) of nodes on the shard
- node: Bucketed counters
- edge-counts: A counter for the numbers of edges on nodes, split into buckets
- 1-7
- 8-127
- 128-2047
- 2048-16383
- 16384-infinity
- property-counts: A counter for the numbers of properties on nodes, split into buckets
- 1-7
- 8-127
- 128-2047
- 2048-16383
- 16384-infinity
- property-sizes: A histogram of property sizes (in bytes) observed since startup
- edge-counts: A counter for the numbers of edges on nodes, split into buckets
- ingest.{ingest-name}
- count: Number of records ingested
- bytes: Number of bytes ingested (aggregate data payload size)
- query: Timer measuring the duration of ingest query executions
- deserialization: Timer measuring the duration of ingest record deserialization
- shard.shard-{n}
- persistor: All are timers, except snapshot-sizes, which is a histogram.
- get-journal: Measures how long it takes to query a node's journal from the persistor
- get-latest-snapshot: Measures how long it takes to retrieve a node's snapshot from the persistor
- persist-event: Measures how long it takes to persist a change to a node's state.
- persist-snapshot: Measures how long it takes to persist a node's snapshot.
- snapshot-sizes: A histogram that measures the serialized size (in bytes) of a node's persisted snapshot.
- shard.shard-{n}
- delivery-relay-deduplicated: Counter of deduplicated message deliveries on this shard.
- shared
- valve.{name}: A gauge representing how many operations are currently pausing an ingest due to backpressuring.
- cache
- {context}.insert: Timer tracking insert operations into internal caches (e.g.
ingest-XYZ-deduplication,http-webpage-serve).
- {context}.insert: Timer tracking insert operations into internal caches (e.g.
- node
- mailbox-sizes: A counter for the sizes of message mailboxes on nodes, split into buckets
- 1-7
- 8-127
- 128-2047
- 2048-16383
- 16384-infinity
- mailbox-sizes: A counter for the sizes of message mailboxes on nodes, split into buckets
- dgn-reg
- count: Gauge measuring the number of in-memory registered DomainGraphNodes.
Other libraries we use also export metrics via this mechanism - e.g. the Cassandra client reports metrics relating to the usage of the Cassandra server, which can optionally be enabled in your config file: https://docs.datastax.com/en/developer/java-driver/4.17/manual/core/metrics/#enabling-specific-driver-metrics.