Interpret Results

Response Payload

The response payload returned from the system has the following form:

{
  "observation": [
   "my",
   "sample",
   "observation"
  ],
  "score": 0.36231689108923804,
  "totalObsScore": 0.36231689108923804,
  "sequence": 3,
  "probability": 0.666666666666666,
  "uniqueness": 0.9943363088569088,
  "infoContent": 0.5849625007211563,
  "mostNovelComponent": {
     "index": 2,
     "value": "observation",
     "novelty": 0.5849625007211563
  }
}

Probabilistic Graphical Models

The fields returned have the following meaning:

  • observation – This is the same value passed in to produce the output. It is returned here only for reference.
  • score – The score is the total calculation of how novel the particular observation is. The value is always between 0 and 1, where zero is entirely normal and not-anomalous, and one is highly novel and clearly anomalous. The score is the result of a complex analysis of the observation and other contextual data. In contrast to the next field, this score is weighted primarily by the novelty of individual components of the observation. Depending on the dataset and corresponding observation structure (see Step 2), real-world datasets will often see this score weighted with exponentially fewer results at higher scores. Practically, this often means that 0.99 is a reasonable threshold for finding only the most anomalous results; and 0.999 is likely to return half as many results. But to reiterate, the actual values and results will depend on the data and observation structure.
  • totalObsScore – While the score field is biased toward novel components the totalObsScore field is a similar computation applied to all components of the entire observation. One of the practical uses of this field is when using thatDot Novelty Detector for finding “anti-anomalies”: data which is very typical.
  • sequence – Each observation passed into thatDot Novelty Detector is given a unique sequence number. This value represents a total order for all observations and can be used to explore the data visualization as it was at the time when this observation was observed.
  • probability – This field represents the probability of seeing this entire observation (exactly) given all previous data when the observation was made.
  • uniqueness – A value between 0 and 1 which indicates how unique this entire observation is, given all previously observed data. A value of 1 means that this observation has never been seen before (in its entirety). Values approaching 0 indicate that this observation is incredibly common.
  • infoContent – The “Information Content”, “Shannon Information”, or “self-information” contained in this entire observation, given all prior observations. This value is measured in bits, and is an answer to the question: On average, how many “yes/no” questions would I need to ask to identify this observation, given this and all previous observations made to the system.
  • mostNovelComponent – An object describing which component of the observation was the most novel.
  • index – Which component in the list from the observation field was the most novel. This value is the index into that list, and is zero-indexed.
  • value – The string from the observation field which is the most novel component. This is the value you would find by extracting the component at position index from the observation array.
  • novelty – An abstract measure of how novel this one particular (most novel) component is. The maximum theoretical value of this field is equivalent to the value in the infoContent field. This field is not directly a measure of information content, however. Instead it is weighted by many additional factors. The ratio of novelty over infoContent will always be between 0 and 1 and will explain how much of the total infoContent is attributable to this particular component.

Full documentation for the payload values is also included in the interactive API documentation built in to each instance of thatDot Novelty Detector.

Conditioning the System Instead of Training

thatDot Novelty Detector requires no labeled training data, as it is an unsupervised process. The system will produce scored result immediately with the very first observation passed in. The first results will not be very useful, however! The system will adapt its scoring to the data it has seen so far in that particular novelty context. Before the system has seen a representative sample of your data, the scores won’t have much to go on. So thatDot recommends ignoring the first result while the system is still learning a representative sample of your data. ‍ There is no universal guidance possible for how much data to ignore, since this depends on the dataset it self and the user’s choice of observation ordering (in Step 2). In practice, we find that many users have a good intuition for how much data is representative—but if not, a reasonable first estimate would be a few thousand observations. We have provided free usage tiers so that users can experiment with enough data to see useful results.

Our professional services team is available for engagements which require deeper analysis or collaboration on specific customer datasets and use cases.

Exploring the Data

thatDot Novelty Detector includes a web-based exploration UI meant to build an intuition for what the data passed in to any particular context is like. By default, the exploration UI is hosted at: http://<ip-or-domain-name>:8080/ The exploration UI exposes a simplified interactive visualization of the underlying model built from the observations passed in.