Streaming Systems¶
A streaming system processes events -- data -- that is continuously generated, often in high volumes and at high velocity in real time. A streaming event source typically consists of a continuous stream of timestamped logs that record events as they happen – such as a user logging in via an identity management system, or a web server logging page requests from an online application.
Architecturally, Novelty fits into a streaming event pipeline in between, or consuming from and writing back out to, streaming event processor like Kafka or Kinesis. Novelty can detect complex anomalies in the event stream.
Streaming vs Batch Processing¶
Event driven applications need to process events quickly, often within the time window of a single transaction. Novelty allows you to analyze a stream of events in real time, detecting anomalies and scoring observations based on how novel they are, and route results to external systems when significant deviations are found.
Historically, event analysis is done in batches, and by definition, batch processing does not provide real-time or near real-time results. A batch system analyses large groups of events well after the events occurred, typically with a goal to produce metrics, trends, or data for AI models. The insights discovered by batch processing detail "what did happen" in an event stream but they are not capable of describing "what is happening" while it's still happening.
Detecting anomalies in categorical data in real-time has significant implications for cyber security, fraud detection, observability, logistics, e-commerce, and any use case that must process high velocity data in real time.
Properties of a streaming system:
- Runs queries and algoritms in real-time
- Treats stream as infinite -- no beginning or end
- Detects anomalies
- Engineered to process high volumes of data and, often, for fault tolerance
Deployment Patterns¶
Novelty is designed to be deployed within an event streaming data pipeline.
-
A single Novelty instance, properly configured, can handle an infinite stream of data.
-
Novelty currently supports RocksDB (default), Cassandra, and MapDB for persisting ingested event data. Of course you need a storage plan with enough capacity to hold the amount of data your application needs. Cassandra allows for one Novelty host and many Cassandra systems whereas RocksDB and MapDB persistors are available to use as local storage.
-
Novelty ingest is backpressured which makes it very stable in a high-volume event stream.
-
Observation output results are also delivered in a backpressured stream.
-
Novelty analyzes streaming data to detect anomalies and score observations based on how novel they are compared to previously seen data. Results can be routed to external systems like Kafka, Kinesis, or webhooks.
Typical Novelty deployments:
- Consume events from a source (e.g. Kafka or Kinesis) to detect anomalies and score observations and take action.
- Aggregate event streams from different source types and produce a new event stream.
- Ingest from a source, transform the events (for example, parameter isolation, reduction, or enrichment) and produce a new version of the event.
Apache Kafka¶
Apache Kafka is a distributed streaming platform that is used to build real-time streaming data pipelines and applications.
Ingesting from a Kafka topic¶
Ingesting from a Kafka topic as an event source is done by configuring a novelty stream using the KafkaInput type. Novelty supports ingest of raw bytes or JSON data that is then transformed into observations.
More information is contained in the Create Novelty Stream: POST /api/v2/ingests API documentation.
noveltyContext: my-context
transformation: my-transformation
inputStream:
type: KafkaInput
topics:
- test-topic
bootstrapServers: localhost:9092
groupId: novelty-group
Output to a Kafka topic¶
When Novelty is positioned upstream in a data pipeline as an event source for Kafka, Novelty publishes observation results to a topic configured in the KafkaOutput output stream. Records are serialized as JSON before being published to Kafka. See the Create Novelty Stream: POST /api/v2/ingests API documentation for details.
noveltyContext: my-context
transformation: my-transformation
inputStream:
type: KafkaInput
topics:
- observations
bootstrapServers: localhost:9092
groupId: novelty-group
outputStream:
type: KafkaOutput
topic: observation-results
bootstrapServers: localhost:9092
Amazon Kinesis¶
Amazon Kinesis is an Amazon Web Service designed to process large-scale data streams from a multitude of services in real time.
Ingesting from a Kinesis stream¶
Ingesting from a Kinesis stream as an event source is done by configuring a novelty stream using the KinesisInput type. Novelty supports ingest of raw bytes or JSON data that is then transformed into observations.
More information is contained in the Create Novelty Stream: POST /api/v2/ingests API documentation.
noveltyContext: my-context
transformation: my-transformation
inputStream:
type: KinesisInput
streamName: stream-feed
credentials:
region: your_aws_region
accessKeyId: your_access_key_id
secretAccessKey: your_secret
Output to a Kinesis stream¶
Novelty publishes observation results to a stream configured in the KinesisOutput output stream. Records are serialized as JSON before being published to Kinesis. See the Create Novelty Stream: POST /api/v2/ingests API documentation for details.
noveltyContext: my-context
transformation: my-transformation
inputStream:
type: KinesisInput
streamName: observations
credentials:
region: your_aws_region
accessKeyId: your_access_key_id
secretAccessKey: your_secret
outputStream:
type: KinesisOutput
streamName: observation-results
credentials:
region: your_aws_region
accessKeyId: your_access_key_id
secretAccessKey: your_secret