Configuration

Configuration is supported by Typesafe Config, enabling multiple ways to pass in options. Most commonly, configuration is provided via either Java system properties (passed as command-line options) or via a HOCON config file. HOCON is a JSON-like format that is very flexible and human-readable. The reference config below is in HOCON format.

# Example of setting configuration via configuration file
java \
  -Dconfig.file=novelty.conf \
  -jar novelty-0.15.0.jar

# Example of overriding configuration via system properties
java \
  -Dthatdot.novelty.webserver.port=9000 \
  -jar novelty-0.15.0.jar

# Example of overriding configuration via environment variables
CONFIG_FORCE_thatdot_novelty_webserver_port=9000 \
java \
  -Dconfig.override_with_env_vars=true \
  -jar novelty-0.15.0.jar

Memory Configuration¶

Novelty caches nodes in memory as necessary. The setting for in-memory-soft-node-limit and in-memory-hard-node-limit along with the shard-count configuration determines how many nodes can be cached at a time, and therefore how much heap space the JVM needs available for processing. The shard count defaults to 4 and does not typically need to be changed. Each shard in a cluster member retains its own cache of in memory nodes. The soft limit (defaulting to 10000) determines the minimum size of this cache, and the hard limit (defaulting to 75000) determines the maximum size of the cache. The cache uses the flexible range between these values for nodes which are being put to sleep while other nodes are being rehydrated into memory (i.e. the difference between these values determines how many nodes can be going to sleep and waking up at the same time). The default settings would enable between 40,000 and 300,000 nodes in memory at a time. Depending on the use case, and the average memory footprint of a node, These values can be adjusted up or down to maximize usage of the memory on the machine and the JVM heap.

Note

The heap space requirement is primarily a function of the soft and hard node limit settings. Beyond that, leaving overhead for the operating system to manage direct memory requests will ensure the performance of the host server.

To avoid Garbage Collection (GC) pauses in the JVM heap, it is recommended that you set the memory allocated to Novelty to a fixed size if possible. You can do this by setting Xms and Xmx to the same value, discouraging the JVM from dynamically resizing the heap. Resizing the heap triggers a full GC, which can mean a lengthy pause (meaning everything in the app is absolutely locked), depending on the size of the heap. Keep in mind that large heap spaces take longer in garbage collection so simply adding as much as possible could negatively affect performance. For this reason, it is not recommended to set the heap size larger than 16GB, and 12GB tends to be a good starting point for large graph implementations with high ingest requirements.

Data Pipeline Security Recommendations¶

Data pipelines in Novelty can be very flexible to almost any use-case, but with that flexibility comes potential for misuse. In order to keep Novelty, its data, and its environment as secure as possible, we recommend you follow these best practices:

Use the latest version of Novelty. We regularly release updates with security fixes.
Deploy Novelty behind a reverse proxy that performs TLS termination, user authentication, and HTTP request logging. This will allow you to control access to your Novelty instance and gives you more information about how and when your graph is being accessed.
Configure your data source according to that data source's best practices. For example, if using a Kafka data source, keep your Kafka servers up to date and use the kafkaProperties field in the Novelty ingest configuration to enforce TLSv1.3 encryption between Novelty and your Kafka cluster.
Configure file ingest security controls to restrict file access to specific directories. Use the file-ingest.allowed-directories setting to whitelist permitted directories and set file-ingest.resolution-mode to "static" in production environments to only allow files present at startup.

Loglevel Configuration¶

Novelty uses the Logback framework for logging. Loglevel is configurable for several loggers. The root.loglevel Java system property will set the log level for all loggers including dependencies. thatdot.loglevel will set the log level only for Novelty logs directly.

Available log levels are ERROR, WARN, INFO, DEBUG, TRACE. See documentation for more details: https://logback.qos.ch/manual/architecture.html#effectiveLevel

Example:

thatdot {
    loglevel = INFO
}

Reference Documentation¶

Uncommented values are the defaults, unless otherwise noted. Unexpected configuration keys or values in the thatdot.novelty block will report an error at startup.

A single underscore _ is used to indicate a required property with no default value. There are none of these in the default configuration.

thatdot {
  novelty {

    # # Novelty license configuration (required)
    # # Note: The license-key must be provided via environment variable or command line
    # license-key = "YOUR_LICENSE_KEY"

    # webserver binding configuration
    webserver {
      # whether the webserver should be enabled
      enabled = true

      # Hostname or address of the interface to which the HTTP server should
      # be bound - 0.0.0.0 means "all interfaces"
      # There are two special values which are interpreted dynamically:
      #   1.) "<getHostAddress>" uses the host IP found at runtime
      #   2.) "<getHostName>" uses the host DNS name found at runtime
      address = "0.0.0.0"

      # port to which the HTTP server should be bound
      # setting to `0` will choose an available port at random.
      port = 8080

      # Whether the webserver should perform TLS termination
      # this is inferred to be no/false by default, unless keystore information is provided
      # via the `SSL_KEYSTORE_PATH` and `SSL_KEYSTORE_PASSWORD` environment variables. If this
      # is set to `yes/true` but the environment variables are not set, standard java system properties
      # such as `-Djavax.net.ssl.keyStore` and `-Djavax.net.ssl.keyStorePassword` may be used to configure
      # the keystore.
      use-tls = no

      # Whether the webserver should require client certificate authentication (mTLS/mutual TLS).
      # This setting only has an effect when `use-tls` is enabled.
      use-mtls {
        # whether mTLS should be enabled (requires use-tls to be enabled)
        enabled = no

        # (optional) path and password for the trust store containing CA certificates
        trust-store = null

        # configuration for separate health endpoint binding
        health-endpoints {
          # whether to enable a separate health endpoint binding
          enabled = no

          # port on which the health endpoints will be bound
          port = 8081
        }
      }
    }

    # (optional) Configuration to use when advertising this server
    # webserver-advertise {
    #   address = "localhost"
    #   port = 8080
    #   path = null
    # }

    # storage backend / "persistor" configuration
    store {
      # store data in a local filesystem using RocksDB
      type = rocks-db

      # base folder in which RocksDB data will be stored
      filepath = "novelty.db"

      # whether to use a separate database per novelty tree
      db-per-tree = yes

      # whether to use a write-ahead log
      write-ahead-log = on

      # whether to force all writes to be fully confirmed to disk
      sync-all-writes = off

      # whether to create any directories in "filepath" that do not yet exist
      create-parent-dir = yes

      # if set, the number of nodes for which to optimize node creation latency
      # bloom-filter-size =
    }
    # store {
    #   # store data in an Apache Cassandra instance
    #   type = cassandra
    #
    #   # "host:port" strings at which Cassandra nodes can be accessed from
    #   # the application
    #   endpoints = [
    #     "localhost:9042"
    #   ]
    #
    #   # the keyspace to use
    #   keyspace = novelty
    #
    #   # whether the application should create the keyspace if it does not
    #   # yet exist
    #   should-create-keyspace = true
    #
    #   # whether the application should create tables in the keyspace if
    #   # they do not yet exist
    #   should-create-tables = true
    #
    #   # how many copies of each datum the Cassandra cluster should retain
    #   replication-factor = 1
    #
    #   # how many hosts must agree on a datum for Novelty to consider that
    #   # datum written/read
    #   write-consistency = LOCAL_QUORUM
    #   read-consistency = LOCAL_QUORUM
    #
    #   # passed through to Cassandra
    #   local-datacenter = "datacenter1"
    #
    #   # how long to wait before considering a write operation failed
    #   write-timeout = "10s"
    #
    #   # how long to wait before considering a read operation failed
    #   read-timeout = "10s"
    #
    #   # whether to use a separate database per novelty tree
    #   db-per-tree = yes
    #
    #   # maximum size in bytes for snapshot parts
    #   snapshot-part-max-size-bytes = 1000000
    #
    #   # if set, the number of nodes for which to optimize node creation latency
    #   # bloom-filter-size =
    # }
    # store {
    #   # store data in a memory-mapped local file using MapDB
    #   type = map-db
    #
    #   # base filename from which MapDB filenames will be created
    #   # filepath = _
    #
    #   # whether to create any directories in "filepath" that don't yet exist
    #   create-parent-dir = yes
    #
    #   # how many files to use
    #   number-partitions = 1
    #
    #   # whether to use a write-ahead log
    #   write-ahead-log = yes
    #
    #   # if write-ahead-log = true, how often to commit the write ahead log
    #   commit-interval = "10s"
    #
    #   # whether to use a separate database per novelty tree
    #   db-per-tree = yes
    #
    #   # if set, the number of nodes for which to optimize node creation latency
    #   # bloom-filter-size =
    # }
    # store {
    #   # do not store any data, only use the temporary node cache
    #   type = empty
    #
    #   # whether to use a separate database per novelty tree
    #   db-per-tree = yes
    # }

    # configuration for which data to save about nodes and when to do so
    persistence {
      # whether to save node journals
      journal-enabled = true

      # one of [on-node-sleep, on-node-update, never]
      snapshot-schedule = on-node-sleep

      # whether only a single snapshot should be retained per-node
      snapshot-singleton = false

      # when to save Standing Query partial results
      standing-query-schedule = on-node-sleep

      # whether effects in-memory occur before or after updates are confirmed
      # persisted to disk. Possible values: memory-first, persistor-first
      effect-order = persistor-first
    }

    # (optional) The number of nodes in a shard's cache before that shard
    # will begin to expire nodes from its cache.
    in-memory-soft-node-limit = 10000

    # (optional) A limit to the total number of nodes in a shard's cache.
    in-memory-hard-node-limit = 75000

    # threshold for high cardinality detection in novelty trees
    high-cardinality-threshold = 10000

    # configuration for statistics collection
    statistics {
      # number of top values to track for sorted statistics
      sorted-count = 250

      # number of samples to collect for statistical analysis
      sample-count = 1000
    }

    # observation reporters for novelty data
    # Each reporter receives observations and forwards them to external systems
    reporters = [
      # {
      #   type = postgresql
      #   host = localhost
      #   port = 5432
      #   ssl = false
      #   username = postgres
      #   password = postgres
      #   database = novelty
      #   schema = public
      # }
      # {
      #   type = influxdb2
      #   url = "http://localhost:8086"
      #   token = "your-token"
      #   org = "your-org"
      #   bucket = "novelty"
      # }
      # {
      #   type = kafka
      #   bootstrap-servers = "localhost:9092"
      #   topic = "novelty-observations"
      # }
    ]

    # where metrics collected by the application should be reported
    metrics-reporters = [
      # {
      #   type = jmx
      # }
      # {
      #   type = csv
      #   period = _
      #   log-directory = _
      # }
      # {
      #   type = influxdb
      #   period = _
      #   database = metrics
      #   scheme = http
      #   host = localhost
      #   port = 8086
      # }
    ]

    # Startup and shutdown timeout for the Novelty Application
    timeout = 2 m

    # the minimum amount of time a node must stay in the cache after
    # being updated
    decline-sleep-when-write-within = 100 ms

    # the minimum amount of time a node must stay in the cache after
    # being accessed
    decline-sleep-when-access-within = 0 ms

    # nodes will wait up to this amount of time before processing messages
    # when at-time is in the future
    max-catch-up-sleep-millis = 2000 ms

    # whether the application should log its current config at startup
    dump-config = no

    # default API version (v1 or v2)
    # Note: v1 is not compatible with authentication
    default-api-version = "v1"

    # Send anonymous telemetry information about Novelty feature usage
    # to help improve the product. No sensitive data is collected.
    help-make-novelty-better = true

    # configuration for the log sanitizer
    log-config {
      # whether to hide potentially sensitive information in logs
      # (Novelty defaults to strict mode with sensitive data hidden)
      show-unsafe = no

      # whether to show exceptions in logs
      # (Novelty defaults to strict mode with exceptions hidden)
      show-exceptions = no

      # the redaction method to use when hiding sensitive information
      redactor {
        type = redact-hide
      }
    }

    # # (optional) authentication and/or authorization options
    # auth {
    #   # Configuration for user sessions
    #   session {
    #     # Secret key to sign session tokens, it should have at least 256 bits of entropy
    #     secret = my-secret
    #     expiration-seconds = 3600
    #     # (default: `true`) whether cookies shall be securely managed, which should always be `true` for production
    #     secure-cookies = true
    #   }
    #   # (optional) OIDC-specific configuration options
    #   oidc {
    #     # (optional) Fully described OIDC configuration
    #     full {
    #       # the configuration of the OIDC provider
    #       provider {
    #         # the "base URL", or location, of the provider
    #         location-url = "https://example.com"
    #         # the provider's authorization URL
    #         authorization-url = "https://example.com/authorize"
    #         # the path segment(s) that, when added to `locationUrl`, specify the OIDC login endpoint
    #         login-path = "/login"
    #         # the provider's token acquisition URL
    #         token-url = "https://example.com/token"
    #       }
    #       # the configuration of this application as an OIDC client
    #       client {
    #         # the client ID that a provider uses to refer to this application
    #         id = "client-id"
    #         # the client secret that this application may use to prove itself to the provider
    #         secret = "client-secret"
    #       }
    #     }
    #   }
    # }

    # # File ingest security configuration
    # file-ingest {
    #   # Allowlist of allowed directories for file ingests
    #   allowed-directories = ["."]
    #
    #   # File resolution mode: "static" or "dynamic"
    #   resolution-mode = "dynamic"
    # }
  }
}