Skip to content

Temporal Locality

Full Recipe

Shared by: Michael Aglietti

This recipe looks for emails sent or received by cto@company.com within a sliding window as a means of highlighting a technique for matching on temporal locality of nodes in standing queries.

Temporal Locality Recipe
1

Download Recipe

Scenario

This scenario processes records containing metadata for almost 295,000 emails are ingested for the purpose of identifying emails to/from a specific email address within a sliding 2-minute window.

{
    "from": <sender>,
    "to": [<recipients>],
    "subject": <email subject line>,
    "time": <epoch time timestamp>,
    "sequence": <sequence number>
}

Sample Data

Download the sample data to the same directory as the recipe and where Quine will be run.

Download email.json

How it Works

The recipe reads data from a sample data file using an ingest stream and parses each line into sender, receiver and message nodes to manifest a graph in Quine.

INGEST-1 processes the email.json file:

  - type: FileIngest
    path: email.json
    format:
        type: CypherJson
        query: |-
            MATCH (sender), (message)
            WHERE id(sender) = idFrom('email', $that.from)
            AND id(message) = idFrom('message', $that)

            SET sender.email = $that.from,
                sender: Email,
                message.from = $that.from,
                message.to = $that.to,
                message.subject = $that.subject,
                message.time = datetime({epochMillis: $that.time}),
                message: Message

            CREATE (sender)-[:SENT_MSG]->(message)

            WITH $that AS t, message
            UNWIND t.to AS rcv
            MATCH (receiver)
            WHERE id(receiver) = idFrom('email', rcv)

            SET receiver.email = rcv,
                receiver: Email

            CREATE (message)-[:RECEIVED_MSG]->(receiver)
POST /api/v1/ingest/INGEST-1
{
  "type": "FileIngest",
  "path": "email.json",
  "format": {
    "type": "CypherJson",
    "query":   "MATCH (sender), (message) \\nWHERE id(sender) = idFrom('email', $that.from)\\n AND id(message) = idFrom('message', $that) \\n\\nSET sender.email = $that.from,\\n sender: Email,\\n message.from = $that.from,\\n message.to = $that.to,\\n message.subject = $that.subject,\\n message.time = datetime({ epochMillis: $that.time}),\\n message: Message\\n\\nCREATE (sender)-\[:SENT\_MSG\]->(message)\\n\\nWITH $that as t, message\\nUNWIND t.to AS rcv\\nMATCH (receiver)\\nWHERE id(receiver) = idFrom('email', rcv)\\n\\nSET receiver.email = rcv,\\n receiver: Email\\n\\nCREATE (message)-\[:RECEIVED\_MSG\]->(receiver)"
  }
}

A standing query is configured to detect when emails that are sent or received by cto@company.com are within a two minute sliding window of one another.

The pattern query matches each individual (sender)-[:SENT_MSG]->(message)-[:RECEIVED_MSG]->(receiver) pattern.

- pattern:
    type: Cypher
    mode: MultipleValues
    query: |-
        MATCH (n)-[:SENT_MSG]->(m)-[:RECEIVED_MSG]->(r)
        WHERE n.email="cto@company.com" OR r.email="cto@company.com"
        RETURN id(n) as ctoId, id(m) as ctoMsgId, m.time as mTime, id(r) as recId

Once the pattern query matches the (sender)-[:SENT_MSG]->(message)-[:RECEIVED_MSG]->(receiver) pattern, an event is sent to an output query to calculate the temporal locality using the Cypher duration.between() function to establish a sliding window of interest.

duration("PT6M") > duration.between(m.time,thisMsg.time) > duration("PT4M")

The expression utilizes Cypher-defined temporal data types, based on ISO 8601 format, PT6M and PT4M to represent 6 minutes < duration >4 minutes for the window. We are able to use this because we converted the ingested epoch time formatted timestamp to datetime format in the ingest query.

message.time = datetime({epochMillis: $that.time})

Otherwise, we would have cast the data within the standing query:

AND duration("PT6M") > duration.between(datetime({epochMillis: m.time}),datetime({epochMillis: thisMsg.time})) > duration("PT4M")

Using the former pattern allows us to to express the standing query in a clean, simple manner.

    outputs:
      withinFourToSixMinuteWindow:
        type: CypherQuery
        query: |-
          MATCH (n)-[:SENT_MSG]->(m)-[:RECEIVED_MSG]->(r), (thisMsg)
           WHERE id(n) = $that.data.ctoId
             AND id(r) = $that.data.recId
             AND id(thisMsg) = $that.data.ctoMsgId
             AND id(m) <> id(thisMsg)
             AND duration("PT6M") > duration.between(m.time,thisMsg.time) > duration("PT4M")

           CREATE (m)-[:IN_WINDOW]->(thisMsg)
           CREATE (m)<-[:IN_WINDOW]-(thisMsg)

           WITH n, m, r, "http://localhost:8080/#MATCH" + text.urlencode(' (n)-[:SENT_MSG]->(m)-[:RECEIVED_MSG]->(r) WHERE strId(n)="' + strId(n) + '"AND strId(r)="' + strId(r) + '" AND  strId(m)="' + strId(m) + '" RETURN n, r, m') as URL

           RETURN URL
        andThen:
          type: PrintToStandardOut

When a complete pattern match is detected the recipe outputs a link to the console for an analyst to use when reviewing the event further inside Quine's Exploration UI.

2023-02-23 12:04:24,981 Standing query `withinFourToSixMinuteWindow` match: {"meta":{"isPositiveMatch":true,"resultId":"628d6523-4d25-4cb4-aed2-9890ecf0cb9c"},"data":{"URL":"http://localhost:8080/#MATCH%20%28n%29-%5B%3ASENT_MSG%5D-%3E%28m%29-%5B%3ARECEIVED_MSG%5D-%3E%28r%29%20WHERE%20strId%28n%29%3D%22a69876bc-8876-37e9-b776-471507a080b9%22AND%20strId%28r%29%3D%2299eeafc4-3178-3aca-8c7c-f84d0781f3a1%22%20AND%20%20strId%28m%29%3D%229e66e020-45ad-3971-92ec-60be8fa844e7%22%20RETURN%20n%2C%20r%2C%20m"}}

Running the Recipe

 java -jar quine-1.8.2.jar -r duration.yaml
Graph is ready
Running Recipe: Temporal Locality Example
Using 3 node appearances
Using 5 quick queries
Running Standing Query STANDING-1
Running Ingest Stream INGEST-1
Quine web server available at http://localhost:8080

Almost immediately positive matches will begin to stream to your console window.

2023-02-23 14:12:05,165 Standing query `withinFourToSixMinuteWindow` match: {"meta":{"isPositiveMatch":true,"resultId":"52c981a4-394f-48c3-8419-671fe5f0c1d5"},"data":{"URL":"http://localhost:8080/#MATCH%20%28n%29-%5B%3ASENT_MSG%5D-%3E%28m%29-%5B%3ARECEIVED_MSG%5D-%3E%28r%29%20WHERE%20strId%28n%29%3D%22a69876bc-8876-37e9-b776-471507a080b9%22AND%20strId%28r%29%3D%22c57f8f3b-b567-3b7e-b2a2-c516f030ce2f%22%20AND%20%20strId%28m%29%3D%221c0bc726-4a19-363d-a623-f5df12142a72%22%20RETURN%20n%2C%20r%2C%20m"}}

Copy and paste the URL into your browser (or ++"cmd+click"++ on a Mac) to open the Exploration UI and display an email event.

email event

Right click on one of the email messages and select [Node] Messages in Window to find other messages that occured within the time window.

messages in window

Continue to right click and run queries to discover other messages that we sent but not did not occur within the time window.

other email messages

Summary

This simple recipe shows how the duration.between() temporal function is able to implement a sliding window for matching events that happen close to gether in time.

Tip

Quick Queries are available by right clicking on a node.

Quick Query Node Type Description
Adjacent Nodes All Display the nodes that are adjacent to this node.
Refresh All Refresh the content stored in a node
Local Properties All Display the properties stored by the node
Messages in Window Messages Display all message nodes within the sliding window
Table of Messages in Window Messages Return a list of messages and time deltas