User-Defined Functions and Procedures

User-defined functions and user-defined procedures (UDF/UDP) are a way to extend the functionality of thatDot Streaming Graph with custom logic particularly relevant to specific users and use cases. UDFs and UDPs can enable you to reuse code that has already been written, by calling it directly within thatDot Streaming Graph and not requiring an outside service. UDFs can also be used to simplify the queries written in the system or to otherwise streamline the processing of data for specific applications.

User-defined functions and procedures can be written in any JVM-compatible language. This documentation focuses on UDFs/UDPs written in Scala or Java, loaded into the system via the /api/v1/query/cypher/user-defined REST endpoint, and then available for use in Cypher queries.

  • User-defined functions (UDFs) are functions that take any number of arguments and are expected to produce a single value as their output. They are pure functions that cause no changes to the data stored in the graph. Results are produced synchronously.

  • User-defined procedures (UDPs) are similar to functions in that they take arguments and produce results, but UDPs produce a back-pressured stream of rows, each of which may contain multiple values. Since the results are in a stream, it is possible to do asynchronous computation (producing an output row only on completion) or to return many results.

Because the UDF and UDP interfaces both run within the Quine engine’s JVM, care must be taken to implement these functions in a way that does not unduly compete with the graph’s own resource requirements. We recommend that you use these interfaces sparingly and only when necessary.

Steps to Create UDFs/UDPs

Creating user-defined functions and procedures requires:

  • writing code for your custom procedure in a JVM compatible language
  • compiling that code (with the thatDot Streaming Graph JAR available as a dependency)
  • packaging the output into a JAR
  • copy the that JAR file to all of the cluster members
  • loading the code in the JAR by calling the REST API endpoint

Example: Defining a math.factorial UDF

Here is what the code for defining a math.factorial UDF looks like in Scala and in Java. We assume that the snippets are compiled with the full thatDot Streaming Graph JAR available (since otherwise they won’t compile due to missing types). Some important requirements:

  • The UDF is defined as a class which has a public no-argument constructor
  • The UDF class is annotated with com.thatdot.quine.graph.cypher.CypherUDF
  • The UDF class extends com.thatdot.quine.graph.cypher.UserDefinedFunction or its subclass com.thatdot.quine.graph.cypher.JavaUserDefinedFunction
Scala
package com.thatdot.quine.graph.cypher

import com.thatdot.quine.model.QuineIdProvider
import com.thatdot.quine.util.Log._

@CypherUDF
final class Factorial extends UserDefinedFunction {

  // Determines what the UDF is called when used in Cypher
  val name = "math.factorial.scala"

  // Set to `true` only if the function will produce the same output given the same inputs
  val isPure = true

  // Categorical classification of the function
  val category = "Numeric"

  // Used to filter out obviously incorrect uses of the UDF at query compilation
  val signatures: Vector[UserDefinedFunctionSignature] = Vector(
    UserDefinedFunctionSignature(
      arguments = Vector("input" -> Type.Integer),
      output = Type.Integer,
      description = "Returns the factorial of a number",
    ),
  )

  // Gets called every time the UDF is called
  def call(args: Vector[Value])(implicit idProvider: QuineIdProvider, logConfig: LogConfig): Value =
    args match {
      case Vector(Expr.Integer(n)) if n < 0L => Expr.Null
      case Vector(Expr.Integer(n)) =>
        // calculate factorial
        var acc: Long = 1L
        for (i <- 1L to n)
          acc *= i
        Expr.Integer(acc)
      case _ => throw wrongSignature(args)
    }
}
Java
package com.thatdot.quine.graph.cypher;

import com.thatdot.quine.model.QuineIdProvider;
import java.util.*;

@CypherUDF
public final class JavaFactorial extends JavaUserDefinedFunction {

    // Determines what the UDF is called when used in Cypher
    private static String name = "math.factorial";

    // Used to filter out obviously incorrect uses of the UDF at query compilation
    private static UserDefinedFunctionSignature signature =
        UserDefinedFunctionSignature.create(
            Arrays.asList(new Argument("input", Type.integer())),
            Type.integer(),
            "Returns the factorial of a number"
        );

    // Gets called every time the UDF is called
    @Override
    public Value call(
        List<Value> args,
        QuineIdProvider idProvider
    ) throws CypherException {
        if (args.size() != 1 || !(args.get(0) instanceof Expr.Integer)) {
            throw CypherException.wrongSignature(
                name,
                Arrays.asList(Type.integer()),
                args
            );
        }

        long n = ((Expr.Integer) args.get(0)).getLong();
        if (n < 0L) return Expr.nullValue();

        // calculate factorial
        long acc = 1L;
        for (long i = 1L; i <= n; i += 1L) {
          acc *= i;
        }

        return new Expr.Integer(acc);
    }

    public JavaFactorial() {
        super(name, Arrays.asList(signature));
    }

    @Override
    public boolean isPure() {
        return true;
    }

    @Override
    public String category() {
        return "Numeric";
    }
}

In order to extend UserDefinedFunction, it is necessary to implement several members:

  • the name specifies how the UDF will be called in Cypher

  • the call method defines what it means to call the UDF, taking in as an argument the internal representation of a Cypher value and producing another Cypher value as output

  • the signatures field specifies the function signature(s) of the UDF (used for ruling out obviously ill-typed calls at query compilation time and producing helpful errors)

Assuming the above has been compiled & packaged into cypher-factorial.jar and the JAR is copied to the servers beside the thatDot Streaming Graph JAR, the following REST API call is enough to load the UDF into the system.

curl -X POST "http://localhost:8080/api/v1/query/cypher/user-defined" \
     -H  "accept: */*" -H  "Content-Type: application/json" \
     -d '["cypher-factorial.jar"]'

The math.factorial function can now be used from Cypher:

curl -X POST "http://localhost:8080/api/v1/query/cypher" \
     -H  "accept: application/json" -H  "Content-Type: text/plain" \
     -d "RETURN math.factorial(5)"

Executing this command uses our new user-defined function and returns:

{"columns":["math.factorial(5)"],"results":[[120]]}