User-Defined Functions and Procedures
User-defined functions and user-defined procedures (UDF/UDP) are a way to extend the functionality of thatDot Streaming Graph with custom logic particularly relevant to specific users and use cases. UDFs and UDPs can enable you to reuse code that has already been written, by calling it directly within thatDot Streaming Graph and not requiring an outside service. UDFs can also be used to simplify the queries written in the system or to otherwise streamline the processing of data for specific applications.
User-defined functions and procedures can be written in any JVM-compatible language. This documentation focuses on UDFs/UDPs written in Scala or Java, loaded into the system via the /api/v1/query/cypher/user-defined
REST endpoint, and then available for use in Cypher queries.
-
User-defined functions (UDFs) are functions that take any number of arguments and are expected to produce a single value as their output. They are pure functions that cause no changes to the data stored in the graph. Results are produced synchronously.
-
User-defined procedures (UDPs) are similar to functions in that they take arguments and produce results, but UDPs produce a back-pressured stream of rows, each of which may contain multiple values. Since the results are in a stream, it is possible to do asynchronous computation (producing an output row only on completion) or to return many results.
Because the UDF and UDP interfaces both run within the Quine engine’s JVM, care must be taken to implement these functions in a way that does not unduly compete with the graph’s own resource requirements. We recommend that you use these interfaces sparingly and only when necessary.
Steps to Create UDFs/UDPs
Creating user-defined functions and procedures requires:
- writing code for your custom procedure in a JVM compatible language
- compiling that code (with the thatDot Streaming Graph JAR available as a dependency)
- packaging the output into a JAR
- copy the that JAR file to all of the cluster members
- loading the code in the JAR by calling the REST API endpoint
Example: Defining a math.factorial
UDF
Here is what the code for defining a math.factorial
UDF looks like in Scala and in Java. We assume that the snippets are compiled with the full thatDot Streaming Graph JAR available (since otherwise they won’t compile due to missing types). Some important requirements:
- The UDF is defined as a
class
which has a public no-argument constructor - The UDF class is annotated with
com.thatdot.quine.graph.cypher.CypherUDF
- The UDF class extends
com.thatdot.quine.graph.cypher.UserDefinedFunction
or its subclasscom.thatdot.quine.graph.cypher.JavaUserDefinedFunction
- Scala
-
package com.thatdot.quine.graph.cypher import com.thatdot.quine.model.QuineIdProvider import com.thatdot.quine.util.Log._ @CypherUDF final class Factorial extends UserDefinedFunction { // Determines what the UDF is called when used in Cypher val name = "math.factorial.scala" // Set to `true` only if the function will produce the same output given the same inputs val isPure = true // Categorical classification of the function val category = "Numeric" // Used to filter out obviously incorrect uses of the UDF at query compilation val signatures: Vector[UserDefinedFunctionSignature] = Vector( UserDefinedFunctionSignature( arguments = Vector("input" -> Type.Integer), output = Type.Integer, description = "Returns the factorial of a number", ), ) // Gets called every time the UDF is called def call(args: Vector[Value])(implicit idProvider: QuineIdProvider, logConfig: LogConfig): Value = args match { case Vector(Expr.Integer(n)) if n < 0L => Expr.Null case Vector(Expr.Integer(n)) => // calculate factorial var acc: Long = 1L for (i <- 1L to n) acc *= i Expr.Integer(acc) case _ => throw wrongSignature(args) } }
- Java
-
package com.thatdot.quine.graph.cypher; import com.thatdot.quine.model.QuineIdProvider; import java.util.*; @CypherUDF public final class JavaFactorial extends JavaUserDefinedFunction { // Determines what the UDF is called when used in Cypher private static String name = "math.factorial"; // Used to filter out obviously incorrect uses of the UDF at query compilation private static UserDefinedFunctionSignature signature = UserDefinedFunctionSignature.create( Arrays.asList(new Argument("input", Type.integer())), Type.integer(), "Returns the factorial of a number" ); // Gets called every time the UDF is called @Override public Value call( List<Value> args, QuineIdProvider idProvider ) throws CypherException { if (args.size() != 1 || !(args.get(0) instanceof Expr.Integer)) { throw CypherException.wrongSignature( name, Arrays.asList(Type.integer()), args ); } long n = ((Expr.Integer) args.get(0)).getLong(); if (n < 0L) return Expr.nullValue(); // calculate factorial long acc = 1L; for (long i = 1L; i <= n; i += 1L) { acc *= i; } return new Expr.Integer(acc); } public JavaFactorial() { super(name, Arrays.asList(signature)); } @Override public boolean isPure() { return true; } @Override public String category() { return "Numeric"; } }
In order to extend UserDefinedFunction
, it is necessary to implement several members:
-
the
name
specifies how the UDF will be called in Cypher -
the
call
method defines what it means to call the UDF, taking in as an argument the internal representation of a Cypher value and producing another Cypher value as output -
the
signatures
field specifies the function signature(s) of the UDF (used for ruling out obviously ill-typed calls at query compilation time and producing helpful errors)
Assuming the above has been compiled & packaged into cypher-factorial.jar
and the JAR is copied to the servers beside the thatDot Streaming Graph JAR, the following REST API call is enough to load the UDF into the system.
curl -X POST "http://localhost:8080/api/v1/query/cypher/user-defined" \
-H "accept: */*" -H "Content-Type: application/json" \
-d '["cypher-factorial.jar"]'
The math.factorial
function can now be used from Cypher:
curl -X POST "http://localhost:8080/api/v1/query/cypher" \
-H "accept: application/json" -H "Content-Type: text/plain" \
-d "RETURN math.factorial(5)"
Executing this command uses our new user-defined function and returns:
{"columns":["math.factorial(5)"],"results":[[120]]}