Feature Request: `EXISTS` In Variable-Length `MATCH` Filters

Alex Johnson
-
Feature Request: `EXISTS` In Variable-Length `MATCH` Filters

In this article, we will dive into a feature request for Memgraph that could significantly enhance the flexibility and power of graph queries, specifically the ability to use the EXISTS operator within the filter of a variable-length relationship MATCH clause. This enhancement would enable more dynamic and precise traversal step conditions, unlocking new possibilities for complex graph pattern matching.

The Challenge: Dynamic Traversal Conditions

When working with graph databases, one of the most common tasks is traversing relationships between nodes to discover patterns and connections. Memgraph, like other graph databases, provides powerful mechanisms for specifying these traversals, including variable-length relationships. However, current limitations prevent the use of EXISTS within the filter (r, n | ...) of a MATCH clause, which restricts the ability to define dynamic traversal-step conditions. This means that certain conditions, especially those requiring subquery checks, cannot be directly incorporated into the traversal logic, leading to less efficient or more complex query formulations.

Understanding Variable-Length Relationships in Memgraph

Before diving deeper into the feature request, let's clarify what variable-length relationships are and how they are used in Memgraph. A variable-length relationship allows you to specify a pattern that can span a range of hops within the graph. For instance, you might want to find all nodes that are connected to a starting node by a path of 1 to 5 relationships. This is expressed in Cypher, Memgraph's query language, using the * operator followed by a range, such as -[REL*1..5]->.

The (r, n | ...) syntax within the variable-length relationship provides a filter that can be applied at each step of the traversal. Here, r represents the relationship being traversed, and n represents the target node. The filter allows you to specify conditions that must be met for the traversal to continue along a particular path. However, the current limitation is that you cannot use EXISTS subqueries within this filter.

The Need for EXISTS in Traversal Filters

The core of the feature request is the ability to use EXISTS subqueries within the (r, n | ...) filter. The need for this arises when the conditions for continuing the traversal depend on the existence of certain patterns or relationships around the current node (n) that cannot be directly derived from n or the relationship r. In other words, sometimes you need to look beyond the immediate context of the traversal step to make a decision about whether to continue.

Consider a scenario where you are traversing a class hierarchy and want to consider only class properties during the traversal. This requires checking conditions on the properties beyond just the immediate relationship and node. Specifically, you might need to ensure that a property node is connected to a class node via a specific relationship. This kind of check necessitates the use of a subquery, which is precisely what EXISTS allows.

Minimal Example Case: Class Hierarchy Traversal

To illustrate the problem and the potential solution, let's examine a minimal example case. Imagine a graph representing a class hierarchy, where classes are connected by relationships like :P and :VALUE. Properties are connected to classes, and certain properties may have additional relationships, such as :KEY, to other nodes.

Match p = (c:Class {name: "Foo"})
 -[:P|:VALUE *BFS ..6 (r,n |
 (n:Property or n:Class) AND
 CASE WHEN n:Property
 // 'Class' node type is required, if it's a :Property. 
 THEN EXISTS { MATCH (:Class)-[:P]->(n) }
 // optionally restrict :Property key as well
 AND EXISTS { MATCH (n)-[:KEY]->(:Verb {name: "SUBCLASS_OF"}) }
 ELSE true END
 )]
 -(:Class)
RETURN p
LIMIT 100;

In this query, the goal is to traverse the class hierarchy starting from a class named "Foo", considering only class properties. The traversal should follow :P or :VALUE relationships up to a depth of 6. The (r, n | ...) filter is used to specify the conditions for each step.

The key challenge lies in the fact that when the traversal reaches a :Property node, it is necessary to check whether this property is indeed a class property. This requires verifying the existence of a relationship from a :Class node to the :Property node, which is precisely where the EXISTS subquery comes into play. Additionally, there might be a need to further restrict the properties based on their :KEY relationships, adding another layer of complexity that EXISTS can handle.

Breaking Down the Query

Let's break down the query to understand each part:

  • MATCH p = (c:Class {name: "Foo"}) ...: This starts the pattern matching from a :Class node with the name "Foo".
  • -[:P|:VALUE *BFS ..6 (r,n | ...)]-: This specifies a variable-length relationship that can be either :P or :VALUE, with a breadth-first search (BFS) traversal up to 6 hops. The (r, n | ...) filter is applied at each step.
  • (n:Property or n:Class): This condition allows the traversal to proceed to either :Property or :Class nodes.
  • CASE WHEN n:Property THEN ... ELSE true END: This is a conditional check that applies specific conditions when the target node n is a :Property.
  • EXISTS { MATCH (:Class)-[:P]->(n) }: This is the crucial part where the EXISTS subquery is used. It checks whether there is a :Class node connected to the current :Property node n via a :P relationship. This ensures that only class properties are considered.
  • AND EXISTS { MATCH (n)-[:KEY]->(:Verb {name: "SUBCLASS_OF"}) }: This adds an optional restriction to filter properties based on their :KEY relationship to a :Verb node with the name "SUBCLASS_OF".
  • -(:Class): This specifies that the traversal should end at a :Class node.
  • RETURN p LIMIT 100: This returns the paths found, limited to 100 results.

The Current Limitation

As highlighted in the original feature request, the current implementation of Memgraph does not support the use of EXISTS within the (r, n | ...) filter. This limitation prevents the direct expression of the desired traversal logic, making it necessary to find workarounds.

Workarounds and Their Shortcomings

Given the current limitations, there are a few potential workarounds, but each comes with its own set of drawbacks.

1. Subgraph Approach

One possible workaround is to create a subgraph containing only the filtered properties and then perform the traversal on this subgraph. However, this approach is cumbersome and may not be efficient for complex queries. Additionally, subgraphs may not support arbitrary queries, further limiting their applicability.

2. Adding Case-Specific Labels

Another workaround is to add additional labels to the :Property nodes, such as :ClassProperty, to indicate that they are class properties. This can be done using a separate query:

MATCH (:Class)-[:P]->(p:Property) SET p:ClassProperty;

However, this approach introduces redundant information into the graph, as the class property status can already be derived from the graph model. It also does not scale well, especially when trying to filter based on more complex conditions, such as specific verb names in the :KEY relationship.

The Ideal Solution: Native EXISTS Support

The ideal solution is to natively support the use of EXISTS within the (r, n | ...) filter. This would allow for a more natural and efficient expression of dynamic traversal conditions, eliminating the need for cumbersome workarounds and improving the overall query performance.

Benefits of Implementing the Feature

Implementing the ability to use EXISTS within the filter of variable-length relationship MATCH clauses would bring several benefits:

  • Increased Expressiveness: It would allow users to express more complex traversal conditions directly within the query, without resorting to workarounds.
  • Improved Readability: Queries would become more readable and easier to understand, as the traversal logic would be more clearly defined.
  • Enhanced Performance: Native support for EXISTS would likely lead to better query performance, as the database engine could optimize the traversal based on the subquery conditions.
  • Greater Flexibility: It would provide greater flexibility in modeling and querying complex graph patterns, unlocking new use cases and applications.

Performance Implications and Considerations

While the feature would bring significant benefits, it is essential to consider the potential performance implications. Using EXISTS within the traversal filter could potentially increase the query execution time, as each step might involve evaluating a subquery.

However, with proper optimization techniques, such as indexing and query planning, the performance impact can be minimized. Additionally, the benefits of increased expressiveness and flexibility often outweigh the potential performance overhead, especially for complex queries.

Use Cases and Applications

The ability to use EXISTS in traversal filters would be valuable in various use cases and applications, including:

  • Knowledge Graphs: Traversal of complex relationships with conditions based on external properties or relationships.
  • Semantic Web: Querying RDF graphs with dynamic constraints on traversal paths.
  • Social Networks: Finding connections between users based on specific criteria that require subquery checks.
  • Recommendation Systems: Building recommendation engines that consider complex relationships and user preferences.
  • Data Lineage: Tracking the flow of data through a system with conditions based on data transformations and dependencies.

Conclusion

The feature request to allow the use of EXISTS within the filter of variable-length relationship MATCH clauses in Memgraph is a significant one. It addresses a current limitation that hinders the expression of dynamic traversal conditions and necessitates cumbersome workarounds. Implementing this feature would enhance the expressiveness, readability, and flexibility of Memgraph queries, opening up new possibilities for complex graph pattern matching and analysis.

While there are potential performance implications to consider, proper optimization techniques can mitigate these concerns. The benefits of increased expressiveness and flexibility often outweigh the potential overhead, especially for complex queries.

This enhancement would make Memgraph an even more powerful and versatile graph database, capable of handling a wider range of use cases and applications. By enabling more natural and efficient expression of dynamic traversal conditions, Memgraph can empower users to unlock deeper insights from their graph data.

To learn more about graph databases and Cypher, you can visit Neo4j's website, a leading graph database platform.

You may also like