Cypher Tutorial
If you’re new to Cypher and just getting started with Ladybug, you’re in the right place! This tutorial will guide you through the basics of Cypher, including how to create nodes and relationships, and how to scan, copy and query your data that’s in a Ladybug database.
See the link below for the example dataset used in this tutorial.
The dataset is a social network dataset of users and posts. Download the dataset and unzip it in your current working directory.
Working with the Ladybug CLI
Ladybug’s command line interface (CLI) is a great way to get started with Ladybug. It allows you to interact with your database in a terminal and quickly test out your ideas in Cypher.
Let’s start by running the Ladybug CLI shell. Install the Ladybug CLI and then run the following command to start the CLI.
# Open an in-memory databaselbugDefine a schema
The first step in getting your data into Ladybug is creating node and relationship tables. This step is called schema definition. Follow the steps in the example database to create the node and relationship tables.
Ingest the data
The COPY FROM command is used to ingest data from various file formats into Ladybug. In this case,
the data is stored in CSV format, in the local directory tutorial_data.
COPY User FROM 'tutorial_data/node/user.csv';COPY Post FROM 'tutorial_data/node/post.csv';COPY LIKES FROM 'tutorial_data/relation/LIKES.csv';COPY FOLLOWS FROM 'tutorial_data/relation/FOLLOWS.csv';COPY POSTED FROM 'tutorial_data/relation/POSTS.csv';You should see messages in your terminal indicating how many tuples were copied into each table. An example is shown below.
┌───────────────────────────────────────────────┐│ result ││ STRING │├───────────────────────────────────────────────┤│ 20 tuples have been copied to the User table. │└───────────────────────────────────────────────┘MATCH
In Ladybug, a graph’s nodes and relationships are stored in tables. The MATCH clause is used to
find nodes that match the pattern specified in the clause. All entities in the pattern are
returned via the RETURN clause.
Match nodes
Let’s say we want to match only User nodes in the database. We can do this by specifying the label
in the MATCH clause.
MATCH (a:User) RETURN a.* LIMIT 3;Note the a.* in the RETURN clause. This is a wildcard that returns all properties of the User
node, regardless of how many properties there are.
┌──────────┬────────────────┬─────────────────────────┐│ a.userID │ a.username │ a.account_creation_date ││ INT64 │ STRING │ DATE │├──────────┼────────────────┼─────────────────────────┤│ 1 │ epicwolf202 │ 2022-09-09 ││ 2 │ silentninja637 │ 2023-01-27 ││ 3 │ stormcat597 │ 2023-02-25 │└──────────┴────────────────┴─────────────────────────┘You can also match nodes across multiple node tables, as shown below. However, it only makes sense to do this when there exists a common property between the two tables.
MATCH (a:User:Post) RETURN * LIMIT 3;Match a relationship pattern
You can match a relationship pattern by specifying the relationship in the MATCH clause.
In the below example, we match the LIKES relationship between a User node and a Post node.
MATCH (a:User)-[r:LIKES]->(b:Post) RETURN a.username, b.* LIMIT 3;┌─────────────┬──────────┬─────────────────┬──────────────┬─────────────────┐│ a.username │ b.postID │ b.creation_date │ b.like_count │ b.retweet_count ││ STRING │ INT64 │ DATE │ INT64 │ INT64 │├─────────────┼──────────┼─────────────────┼──────────────┼─────────────────┤│ epicwolf202 │ 5 │ 2022-10-26 │ 103 │ 73 ││ epicwolf202 │ 11 │ 2020-02-03 │ 448 │ 168 ││ epicwolf202 │ 18 │ 2021-03-12 │ 244 │ 166 │└─────────────┴──────────┴─────────────────┴──────────────┴─────────────────┘The user named epicwolf202 has liked the posts with IDs 5, 11 and 18.
Match on all nodes and relationships
If you want to match on an arbitrary node or relationship in the database, you can use the MATCH clause without any label
in the pattern.
// Ask for all nodes and relationships in the databaseMATCH (a)-[b]->(c) RETURN * LIMIT 3;┌──────────────────────────────────┬──────────────────────────────────┬──────────────────────────────────┐│ a │ c │ b ││ NODE │ NODE │ REL │├──────────────────────────────────┼──────────────────────────────────┼──────────────────────────────────┤│ {_ID: 0:14, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:14)-{_LABEL: FOLLOWS, _ID:... ││ {_ID: 0:15, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:15)-{_LABEL: FOLLOWS, _ID:... ││ {_ID: 0:17, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:17)-{_LABEL: FOLLOWS, _ID:... │└──────────────────────────────────┴──────────────────────────────────┴──────────────────────────────────┘An alternate way to do this would be to leave the node and relationship fields blank, but store the path in a variable p as shown below.
// Ask for all paths in the databaseMATCH p=()-[]->() RETURN p LIMIT 3;Both the above queries return similar results.
Match multiple patterns
You can combine multiple match clauses that each specify a particular pattern.
MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User), (a)-[:Follows]->(c)RETURN a.username AS user1, b.username AS user2, c.username AS user3LIMIT 3;The above query is the same as having written the following two match clauses one after the other.
MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User)MATCH (a)-[:Follows]->(c)RETURN a.username, b.username, c.username LIMIT 3;Instead of repeating the match clause, you can comma-separate the clauses as shown above. The following result is returned.
┌────────────────┬───────────────┬───────────────┐│ user1 │ user2 │ user3 ││ STRING │ STRING │ STRING │├────────────────┼───────────────┼───────────────┤│ silentninja637 │ stormninja678 │ darkdog878 ││ silentninja637 │ darkdog878 │ stormninja678 ││ silentninja637 │ stormcat597 │ darkdog878 │└────────────────┴───────────────┴───────────────┘Match variable-length relationships
One of the most powerful features of Cypher is the ability to match variable-length relationships.
This is done using the Kleene star operator *. The following query aims to find all users that
are one or two hops (i.e., paths with 1 or 2 edges) away from a particular user.
MATCH (a:User)-[:Follows*1..2]->(b:User)RETURN a.username, b.usernameLIMIT 5;┌─────────────┬─────────────┐│ a.username │ b.username ││ STRING │ STRING │├─────────────┼─────────────┤│ epicwolf202 │ epicking81 ││ epicwolf202 │ darkdog878 ││ epicwolf202 │ coolking201 ││ epicwolf202 │ stormfox762 ││ epicwolf202 │ coolwolf752 │└─────────────┴─────────────┘The full Kleene star syntax has the form *<low>..<high>. In the above example, the lower bound was 1 and the upper bound was 2,
and so we match any paths with one or two edges. Alternatively, we can do *..<high> in which case the lower bound defaults to 1.
And, if we just want to match paths with an exact number of edges, we can simply do *<len> instead of specifying the same bound twice.
For example, if we want to find all users that are exactly two hops away, we can use the following query.
MATCH (a:User)-[:Follows*2]->(b:User)RETURN a.username, b.usernameLIMIT 5;┌────────────────┬───────────────┐│ a.username │ b.username ││ STRING │ STRING │├────────────────┼───────────────┤│ epicwolf202 │ epicking81 ││ epicwolf202 │ darkdog878 ││ epicwolf202 │ coolking201 ││ silentninja637 │ coolking201 ││ silentninja637 │ smartdragon25 │└────────────────┴───────────────┘The above result is telling us that the user in column b is followed by a user that the user in column a follows.
We can verify if this is true by manually writing the following query without the * operator.
MATCH (a:User)-[:Follows]->(x:User)-[:Follows]->(b:User)WHERE a.username = 'epicwolf202'RETURN a.username, b.username;┌─────────────┬─────────────┐│ a.username │ b.username ││ STRING │ STRING │├─────────────┼─────────────┤│ epicwolf202 │ coolking201 ││ epicwolf202 │ darkdog878 ││ epicwolf202 │ epicking81 │└─────────────┴─────────────┘Indeed, the same result is returned.
DISTINCT
The DISTINCT clause is used to return unique tuples in the result. The following query returns all
users who have liked a post. No duplicates are returned.
MATCH (u:User)-[:LIKES]->(p:Post)RETURN DISTINCT u.usernameLIMIT 3;┌────────────────┐│ u.username ││ STRING │├────────────────┤│ stormcat597 ││ silentninja637 ││ epicwolf202 │└────────────────┘OPTIONAL MATCH
The OPTIONAL MATCH clause is used to define a pattern to find in the database. The difference from a regular
MATCH is that if the system cannot match a pattern defined by OPTIONAL MATCH, it will set the values in
the variables defined only in the OPTIONAL MATCH, to NULL.
Depending on what the end goal is, returning nulls may or may not be acceptable, so use a conventional
MATCH if no nulls are desired.
MATCH (u1:User)OPTIONAL MATCH (u2)-[:Follows]->(u1:User)RETURN u1.username, u2.usernameLIMIT 3;┌─────────────┬───────────────┐│ u1.username │ u2.username ││ STRING │ STRING │├─────────────┼───────────────┤│ epicwolf202 │ smartdragon25 ││ epicwolf202 │ fastqueen400 ││ epicwolf202 │ mysticwolf198 │└─────────────┴───────────────┘WHERE
The WHERE clause allows you to specify predicates/constraints on a part of your query. The query
below shows how to filter the results to only include users whose account was created before a
particular date.
MATCH (a:User)WHERE a.account_creation_date < DATE('2023-02-01')RETURN a.username, a.account_creation_dateLIMIT 3;The date format in the WHERE predicate is specified in the format YYYY-MM-DD, as a string, and
transformed into a date object that can be compared with the account_creation_date property of the
User node.
┌────────────────┬─────────────────────────┐│ a.username │ a.account_creation_date ││ STRING │ DATE │├────────────────┼─────────────────────────┤│ epicwolf202 │ 2022-09-09 ││ silentninja637 │ 2023-01-27 ││ fastgirl798 │ 2021-06-11 │└────────────────┴─────────────────────────┘WHERE EXISTS subquery
You can specify a subquery in a WHERE clause with the EXISTS keyword. The following query returns all users
who have at least one post.
MATCH (u:User)WHERE EXISTS { MATCH (u)-[:POSTED]->()}RETURN u.usernameLIMIT 3;┌────────────────┐│ u.username ││ STRING │├────────────────┤│ epicwolf202 ││ silentninja637 ││ stormcat597 │└────────────────┘Grouping and Aggregation
Cypher does not have an explicit GROUP BY clause. Instead, you can simply apply an aggregation function
in the RETURN clause and group by the specified property. The following query returns the total number of
posts.
MATCH (p:Post)RETURN COUNT(p) AS num_posts;┌───────────┐│ num_posts ││ INT64 │├───────────┤│ 50 │└───────────┘The following example shows how to group by the userID property and return the number of posts for each user.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.userID, COUNT(p) AS num_postsLIMIT 3;┌──────────┬───────────┐│ u.userID │ num_posts ││ INT64 │ INT64 │├──────────┼───────────┤│ 14 │ 3 ││ 8 │ 2 ││ 15 │ 4 │└──────────┴───────────┘ORDER BY
The ORDER BY clause is used to sort the results of a query. The following query returns all users
sorted in descending order of the number of posts they have.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.username, COUNT(p) AS num_postsORDER BY num_posts DESCLIMIT 3;┌───────────────┬───────────┐│ u.username │ num_posts ││ STRING │ INT64 │├───────────────┼───────────┤│ stormqueen831 │ 5 ││ cooldragon866 │ 5 ││ mysticcat651 │ 4 │└───────────────┴───────────┘WITH
The WITH clause is used to chain the results of one query to another. The example below shows how to
find the top 3 users with the most posts and then find the users who follow them.
MATCH (u:User)-[:POSTED]->(p:Post)WITH u, COUNT(p) AS num_postsORDER BY num_posts DESCLIMIT 3WITH uMATCH (u2:User)-[:Follows]->(u)RETURN u.username AS most_prolific_user, u2.username AS followerLIMIT 8;Note the use of the AS keyword to rename the columns in the result.
┌────────────────────┬───────────────┐│ most_prolific_user │ follower ││ STRING │ STRING │├────────────────────┼───────────────┤│ stormqueen831 │ brightking765 ││ stormqueen831 │ coolking201 ││ stormqueen831 │ mysticwolf198 ││ stormqueen831 │ fastqueen400 ││ cooldragon866 │ silentguy245 ││ mysticcat651 │ coolking201 ││ mysticcat651 │ darkdog878 ││ mysticcat651 │ cooldragon866 │└────────────────────┴───────────────┘UNWIND
The UNWIND clause is used to unnest (i.e., explode) a list and return the elements as separate rows.
Consider that you have a list of user IDs and you want to find their account creation dates.
UNWIND ["epicwolf202", "cooldragon866", "stormcat597"] AS userMATCH (u:User {username: user})RETURN u.username, u.account_creation_date;┌───────────────┬─────────────────────────┐│ u.username │ u.account_creation_date ││ STRING │ DATE │├───────────────┼─────────────────────────┤│ epicwolf202 │ 2022-09-09 ││ cooldragon866 │ 2020-10-30 ││ stormcat597 │ 2023-02-25 │└───────────────┴─────────────────────────┘CASE
Cypher supports the CASE expression to handle conditional logic. The following query returns
the number of posts for each user, but only if the post was created after a certain date.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.username, COUNT(CASE WHEN p.creation_date > DATE('2023-04-01') THEN p ELSE NULL END) AS num_postsORDER BY num_posts DESC LIMIT 3;┌────────────────┬───────────┐│ u.username │ num_posts ││ STRING │ INT64 │├────────────────┼───────────┤│ cooldragon866 │ 2 ││ silentninja637 │ 2 ││ epiccat105 │ 1 │└────────────────┴───────────┘This was a basic introduction to the kinds of queries you can ask in Cypher. There are many other clauses in Cypher that were not covered in this tutorial. Check out the Cypher manual for more information!