Scan data from various sources
Cypher supports the LOAD FROM clause to scan data from various sources. Scanning is the act of
reading data from a source, but not inserting it into the database. Inserting data into the database
involves “copying”; see the Import data section for details.
Scanning data using LOAD FROM is useful to inspect a subset of your data, understand its structure,
and perform transformations like rearranging columns.
General usage
Say you have a user.csv file that looks like this:
name,ageAdam,30Karissa,40Zhang,50Noura,25You can scan the file and count the number of rows:
LOAD FROM "user.csv" (header = true)RETURN COUNT(*);This counts the number of rows in the file.
┌──────────────┐│ COUNT_STAR() ││ INT64 │├──────────────┤│ 4 │└──────────────┘You can also apply filter predicates via the WHERE clause, like this:
LOAD FROM "user.csv" (header = true)WHERE age > 25RETURN COUNT(*);The above query counts only the rows where the age column is greater than 25.
┌──────────────┐│ COUNT_STAR() ││ INT64 │├──────────────┤│ 3 │└──────────────┘Note that when scanning from CSV files, Ladybug will attempt to auto-cast the data to the correct type
when possible. For example, the age column is cast to an INT64 type.
You can reorder the columns by simply returning them in the order you want. The LIMIT keyword
can be used to limit the number of rows returned. The example below returns the first two rows,
with the age and name columns in the order specified.
LOAD FROM "user.csv" (header = true)RETURN age, nameLIMIT 2;┌───────┬─────────┐│ age │ name ││ INT64 │ STRING │├───────┼─────────┤│ 30 │ Adam ││ 40 │ Karissa │└───────┴─────────┘More features
LOAD FROM is a general-purpose clause in Cypher for scanning data from various data sources,
including files and in-memory DataFrames. See the detailed documentation below for more details
on how to use the LOAD FROM clause.