DuckDB extension
The duckdb extension allows you to attach DuckDB databases to Ladybug.
Usage
Please see Install an extension and Load an extension first before getting started.
Example DuckDB database
First, create a sample DuckDB database of university students in Python.
import duckdb
conn = duckdb.connect('university.db')
conn.execute("CREATE TABLE IF NOT EXISTS person (name VARCHAR, age INTEGER);")conn.execute("INSERT INTO person VALUES ('Alice', 30);")conn.execute("INSERT INTO person VALUES ('Bob', 27);")conn.execute("INSERT INTO person VALUES ('Carol', 19);")conn.execute("INSERT INTO person VALUES ('Dan', 25);")Attach a DuckDB database
To attach a DuckDB database, use the ATTACH statement:
ATTACH [DB_PATH] [AS alias] (dbtype duckdb, skip_unsupported_table = OPTION, schema = SCHEMA_NAME)DB_PATH: Path to the DuckDB database instance (can be a relative, absolute, https or S3 path)alias: Database alias to use in Ladybug. If not provided, the database name from DuckDB will be used. When attaching multiple databases, we recommend using aliases.skip_unsupported_table: A boolean flag to indicate whether Ladybug should exit with an error or skip tables with unsupported data types. Defaults toFalse.schema: The name of the schema in DuckDB to attach. Ladybug attaches to themainschema of DuckDB by default.
For example, the university.db DuckDB database can be attached to Ladybug using the alias uw:
ATTACH 'university.db' AS uw (dbtype duckdb);Attach a DuckDB database on S3
You can also attach a remote DuckDB database hosted on S3.
First, configure the S3 connection.
Then, attach the database using an S3 URL:
SET S3_URL_STYLE='VHOST';ATTACH 's3://duckdb-blobs/databases/stations.duckdb' AS station (dbtype duckdb);LOAD FROM station.stations RETURN count(*);┌──────────────┐│ COUNT_STAR() ││ INT64 │├──────────────┤│ 578 │└──────────────┘DuckDB to Ladybug type mapping
The following table shows the mapping from DuckDB’s data types to Ladybug’s data types:
| Data type in DuckDB | Corresponding data type in Ladybug |
|---|---|
BIGINT | INT64 |
BIT | Unsupported |
BLOB | BLOB |
BOOLEAN | BOOL |
DATE | DATE |
DECIMAL(prec, scale) | DECIMAL(prec, scale) |
DOUBLE | DOUBLE |
FLOAT | FLOAT |
HUGEINT | INT128 |
INTEGER | INT32 |
INTERVAL | INTERVAL |
SMALLINT | INT16 |
TIME | Unsupported |
TIMESTAMP WITH TIME ZONE | Unsupported |
TIMESTAMP | TIMESTAMP |
TINYINT | INT8 |
UBIGINT | UINT64 |
UHUGEINT | Unsupported |
UINTEGER | UINT32 |
USMALLINT | UINT16 |
UTINYINT | UINT8 |
UUID | UUID |
VARCHAR | STRING |
ENUM | Unsupported |
ARRAY | ARRAY |
LIST | LIST |
MAP | MAP |
STRUCT | STRUCT |
UNION | UNION |
Scan a DuckDB table
You can use the LOAD FROM statement to scan the person table from the attached uw database.
LOAD FROM uw.personRETURN *;┌─────────┬───────┐│ name │ age │├─────────┼───────┤│ Alice │ 30 ││ Bob │ 27 ││ Carol │ 19 ││ Dan │ 25 │└─────────┴───────┘Use a default database name
You can set a default database name to avoid having to specify the full database name as a prefix in every query.
The above example can be simplified as:
ATTACH 'university.db' AS uw (dbtype duckdb);USE uw;LOAD FROM personRETURN *;Copy data from DuckDB tables
You can use the COPY FROM statement to import data from a DuckDB table into Ladybug.
First, create a Person table in Ladybug, with the same schema as the one defined in DuckDB:
CREATE NODE TABLE Person (name STRING PRIMARY KEY, age INT32);When the schemas are the same, you can copy the data from the external DBMS table to the Ladybug table simply as:
COPY Person FROM uw.person;If the schemas are not the same, e.g., Person contains only name property while the external uw.person contains
name and age, you can still use COPY FROM but with a subquery as:
COPY Person FROM (LOAD FROM uw.person RETURN name);You can verify that the data was successfully imported:
MATCH (p:Person) RETURN p.*;┌─────────┬───────┐│ p.name │ p.age │├─────────┼───────┤│ Alice │ 30 ││ Bob │ 27 ││ Carol │ 19 ││ Dan │ 25 │└─────────┴───────┘Clear the schema cache
To avoid redundantly retrieving schema information from attached databases, Ladybug maintains a schema cache
including table names and their respective columns and types. Should modifications occur in the schema
via an alternate connection to attached RDBMSs, such as creation or deletion of tables, the cached
schema data may become obsolete. You can use the clear_attached_db_cache() function to refresh cached
schema information in such cases.
CALL clear_attached_db_cache();Detach a database
To detach a database, use DETACH [ALIAS]:
DETACH uw;