Connections & Concurrency
In this section, we will go over the basics of how applications connect to a Ladybug database and list some best practices for concurrent connection handling.
Ladybug supports both on-disk and in-memory modes of operation. When operating under on-disk mode, your data and the underlying database files are stored in a local database, whereas under in-memory mode, no data is persisted to disk.
Throughout this documentation, let’s suppose you open a Ladybug database file that’s on-disk, named
example.lbug.
Understand connections
Database and connection objects
Application processes must connect to a Ladybug database in two steps before they can start querying it:
Step 1. Create an instance of a Database object db and pass it the database filename (example.lbug in our example below), and
a read-write mode which can be either:
READ_WRITE(default); orREAD_ONLY
Step 2. Create a Connection object conn from the Database object db.
- A Connection object that was created using a
READ_WRITEDatabaseobject can execute queries that do both read (e.g., queries withMATCH WHERE RETURNstatements) as well as write operations (e.g., queries withCREATEorCOPY FROMstatements). - In contrast, a Connection object that was created using a
READ_ONLYdatabase can only execute queries that do read operations.
Then, using conn, one can execute Cypher queries against the example.lbug database.
Here’s a simple example application in Python that demonstrates these two steps for creating a READ_WRITE
database and a connection. The same principles apply to other language APIs as well:
import lbug
# Open the database in `READ_WRITE` mode. The below code is equivalent to:# db = lbug.Database("example.lbug", read_only=False)db = lbug.Database("example.lbug")conn = lbug.Connection(db)conn.execute("CREATE (a:Person {name: 'Alice'});")Restrictions of in-memory databases
When working with in-memory databases, there are a few restrictions to keep in mind.
- An in-memory database cannot be opened as
READ_ONLY— onlyREAD_WRITEprocesses are allowed - When using the httpfs extension, we do not support remote file caching for in-memory databases
- Attaching an in-memory database is not allowed
Understand concurrency
Limitations of creating multiple Database objects
Ladybug is an embedded database, i.e., it is a library you embed inside an application process and run as part
of this application process, rather than as a separate process.
You can think of the Database object as the Ladybug database software.
Specifically, the Database object contains
different components of the Ladybug database software, such as its buffer manager, storage manager, transaction manager, etc.
Several of the components inside a Database object, such as the buffer manager,
cache parts of the data that are stored on disk. This limits the number of Database objects that can be created
pointing to the same database, either in the same process or across multiple processes.
The possible settings are:
- One
READ_WRITEdatabase object; OR - Multiple
READ_ONLYdatabase objects.
The reason for this limitation is that if a connection conn1 from db1 makes a
write operation, say deleting some node record, then the db1 object is able to ensure
that any cached data in db1 is refreshed and is accurate. However, it cannot notify other Database objects that may exist
about the change. So in our example, db2’s cache would no longer represent the true state of the
data on disk that was cached. This can lead to problems if
connections from db2 try to run queries after db1’s modification. Therefore, Ladybug will
not allow multiple Database objects to be created unless they are all READ_ONLY.
The limitation of having either one READ_WRITE Database object or multiple READ_ONLY Database objects applies
even if these Database objects are created in the same application process. That said, there is often no good reason for
creating multiple Database instances within the same process (you should instead share the same Database object
in that process).
However, there are common scenarios when you may want to launch
multiple application processes that connect to the same database. One such scenario
is when developing your workflow in Python using a Jupyter notebook
that connects to example.lbug. Say you want to also run the Ladybug CLI alongside your Jupyter notebook,
which also connects to the same example.lbug. When you launch Ladybug CLI and point it to
example.lbug, Ladybug CLI embeds Ladybug and tries to create a READ_WRITE Database object. So if your notebook process already
has created a Database object, this will fail with an error that looks like this:
RuntimeError: IO exception: Could not set lock on file : /path/to/database/example.lbugIf this happens, you would have to shut down your notebook process (or simply restart your Jupyter server), so that its Database object is destroyed, before the CLI can run.
Create multiple Connections from the same Database object
Note that the above limitation about creating multiple Database objects does not mean that you cannot create
multiple Connections from the same READ_WRITE Database object and issue concurrent queries. For example,
you can write a program that creates a single READ_WRITE Database object db that points to example.lbug.
Then, you can spawn multiple threads
T1, …, Tk, and each Ti obtains a connection from db and concurrently issues
read or write queries. This is safe. Every read and write statement in Ladybug is wrapped around a transaction
(either automatically or manually by you). Concurrent transactions that operate on the same database
example.lbug are safely executed by Ladybug’s transaction manager (i.e., the transaction manager inside db),
again as long as those transactions are issued by connections that were created from the same Database object.
See the documentation on transactions for the transactional guarantees that Ladybug provides.
Example scenarios
Below, we provide some examples and best practices for common scenarios you are likely to run Ladybug under:
Scenario 1: One process that creates a READ_WRITE database
In this scenario, you have a single application process that embeds Ladybug and creates a READ_WRITE Database object
that opens the example.lbug database. Within this process, you can create multiple concurrent connections, each of which
can execute queries that can read and write to the database, which will be handled safely
by Ladybug’s transaction manager. Pictorially, this scenario looks as follows:
For simplicity, in the above image queries
from conn1 and conn2 are executed sequentially but they could be running concurrently as well.
Scenario 2: Multiple processes that create READ_ONLY databases
In this scenario, you have multiple application processes that embed
Ladybug and create READ_ONLY Database objects that open the same database example.lbug.
Each process can create multiple concurrent connections and issue queries.
However, each connection can only execute read-only queries (because the database is opened in READ_ONLY mode).
Since the connections and queries are read-only, none of the queries can change the actual database files on disk.
Therefore, even though the queries are coming
from connections from different Database objects, this is safe and allowed.
If you’re interested in running multiple processes that can read and write to the same database, see the next section.
Performing read-write operations from multiple processes
In certain production settings, you may need to have multiple processes that read and write to the same Ladybug database,
say again stored under example.lbug.
This is the case for example if you have an online application. Say you have a browser application and multiple users
use your application from different browsers and each user interaction leads to concurrent read-write queries
on the same database. To support such scenarios, a common design pattern is this:
- One API server process that embeds Ladybug
and creates a single
READ_WRITEDatabase object pointing toexample.lbug. The API server is responsible for handling incoming requests from clients, say through HTTP or gRPC. The requests’ Cypher queries, which can read and write data to the database, are executed (possibly) concurrently. - Multiple client processes that connect to the API server and send requests to the API server (again through some protocol, such as HTTP or gRPC). You are free to open any number of client processes that issue read or write queries to the API server.
Note that in terms of processes that embed Ladybug, this design pattern follows
scenario 1 above as there is actually
one process that creates a READ_WRITE Database object. To enable you to get up and running with such an architecture,
we provide a REST-style Ladybug API server powered by Express.js.
Pictorially, this design pattern looks as follows:
Known Issue: Ladybug Explorer not recognizing the file lock permissions
Ladybug ensures that multiple Database objects, where one of them is a READ_WRITE instance, are not
created by setting some permission flags when opening the database file. This is a lightweight
locking mechanism. However, there is a known issue that Ladybug Explorer is not
able to see the flags put by other processes. The core problem is that Explorer runs as a Docker container
and the flags are not propagated between the host operating system and the Docker environment. We do not currently
have a fix to this. So if you have a process (or processes) that has
opened a database and you concurrently start Ladybug Explorer, you should manually ensure that
either: (i) both Explorer and your other process are in READ_ONLY mode; or (ii) you shut down your other
process first before opening Explorer in READ_WRITE mode.
FAQs
In this section, we address some commonly asked questions related to concurrency and connections in Ladybug.
Can I embed Ladybug using both READ_ONLY and READ_WRITE processes in my application?
No, when embedding Ladybug in your application, you cannot have both READ_WRITE and READ_ONLY database processes
open at any given time (in a safe manner). Technical details for this limitation are described in the sections above.
In short, the reason for this limitation is that at any given time, a READ_WRITE process can make changes
to the disk layout, which may or may not be reflected in the buffer manager of other open READ_ONLY connections, and this
can lead to inconsistencies or data corruption. To avoid this issue, the best practice when embedding Ladybug in your
application is to use design patterns as per one of the scenarios shown pictorially in the sections above.
I’m seeing an error related to locks when running Ladybug in a Jupyter notebook. How can I resolve this?
Sometimes, when you are working in a Jupyter notebook and building your Ladybug graph while also trying to open other processes that connect to the same directory as the database file, you may come across this error:
RuntimeError: IO exception: Could not set lock on file : /path/to/database/example.lbugThe lock, as described in earlier sections on this page, is present to protect you from inadvertent
data corruption due to multiple Database instances trying to access the same database concurrently.
To resolve this, simply click the Restart server button in your Jupyter notebook (or close the Jupyter
notebook entirely). Restarting the Jupyter notebook server (or closing it) will release the lock on the
database file, allowing you to safely connect to the database via another connection,
for example, a CLI or Ladybug Explorer for graph visualization. In general, it’s recommended to only
open the CLI or Ladybug Explorer after you have finished any operations to the database (and closing or
restarting the Jupyter notebook server).
Can I open an in-memory database as READ_ONLY?
No, you cannot open in-memory databases as READ_ONLY — only READ_WRITE processes are allowed
when operating under in-memory mode. When you create a Ladybug database, there are two things to keep
in mind:
- The database mode (on-disk or in-memory)
- Whether you will read and write to the database or only read from it
An in-memory database is stored in memory and not on disk. This means that the database is temporary
and the data will be lost when the process that created the database is terminated, so without a READ_WRITE process,
in-memory databases won’t have any data to operate on.
See the getting started section for more details on how to create and work with in-memory databases in your client API of choice.