PostgreSQL
When setting the dbms parameter in the biocypher_config.yaml to postgresql,
the BioCypher Knowledge Graph is written to a PostgreSQL database.
PostgreSQL is a relational database management system.
Install PostgreSQL
To get a PostgreSQL instance running quickly, you can use Docker. The following
command will start a PostgreSQL instance with the password postgres and the
port 5432 exposed to the host system.
The postgresql-client (also known as psql command line tool) can be
used to connect and interact with the running PostgreSQL database instance.
Installation instructions can be found here.
PostgreSQL settings
To overwrite the standard settings of PostgreSQL, add a postgresql section to
the biocypher_config.yaml file. The following settings are possible:
postgresql: ### PostgreSQL configuration ###
# PostgreSQL connection credentials
database_name: postgres
user: postgres
password: postgres
port: 5432
# PostgreSQL import batch writer settings
quote_character: '"'
delimiter: '\t'
# Import call prefixes to adjust the autogenerated shell script
import_call_bin_prefix: '' # path to "psql"
import_call_file_prefix: /path/to/files/
labels_order: Ascending # The default, from most specific to most general in the ontology.
node_labels_order: None # The default.
edge_labels_order: None
Offline mode
Running BioCypher
After running BioCypher with the offline parameter set to true and the
dbms set to postgres, the output folder contains:
-
entity-create_table.sql: The SQL scripts to create the tables for the nodes/edges. Entity is replaced by your nodes and edges and for each node and edge type an own SQL script is generated. -
entity-part000.csv: The CSV file containing the data for the entity. -
postgres-import-call.sh: The import script to create a database with the SQL scripts and insert the data from the CSV files.
Note
If the postgres-import-call.sh is missing, you can create it by running
bc.write_import_call().
Create the PostgreSQL database
A running PostgreSQL instance (e.g. in a Docker container created with the
command from above) is required to create the database and import the data.
By running the import script postgres-import-call.sh, the content should be
written to the PostgreSQL database.
Access the PostgreSQL database
To check the content of the database, you can use the psql command line
tool.
First connect to the running PostgreSQL database instance:
Then you can execute SQL queries. For example, you can list all tables in the database by running the following command in the terminal:
Note on labels order
PostgreSQL does not support managing the hierarchy of types of the vocabulary given by the input ontology. However, it is possible to keep all the type labels of all the ancestors in the types hierarchy, and list them in a column, as a single string.
By default, the PostgreSQL driver exports the type labels as a sorted list
(Ascending), which means the labels are ordered from most specific to most
general in the ontology. You can set labels_order to either Ascending,
Descending, or Alphabetical.
To get even simpler labels, you can set labels_order: Leaves, which
will remove all but the most specific type label (the "leaf" of the types
tree). Be warned that the resulting export will completely lose the ontological
information, hence making it impossible to query the graph on high-level types.
Advanced labels ordering
The labels_order option impacts both nodes and edges. To get an order that
is different for nodes and edges, you can use the node_labels_order and
edge_labels_order parameters.
By default, those are set to "None", and labels_order is used for both nodes
and edges. If you set either node_labels_order or edge_labels_order,
they will take precedence.