PostgreSQL

When setting the dbms parameter in the biocypher_config.yaml to postgresql, the BioCypher Knowledge Graph is written to a PostgreSQL database. PostgreSQL is a relational database management system.

Install PostgreSQL

To get a PostgreSQL instance running quickly, you can use Docker. The following command will start a PostgreSQL instance with the password postgres and the port 5432 exposed to the host system.

docker run --restart always \
           --publish=5432:5432 \
           --env POSTGRES_PASSWORD=postgres \
           -d postgres

The postgresql-client (also known as psql command line tool) can be used to connect and interact with the running PostgreSQL database instance. Installation instructions can be found here.

PostgreSQL settings

To overwrite the standard settings of PostgreSQL, add a postgresql section to the biocypher_config.yaml file. The following settings are possible:

biocypher_config.yaml

postgresql:  ### PostgreSQL configuration ###

  # PostgreSQL connection credentials
  database_name: postgres
  user: postgres
  password: postgres
  port: 5432

  # PostgreSQL import batch writer settings
  quote_character: '"'
  delimiter: '\t'

  # Import call prefixes to adjust the autogenerated shell script
  import_call_bin_prefix: ''  # path to "psql"
  import_call_file_prefix: /path/to/files/

  labels_order: Ascending  # The default, from most specific to most general in the ontology.
  node_labels_order: None  # The default.
  edge_labels_order: None

Offline mode

Running BioCypher

After running BioCypher with the offline parameter set to true and the dbms set to postgres, the output folder contains:

entity-create_table.sql: The SQL scripts to create the tables for the nodes/edges. Entity is replaced by your nodes and edges and for each node and edge type an own SQL script is generated.
entity-part000.csv: The CSV file containing the data for the entity.
postgres-import-call.sh: The import script to create a database with the SQL scripts and insert the data from the CSV files.

Note

If the postgres-import-call.sh is missing, you can create it by running bc.write_import_call().

Create the PostgreSQL database

A running PostgreSQL instance (e.g. in a Docker container created with the command from above) is required to create the database and import the data. By running the import script postgres-import-call.sh, the content should be written to the PostgreSQL database.

Access the PostgreSQL database

To check the content of the database, you can use the psql command line tool.

First connect to the running PostgreSQL database instance:

psql -h localhost -p 5432 -U postgres

Then you can execute SQL queries. For example, you can list all tables in the database by running the following command in the terminal:

SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public';

Note on labels order

PostgreSQL does not support managing the hierarchy of types of the vocabulary given by the input ontology. However, it is possible to keep all the type labels of all the ancestors in the types hierarchy, and list them in a column, as a single string.

By default, the PostgreSQL driver exports the type labels as a sorted list (Ascending), which means the labels are ordered from most specific to most general in the ontology. You can set labels_order to either Ascending, Descending, or Alphabetical.

To get even simpler labels, you can set labels_order: Leaves, which will remove all but the most specific type label (the "leaf" of the types tree). Be warned that the resulting export will completely lose the ontological information, hence making it impossible to query the graph on high-level types.

Advanced labels ordering

The labels_order option impacts both nodes and edges. To get an order that is different for nodes and edges, you can use the node_labels_order and edge_labels_order parameters.

By default, those are set to "None", and labels_order is used for both nodes and edges. If you set either node_labels_order or edge_labels_order, they will take precedence.