BioCypher Configuration Reference
BioCypher comes with a default set of configuration parameters. You can
overwrite them by creating a biocypher_config.yaml
file in the root
directory or the config
directory of your project. You only need to
specify the ones you wish to override from default. If you want to
create global user settings, you can create a biocypher_config.yaml
in your
default BioCypher user directory (as found using
appdirs.user_config_dir('biocypher')
). For instance, on Mac OS, this would
be ~/Library/Caches/biocypher/biocypher_config.yaml
. Finally, you can also
point an instance of the BioCypher class to any YAML file using the
biocypher_config_path
parameter.
Note
It is important to follow the rules of indentation in the YAML file.
BioCypher module configuration is found under the top-level keyword
biocypher
, while the settings for DBMS systems (e.g., Neo4j) are found
under their respective keywords (e.g., neo4j
).
Quote characters
If possible, avoid using quote characters in your YAML files. If you
need to quote, for instance a tab delimiter (\t
), use single quotes
('
), since double quotes ("
) allow parsing of escape characters in
YAML, which can cause issues downstream. It is safe to use double quotes
to quote a single quote character ("'"
).
Configuration files are read in the order default -> user level -> project level
,
with the later ones overriding the preceding.
Configuration Structure
The configuration file is structured into several sections:
- BioCypher Core Settings (
biocypher:
) - Core settings for BioCypher functionality- choose
dbms
to select one of either the available DBMSs (2.) or data models (3.)
- choose
- Database Management Systems - Settings specific to each supported DBMS:
- Neo4j (
neo4j:
) - PostgreSQL (
postgresql:
) - SQLite (
sqlite:
)
- Neo4j (
- Data Models - Settings for different data models:
- RDF (
rdf:
) - NetworkX (
networkx:
) - CSV (
csv:
)
- RDF (
Default Configuration
Below is the default configuration that comes with BioCypher. This represents all available options with their default values. Some options (like tail ontologies) are commented out in the default configuration as they are optional and specific to certain use cases.
biocypher:
#---- REQUIRED PARAMETERS
dbms: neo4j
schema_config_path: config/schema_config.yaml
offline: true
strict_mode: false
head_ontology:
url: https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl
root_node: entity
switch_label_and_id: true
#---- OPTIONAL PARAMETERS
log_to_disk: true
debug: true
log_directory: biocypher-log
output_directory: biocypher-out
cache_directory: .cache
#---- OPTIONAL TAIL ONTOLOGIES
# tail_ontologies:
# so:
# url: test/ontologies/so.owl
# head_join_node: sequence variant
# tail_join_node: sequence_variant
# switch_label_and_id: true
# mondo:
# url: test/ontologies/mondo.owl
# head_join_node: disease
# tail_join_node: disease
# switch_label_and_id: true
#-------------------------------------------------------------------
#----------------- OUTPUT Configuration -----------------
#-------------------------------------------------------------------
#---- NEO4J database management system
neo4j:
database_name: neo4j
wipe: true
uri: neo4j://localhost:7687
user: neo4j
password: neo4j
delimiter: ";"
array_delimiter: "|"
quote_character: "'"
multi_db: true
skip_duplicate_nodes: false
skip_bad_relationships: false
# import_call_bin_prefix: bin/
# import_call_file_prefix: path/to/files/
#---- PostgreSQL database management system
postgresql:
database_name: postgres
host: localhost # host
port: 5432 # port
user: postgres
password: postgres # password
quote_character: '"'
delimiter: '\t'
# import_call_bin_prefix: '' # path to "psql"
# import_call_file_prefix: '/path/to/files'
#---- SQLite database management system
sqlite:
### SQLite configuration ###
# SQLite connection credentials
database_name: sqlite.db # DB name
# SQLite import batch writer settings
quote_character: '"'
delimiter: '\t'
# import_call_bin_prefix: '' # path to "sqlite3"
# import_call_file_prefix: '/path/to/files'
#---- RDF (Resource Description Framework) data model
rdf:
### RDF configuration ###
rdf_format: turtle
#---- NetworkX graph data model
networkx:
### NetworkX configuration ###
some_config: some_value # placeholder for technical reasons TODO
#---- CSV (Comma-Separated Values) text file format
csv:
### CSV/Pandas configuration ###
delimiter: ","
Configuration Parameters Reference
BioCypher Core Parameters
Parameter | Description | Type | Default |
---|---|---|---|
dbms |
Specifies which database management system to use | string | "neo4j" |
schema_config_path |
Path to the schema configuration file | string | "config/schema_config.yaml" |
offline |
Whether to run in offline mode (no running DBMS or in-memory object) | boolean | true |
strict_mode |
Whether to enforce strict schema validation | boolean | false |
head_ontology.url |
URL or file path to the main ontology file | string | Biolink model URL |
head_ontology.root_node |
The root node of the ontology to use | string | "entity" |
head_ontology.switch_label_and_id |
Whether to switch label and ID in the ontology | boolean | true |
log_to_disk |
Whether to save logs to disk | boolean | true |
debug |
Whether to enable debug logging | boolean | true |
log_directory |
Directory for log files | string | "biocypher-log" |
output_directory |
Directory for output files | string | "biocypher-out" |
cache_directory |
Directory for cache files | string | ".cache" |
tail_ontologies |
Additional ontologies to use (optional) | object | - |
Neo4j Configuration
Parameter | Description | Type | Default |
---|---|---|---|
database_name |
Name of the Neo4j database | string | "neo4j" |
wipe |
Whether to wipe the database before import | boolean | true |
uri |
Connection URI for Neo4j | string | "neo4j://localhost:7687" |
user |
Username for Neo4j authentication | string | "neo4j" |
password |
Password for Neo4j authentication | string | "neo4j" |
delimiter |
Field delimiter for CSV import files | string | ";" |
array_delimiter |
Delimiter for array values | string | "\|" |
quote_character |
Character used for quoting string values | string | "'" |
multi_db |
Whether to use multi-database support | boolean | true |
skip_duplicate_nodes |
Whether to skip duplicate nodes during import | boolean | false |
skip_bad_relationships |
Whether to skip relationships with missing endpoints | boolean | false |
import_call_bin_prefix |
Prefix for the import command binary (optional) | string | - |
import_call_file_prefix |
Prefix for import files (optional) | string | - |
PostgreSQL Configuration
Parameter | Description | Type | Default |
---|---|---|---|
database_name |
Name of the PostgreSQL database | string | "postgres" |
host |
Host address for PostgreSQL server | string | "localhost" |
port |
Port for PostgreSQL server | integer | 5432 |
user |
Username for PostgreSQL authentication | string | "postgres" |
password |
Password for PostgreSQL authentication | string | "postgres" |
quote_character |
Character used for quoting identifiers | string | "\"" |
delimiter |
Field delimiter for import files | string | "\t" |
import_call_bin_prefix |
Path to psql (optional) | string | - |
import_call_file_prefix |
Prefix for import files (optional) | string | - |
SQLite Configuration
Parameter | Description | Type | Default |
---|---|---|---|
database_name |
Name of the SQLite database file | string | "sqlite.db" |
quote_character |
Character used for quoting identifiers | string | "\"" |
delimiter |
Field delimiter for import files | string | "\t" |
import_call_bin_prefix |
Path to sqlite3 (optional) | string | - |
import_call_file_prefix |
Prefix for import files (optional) | string | - |
RDF Configuration
Parameter | Description | Type | Default |
---|---|---|---|
rdf_format |
Format for RDF output | string | "turtle" |
NetworkX Configuration
Parameter | Description | Type | Default |
---|---|---|---|
some_config |
Placeholder configuration | string | "some_value" |
CSV Configuration
Parameter | Description | Type | Default |
---|---|---|---|
delimiter |
Field delimiter for CSV files | string | "," |