API Reference

The main BioCypher interface

Create a BioCypher instance by running:

from biocypher import BioCypher
bc = BioCypher()

Most of the settings should be configured by YAML files. See below for more information on the BioCypher class.

BioCypher([dbms, offline, strict_mode, ...])

Orchestration of BioCypher operations.

Database creation by file

Using the BioCypher instance, you can create a database by writing files by using the BioCypher.write_nodes() and BioCypher.write_edges() methods, which accept collections of nodes and edges either as tuples or as BioCypherNode and BioCypherEdge objects. For example:

# given lists of nodes and edges
bc.write_nodes(node_list)
bc.write_edges(edge_list)

Note

To facilitate the interaction with the various database management systems (DBMSs), BioCypher provides utility functions, such as writing a Neo4j admin import statement to be used for creating a Neo4j database (BioCypher.write_import_call()). The most commonly used utility functions are also available in the wrapper function BioCypher.summary(). See the BioCypher class for more information.

Details about the biocypher.write module responsible for these methods can be found below.

_write.get_writer(dbms, translator, ...)

Function to return the writer class based on the selection in the config file.

_batch_writer._BatchWriter(translator, ...)

graph._neo4j._Neo4jBatchWriter(*args, **kwargs)

Class for writing node and edge representations to disk using the format specified by Neo4j for the use of admin import.

graph._arangodb._ArangoDBBatchWriter(*args, ...)

Class for writing node and edge representations to disk using the format specified by ArangoDB for the use of "arangoimport".

relational._postgresql._PostgreSQLBatchWriter(...)

Class for writing node and edge representations to disk using the format specified by PostgreSQL for the use of "COPY FROM...".

relational._sqlite._SQLiteBatchWriter(*args, ...)

Class for writing node and edge representations to a SQLite database.

In-memory Pandas knowledge graph

BioCypher provides a wrapper around the pandas.DataFrame class to facilitate the creation of a knowledge graph in memory. This is useful for testing, small datasets, and for workflows that should remain purely in Python. Example usage:

from biocypher import BioCypher
bc = BioCypher()
# given lists of nodes and edges
bc.add(node_list)
bc.add(edge_list)
# show list of dataframes (one per node/edge type)
dfs = bc.to_df()

Details about the biocypher._pandas module responsible for these methods can be found below.

Pandas(translator, deduplicator)

Database creation and manipulation by Driver

BioCypher also provides a driver for each of the supported DBMSs. The driver can be used to create a database and to write nodes and edges to it, as well as allowing more subtle manipulation usually not encountered in creating a database from scratch as in the file-based workflow. This includes merging (creation of entities only if they don’t exist) and deletion. For example:

from biocypher import BioCypher
bc = BioCypher()
# given lists of nodes and edges
bc.merge_nodes(node_set_1)
bc.merge_edges(edge_set_1)
bc.merge_nodes(node_set_2)
bc.merge_edges(edge_set_2)

Details about the biocypher._connect module responsible for these methods can be found below.

get_driver(dbms, translator)

Function to return the writer class.

_Neo4jDriver(database_name, uri, user, ...)

Manages a BioCypher connection to a Neo4j database using the neo4j_utils.Driver class.

Download and cache functionality

BioCypher provides a download and cache functionality for resources. Resources are defined via the Resource class, which have a name, a (set of) URL(s), and a lifetime (in days, set to 0 for infinite). The Downloader can deal with single and lists of files, compressed files, and directories (which needs to be indicated using the is_dir parameter of the resource). It uses Pooch under the hood to handle the downloads. Example usage:

from biocypher import BioCypher, Resource
bc = BioCypher()

resource1 = Resource(
   name="file_list_resource",
   url_s=[
      "https://example.com/resource1.txt"
      "https://example.com/resource2.txt"
   ],
   lifetime=1
)
resource2 = Resource(
   name="zipped_resource",
   url_s="https://example.com/resource3.zip",
   lifetime=7
)
resource3 = Resource(
   name="directory_resource",
   url_s="https://example.com/resource4/",
   lifetime=7,
   is_dir=True,
)
resource_list = [resource1, resource2, resource3]
paths = bc.download(resource_list)

The files will be stored in the cache directory, in subfolders according to the names of the resources, and additionally determined by Pooch (e.g. extraction). All paths of downloaded files are returned by the download method. The Downloader class can also be used directly, without the BioCypher instance. You can set the cache directory in the configuration file; if not set, it will use the TemporaryDirectory.name() method from the tempfile module. More details about the Resource and Downloader classes can be found below.

Resource(name, url_s[, lifetime, is_dir])

Downloader([cache_dir])

Ontology ingestion, parsing, and manipulation

Ontology(head_ontology[, ontology_mapping, ...])

A class that represents the ontological "backbone" of a BioCypher knowledge graph.

OntologyAdapter(ontology_file, root_label[, ...])

Class that represents an ontology to be used in the Biocypher framework.

Mapping of data inputs to KG ontology

OntologyMapping([config_file])

Class to store the ontology mapping and extensions.

Base classes for node and edge representations in BioCypher

BioCypherNode(node_id, node_label, ...)

Handoff class to represent biomedical entities as Neo4j nodes.

BioCypherEdge(source_id, target_id, ...)

Handoff class to represent biomedical relationships in Neo4j.

BioCypherRelAsNode(node, source_edge, ...)

Class to represent relationships as nodes (with in- and outgoing edges) as a triplet of a BioCypherNode and two BioCypherEdges.

Translation functionality for implemented types of representation

Translator(ontology[, strict_mode])

Class responsible for exacting the translation process that is configured in the schema_config.yaml file.

Logging

get_logger([name])

Access the module logger, create a new one if does not exist yet.

Miscellaneous utility functions

to_list(value)

Ensures that value is a list.

ensure_iterable(value)

Returns iterables, except strings, wraps simple types into tuple.

create_tree_visualisation(inheritance_graph)

Creates a visualisation of the inheritance tree using treelib.

from_pascal(s[, sep])

pascalcase_to_sentencecase(s)

Convert PascalCase to sentence case.

snakecase_to_sentencecase(s)

Convert snake_case to sentence case.

sentencecase_to_snakecase(s)

Convert sentence case to snake_case.

sentencecase_to_pascalcase(s[, sep])

Convert sentence case to PascalCase.