Submodule documentation

_core.py: The main BioCypher interface

BioCypher core module. Interfaces with the user and distributes tasks to submodules.

class biocypher._core.BioCypher(dbms: Optional[str] = None, offline: Optional[bool] = None, strict_mode: Optional[bool] = None, biocypher_config_path: Optional[str] = None, schema_config_path: Optional[str] = None, head_ontology: Optional[dict] = None, tail_ontologies: Optional[dict] = None, output_directory: Optional[str] = None, db_name: Optional[str] = None)

Orchestration of BioCypher operations. Instantiate this class to interact with BioCypher.

Parameters:
  • dbms (str) – The database management system to use. For supported systems see SUPPORTED_DBMS.

  • offline (bool) – Whether to run in offline mode. If True, no connection to the database will be made.

  • strict_mode (bool) – Whether to run in strict mode. If True, the translator will raise an error if a node or edge does not provide source, version, and licence information.

  • biocypher_config_path (str) – Path to the BioCypher config file.

  • schema_config_path (str) – Path to the user schema config file.

  • head_ontology (dict) – The head ontology defined by URL (‘url’) and root node (‘root_node’).

  • tail_ontologies (dict) – The tail ontologies defined by URL and join nodes for both head and tail ontology.

  • output_directory (str) – Path to the output directory. If not provided, the default value ‘biocypher-out’ will be used.

add(entities)

Function to add entities to the in-memory database. Accepts an iterable of tuples (if given, translates to BioCypherNode or BioCypherEdge objects) or an iterable of BioCypherNode or BioCypherEdge objects.

log_duplicates() None

Get the set of duplicate nodes and edges encountered and print them to the logger.

log_missing_input_labels() Optional[Dict[str, List[str]]]

Get the set of input labels encountered without an entry in the schema_config.yaml and print them to the logger.

Returns:

A dictionary of Biolink types encountered without an entry in the schema_config.yaml file.

Return type:

Optional[Dict[str, List[str]]]

merge_edges(edges) bool

Merge edges into database. Either takes an iterable of tuples (if given, translates to BioCypherEdge objects) or an iterable of BioCypherEdge objects.

Parameters:

edges (iterable) – An iterable of edges to merge into the database.

Returns:

True if successful.

Return type:

bool

merge_nodes(nodes) bool

Merge nodes into database. Either takes an iterable of tuples (if given, translates to BioCypherNode objects) or an iterable of BioCypherNode objects.

Parameters:

nodes (iterable) – An iterable of nodes to merge into the database.

Returns:

True if successful.

Return type:

bool

reverse_translate_query(query: str) str

Reverse translate a query from its BioCypher equivalent.

Parameters:

query (str) – The BioCypher query to reverse translate.

Returns:

The original query.

Return type:

str

reverse_translate_term(term: str) str

Reverse translate a term from its BioCypher equivalent.

Parameters:

term (str) – The BioCypher term to reverse translate.

Returns:

The original term.

Return type:

str

show_ontology_structure(**kwargs) None

Show the ontology structure using treelib or write to GRAPHML file.

Parameters:
  • to_disk (str) – If specified, the ontology structure will be saved to disk as a GRAPHML file, to be opened in your favourite graph visualisation tool.

  • full (bool) – If True, the full ontology structure will be shown, including all nodes and edges. If False, only the nodes and edges that are relevant to the extended schema will be shown.

summary() None

Wrapper for showing ontology structure and logging duplicates and missing input types.

to_df() List[DataFrame]

Convert entities to a pandas DataFrame for each entity type and return a list.

Parameters:

entities (iterable) – An iterable of entities to convert to a DataFrame.

Returns:

A pandas DataFrame.

Return type:

pd.DataFrame

translate_query(query: str) str

Translate a query to its BioCypher equivalent.

Parameters:

query (str) – The query to translate.

Returns:

The BioCypher equivalent of the query.

Return type:

str

translate_term(term: str) str

Translate a term to its BioCypher equivalent.

Parameters:

term (str) – The term to translate.

Returns:

The BioCypher equivalent of the term.

Return type:

str

write_edges(edges, batch_size: int = 1000000) bool

Write edges to database. Either takes an iterable of tuples (if given, translates to BioCypherEdge objects) or an iterable of BioCypherEdge objects.

Parameters:

edges (iterable) – An iterable of edges to write to the database.

Returns:

True if successful.

Return type:

bool

write_import_call() None

Write a shell script to import the database depending on the chosen DBMS.

write_nodes(nodes, batch_size: int = 1000000) bool

Write nodes to database. Either takes an iterable of tuples (if given, translates to BioCypherNode objects) or an iterable of BioCypherNode objects.

Parameters:

nodes (iterable) – An iterable of nodes to write to the database.

Returns:

True if successful.

Return type:

bool

_write.py: Write the Graph to various formats for batch import

BioCypher ‘offline’ module. Handles the writing of node and edge representations suitable for import into a DBMS.

biocypher._write.get_writer(dbms: str, translator: Translator, ontology: Ontology, deduplicator: Deduplicator, output_directory: str, strict_mode: bool)

Function to return the writer class based on the selection in the config file.

Parameters:
  • dbms – the database management system; for options, see DBMS_TO_CLASS.

  • translator – the Translator object.

  • ontology – the Ontology object.

  • output_directory – the directory to write the output files to.

  • strict_mode – whether to use strict mode.

Returns:

an instance of the selected writer class.

Return type:

instance

_connect.py: On-line functionality for interaction with a DBMS

BioCypher ‘online’ mode. Handles connection and manipulation of a running DBMS.

_mapping.py: Mapping of data inputs to KG ontology

BioCypher ‘mapping’ module. Handles the mapping of user-defined schema to the underlying ontology.

class biocypher._mapping.OntologyMapping(config_file: Optional[str] = None)

Class to store the ontology mapping and extensions.

_ontology.py: Ontology ingestion, parsing, and manipulation

BioCypher ‘ontology’ module. Contains classes and functions to handle parsing and representation of single ontologies as well as their hybridisation and other advanced operations.

class biocypher._ontology.Ontology(head_ontology: dict, ontology_mapping: OntologyMapping, tail_ontologies: Optional[dict] = None)

A class that represents the ontological “backbone” of a BioCypher knowledge graph. The ontology can be built from a single resource, or hybridised from a combination of resources, with one resource being the “head” ontology, while an arbitrary number of other resources can become “tail” ontologies at arbitrary fusion points inside the “head” ontology.

get_ancestors(node_label: str) list

Get the ancestors of a node in the ontology.

Parameters:

node_label (str) – The label of the node in the ontology.

Returns:

A list of the ancestors of the node.

Return type:

list

get_dict() dict

Returns a dictionary compatible with a BioCypher node for compatibility with the Neo4j driver.

show_ontology_structure(to_disk: Optional[str] = None, full: bool = False)

Show the ontology structure using treelib or write to GRAPHML file.

Parameters:
  • to_disk (str) – If specified, the ontology structure will be saved to disk as a GRAPHML file, to be opened in your favourite graph visualisation tool.

  • full (bool) – If True, the full ontology structure will be shown, including all nodes and edges. If False, only the nodes and edges that are relevant to the extended schema will be shown.

class biocypher._ontology.OntologyAdapter(ontology_file: str, root_label: str, head_join_node: Optional[str] = None, merge_nodes: Optional[bool] = True, reverse_labels: bool = True, remove_prefixes: bool = True)

Class that represents an ontology to be used in the Biocypher framework. Can read from a variety of formats, including OWL, OBO, and RDF/XML. The ontology is represented by a networkx.DiGraph object; an RDFlib graph is also kept. By default, the DiGraph reverses the label and identifier of the nodes, such that the node name in the graph is the human-readable label. The edges are oriented from child to parent. Going from the Biolink example, labels are formatted in lower sentence case. In some cases, this means that we replace underscores with spaces.

get_ancestors(node_label)

Get the ancestors of a node in the ontology.

get_head_join_node()

Get the head join node of the ontology.

get_nx_graph()

Get the networkx graph representing the ontology.

get_rdf_graph()

Get the RDFlib graph representing the ontology.

get_root_label()

Get the label of the root node in the ontology.

_create.py: Base classes for node and edge representations in BioCypher

BioCypher ‘create’ module. Handles the creation of BioCypher node and edge dataclasses.

class biocypher._create.BioCypherEdge(source_id: str, target_id: str, relationship_label: str, relationship_id: ~typing.Optional[str] = None, properties: dict = <factory>)

Handoff class to represent biomedical relationships in Neo4j.

Has source and target ids, label, property dict; ids and label (in the Neo4j sense of a label, ie, the entity descriptor after the colon, such as “:TARGETS”) are non-optional and called source_id, target_id, and relationship_label to avoid confusion with properties called “label”, which usually denotes the human-readable form. Relationship labels are written in UPPERCASE and as verbs, as per Neo4j consensus.

Parameters:
  • source_id (string) – consensus “best” id for biological entity

  • target_id (string) – consensus “best” id for biological entity

  • relationship_label (string) – type of interaction, UPPERCASE

  • properties (dict) – collection of all other properties of the

  • edge (respective) –

get_dict() dict

Return dict of ids, label, and properties.

Returns:

source_id, target_id and relationship_label as

top-level key-value pairs, properties as second-level dict.

Return type:

dict

get_id() Optional[str]

Returns primary node identifier or None.

Returns:

node_id

Return type:

str

get_label() str

Returns relationship label.

Returns:

relationship_label

Return type:

str

get_properties() dict

Returns all other relationship properties apart from primary ids and label as key-value pairs.

Returns:

properties

Return type:

dict

get_source_id() str

Returns primary node identifier of relationship source.

Returns:

source_id

Return type:

str

get_target_id() str

Returns primary node identifier of relationship target.

Returns:

target_id

Return type:

str

get_type() str

Returns relationship label.

Returns:

relationship_label

Return type:

str

class biocypher._create.BioCypherNode(node_id: str, node_label: str, preferred_id: str = 'id', properties: dict = <factory>)

Handoff class to represent biomedical entities as Neo4j nodes.

Has id, label, property dict; id and label (in the Neo4j sense of a label, ie, the entity descriptor after the colon, such as “:Protein”) are non-optional and called node_id and node_label to avoid confusion with “label” properties. Node labels are written in PascalCase and as nouns, as per Neo4j consensus.

Parameters:
  • node_id (string) – consensus “best” id for biological entity

  • node_label (string) – primary type of entity, capitalised

  • **properties (kwargs) – collection of all other properties to be passed to neo4j for the respective node (dict)

get_dict() dict

Return dict of id, labels, and properties.

Returns:

node_id and node_label as top-level key-value pairs, properties as second-level dict.

Return type:

dict

get_id() str

Returns primary node identifier.

Returns:

node_id

Return type:

str

get_label() str

Returns primary node label.

Returns:

node_label

Return type:

str

get_preferred_id() str

Returns preferred id.

Returns:

preferred_id

Return type:

str

get_properties() dict

Returns all other node properties apart from primary id and label as key-value pairs.

Returns:

properties

Return type:

dict

get_type() str

Returns primary node label.

Returns:

node_label

Return type:

str

class biocypher._create.BioCypherRelAsNode(node: BioCypherNode, source_edge: BioCypherEdge, target_edge: BioCypherEdge)

Class to represent relationships as nodes (with in- and outgoing edges) as a triplet of a BioCypherNode and two BioCypherEdges. Main usage in type checking (instances where the receiving function needs to check whether it receives a relationship as a single edge or as a triplet).

Parameters:
  • node (BioCypherNode) – node representing the relationship

  • source_edge (BioCypherEdge) – edge representing the source of the relationship

  • target_edge (BioCypherEdge) – edge representing the target of the relationship

_translate.py: Translation functionality for implemented types of representation

BioCypher ‘translation’ module. Responsible for translating between the raw input data and the BioCypherNode and BioCypherEdge objects.

class biocypher._translate.Translator(ontology_mapping: OntologyMapping, strict_mode: bool = False)

Class responsible for exacting the translation process that is configured in the schema_config.yaml file. Creates a mapping dictionary from that file, and, given nodes and edges, translates them into BioCypherNodes and BioCypherEdges. During this process, can also filter the properties of the entities if the schema_config.yaml file specifies a property whitelist or blacklist.

Provides utility functions for translating between input and output labels and cypher queries.

Returns a dictionary of types that were not represented in the schema_config.

static name_sentence_to_pascal(name: str) str

Converts a name in sentence case to pascal case.

reverse_translate(query)

Reverse translate a cypher query. Only translates labels as of now.

reverse_translate_term(term)

Reverse translate a single term.

translate(query)

Translate a cypher query. Only translates labels as of now.

translate_edges(id_src_tar_type_prop_tuples: Iterable) Generator[Union[BioCypherEdge, BioCypherRelAsNode], None, None]

Translates input edge representation to a representation that conforms to the schema of the given BioCypher graph. For now requires explicit statement of edge type on pass.

Parameters:

id_src_tar_type_prop_tuples (list of tuples) – collection of tuples representing source and target of an interaction via their unique ids as well as the type of interaction in the original database notation, which is translated to BioCypher notation using the leaves. Can optionally possess its own ID.

translate_nodes(id_type_prop_tuples: Iterable) Generator[BioCypherNode, None, None]

Translates input node representation to a representation that conforms to the schema of the given BioCypher graph. For now requires explicit statement of node type on pass.

Parameters:

id_type_tuples (list of tuples) – collection of tuples representing individual nodes by their unique id and a type that is translated from the original database notation to the corresponding BioCypher notation.

translate_term(term)

Translate a single term.

_logger.py: Logging

Configuration of the module logger.

biocypher._logger.get_logger(name: str = 'biocypher') Logger

Access the module logger, create a new one if does not exist yet.

Method providing central logger instance to main module. Is called only from main submodule, biocypher.driver. In child modules, the standard Python logging facility is called (using logging.getLogger(__name__)), automatically inheriting the handlers from the central logger.

The file handler creates a log file named after the current date and time. Levels to output to file and console can be set here.

Parameters:

name – Name of the logger instance.

Returns:

An instance of the Python logging.Logger.

biocypher._logger.log()

Browse the log file.

biocypher._logger.logfile() str

Path to the log file.

_misc.py: Miscellaneous utility functions

Handy functions for use in various places.

biocypher._misc.ensure_iterable(value: Any) Iterable

Returns iterables, except strings, wraps simple types into tuple.

biocypher._misc.to_list(value: Any) list

Ensures that value is a list.