Submodule documentation
_core.py
: The main BioCypher interface
BioCypher core module. Interfaces with the user and distributes tasks to submodules.
- class biocypher._core.BioCypher(dbms: Optional[str] = None, offline: Optional[bool] = None, strict_mode: Optional[bool] = None, biocypher_config_path: Optional[str] = None, schema_config_path: Optional[str] = None, head_ontology: Optional[dict] = None, tail_ontologies: Optional[dict] = None, output_directory: Optional[str] = None, db_name: Optional[str] = None)
Orchestration of BioCypher operations. Instantiate this class to interact with BioCypher.
- Parameters:
dbms (str) – The database management system to use. For supported systems see SUPPORTED_DBMS.
offline (bool) – Whether to run in offline mode. If True, no connection to the database will be made.
strict_mode (bool) – Whether to run in strict mode. If True, the translator will raise an error if a node or edge does not provide source, version, and licence information.
biocypher_config_path (str) – Path to the BioCypher config file.
schema_config_path (str) – Path to the user schema config file.
head_ontology (dict) – The head ontology defined by URL (‘url’) and root node (‘root_node’).
tail_ontologies (dict) – The tail ontologies defined by URL and join nodes for both head and tail ontology.
output_directory (str) – Path to the output directory. If not provided, the default value ‘biocypher-out’ will be used.
- add(entities)
Function to add entities to the in-memory database. Accepts an iterable of tuples (if given, translates to
BioCypherNode
orBioCypherEdge
objects) or an iterable ofBioCypherNode
orBioCypherEdge
objects.
- log_duplicates() None
Get the set of duplicate nodes and edges encountered and print them to the logger.
- log_missing_input_labels() Optional[Dict[str, List[str]]]
Get the set of input labels encountered without an entry in the schema_config.yaml and print them to the logger.
- Returns:
A dictionary of Biolink types encountered without an entry in the schema_config.yaml file.
- Return type:
Optional[Dict[str, List[str]]]
- merge_edges(edges) bool
Merge edges into database. Either takes an iterable of tuples (if given, translates to
BioCypherEdge
objects) or an iterable ofBioCypherEdge
objects.- Parameters:
edges (iterable) – An iterable of edges to merge into the database.
- Returns:
True if successful.
- Return type:
bool
- merge_nodes(nodes) bool
Merge nodes into database. Either takes an iterable of tuples (if given, translates to
BioCypherNode
objects) or an iterable ofBioCypherNode
objects.- Parameters:
nodes (iterable) – An iterable of nodes to merge into the database.
- Returns:
True if successful.
- Return type:
bool
- reverse_translate_query(query: str) str
Reverse translate a query from its BioCypher equivalent.
- Parameters:
query (str) – The BioCypher query to reverse translate.
- Returns:
The original query.
- Return type:
str
- reverse_translate_term(term: str) str
Reverse translate a term from its BioCypher equivalent.
- Parameters:
term (str) – The BioCypher term to reverse translate.
- Returns:
The original term.
- Return type:
str
- show_ontology_structure(**kwargs) None
Show the ontology structure using treelib or write to GRAPHML file.
- Parameters:
to_disk (str) – If specified, the ontology structure will be saved to disk as a GRAPHML file, to be opened in your favourite graph visualisation tool.
full (bool) – If True, the full ontology structure will be shown, including all nodes and edges. If False, only the nodes and edges that are relevant to the extended schema will be shown.
- summary() None
Wrapper for showing ontology structure and logging duplicates and missing input types.
- to_df() List[DataFrame]
Convert entities to a pandas DataFrame for each entity type and return a list.
- Parameters:
entities (iterable) – An iterable of entities to convert to a DataFrame.
- Returns:
A pandas DataFrame.
- Return type:
pd.DataFrame
- translate_query(query: str) str
Translate a query to its BioCypher equivalent.
- Parameters:
query (str) – The query to translate.
- Returns:
The BioCypher equivalent of the query.
- Return type:
str
- translate_term(term: str) str
Translate a term to its BioCypher equivalent.
- Parameters:
term (str) – The term to translate.
- Returns:
The BioCypher equivalent of the term.
- Return type:
str
- write_edges(edges, batch_size: int = 1000000) bool
Write edges to database. Either takes an iterable of tuples (if given, translates to
BioCypherEdge
objects) or an iterable ofBioCypherEdge
objects.- Parameters:
edges (iterable) – An iterable of edges to write to the database.
- Returns:
True if successful.
- Return type:
bool
- write_import_call() None
Write a shell script to import the database depending on the chosen DBMS.
- write_nodes(nodes, batch_size: int = 1000000) bool
Write nodes to database. Either takes an iterable of tuples (if given, translates to
BioCypherNode
objects) or an iterable ofBioCypherNode
objects.- Parameters:
nodes (iterable) – An iterable of nodes to write to the database.
- Returns:
True if successful.
- Return type:
bool
_write.py
: Write the Graph to various formats for batch import
BioCypher ‘offline’ module. Handles the writing of node and edge representations suitable for import into a DBMS.
- biocypher._write.get_writer(dbms: str, translator: Translator, ontology: Ontology, deduplicator: Deduplicator, output_directory: str, strict_mode: bool)
Function to return the writer class based on the selection in the config file.
- Parameters:
dbms – the database management system; for options, see DBMS_TO_CLASS.
translator – the Translator object.
ontology – the Ontology object.
output_directory – the directory to write the output files to.
strict_mode – whether to use strict mode.
- Returns:
an instance of the selected writer class.
- Return type:
instance
_connect.py
: On-line functionality for interaction with a DBMS
BioCypher ‘online’ mode. Handles connection and manipulation of a running DBMS.
_mapping.py
: Mapping of data inputs to KG ontology
BioCypher ‘mapping’ module. Handles the mapping of user-defined schema to the underlying ontology.
- class biocypher._mapping.OntologyMapping(config_file: Optional[str] = None)
Class to store the ontology mapping and extensions.
_ontology.py
: Ontology ingestion, parsing, and manipulation
BioCypher ‘ontology’ module. Contains classes and functions to handle parsing and representation of single ontologies as well as their hybridisation and other advanced operations.
- class biocypher._ontology.Ontology(head_ontology: dict, ontology_mapping: OntologyMapping, tail_ontologies: Optional[dict] = None)
A class that represents the ontological “backbone” of a BioCypher knowledge graph. The ontology can be built from a single resource, or hybridised from a combination of resources, with one resource being the “head” ontology, while an arbitrary number of other resources can become “tail” ontologies at arbitrary fusion points inside the “head” ontology.
- get_ancestors(node_label: str) list
Get the ancestors of a node in the ontology.
- Parameters:
node_label (str) – The label of the node in the ontology.
- Returns:
A list of the ancestors of the node.
- Return type:
list
- get_dict() dict
Returns a dictionary compatible with a BioCypher node for compatibility with the Neo4j driver.
- show_ontology_structure(to_disk: Optional[str] = None, full: bool = False)
Show the ontology structure using treelib or write to GRAPHML file.
- Parameters:
to_disk (str) – If specified, the ontology structure will be saved to disk as a GRAPHML file, to be opened in your favourite graph visualisation tool.
full (bool) – If True, the full ontology structure will be shown, including all nodes and edges. If False, only the nodes and edges that are relevant to the extended schema will be shown.
- class biocypher._ontology.OntologyAdapter(ontology_file: str, root_label: str, head_join_node: Optional[str] = None, merge_nodes: Optional[bool] = True, reverse_labels: bool = True, remove_prefixes: bool = True)
Class that represents an ontology to be used in the Biocypher framework. Can read from a variety of formats, including OWL, OBO, and RDF/XML. The ontology is represented by a networkx.DiGraph object; an RDFlib graph is also kept. By default, the DiGraph reverses the label and identifier of the nodes, such that the node name in the graph is the human-readable label. The edges are oriented from child to parent. Going from the Biolink example, labels are formatted in lower sentence case. In some cases, this means that we replace underscores with spaces.
- get_ancestors(node_label)
Get the ancestors of a node in the ontology.
- get_head_join_node()
Get the head join node of the ontology.
- get_nx_graph()
Get the networkx graph representing the ontology.
- get_rdf_graph()
Get the RDFlib graph representing the ontology.
- get_root_label()
Get the label of the root node in the ontology.
_create.py
: Base classes for node and edge representations in BioCypher
BioCypher ‘create’ module. Handles the creation of BioCypher node and edge dataclasses.
- class biocypher._create.BioCypherEdge(source_id: str, target_id: str, relationship_label: str, relationship_id: ~typing.Optional[str] = None, properties: dict = <factory>)
Handoff class to represent biomedical relationships in Neo4j.
Has source and target ids, label, property dict; ids and label (in the Neo4j sense of a label, ie, the entity descriptor after the colon, such as “:TARGETS”) are non-optional and called source_id, target_id, and relationship_label to avoid confusion with properties called “label”, which usually denotes the human-readable form. Relationship labels are written in UPPERCASE and as verbs, as per Neo4j consensus.
- Parameters:
source_id (string) – consensus “best” id for biological entity
target_id (string) – consensus “best” id for biological entity
relationship_label (string) – type of interaction, UPPERCASE
properties (dict) – collection of all other properties of the
edge (respective) –
- get_dict() dict
Return dict of ids, label, and properties.
- Returns:
- source_id, target_id and relationship_label as
top-level key-value pairs, properties as second-level dict.
- Return type:
dict
- get_id() Optional[str]
Returns primary node identifier or None.
- Returns:
node_id
- Return type:
str
- get_label() str
Returns relationship label.
- Returns:
relationship_label
- Return type:
str
- get_properties() dict
Returns all other relationship properties apart from primary ids and label as key-value pairs.
- Returns:
properties
- Return type:
dict
- get_source_id() str
Returns primary node identifier of relationship source.
- Returns:
source_id
- Return type:
str
- get_target_id() str
Returns primary node identifier of relationship target.
- Returns:
target_id
- Return type:
str
- get_type() str
Returns relationship label.
- Returns:
relationship_label
- Return type:
str
- class biocypher._create.BioCypherNode(node_id: str, node_label: str, preferred_id: str = 'id', properties: dict = <factory>)
Handoff class to represent biomedical entities as Neo4j nodes.
Has id, label, property dict; id and label (in the Neo4j sense of a label, ie, the entity descriptor after the colon, such as “:Protein”) are non-optional and called node_id and node_label to avoid confusion with “label” properties. Node labels are written in PascalCase and as nouns, as per Neo4j consensus.
- Parameters:
node_id (string) – consensus “best” id for biological entity
node_label (string) – primary type of entity, capitalised
**properties (kwargs) – collection of all other properties to be passed to neo4j for the respective node (dict)
- get_dict() dict
Return dict of id, labels, and properties.
- Returns:
node_id and node_label as top-level key-value pairs, properties as second-level dict.
- Return type:
dict
- get_id() str
Returns primary node identifier.
- Returns:
node_id
- Return type:
str
- get_label() str
Returns primary node label.
- Returns:
node_label
- Return type:
str
- get_preferred_id() str
Returns preferred id.
- Returns:
preferred_id
- Return type:
str
- get_properties() dict
Returns all other node properties apart from primary id and label as key-value pairs.
- Returns:
properties
- Return type:
dict
- get_type() str
Returns primary node label.
- Returns:
node_label
- Return type:
str
- class biocypher._create.BioCypherRelAsNode(node: BioCypherNode, source_edge: BioCypherEdge, target_edge: BioCypherEdge)
Class to represent relationships as nodes (with in- and outgoing edges) as a triplet of a BioCypherNode and two BioCypherEdges. Main usage in type checking (instances where the receiving function needs to check whether it receives a relationship as a single edge or as a triplet).
- Parameters:
node (BioCypherNode) – node representing the relationship
source_edge (BioCypherEdge) – edge representing the source of the relationship
target_edge (BioCypherEdge) – edge representing the target of the relationship
_translate.py
: Translation functionality for implemented types of representation
BioCypher ‘translation’ module. Responsible for translating between the raw input data and the BioCypherNode and BioCypherEdge objects.
- class biocypher._translate.Translator(ontology_mapping: OntologyMapping, strict_mode: bool = False)
Class responsible for exacting the translation process that is configured in the schema_config.yaml file. Creates a mapping dictionary from that file, and, given nodes and edges, translates them into BioCypherNodes and BioCypherEdges. During this process, can also filter the properties of the entities if the schema_config.yaml file specifies a property whitelist or blacklist.
Provides utility functions for translating between input and output labels and cypher queries.
- get_missing_biolink_types() dict
Returns a dictionary of types that were not represented in the schema_config.
- static name_sentence_to_pascal(name: str) str
Converts a name in sentence case to pascal case.
- reverse_translate(query)
Reverse translate a cypher query. Only translates labels as of now.
- reverse_translate_term(term)
Reverse translate a single term.
- translate(query)
Translate a cypher query. Only translates labels as of now.
- translate_edges(id_src_tar_type_prop_tuples: Iterable) Generator[Union[BioCypherEdge, BioCypherRelAsNode], None, None]
Translates input edge representation to a representation that conforms to the schema of the given BioCypher graph. For now requires explicit statement of edge type on pass.
- Parameters:
id_src_tar_type_prop_tuples (list of tuples) – collection of tuples representing source and target of an interaction via their unique ids as well as the type of interaction in the original database notation, which is translated to BioCypher notation using the leaves. Can optionally possess its own ID.
- translate_nodes(id_type_prop_tuples: Iterable) Generator[BioCypherNode, None, None]
Translates input node representation to a representation that conforms to the schema of the given BioCypher graph. For now requires explicit statement of node type on pass.
- Parameters:
id_type_tuples (list of tuples) – collection of tuples representing individual nodes by their unique id and a type that is translated from the original database notation to the corresponding BioCypher notation.
- translate_term(term)
Translate a single term.
_logger.py
: Logging
Configuration of the module logger.
- biocypher._logger.get_logger(name: str = 'biocypher') Logger
Access the module logger, create a new one if does not exist yet.
Method providing central logger instance to main module. Is called only from main submodule,
biocypher.driver
. In child modules, the standard Python logging facility is called (usinglogging.getLogger(__name__)
), automatically inheriting the handlers from the central logger.The file handler creates a log file named after the current date and time. Levels to output to file and console can be set here.
- Parameters:
name – Name of the logger instance.
- Returns:
An instance of the Python
logging.Logger
.
- biocypher._logger.log()
Browse the log file.
- biocypher._logger.logfile() str
Path to the log file.
_misc.py
: Miscellaneous utility functions
Handy functions for use in various places.
- biocypher._misc.ensure_iterable(value: Any) Iterable
Returns iterables, except strings, wraps simple types into tuple.
- biocypher._misc.to_list(value: Any) list
Ensures that
value
is a list.