Web Ontology Language (OWL)
The Web Ontology Language (OWL) is a (family of) knowledge representation language(s) for authoring ontologies. BioCypher can use taxonomies written in OWL as an input, and it can also output a knowledge graph in an OWL file.
OWL is one of the most common knowledge representation languages. It is built on the Resource Description Framework (RDF) and is partly compatible with the RDF Schema data model. It can be serialized in several formats (the most well-known being XML and Turtle). The Protégé software is the de facto standard graphical user interface to design OWL ontologies.
In BioCypher, selecting the owl
output format will call the _OWLWriter
class
and generate a self-contained OWL file. The file is said to be "self-contained"
because it holds both the vocabulary (i.e. a part of the hierarchy of classes
from the input ontology) and the instances (i.e. "nodes", for BioCypher).
Edge Model
The behavior of edge creation in the RDF output relies mainly on the
edge_model
parameter, which can take two values: "ObjectProperty" or
"Association".
Note on vocabulary
To understand the following rationale, note that OWL uses a different vocabulary than BioCypher (which is more aligned with labelled property graphs); a rough translation is:
BioCypher | OWL |
---|---|
node | individual |
edge | object property |
label | class |
property | annotation / data property |
ID | label / IRI |
Note
There is a particular danger of ambiguity in the term "label"; in labelled property graphs, this is the type or class of entity, while in OWL it refers to the identifier of a single entity.
In a nutshell
When using edge_model: ObjectProperty
, the resulting ontology will follow
more closely the spirit of the OWL modelling approach; but the ID and the
properties attached to the edges are lost.
Example:
graph LR
A["My_source<br/><i>my_prop: this</i>"] -->|toward| B["My_target"]
When using edge_model: Association
, the edges are created as OWL individuals,
with attached annotations, and an IRI; however, this introduces two object
properties around a new individual, between pairs of individuals. This is very
similar to the "reification" that BioCypher does for relationships that are set
to represented_as: node
in the schema configuration.
Example:
graph LR
A["My_source<br/><i>my_prop: this</i>"] -->|edge_source| B["My_edge<br/><i>my_edge_prop: that</i>"] -->|edge_target| C["My_target"]
ObjectProperty
This edge model translates BioCypher's edges into OWL's "object properties" (if they are available under the selected root term). Object properties are the natural way to model edges in OWL, but they do not support annotation, thus being incompatible with having BioCypher's "properties" on edges.
Example
For instance, the following BioCypher tuples (two nodes and one edge):
# Nodes:
# ID label properties
("My_source", "thisNodeType", {"my_prop":"this"}),
("My_target", "thatNodeType", {}),
# Edge:
# ID source ID target ID properties label
("My_edge", "My_source", "My_target", {"my_edge_prop":"that"}, "toward")
# Declaration of types:
:toward a owl:ObjectProperty ;
rdfs:range :thisNodeType ;
rdfs:domain :thatNodetype ;
rdfs:subPropertyOf owl:topObjectProperty ;
# Actual data:
:My_source a :thisNodeType, owl:NamedIndividual ;
biocypher:my_prop "this" ;
:toward :My_target
:My_target a :thatNodeType, owl:NamedIndividual ;
Root node and ObjectProperty
As most OWL files do not model a common term on top of both
owl:topObjectProperty
and owl:Thing
, you may need to ensure
that the input OWL contains a "meta-root", that is, a
common ancestor honoring both:
- owl:Thing rdfs:subClassOf
- owl:topObjectProperty rdfs:subPropertyOf
It is this meta-root that you should select as a root_node
in your BioCypher
configuration.
For example, a classical OWL taxonomy is often structured like:
To allow BioCypher to "see" both the owl:Thing
and owl:topObjectProperty
subtrees, you need to add your own root node:
- my_meta_root
├ owl:Thing
│ ├ Entity
│ ├ My_class
│ └ etc.
└ owl:topObjectProperty
├ My_link_type
└ etc.
root_node: my_meta_root
in BioCypher's configuration.
Association
This edge model (the default) translates BioCypher's edges into OWL's class instances. Those edge instances are inserted in between the instances coming from BioCypher's nodes. This allows to retain edge properties, but adds OWL instances to model relationships, which does not follow the classical OWL model.
In this approach, all OWL instances are linked with a generic "edge_source" (linking source instance to the association instance) and "edge_target" (linking the association instance to the target instance). Both inherit from "edge" and are in the biocypher namespace.
Example
For instance, the following BioCypher tuples (two nodes and one edge):
# Nodes:
# ID label properties
("My_source", "thisNodeType", {"my_prop":"this"}),
("My_target", "thatNodeType", {}),
# Edges:
# ID source ID target ID properties label
("My_edge", "My_source", "My_target", {"my_edge_prop":"that"}, "toward")
# Declaration of BioCypher's generic edge types:
biocypher:edge a owl:ObjectProperty ;
rdfs:subPropertyOf owl:topObjectProperty ;
biocypher:edge_source a biocypher:edge ;
biocypher:edge_target a biocypher:edge ;
# The edge type becomes an OWL class:
:toward a owl:Class ;
# Actual data:
:My_source a :thisNodeType, owl:NamedIndividual ;
biocypher:my_prop "this" ;
:My_target a :thatNodeType, owl:NamedIndividual ;
# An edge is an OWL individual, with properties:
:My_edge a :toward, owl:NamedIndividual ;
biocypher:edge_source :My_source ;
biocypher:edge_target :My_target ;
biocypher:my_edge_prop "that" ;
Root node and Association
If you use this edge model, you may select one of the subclasses of
owl:Thing as a root_node
, and not select any part of the object property tree.
For instance, if you have a taxonomy with a common root node, the "Association" edge model only requires that you select a subclass of owl:Thing, and you do not need to select the meta root:
- my_meta_root
├ owl:Thing <= it is only necessary to use `root_node: Thing`
│ ├ Entity
│ ├ My_class
│ └ etc.
└ owl:topObjectProperty <= This subtree will not be used.
├ My_link_type
└ etc.
Taxonomy Management
This class takes care of keeping the vocabulary underneath the selected root node and exports it along the instances in the resulting OWL file. It discards all terms that are not in the tree below the selected root node.
The configuration parameter rdf_namespaces
can be used to specify which
namespaces exist in the input ontology (or the data). If the data contain IDs
with a given prefix, they will be converted into valid Internationalized
Resource Identifiers (IRI) to allow referencing. If no namespace is specified,
BioCypher will search for them in the input ontology.
Settings
Important parameters are:
root_node
, which must be a meta-root on top of both owl:Thing and owl:topObjectProperty.edge_model
heavily impacts the output ontology, most notably the graph structure, and thus the queries that can be made on it (see above).file_stem
is the name of the output file (without the extension or the path) which will be written in the output directory.file_format
is the output serialization format. Note that if set to "turtle", the output file extension will be ".ttl".
For the ObjectProperty edge model
:caption: biocypher_config.yaml
biocypher:
strict_mode: true
schema_config_path: config/schema_config.yaml
dbms: owl # <- Use the OWL output writer.
head_ontology:
url: file:///home/superb/owl_file.ttl
root_node: BioCypherRoot # <- The "meta-root" class.
owl:
file_format: turtle
# Can be either: xml, n3, turtle or ttl, nt, pretty-xml, trix, trig, nquads, json-ld
edge_model: ObjectProperty
# Can also be: Association (the default)
file_stem: my_ontology # "biocypher" by default, do not put an extension
# Optional:
rdf_namespaces:
so: http://purl.obolibrary.org/obo/SO_
efo: http://www.ebi.ac.uk/efo/EFO_
For the Association edge model
:caption: biocypher_config.yaml
biocypher:
strict_mode: true
schema_config_path: config/schema_config.yaml
dbms: owl # <- Use the OWL output writer.
head_ontology:
url: file:///home/superb/owl_file.ttl
root_node: Entity # <- NOT the meta-root!
owl:
file_format: turtle
# Can be either: xml, n3, turtle or ttl, nt, pretty-xml, trix, trig, nquads, json-ld
edge_model: Association
file_stem: my_ontology # "biocypher" by default, do not put an extension
# Optional:
rdf_namespaces:
so: http://purl.obolibrary.org/obo/SO_
efo: http://www.ebi.ac.uk/efo/EFO_
Possible Issues
BioCypher is not able to read all OWL ontologies, and not all of the terms hosted in an OWL ontology. Most notably, it only reads (a part of) the taxonomy to build up its input. Some logical predicates may also be incompatible with the selected edge model (especially "Association").
Note that Protégé may show a couple of impediments:
- It displays owl:Entity as if it inherits from owl:Thing, but that is not necessarily actually implemented by a predicate. You may have to add it manually.
- It displays all owl:ObjectProperty as if they inherit from owl:topObjectProperty, but you may also have to add the predicate manually.
- It provides no easy way to add a meta-root on top of both classes, and a manually added one will appear as a subclass of both owl:Thing and owl:topObjectProperty.
Double-checking the ontology file source code itself should help you ensure compatibility with BioCypher's constraints.
Also, note that BioCypher requires that classes (and object properties) have an RDFS label, and will use it (and not the IRI) to find the necessary types.