{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example Notebook: BioCypher and Pandas"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "**Tip:** You can __[run the tutorial interactively in Google Colab](https://colab.research.google.com/github/biocypher/biocypher/blob/main/docs/notebooks/pandas_tutorial.ipynb)__.\n",
    "</div>\n",
    "\n",
    "## Introduction\n",
    "\n",
    "The main purpose of BioCypher is to facilitate the pre-processing of biomedical data, and thus save development time in the maintenance of curated knowledge graphs, while allowing simple and efficient creation of task-specific lightweight knowledge graphs in a user-friendly and biology-centric fashion.\n",
    "\n",
    "We are going to use a toy example to familiarise the user with the basic functionality of BioCypher. One central task of BioCypher is the harmonisation of dissimilar datasets describing the same entities. Thus, in this example, the input data - which in the real-world use case could come from any type of interface - are represented by simulated data containing some examples of differently formatted biomedical entities such as proteins and their interactions.\n",
    "\n",
    "There are two other versions of this tutorial, which only differ in the output format. The first uses a CSV output format to write files suitable for Neo4j admin import, and the second creates an in-memory collection of Pandas dataframes. You can find the former in the tutorial directory of the BioCypher repository. This tutorial simply takes the latter, in-memory approach to a Jupyter notebook."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While BioCypher was designed as a graph-focused framework, due to commonalities in bioinformatics workflows, BioCypher also supports Pandas DataFrames. This allows integration with methods that use tabular data, such as machine learning and statistical analysis, for instance in the [scVerse framework](https://scverse.org)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To run this tutorial interactively, you will first need to install perform some setup steps specific to running on Google Colab. You can collapse this section and run the setup steps with one click, as they are not required for the explanation of BioCyper's functionality. You can of course also run the steps one by one, if you want to see what is happening. The real tutorial starts with [section 1, \"Adding data\"](https://biocypher.org/notebooks/pandas_tutorial.html#Section-1:-Adding-data) (do not follow this link on colab, as you will be taken back to the website; please scroll down instead)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install biocypher"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tutorial files\n",
    "\n",
    "In the `biocypher` root directory, you will find a `tutorial` directory with\n",
    "the files for this tutorial. The `data_generator.py` file contains the\n",
    "simulated data generation code, and the other files, specifically the `.yaml` files, are named according to the\n",
    "tutorial step they are used in.\n",
    "\n",
    "Let's download these:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml\n",
    "import requests\n",
    "import subprocess\n",
    "\n",
    "schema_path = \"https://raw.githubusercontent.com/biocypher/biocypher/main/tutorial/\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!wget -O data_generator.py \"https://github.com/biocypher/biocypher/raw/main/tutorial/data_generator.py\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "owner = \"biocypher\"\n",
    "repo = \"biocypher\"\n",
    "path = \"tutorial\"  # The path within the repository (optional, leave empty for the root directory)\n",
    "github_url = \"https://api.github.com/repos/{owner}/{repo}/contents/{path}\"\n",
    "\n",
    "api_url = github_url.format(owner=owner, repo=repo, path=path)\n",
    "response = requests.get(api_url)\n",
    "\n",
    "# Get list of yaml files from the repo\n",
    "files = response.json()\n",
    "yamls = []\n",
    "for file in files:\n",
    "    if file[\"type\"] == \"file\":\n",
    "        if file[\"name\"].endswith(\".yaml\"):\n",
    "            yamls.append(file[\"name\"])\n",
    "           \n",
    "# wget all yaml files \n",
    "for yaml in yamls:\n",
    "    url_path = schema_path + yaml\n",
    "    subprocess.run([\"wget\", url_path])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's also define functions with which we can visualize those"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# helper function to print yaml files\n",
    "import yaml\n",
    "def print_yaml(file_path):\n",
    "    with open(file_path, 'r') as file:\n",
    "        yaml_data = yaml.safe_load(file)\n",
    "\n",
    "    print(\"--------------\")\n",
    "    print(yaml.dump(yaml_data, sort_keys=False, indent=4))\n",
    "    print(\"--------------\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuration\n",
    "BioCypher is configured using a YAML file; it comes with a default (which you\n",
    "can see in the\n",
    "[Configuration](https://biocypher.org/installation.html#configuration) section).\n",
    "You can use it, for instance, to select an output format, the output directory,\n",
    "separators, logging level, and other options. For this tutorial, we will use a\n",
    "dedicated configuration file for each of the steps. The configuration files are\n",
    "located in the `tutorial` directory, and are called using the\n",
    "`biocypher_config_path` argument at instantiation of the BioCypher interface.\n",
    "For more information, see also the [Quickstart\n",
    "Configuration](https://biocypher.org/quickstart.html#the-biocypher-configuration-yaml-file)\n",
    "section."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 1: Adding data\n",
    "\n",
    "### Input data stream (\"adapter\")\n",
    "The basic operation of adding data to the knowledge graph requires two\n",
    "components: an input stream of data (which we call adapter) and a configuration\n",
    "for the resulting desired output (the schema configuration). The former will be\n",
    "simulated by calling the `Protein` class of our data generator 10 times."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a list of proteins to be imported\n",
    "from data_generator import Protein\n",
    "n_proteins = 3\n",
    "proteins = [Protein() for _ in range(n_proteins)]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Each protein in our simulated data has a UniProt ID, a label\n",
    "(\"uniprot_protein\"), and a dictionary of properties describing it. This is -\n",
    "purely by coincidence - very close to the input BioCypher expects (for nodes):\n",
    "- a unique identifier\n",
    "- an input label (to allow mapping to the ontology, see the second step below)\n",
    "- a dictionary of further properties (which can be empty)\n",
    "\n",
    "These should be presented to BioCypher in the form of a tuple. To achieve this\n",
    "representation, we can use a generator function that iterates through our\n",
    "simulated input data and, for each entity, forms the corresponding tuple. The\n",
    "use of a generator allows for efficient streaming of larger datasets where\n",
    "required."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def node_generator(proteins):\n",
    "    for protein in proteins:\n",
    "        yield (\n",
    "            protein.get_id(),\n",
    "            protein.get_label(),\n",
    "            protein.get_properties(),\n",
    "        )\n",
    "entities = node_generator(proteins)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The concept of an adapter can become arbitrarily complex and involve\n",
    "programmatic access to databases, API requests, asynchronous queries, context\n",
    "managers, and other complicating factors. However, it always boils down to\n",
    "providing the BioCypher driver with a collection of tuples, one for each entity\n",
    "in the input data. For more info, see the section on\n",
    "[Adapters](../adapters.md).\n",
    "\n",
    "As descibed above, *nodes* possess:\n",
    "\n",
    "- a mandatory ID,\n",
    "- a mandatory label, and\n",
    "- a property dictionary,\n",
    "\n",
    "while *edges* possess:\n",
    "\n",
    "- an (optional) ID,\n",
    "- two mandatory IDs for source and target,\n",
    "- a mandatory label, and\n",
    "- a property dictionary.\n",
    "\n",
    "How these entities are mapped to the ontological hierarchy underlying a\n",
    "BioCypher graph is determined by their mandatory labels, which connect the input\n",
    "data stream to the schema configuration. This we will see in the following\n",
    "section."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Schema configuration\n",
    "How each BioCypher graph is structured is determined by the schema configuration\n",
    "YAML file that is given to the BioCypher interface. This also serves to ground\n",
    "the entities of the graph in the biomedical realm by using an ontological\n",
    "hierarchy. In this tutorial, we refer to the Biolink model as the general\n",
    "backbone of our ontological hierarchy. The basic premise of the schema\n",
    "configuration YAML file is that each component of the desired knowledge graph\n",
    "output should be configured here; if (and only if) an entity is represented in\n",
    "the schema configuration *and* is present in the input data stream, will it be\n",
    "part of our knowledge graph.\n",
    "\n",
    "In our case, since we only import proteins, we only require few lines of\n",
    "configuration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------\n",
      "protein:\n",
      "    represented_as: node\n",
      "    preferred_id: uniprot\n",
      "    input_label: uniprot_protein\n",
      "\n",
      "--------------\n"
     ]
    }
   ],
   "source": [
    "print_yaml('01_schema_config.yaml')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first line (`protein`) identifies our entity and connects to the ontological\n",
    "backbone; here we define the first class to be represented in the graph. In the\n",
    "configuration YAML, we represent entities — similar to the internal\n",
    "representation of Biolink — in lower sentence case (e.g., \"small molecule\").\n",
    "Conversely, for class names, in file names, and property graph labels, we use\n",
    "PascalCase instead (e.g., \"SmallMolecule\") to avoid issues with handling spaces.\n",
    "The transformation is done by BioCypher internally. BioCypher does not strictly\n",
    "enforce the entities allowed in this class definition; in fact, we provide\n",
    "[several methods of extending the existing ontological backbone *ad hoc* by\n",
    "providing custom inheritance or hybridising\n",
    "ontologies](https://biocypher.org/tutorial-ontology.html#model-extensions).\n",
    "However, every entity should at some point be connected to the underlying\n",
    "ontology, otherwise the multiple hierarchical labels will not be populated.\n",
    "Following this first line are three indented values of the protein class.\n",
    "\n",
    "The second line (`represented_as`) tells BioCypher in which way each entity\n",
    "should be represented in the graph; the only options are `node` and `edge`.\n",
    "Representation as an edge is only possible when source and target IDs are\n",
    "provided in the input data stream. Conversely, relationships can be represented\n",
    "as both `node` or `edge`, depending on the desired output. When a relationship\n",
    "should be represented as a node, i.e., \"reified\", BioCypher takes care to create\n",
    "a set of two edges and a node in place of the relationship. This is useful when\n",
    "we want to connect the relationship to other entities in the graph, for example\n",
    "literature references.\n",
    "\n",
    "The third line (`preferred_id`) informs the uniqueness of represented entities\n",
    "by selecting an ontological namespace around which the definition of uniqueness\n",
    "should revolve. In our example, if a protein has its own uniprot ID, it is\n",
    "understood to be a unique entity. When there are multiple protein isoforms\n",
    "carrying the same uniprot ID, they are understood to be aggregated to result in\n",
    "only one unique entity in the graph. Decisions around uniqueness of graph\n",
    "constituents sometimes require some consideration in task-specific\n",
    "applications. Selection of a namespace also has effects in identifier mapping;\n",
    "in our case, for protein nodes that do not carry a uniprot ID, identifier\n",
    "mapping will attempt to find a uniprot ID given the other identifiers of that\n",
    "node. To account for the broadest possible range of identifier systems while\n",
    "also dealing with parsing of namespace prefixes and validation, we refer to the\n",
    "[Bioregistry](https://bioregistry.io) project namespaces, which should be\n",
    "preferred values for this field.\n",
    "\n",
    "Finally, the fourth line (`input_label`) connects the input data stream to the\n",
    "configuration; here we indicate which label to expect in the input tuple for\n",
    "each class in the graph. In our case, we expect \"uniprot_protein\" as the label\n",
    "for each protein in the input data stream; all other input entities that do not\n",
    "carry this label are ignored as long as they are not in the schema\n",
    "configuration."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating the graph (using the BioCypher interface)\n",
    "All that remains to be done now is to instantiate the BioCypher interface (as the\n",
    "main means of communicating with BioCypher) and call the function to create the\n",
    "graph. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO -- Loading ontologies...\n",
      "INFO -- Instantiating OntologyAdapter class for https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl.\n"
     ]
    }
   ],
   "source": [
    "from biocypher import BioCypher\n",
    "bc = BioCypher(\n",
    "    biocypher_config_path='01_biocypher_config.yaml',\n",
    "    schema_config_path='01_schema_config.yaml',\n",
    ")\n",
    "# Add the entities that we generated above to the graph\n",
    "bc.add(entities)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'protein':   protein                                           sequence  \\\n",
       " 0  F7V4U2  RMFDDRFPVELRICTGSLVIINLGEFAEQHDKQDGSKPSHQPMFAT...   \n",
       " 1  K2Y8U3  HWPPSGVSCGVFPECWYRWRDEQWACFGPHIKYNKDNTWSWAQWMH...   \n",
       " 2  L1V6V9  QAEPKYKLAQENCRVQIKLPKIVGTCRPHWMTKTYHVLHTCVLWKS...   \n",
       " \n",
       "            description taxon      id preferred_id  \n",
       " 0  i f c m m q e o o s  9606  F7V4U2      uniprot  \n",
       " 1  e y p g j t j y r x  9606  K2Y8U3      uniprot  \n",
       " 2  a i b t l j e g n j  9606  L1V6V9      uniprot  }"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Print the graph as a dictionary of pandas DataFrame(s) per node label\n",
    "bc.to_df()[\"protein\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 2: Merging data\n",
    "\n",
    "### Plain merge\n",
    "\n",
    "Using the workflow described above with minor changes, we can merge data from\n",
    "different input streams. If we do not want to introduce additional ontological\n",
    "subcategories, we can simply add the new input stream to the existing one and\n",
    "add the new label to the schema configuration (the new label being\n",
    "`entrez_protein`). In this case, we would add the following to the schema\n",
    "configuration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from data_generator import Protein, EntrezProtein"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------\n",
      "protein:\n",
      "    represented_as: node\n",
      "    preferred_id: uniprot\n",
      "    input_label:\n",
      "    - uniprot_protein\n",
      "    - entrez_protein\n",
      "\n",
      "--------------\n"
     ]
    }
   ],
   "source": [
    "print_yaml('02_schema_config.yaml')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO -- Loading ontologies...\n",
      "INFO -- Instantiating OntologyAdapter class for https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl.\n"
     ]
    }
   ],
   "source": [
    "# Create a list of proteins to be imported\n",
    "proteins = [\n",
    "    p for sublist in zip(\n",
    "        [Protein() for _ in range(n_proteins)],\n",
    "        [EntrezProtein() for _ in range(n_proteins)],\n",
    "    ) for p in sublist\n",
    "]\n",
    "# Create a new BioCypher instance\n",
    "bc = BioCypher(\n",
    "    biocypher_config_path='02_biocypher_config.yaml',\n",
    "    schema_config_path='02_schema_config.yaml',\n",
    ")\n",
    "# Run the import\n",
    "bc.add(node_generator(proteins))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'protein':   protein                                           sequence  \\\n",
       " 0  K2W3K5  TVKISILFNPLPNQDMNTTTCQAESNYKAIYLYPWCSMDDVWNVEA...   \n",
       " 1  186009  FHYHGGMGPFMTYQNFLHWEQMQPMKLFNEPMQFHDWYGTHVNWPG...   \n",
       " 2  S6E6D1  CSVQIQIGMSQDSPDSSEGNMDCPPRNIGGYEIVCNVQGKRCYSTD...   \n",
       " 3  926766  HKEAELLVKGQIQTPKCLRHNHFYAKLTIVIELNYMVDRYGKDMAR...   \n",
       " 4  Z1F6R2  FMVWKDCLCIRMRHMAVPVPQYHCEYFEVILERWEVPCFSVLNRCK...   \n",
       " 5  362641  PISDEQEMGSEFCGHCNTGVYQVEMHFFECEDLNPKVQPKWIFTVT...   \n",
       " \n",
       "            description taxon      id preferred_id  \n",
       " 0  e e v h x f t f j l  9606  K2W3K5      uniprot  \n",
       " 1  b c q m l d a u u g  9606  186009      uniprot  \n",
       " 2  i z t s l x v g j l  9606  S6E6D1      uniprot  \n",
       " 3  t n a j d l j a t a  9606  926766      uniprot  \n",
       " 4  h d m k q n r e h r  9606  Z1F6R2      uniprot  \n",
       " 5  l m x k h m v g p y  9606  362641      uniprot  }"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bc.to_df()[\"protein\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This again creates a single DataFrame, now for both protein types, but now including\n",
    "both input streams (you should note both uniprot & entrez style IDs in the id column). However, we are generating our `entrez`\n",
    "proteins as having entrez IDs, which could result in problems in querying.\n",
    "Additionally, a strict import mode including regex pattern matching of\n",
    "identifiers will fail at this point due to the difference in pattern of UniProt\n",
    "vs. Entrez IDs. This issue could be resolved by mapping the Entrez IDs to\n",
    "UniProt IDs, but we will instead use the opportunity to demonstrate how to\n",
    "merge data from different sources into the same ontological class using *ad\n",
    "hoc* subclasses."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### *Ad hoc* subclassing"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous section, we saw how to merge data from different sources into\n",
    "the same ontological class. However, we did not resolve the issue of the\n",
    "`entrez` proteins living in a different namespace than the `uniprot` proteins,\n",
    "which could result in problems in querying. In proteins, it would probably be\n",
    "more appropriate to solve this problem using identifier mapping, but in other\n",
    "categories, e.g., pathways, this may not be possible because of a lack of\n",
    "one-to-one mapping between different data sources. Thus, if we so desire, we\n",
    "can merge datasets into the same ontological class by creating *ad hoc*\n",
    "subclasses implicitly through BioCypher, by providing multiple preferred\n",
    "identifiers. In our case, we update our schema configuration as follows:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------\n",
      "protein:\n",
      "    represented_as: node\n",
      "    preferred_id:\n",
      "    - uniprot\n",
      "    - entrez\n",
      "    input_label:\n",
      "    - uniprot_protein\n",
      "    - entrez_protein\n",
      "\n",
      "--------------\n"
     ]
    }
   ],
   "source": [
    "print_yaml('03_schema_config.yaml')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will \"implicitly\" create two subclasses of the `protein` class, which will\n",
    "inherit the entire hierarchy of the `protein` class. The two subclasses will be\n",
    "named using a combination of their preferred namespace and the name of the\n",
    "parent class, separated by a dot, i.e., `uniprot.protein` and `entrez.protein`.\n",
    "In this manner, they can be identified as proteins regardless of their sources\n",
    "by any queries for the generic `protein` class, while still carrying\n",
    "information about their namespace and avoiding identifier conflicts.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "The only change affected upon the code from the previous section is the\n",
    "referral to the updated schema configuration file.\n",
    "</div>\n",
    "\n",
    "<div class=\"alert alert-success\">\n",
    "In the output, we now generate two separate files for the `protein` class, one\n",
    "for each subclass (with names in PascalCase).\n",
    "</div>\n",
    "\n",
    "Let's create a DataFrame with the same nodes as above, but with a different schema configuration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO -- Loading ontologies...\n",
      "INFO -- Instantiating OntologyAdapter class for https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'uniprot.protein':   uniprot.protein                                           sequence  \\\n",
       " 0          K2W3K5  TVKISILFNPLPNQDMNTTTCQAESNYKAIYLYPWCSMDDVWNVEA...   \n",
       " 1          S6E6D1  CSVQIQIGMSQDSPDSSEGNMDCPPRNIGGYEIVCNVQGKRCYSTD...   \n",
       " 2          Z1F6R2  FMVWKDCLCIRMRHMAVPVPQYHCEYFEVILERWEVPCFSVLNRCK...   \n",
       " \n",
       "            description taxon      id preferred_id  \n",
       " 0  e e v h x f t f j l  9606  K2W3K5      uniprot  \n",
       " 1  i z t s l x v g j l  9606  S6E6D1      uniprot  \n",
       " 2  h d m k q n r e h r  9606  Z1F6R2      uniprot  ,\n",
       " 'entrez.protein':   entrez.protein                                           sequence  \\\n",
       " 0         186009  FHYHGGMGPFMTYQNFLHWEQMQPMKLFNEPMQFHDWYGTHVNWPG...   \n",
       " 1         926766  HKEAELLVKGQIQTPKCLRHNHFYAKLTIVIELNYMVDRYGKDMAR...   \n",
       " 2         362641  PISDEQEMGSEFCGHCNTGVYQVEMHFFECEDLNPKVQPKWIFTVT...   \n",
       " \n",
       "            description taxon      id preferred_id  \n",
       " 0  b c q m l d a u u g  9606  186009       entrez  \n",
       " 1  t n a j d l j a t a  9606  926766       entrez  \n",
       " 2  l m x k h m v g p y  9606  362641       entrez  }"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bc = BioCypher(\n",
    "    biocypher_config_path='03_biocypher_config.yaml',\n",
    "    schema_config_path='03_schema_config.yaml',\n",
    ")\n",
    "bc.add(node_generator(proteins))\n",
    "for name, df in bc.to_df().items():\n",
    "    print(name)\n",
    "    display(df)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we see two separate DataFrames, one for each subclass of the `protein` class."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 3: Handling properties\n",
    "While ID and label are mandatory components of our knowledge graph, properties\n",
    "are optional and can include different types of information on the entities. In\n",
    "source data, properties are represented in arbitrary ways, and designations\n",
    "rarely overlap even for the most trivial of cases (spelling differences,\n",
    "formatting, etc). Additionally, some data sources contain a large wealth of\n",
    "information about entities, most of which may not be needed for the given task.\n",
    "Thus, it is often desirable to filter out properties that are not needed to\n",
    "save time, disk space, and memory.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "\n",
    "Maintaining consistent properties per entity type is particularly important\n",
    "when using the admin import feature of Neo4j, which requires consistency\n",
    "between the header and data files. Properties that are introduced into only\n",
    "some of the rows will lead to column misalignment and import failure. In\n",
    "\"online mode\", this is not an issue.\n",
    "\n",
    "</div>\n",
    "\n",
    "We will take a look at how to handle property selection in BioCypher in a\n",
    "way that is flexible and easy to maintain."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Designated properties\n",
    "\n",
    "The simplest and most straightforward way to ensure that properties are\n",
    "consistent for each entity type is to designate them explicitly in the schema\n",
    "configuration. This is done by adding a `properties` key to the entity type\n",
    "configuration. The value of this key is another dictionary, where in the\n",
    "standard case the keys are the names of the properties that the entity type\n",
    "should possess, and the values give the type of the property. Possible values\n",
    "are:\n",
    "\n",
    "- `str` (or `string`),\n",
    "\n",
    "- `int` (or `integer`, `long`),\n",
    "\n",
    "- `float` (or `double`, `dbl`),\n",
    "\n",
    "- `bool` (or `boolean`),\n",
    "\n",
    "- arrays of any of these types (indicated by square brackets, e.g. `string[]`).\n",
    "\n",
    "In the case of properties that are not present in (some of) the source data,\n",
    "BioCypher will add them to the output with a default value of `None`.\n",
    "Additional properties in the input that are not represented in these designated\n",
    "property names will be ignored. Let's imagine that some, but not all, of our\n",
    "protein nodes have a `mass` value. If we want to include the mass value on all\n",
    "proteins, we can add the following to our schema configuration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------\n",
      "protein:\n",
      "    represented_as: node\n",
      "    preferred_id:\n",
      "    - uniprot\n",
      "    - entrez\n",
      "    input_label:\n",
      "    - uniprot_protein\n",
      "    - entrez_protein\n",
      "    properties:\n",
      "        sequence: str\n",
      "        description: str\n",
      "        taxon: str\n",
      "        mass: int\n",
      "\n",
      "--------------\n"
     ]
    }
   ],
   "source": [
    "print_yaml('04_schema_config.yaml')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will add the `mass` property to all proteins (in addition to the three we\n",
    "had before); if not encountered, the column will be empty. Implicit subclasses\n",
    "will automatically inherit the property configuration; in this case, both\n",
    "`uniprot.protein` and `entrez.protein` will have the `mass` property, even\n",
    "though the `entrez` proteins do not have a `mass` value in the input data.\n",
    "\n",
    "<div class=\"alert alert-info\">\n",
    "If we wanted to ignore the mass value for all properties, we could simply\n",
    "remove the `mass` key from the `properties` dictionary.\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "from data_generator import EntrezProtein, RandomPropertyProtein"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO -- Loading ontologies...\n",
      "INFO -- Instantiating OntologyAdapter class for https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'uniprot.protein':   uniprot.protein                                           sequence  \\\n",
       " 0          S1Z9L5  RHLRGDVMQEDHHTSSERMVYNVLPQDYKVVSCEYWNTQVTALWVI...   \n",
       " 1          W9J5F1  IPFSQSAWAQQRIGPKGTKAHGVTQPAPMDIKNLCNLTDLTLILDF...   \n",
       " 2          T1J3U0  WFGCCHKQYVSHVIDRQDPQSPSDNPSLVSQLQFFMWGIQIQNGEI...   \n",
       " \n",
       "            description taxon  mass      id preferred_id  \n",
       " 0  u x e o k m a i o s  3899  None  S1Z9L5      uniprot  \n",
       " 1  i x k c r b p d d p  8873  None  W9J5F1      uniprot  \n",
       " 2  m a w r r u x c w o  1966  9364  T1J3U0      uniprot  ,\n",
       " 'entrez.protein':   entrez.protein                                           sequence  \\\n",
       " 0         405878  RMTDGFEWQLDFHAFIWCNQAAWQLPLEVHISQGNGGWRMGLYGNM...   \n",
       " 1         154167  CGMNYDNGYFSVAYQSYDLWYHQQLKTRGVKPAEKDSDKDLGIDVI...   \n",
       " 2         234189  GQWQECIQGFTPQQMCVDCCAETKLANKSYYHSWMTWRLSGLCFNM...   \n",
       " \n",
       "            description taxon  mass      id preferred_id  \n",
       " 0  y c s v s n e c h o  9606  None  405878       entrez  \n",
       " 1  i k n c e n r n c d  9606  None  154167       entrez  \n",
       " 2  o v w y g h y e v y  9606  None  234189       entrez  }"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a list of proteins to be imported (now with properties)\n",
    "proteins = [\n",
    "    p for sublist in zip(\n",
    "        [RandomPropertyProtein() for _ in range(n_proteins)],\n",
    "        [EntrezProtein() for _ in range(n_proteins)],\n",
    "    ) for p in sublist\n",
    "]\n",
    "# New instance, populated, and to DataFrame\n",
    "bc = BioCypher(\n",
    "    biocypher_config_path='04_biocypher_config.yaml',\n",
    "    schema_config_path='04_schema_config.yaml',\n",
    ")\n",
    "bc.add(node_generator(proteins))\n",
    "for name, df in bc.to_df().items():\n",
    "    print(name)\n",
    "    display(df)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inheriting properties\n",
    "Sometimes, explicit designation of properties requires a lot of maintenance\n",
    "work, particularly for classes with many properties. In these cases, it may be\n",
    "more convenient to inherit properties from a parent class. This is done by\n",
    "adding a `properties` key to a suitable parent class configuration, and then\n",
    "defining inheritance via the `is_a` key in the child class configuration and\n",
    "setting the `inherit_properties` key to `true`.\n",
    "\n",
    "Let's say we have an additional `protein isoform` class, which can reasonably\n",
    "inherit from `protein` and should carry the same properties as the parent. We\n",
    "can add the following to our schema configuration:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "from data_generator import RandomPropertyProteinIsoform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------\n",
      "protein:\n",
      "    represented_as: node\n",
      "    preferred_id:\n",
      "    - uniprot\n",
      "    - entrez\n",
      "    input_label:\n",
      "    - uniprot_protein\n",
      "    - entrez_protein\n",
      "    properties:\n",
      "        sequence: str\n",
      "        description: str\n",
      "        taxon: str\n",
      "        mass: int\n",
      "protein isoform:\n",
      "    is_a: protein\n",
      "    inherit_properties: true\n",
      "    represented_as: node\n",
      "    preferred_id: uniprot\n",
      "    input_label: uniprot_isoform\n",
      "\n",
      "--------------\n"
     ]
    }
   ],
   "source": [
    "print_yaml('05_schema_config.yaml')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This allows maintenance of property lists for many classes at once. If the child\n",
    "class has properties already, they will be kept (if they are not present in the\n",
    "parent class) or replaced by the parent class properties (if they are present).\n",
    "\n",
    "Again, apart from adding the protein isoforms to the input stream, the code\n",
    "for this example is identical to the previous one except for the reference to\n",
    "the updated schema configuration.\n",
    "\n",
    "We now create three separate DataFrames, all of which are children of the\n",
    "`protein` class; two implicit children (`uniprot.protein` and `entrez.protein`)\n",
    "and one explicit child (`protein isoform`).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO -- Loading ontologies...\n",
      "INFO -- Instantiating OntologyAdapter class for https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "uniprot.protein\n",
      "  uniprot.protein                                           sequence  \\\n",
      "0          A9L6G4  SWIVVGQPDSHNKRLVNYHWMRCEHPLRCWRPIYVVRVSFQSQCEQ...   \n",
      "1          E4N2H2  PGVMILDNMQHKCSKELSTRQIITNHWICNSAPISWSSGMDRSCLD...   \n",
      "2          V4F1T1  DQCHNLCPGSSFQCPENAFGNDWIDHMPQETGLMQYDDPQSGMWFT...   \n",
      "\n",
      "           description taxon  mass      id preferred_id  \n",
      "0  m o k j a f w v w r  4220  None  A9L6G4      uniprot  \n",
      "1  n v i r s f m f d w  6339  6481  E4N2H2      uniprot  \n",
      "2  w e v v a b o b b u  9176  6510  V4F1T1      uniprot  \n",
      "protein isoform\n",
      "  protein isoform                                           sequence  \\\n",
      "0          F0N9A4  QDVVLVEGCGDEGWIHMPEKRPGQAYKWCERFRPIPDFTNSIKIAY...   \n",
      "1          B1W6O2  SQKHFRRWWTNDCFGQELMSIYYNVKFWDNLIEMTGGPASRVCLGQ...   \n",
      "2          G6V5R9  ASAITPFSYEKPHTVTLDATEVFPKMQDAQAIEREIHFSKSTLVYG...   \n",
      "\n",
      "           description taxon  mass      id preferred_id  \n",
      "0  r f e a v a a g w r  8061  None  F0N9A4      uniprot  \n",
      "1  a c a v v k v k c w  6786  None  B1W6O2      uniprot  \n",
      "2  c k g d a l f r t v  6868  1323  G6V5R9      uniprot  \n",
      "entrez.protein\n",
      "  entrez.protein                                           sequence  \\\n",
      "0          52329  DYRSMAPTFILMKIYPACDAITKRRWSVATVKDGEFIWWSAVKIFP...   \n",
      "1         581107  LLVFNMGQLAVAGYGNTMVSAMMCFCCDVKARMGMSWLPKITTMQW...   \n",
      "2         270569  MVCSHHELAVAFQTMCPIQGDAATAKANAHRTTDKQNWMVVKWFRT...   \n",
      "\n",
      "           description taxon  mass      id preferred_id  \n",
      "0  q k r b h g t q x x  9606  None   52329       entrez  \n",
      "1  h f g z j r b g m w  9606  None  581107       entrez  \n",
      "2  s b p v f u t y g v  9606  None  270569       entrez  \n"
     ]
    }
   ],
   "source": [
    "# create a list of proteins to be imported\n",
    "proteins = [\n",
    "    p for sublist in zip(\n",
    "        [RandomPropertyProtein() for _ in range(n_proteins)],\n",
    "        [RandomPropertyProteinIsoform() for _ in range(n_proteins)],\n",
    "        [EntrezProtein() for _ in range(n_proteins)],\n",
    "    ) for p in sublist\n",
    "]\n",
    "\n",
    "# Create BioCypher driver\n",
    "bc = BioCypher(\n",
    "    biocypher_config_path='05_biocypher_config.yaml',\n",
    "    schema_config_path='05_schema_config.yaml',\n",
    ")\n",
    "# Run the import\n",
    "bc.add(node_generator(proteins))\n",
    "\n",
    "for name, df in bc.to_df().items():\n",
    "    print(name)\n",
    "    display(df)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 4: Handling relationships\n",
    "\n",
    "Naturally, we do not only want nodes in our knowledge graph, but also edges. In\n",
    "BioCypher, the configuration of relationships is very similar to that of nodes,\n",
    "with some key differences. First the similarities: the top-level class\n",
    "configuration of edges is the same; class names refer to ontological classes or\n",
    "are an extension thereof. Similarly, the `is_a` key is used to define\n",
    "inheritance, and the `inherit_properties` key is used to inherit properties from\n",
    "a parent class. Relationships also possess a `preferred_id` key, an\n",
    "`input_label` key, and a `properties` key, which work in the same way as for\n",
    "nodes.\n",
    "\n",
    "Relationships also have a `represented_as` key, which in this case can be\n",
    "either `node` or `edge`. The `node` option is used to \"reify\" the relationship\n",
    "in order to be able to connect it to other nodes in the graph. In addition to\n",
    "the configuration of nodes, relationships also have fields for the `source` and\n",
    "`target` node types, which refer to the ontological classes of the respective\n",
    "nodes, and are currently optional.\n",
    "\n",
    "To add protein-protein interactions to our graph, we can modify the schema configuration above to the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------\n",
      "protein:\n",
      "    represented_as: node\n",
      "    preferred_id:\n",
      "    - uniprot\n",
      "    - entrez\n",
      "    input_label:\n",
      "    - uniprot_protein\n",
      "    - entrez_protein\n",
      "    properties:\n",
      "        sequence: str\n",
      "        description: str\n",
      "        taxon: str\n",
      "        mass: int\n",
      "protein isoform:\n",
      "    is_a: protein\n",
      "    inherit_properties: true\n",
      "    represented_as: node\n",
      "    preferred_id: uniprot\n",
      "    input_label: uniprot_isoform\n",
      "protein protein interaction:\n",
      "    is_a: pairwise molecular interaction\n",
      "    represented_as: edge\n",
      "    preferred_id: intact\n",
      "    input_label: interacts_with\n",
      "    properties:\n",
      "        method: str\n",
      "        source: str\n",
      "\n",
      "--------------\n"
     ]
    }
   ],
   "source": [
    "print_yaml('06_schema_config_pandas.yaml')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have added `protein protein interaction` as an edge, we have to simulate some interactions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "from data_generator import InteractionGenerator\n",
    "\n",
    "# Simulate edges for proteins we defined above\n",
    "ppi = InteractionGenerator(\n",
    "    interactors=[p.get_id() for p in proteins],\n",
    "    interaction_probability=0.05,\n",
    ").generate_interactions()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'A9L6G4 interacts_with V4F1T1'"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# naturally interactions/edges contain information about the interacting source and target nodes\n",
    "# let's look at the first one in the list\n",
    "interaction = ppi[0]\n",
    "f\"{interaction.get_source_id()} {interaction.label} {interaction.get_target_id()}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'source': 'signor', 'method': 'u z c x m d c u g s'}"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# similarly to nodes, it also has a dictionary of properties\n",
    "interaction.get_properties()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As with nodes, we add first createa a new BioCypher instance, and then populate it with nodes as well as edges:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "bc = BioCypher(\n",
    "    biocypher_config_path='06_biocypher_config.yaml',\n",
    "    schema_config_path='06_schema_config_pandas.yaml',\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO -- Loading ontologies...\n",
      "INFO -- Instantiating OntologyAdapter class for https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl.\n"
     ]
    }
   ],
   "source": [
    "# Extract id, source, target, label, and property dictionary\n",
    "def edge_generator(ppi):\n",
    "    for interaction in ppi:\n",
    "        yield (\n",
    "            interaction.get_id(),\n",
    "            interaction.get_source_id(),\n",
    "            interaction.get_target_id(),\n",
    "            interaction.get_label(),\n",
    "            interaction.get_properties(),\n",
    "        )\n",
    "\n",
    "bc.add(node_generator(proteins))\n",
    "bc.add(edge_generator(ppi))\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's look at the interaction DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>protein protein interaction</th>\n",
       "      <th>_from</th>\n",
       "      <th>_to</th>\n",
       "      <th>source</th>\n",
       "      <th>method</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>intact703256</td>\n",
       "      <td>A9L6G4</td>\n",
       "      <td>V4F1T1</td>\n",
       "      <td>signor</td>\n",
       "      <td>u z c x m d c u g s</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>None</td>\n",
       "      <td>E4N2H2</td>\n",
       "      <td>F0N9A4</td>\n",
       "      <td>intact</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  protein protein interaction   _from     _to  source               method\n",
       "0                intact703256  A9L6G4  V4F1T1  signor  u z c x m d c u g s\n",
       "1                        None  E4N2H2  F0N9A4  intact                 None"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bc.to_df()[\"protein protein interaction\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, it is worth noting that BioCypher relies on ontologies, which are machine readable representations of domains of knowledge that we use to ground the contents of our knowledge graphs. While details about ontologies are out of scope for this tutorial, and are described in detail in the [BioCypher documentation](https://biocypher.org/tutorial-ontology.html), we can still have a glimpse at the ontology that we used implicitly in this tutorial:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Showing ontology structure based on https://github.com/biolink/biolink-model/raw/v3.2.1/biolink-model.owl.ttl\n",
      "entity\n",
      "├── association\n",
      "│   └── gene to gene association\n",
      "│       └── pairwise gene to gene interaction\n",
      "│           └── pairwise molecular interaction\n",
      "│               └── protein protein interaction\n",
      "└── named thing\n",
      "    └── biological entity\n",
      "        └── polypeptide\n",
      "            └── protein\n",
      "                ├── entrez.protein\n",
      "                ├── protein isoform\n",
      "                └── uniprot.protein\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<treelib.tree.Tree at 0x7f7327b3a880>"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bc.show_ontology_structure()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "biocypher-Ca5VQ1YT-py3.9",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "2ff371c403bc11abbc3c8e1391dba3b01886d66becfc523eea8e3dc677d5b98e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}