biocypher.output.write.relational._postgresql._PostgreSQLBatchWriter

class biocypher.output.write.relational._postgresql._PostgreSQLBatchWriter(*args, **kwargs)

Class for writing node and edge representations to disk using the format specified by PostgreSQL for the use of “COPY FROM…”. Each batch writer instance has a fixed representation that needs to be passed at instantiation via the schema argument. The instance also expects an ontology adapter via ontology_adapter to be able to convert and extend the hierarchy.

This class inherits from the abstract class “_BatchWriter” and implements the PostgreSQL-specific methods:

  • _write_node_headers

  • _write_edge_headers

  • _construct_import_call

  • _write_array_string

__init__(*args, **kwargs)

Abtract parent class for writing node and edge representations to disk using the format specified by each database type. The database-specific functions are implemented by the respective child-classes. This abstract class contains all methods expected by a bach writer instance, some of which need to be overwritten by the child classes.

Each batch writer instance has a fixed representation that needs to be passed at instantiation via the schema argument. The instance also expects an ontology adapter via ontology_adapter to be able to convert and extend the hierarchy.

Requires the following methods to be overwritten by database-specific writer classes:

  • _write_node_headers

  • _write_edge_headers

  • _construct_import_call

  • _write_array_string

  • _get_import_script_name

Parameters:
  • translator – Instance of Translator to enable translation of nodes and manipulation of properties.

  • deduplicator – Instance of Deduplicator to enable deduplication of nodes and edges.

  • delimiter – The delimiter to use for the CSV files.

  • array_delimiter – The delimiter to use for array properties.

  • quote – The quote character to use for the CSV files.

  • output_directory – Path for exporting CSV files.

  • db_name – Name of the database that will be used in the generated commands.

  • import_call_bin_prefix – Path prefix for the admin import call binary.

  • import_call_file_prefix – Path prefix for the data files (headers and parts) in the import call.

  • wipe – Whether to force import (removing existing DB content). (Specific to Neo4j.)

  • strict_mode – Whether to enforce source, version, and license properties.

  • skip_bad_relationships – Whether to skip relationships that do not have a valid start and end node. (Specific to Neo4j.)

  • skip_duplicate_nodes – Whether to skip duplicate nodes. (Specific to Neo4j.)

  • db_user – The database user.

  • db_password – The database password.

  • db_host – The database host. Defaults to localhost.

  • db_port – The database port.

  • rdf_format – The format of RDF.

  • rdf_namespaces – The namespaces for RDF.

Methods

__init__(*args, **kwargs)

Abtract parent class for writing node and edge representations to disk using the format specified by each database type.

get_import_call()

Function to return the import call detailing folder and individual node and edge headers and data files, as well as delimiters and database name.

write_edges(edges[, batch_size])

Wrapper for writing edges and their headers.

write_import_call()

Function to write the import call detailing folder and individual node and edge headers and data files, as well as delimiters and database name, to the export folder as txt.

write_nodes(nodes[, batch_size, force])

Wrapper for writing nodes and their headers.

Attributes

DATA_TYPE_LOOKUP

import_call_file_prefix

Property for output directory path.