biocypher.output.write._batch_writer._BatchWriter
- class biocypher.output.write._batch_writer._BatchWriter(translator: Translator, deduplicator: Deduplicator, delimiter: str, array_delimiter: str = ',', quote: str = '"', output_directory: Optional[str] = None, db_name: str = 'neo4j', import_call_bin_prefix: Optional[str] = None, import_call_file_prefix: Optional[str] = None, wipe: bool = True, strict_mode: bool = False, skip_bad_relationships: bool = False, skip_duplicate_nodes: bool = False, db_user: Optional[str] = None, db_password: Optional[str] = None, db_host: Optional[str] = None, db_port: Optional[str] = None, rdf_format: Optional[str] = None, rdf_namespaces: dict = {})
Abstract batch writer class
- __init__(translator: Translator, deduplicator: Deduplicator, delimiter: str, array_delimiter: str = ',', quote: str = '"', output_directory: Optional[str] = None, db_name: str = 'neo4j', import_call_bin_prefix: Optional[str] = None, import_call_file_prefix: Optional[str] = None, wipe: bool = True, strict_mode: bool = False, skip_bad_relationships: bool = False, skip_duplicate_nodes: bool = False, db_user: Optional[str] = None, db_password: Optional[str] = None, db_host: Optional[str] = None, db_port: Optional[str] = None, rdf_format: Optional[str] = None, rdf_namespaces: dict = {})
Abtract parent class for writing node and edge representations to disk using the format specified by each database type. The database-specific functions are implemented by the respective child-classes. This abstract class contains all methods expected by a bach writer instance, some of which need to be overwritten by the child classes.
Each batch writer instance has a fixed representation that needs to be passed at instantiation via the
schema
argument. The instance also expects an ontology adapter viaontology_adapter
to be able to convert and extend the hierarchy.Requires the following methods to be overwritten by database-specific writer classes:
_write_node_headers
_write_edge_headers
_construct_import_call
_write_array_string
_get_import_script_name
- Parameters:
translator – Instance of
Translator
to enable translation of nodes and manipulation of properties.deduplicator – Instance of
Deduplicator
to enable deduplication of nodes and edges.delimiter – The delimiter to use for the CSV files.
array_delimiter – The delimiter to use for array properties.
quote – The quote character to use for the CSV files.
output_directory – Path for exporting CSV files.
db_name – Name of the database that will be used in the generated commands.
import_call_bin_prefix – Path prefix for the admin import call binary.
import_call_file_prefix – Path prefix for the data files (headers and parts) in the import call.
wipe – Whether to force import (removing existing DB content). (Specific to Neo4j.)
strict_mode – Whether to enforce source, version, and license properties.
skip_bad_relationships – Whether to skip relationships that do not have a valid start and end node. (Specific to Neo4j.)
skip_duplicate_nodes – Whether to skip duplicate nodes. (Specific to Neo4j.)
db_user – The database user.
db_password – The database password.
db_host – The database host. Defaults to localhost.
db_port – The database port.
rdf_format – The format of RDF.
rdf_namespaces – The namespaces for RDF.
Methods
__init__
(translator, deduplicator, delimiter)Abtract parent class for writing node and edge representations to disk using the format specified by each database type.
get_import_call
()Function to return the import call detailing folder and individual node and edge headers and data files, as well as delimiters and database name.
write_edges
(edges[, batch_size])Wrapper for writing edges and their headers.
write_import_call
()Function to write the import call detailing folder and individual node and edge headers and data files, as well as delimiters and database name, to the export folder as txt.
write_nodes
(nodes[, batch_size, force])Wrapper for writing nodes and their headers.
Attributes
import_call_file_prefix
Property for output directory path.