Skip to content

Biotope Add

Draft stage

Biotope is in draft stage. Functionality may be missing or incomplete.
The API is subject to change.

biotope add is the structural entrypoint for tracking data in a biotope project.

  • biotope add <file> creates one JSON-LD for that file.
  • biotope add <dir> recurses by default and creates one JSON-LD for the rooted tree.
  • Parseable files get croissant-baker structure.
  • Unhandled files are still tracked as cr:FileObject pointers.

Command Signature

biotope add [OPTIONS] [PATHS]...

Options

  • --force, -f: force add even if a file is already tracked
  • --name: dataset name override
  • --description: dataset description override
  • --license: dataset license
  • --creator: dataset creator name
  • --creator-email: dataset creator email
  • --url: dataset URL
  • --citation: dataset citation text
  • --version: dataset version
  • --keyword: dataset keyword, repeatable
  • --access-restrictions: dataset access restrictions
  • --legal-obligations: dataset legal obligations
  • --collaboration-partner: dataset collaboration partner
  • --rai KEY=VALUE: Croissant RAI field, repeatable

Examples

Add a single file

biotope add data/experiment.csv --license CC-BY-4.0

Add a directory

biotope add data/opentargets \
  --license CC-BY-4.0 \
  --creator "Open Targets" \
  --description "Open Targets release"

Force re-add a tracked file

biotope add data/experiment.csv --force

What It Does

  1. Validates that you are inside a biotope project and Git repository.
  2. Creates or refreshes metadata in .biotope/datasets/.
  3. Uses croissant-baker for parseable files.
  4. Appends cr:FileObject pointers for unparseable files in directory adds.
  5. Stages .biotope/ changes in Git.

When you add a directory, biotope also writes <dir>/.biotope.yaml so the dataset can be refined collaboratively with biotope annotate apply.

Follow-on workflow

biotope add data/opentargets --license CC-BY-4.0 --creator "Open Targets"
biotope annotate apply data/opentargets
biotope status
biotope commit -m "Track Open Targets dataset"

Output shape

Single-file adds always emit a cr:FileObject. When baker can infer structure, the same JSON-LD may also contain recordSet entries.

Directory adds emit one aggregate JSON-LD:

  • cr:FileSet and recordSet entries for handled formats
  • cr:FileObject entries for uncovered files

Add command implementation for tracking data files and metadata.

add(paths, force, name, description, license_value, creator, creator_email, url, citation, version, keywords, access_restrictions, legal_obligations, collaboration_partner, rai_pairs, status_override, derived_from)

Add data files or rooted directories to a biotope project.

Source code in biotope/biotope/commands/add.py
@click.command()
@click.argument("paths", nargs=-1, type=click.Path(exists=True, path_type=Path))
@click.option("--force", "-f", is_flag=True, help="Force add even if file already tracked")
@click.option("--name", help="Dataset name override")
@click.option("--description", help="Dataset description override")
@click.option("--license", "license_value", help="Dataset license")
@click.option("--creator", help="Dataset creator name")
@click.option("--creator-email", help="Dataset creator email")
@click.option("--url", help="Dataset URL")
@click.option("--citation", help="Dataset citation text")
@click.option("--version", help="Dataset version")
@click.option("--keyword", "keywords", multiple=True, help="Dataset keyword (repeatable)")
@click.option("--access-restrictions", help="Dataset access restrictions")
@click.option("--legal-obligations", help="Dataset legal obligations")
@click.option("--collaboration-partner", help="Dataset collaboration partner")
@click.option("--rai", "rai_pairs", multiple=True, help="Croissant RAI field as KEY=VALUE")
@click.option(
    "--status",
    "status_override",
    type=click.Choice(["raw", "processed"]),
    default=None,
    help="Override pipeline state. Default: 'processed' if baker produced a " "complete record set, 'raw' otherwise.",
)
@click.option(
    "--derived-from",
    "derived_from",
    multiple=True,
    help="Record this dataset as derived from another (repeatable). Pass a "
    "dataset reference — data path, manifest path, or dataset name.",
)
def add(
    paths: tuple[Path, ...],
    force: bool,
    name: str | None,
    description: str | None,
    license_value: str | None,
    creator: str | None,
    creator_email: str | None,
    url: str | None,
    citation: str | None,
    version: str | None,
    keywords: tuple[str, ...],
    access_restrictions: str | None,
    legal_obligations: str | None,
    collaboration_partner: str | None,
    rai_pairs: tuple[str, ...],
    status_override: str | None,
    derived_from: tuple[str, ...],
) -> None:
    """Add data files or rooted directories to a biotope project."""
    if not paths:
        ctx = click.get_current_context()
        click.echo(ctx.get_help())
        raise click.Abort

    if name and len(paths) != 1:
        raise click.BadParameter("--name can only be used when adding one path.")

    biotope_root = find_biotope_root()
    if not biotope_root:
        click.echo("❌ Not in a biotope project. Run 'biotope init' first.")
        raise click.Abort

    if not is_git_repo(biotope_root):
        click.echo("❌ Not in a Git repository. Initialize Git first with 'git init'.")
        raise click.Abort

    try:
        rai_fields = parse_key_value_pairs(rai_pairs, "--rai")
    except ValueError as exc:
        raise click.BadParameter(str(exc)) from exc

    try:
        resolved_provenance = [_resolve_dataset_ref(ref, biotope_root) for ref in derived_from]
    except ValueError as exc:
        raise click.BadParameter(str(exc)) from exc

    overrides = {
        "name": name,
        "description": description,
        "license": license_value,
        "creator": creator,
        "creator_email": creator_email,
        "url": url,
        "citation": citation,
        "version": version,
        "keywords": list(keywords),
        "access_restrictions": access_restrictions,
        "legal_obligations": legal_obligations,
        "collaboration_partner": collaboration_partner,
        "rai_fields": rai_fields,
        "status_override": status_override,
        "derived_from": resolved_provenance,
    }

    datasets_dir = biotope_root / ".biotope" / "datasets"
    datasets_dir.mkdir(parents=True, exist_ok=True)

    added_entries: list[Path] = []
    skipped_entries: list[Path] = []
    baked_dirs: list[tuple[Path, dict[str, Any]]] = []

    for path in paths:
        if path.is_file():
            result = _add_file(path, biotope_root, datasets_dir, force, overrides)
            if result:
                added_entries.append(path)
            else:
                skipped_entries.append(path)
            continue

        baked = _bake_directory(path, biotope_root, overrides)
        if baked is None:
            skipped_entries.append(path)
            continue

        metadata_dict, n_source_files = baked
        added_entries.append(path)
        baked_dirs.append((path.resolve(), metadata_dict))
        n_record_sets = len(metadata_dict.get("recordSet", []))
        target = resolve_target(path, biotope_root)
        click.echo(
            f"  ✨ Generated {target.metadata_path.relative_to(biotope_root)} "
            f"({n_source_files} source file(s), {n_record_sets} record set(s))"
        )

    if added_entries:
        stage_git_changes(biotope_root)

    for source_dir, metadata_dict in baked_dirs:
        _generate_biotope_scaffold_from_baked(source_dir, metadata_dict, biotope_root)

    if added_entries:
        click.echo(f"\n✅ Added {len(added_entries)} entr(y/ies) to biotope project:")
        for entry in added_entries:
            click.echo(f"  + {entry}")

    if skipped_entries:
        click.echo(f"\n⚠️  Skipped {len(skipped_entries)} entr(y/ies):")
        for entry in skipped_entries:
            click.echo(f"  - {entry}")

    if added_entries:
        click.echo("\n💡 Next steps:")
        if baked_dirs:
            for source_dir, _metadata_dict in baked_dirs:
                click.echo(f"  • Review {source_dir / SCAFFOLD_FILENAME}")
                click.echo(f"    Then: biotope annotate apply {source_dir}")
            click.echo("  • Map data into the knowledge graph: biotope map")
            click.echo('  • Finally: biotope commit -m "message"')
        else:
            click.echo("  1. Run 'biotope status' to see staged files")
            click.echo("  2. Run 'biotope annotate edit --staged' to refine metadata")
            click.echo("  3. Run 'biotope commit -m \"message\"' to save changes")