Skip to content

Biotope Init

Draft stage

Biotope is in draft stage. Functionality may be missing or incomplete.
The API is subject to change.

Overview

The biotope init command initializes a new biotope project with interactive configuration. It sets up the necessary directory structure and configuration files for metadata management.

Features

Interactive Configuration

The init process guides you through several configuration options:

  1. Project Name: Set a name for your biotope project
  2. Git Integration: Choose whether to initialize Git version control
  3. Knowledge Graph: Optionally install a knowledge graph for enhanced data management
  4. Output Format: Select output format (only shown if knowledge graph is enabled)
  5. Project Metadata: Collect project-level metadata for annotation pre-filling

Project-Level Metadata Collection

During initialization, you can optionally collect project-level metadata that will be used to pre-fill annotation fields:

  • Description: Brief description of the project and its purpose
  • URL: Project homepage, repository, or documentation URL
  • Creator: Name and contact information of the project maintainer
  • License: Data usage license (e.g., MIT, CC-BY, etc.)
  • Citation: How to cite the project or dataset

This metadata is stored in .biotope/config/biotope.yaml and automatically loaded when using biotope annotate edit.

Conditional Output Format Selection

The output format selection is only presented if you choose to install a knowledge graph, as it's only relevant for knowledge graph functionality.

Usage

biotope init [OPTIONS]

Options

  • --dir, -d: Directory to initialize biotope project in (default: current directory)

Example

# Initialize in current directory
biotope init

# Initialize in specific directory
biotope init --dir /path/to/project

Configuration File Structure

The initialization creates a .biotope/config/biotope.yaml file with the following structure:

version: "1.0"
croissant_schema_version: "1.0"
default_metadata_template: "scientific"
data_storage:
  type: "local"
  path: "data"
checksum_algorithm: "sha256"
auto_stage: true
commit_message_template: "Update metadata: {description}"

# Project information (consolidated from internal metadata)
project_info:
  name: "my-project"
  created_at: "2024-01-01T00:00:00Z"
  biotope_version: "0.1.0"
  last_modified: "2024-01-01T00:00:00Z"
  builds: []
  knowledge_sources: []

# Project-level metadata for annotation pre-fill
project_metadata:
  description: "Project description"
  url: "https://example.com/project"
  creator:
    name: "John Doe"
    email: "john@example.com"
  license: "MIT"
  citation: "Doe, J. (2024). Project Title. Journal Name."

# Validation configuration
annotation_validation:
  enabled: true
  minimum_required_fields:
    - "name"
    - "description"
    - "creator"
    - "dateCreated"
    - "distribution"
  field_validation:
    name:
      type: "string"
      min_length: 1
    description:
      type: "string"
      min_length: 10
    creator:
      type: "object"
      required_keys: ["name"]
    dateCreated:
      type: "string"
      format: "date"
    distribution:
      type: "array"
      min_length: 1

Directory Structure

The init command creates the following directory structure:

project-root/
├── .biotope/
│   ├── config/
│   │   └── biotope.yaml          # Consolidated configuration (Git-like)
│   ├── datasets/                 # Croissant ML metadata files
│   ├── workflows/                # Bioinformatics workflow definitions
│   └── logs/                     # Command execution logs
├── config/
│   └── biotope.yaml              # User-facing configuration
├── data/
│   ├── raw/
│   └── processed/
├── schemas/
└── outputs/

Note: The configuration follows a Git-like approach where .biotope/config/biotope.yaml contains all biotope-specific configuration, similar to how Git uses .git/config for its configuration.

Managing Project Metadata

After initialization, you can manage project metadata using the biotope config command:

# Set project metadata
biotope config set-project-metadata

# Show current project metadata
biotope config show-project-metadata

biotope init — scaffold a new biotope project.

Default behavior is pure scaffold: create the directory layout, drop an AGENTS.md for the agent surface, write an empty project.yaml, run git init. No content questions. The agent (or the user via biotope map) fills in the competence questions afterwards.

Use --interactive to open $EDITOR on the freshly-written project.yaml so the user can fill purpose: before exiting init.

init(name, dir, purpose, no_prompt, no_git, visible, interactive)

Scaffold a new biotope project.

Default invocation: biotope init my-project. Creates my-project/ with .biotope/, data/, mappings/, an AGENTS.md for agents to read, and an empty project.yaml. Runs git init unless --no-git is set.

Source code in biotope/biotope/commands/init.py
@click.command()
@click.argument("name", required=False)
@click.option(
    "--dir",
    "-d",
    type=click.Path(file_okay=False, path_type=Path),
    default=".",
    help="Parent directory to initialise the project in. The project goes in NAME/ within this dir.",
)
@click.option(
    "--purpose",
    "-p",
    type=str,
    default=None,
    help=(
        "Seed the project's purpose directly. "
        "If omitted and stdin is a TTY, init will prompt; pass --no-prompt to skip."
    ),
)
@click.option(
    "--no-prompt",
    is_flag=True,
    default=False,
    help="Never prompt interactively. Combined with no --purpose, leaves it empty.",
)
@click.option(
    "--no-git",
    is_flag=True,
    default=False,
    help="Skip running `git init`. The .biotope/ directory is still created.",
)
@click.option(
    "--visible",
    is_flag=True,
    default=False,
    help="Write project.yaml at the project root instead of inside .biotope/.",
)
@click.option(
    "--interactive",
    is_flag=True,
    default=False,
    help="Open $EDITOR on project.yaml so you can fill in purpose before exiting init.",
)
def init(
    name: str | None,
    dir: Path,  # noqa: A002
    purpose: str | None,
    no_prompt: bool,
    no_git: bool,
    visible: bool,
    interactive: bool,
) -> None:
    """Scaffold a new biotope project.

    Default invocation: ``biotope init my-project``. Creates ``my-project/`` with
    ``.biotope/``, ``data/``, ``mappings/``, an ``AGENTS.md`` for agents to read,
    and an empty ``project.yaml``. Runs ``git init`` unless ``--no-git`` is set.
    """
    if name is None:
        name = click.prompt("Project name", type=str)

    if purpose is None:
        if no_prompt or not sys.stdin.isatty():
            purpose = ""
        else:
            click.echo(PURPOSE_PROMPT)
            purpose = click.prompt("purpose", default="", show_default=False)

    project_dir = (dir / name).resolve() if name != "." else dir.resolve()
    if name == ".":
        name = project_dir.name

    if (project_dir / ".biotope").exists():
        click.echo(f"❌ {project_dir} already contains a .biotope/ directory.")
        raise click.Abort

    project_dir.mkdir(parents=True, exist_ok=True)
    (project_dir / ".biotope" / "datasets").mkdir(parents=True, exist_ok=True)
    (project_dir / ".biotope" / "workflows").mkdir(parents=True, exist_ok=True)
    (project_dir / "data").mkdir(parents=True, exist_ok=True)
    (project_dir / "mappings").mkdir(exist_ok=True)

    config_path = project_dir / ".biotope" / "config.yaml"
    config_path.write_text(yaml.safe_dump(DEFAULT_BIOTOPE_CONFIG, sort_keys=False))

    project = Project(name=name, purpose=purpose)
    project_yaml_path = resolve_project_path(project_dir, visible=visible)
    project_yaml_path.parent.mkdir(parents=True, exist_ok=True)
    project.dump(project_yaml_path)

    agents_md_dest = project_dir / "AGENTS.md"
    agents_md_src = TEMPLATES / "AGENTS.md"
    shutil.copy(agents_md_src, agents_md_dest)

    gitignore = project_dir / ".gitignore"
    if not gitignore.exists():
        gitignore.write_text("data/\n__pycache__/\n*.pyc\n.venv/\n")

    # A starter pyproject so the project is self-contained: each biotope
    # project owns its dependency set and can be reproduced with
    # `uv sync` / `pip install -e .` without relying on whichever env
    # happened to run `biotope init`. Skip if the user already has one
    # (e.g. they're initialising inside an existing Python project).
    pyproject_path = project_dir / "pyproject.toml"
    pyproject_was_written = False
    if not pyproject_path.exists():
        pyproject_path.write_text(_emit_pyproject(name, purpose))
        pyproject_was_written = True

    if not no_git:
        try:
            subprocess.run(["git", "init", "-q"], cwd=project_dir, check=True)
        except (subprocess.CalledProcessError, FileNotFoundError) as exc:
            click.echo(f"⚠️  git init failed: {exc}")
        else:
            scaffold_paths = [".gitignore", "AGENTS.md", ".biotope/"]
            if pyproject_was_written:
                scaffold_paths.append("pyproject.toml")
            if visible:
                scaffold_paths.append(project_yaml_path.relative_to(project_dir).as_posix())
            try:
                subprocess.run(["git", "add", *scaffold_paths], cwd=project_dir, check=True)
            except (subprocess.CalledProcessError, FileNotFoundError) as exc:
                click.echo(f"⚠️  Could not stage scaffold files: {exc}")
            else:
                # Commit the scaffold so `biotope status` starts clean — otherwise
                # init artefacts sit in the index and confuse new users into thinking
                # they themselves staged config.yaml, AGENTS.md, etc. Falls back to
                # leaving the scaffold staged if git identity isn't configured.
                try:
                    subprocess.run(
                        ["git", "commit", "-q", "-m", "chore: initialize biotope project"],
                        cwd=project_dir,
                        check=True,
                        capture_output=True,
                    )
                except (subprocess.CalledProcessError, FileNotFoundError) as exc:
                    detail = ""
                    if isinstance(exc, subprocess.CalledProcessError) and exc.stderr:
                        detail = f": {exc.stderr.decode(errors='replace').strip().splitlines()[-1]}"
                    click.echo(
                        f"⚠️  Could not create initial commit{detail}. "
                        "Scaffold files remain staged; commit them manually."
                    )

    if interactive:
        editor = os.environ.get("EDITOR", "vi")
        try:
            subprocess.run([editor, str(project_yaml_path)], check=True)
            Project.load(project_yaml_path)  # validate after edit
        except subprocess.CalledProcessError:
            click.echo(f"⚠️  Editor exited non-zero; {project_yaml_path} may be empty.")

    console.print(f"✅ Initialised biotope project at [cyan]{project_dir}[/cyan]")
    console.print(f"   project.yaml: [dim]{project_yaml_path.relative_to(project_dir)}[/dim]")
    if purpose:
        console.print(f"   purpose: [dim]{purpose}[/dim]")
    if pyproject_was_written:
        console.print(
            "   Next: [bold]cd "
            f"{project_dir.name}[/bold] and install deps:\n"
            "         [bold]uv sync[/bold]   (or: pip install -e .)\n"
            "         then [bold]biotope add <data>[/bold] (or [bold]biotope get <url>[/bold])"
            " to bring data in."
        )
    else:
        console.print(
            "   Next: bring data in with [bold]biotope add <data>[/bold] "
            "(or [bold]biotope get <url>[/bold]), then [bold]biotope map[/bold].",
        )