# seekdb-branch

- **Name:** seekdb-branch
- **Description:** Data branching workflow for SeekDB — fork tables or entire databases instantly, compare differences with DIFF, and merge diverged branches. Use this skill whenever a user works with SeekDB and needs safe experimentation, data versioning, sandbox environments, schema validation, CI/CD test isolation, A/B testing, knowledge base updates, or any situation where data should be duplicated before modification. Also use when the user needs to compare two tables, merge changes from a branch back, or coordinate parallel work on the same dataset — even if they don't explicitly say "fork" or "branch".

## Prerequisites

- SeekDB V1.2.0+ (fork database, diff, merge)
- SeekDB V1.1.0+ (fork table only)
- MySQL client or any MySQL-compatible driver
- DIFF / MERGE require tables with a primary key and identical column definitions

> **SeekDB D0 note:** `FORK TABLE`, `DIFF TABLE`, and `MERGE TABLE` are fully supported. `FORK DATABASE` (SQL) is **not available** — database management is handled by the system. To branch at the database level, use the **Fork Instance API** (`POST /api/v1/instances/{instance_id}/fork`) described in the main SeekDB D0 skill.

---

## Command Reference

### FORK TABLE / FORK DATABASE

```sql
FORK TABLE [db.]source_table TO [db.]dest_table;   -- SeekDB V1.1.0+
FORK DATABASE source_db TO dest_db;                -- SeekDB V1.2.0+, not available in anon mode
```

Instant, copy-on-write clone. The destination gets identical schema and all data. After fork, writes to source and destination are fully independent. Only diverging writes consume additional storage. The destination must not already exist.

### DIFF TABLE

```sql
DIFF TABLE current_table AGAINST incoming_table;
```

Read-only comparison by primary key. Outputs only rows that differ:

| Scenario | Output |
|----------|--------|
| Same PK, different values (conflict) | Two rows — one from each table |
| PK only in current | One row from current |
| PK only in incoming | One row from incoming |
| Identical rows | No output |

Results include `__table` (source table name) and `__flag` columns, followed by all table columns. Sorted by PK, current row first for conflicts.

### MERGE TABLE

```sql
MERGE TABLE incoming_table INTO current_table
  STRATEGY { FAIL | THEIRS | OURS };
```

Merges incoming data into current (current is modified). Runs in a single transaction — rolls back on failure. **MERGE never deletes rows.**

| Row situation | Action |
|---------------|--------|
| Only in incoming | INSERT into current |
| Only in current | Keep unchanged |
| Same PK, same values | No change |
| Same PK, different values (conflict) | Depends on strategy |

| Strategy | On conflict | Use when |
|----------|-------------|----------|
| `FAIL` (default) | Error + rollback | Strict — require zero conflicts |
| `THEIRS` | Overwrite with incoming | Branch version is authoritative |
| `OURS` | Keep current | Main is authoritative, only add new rows from branch |

### Schema Must Match

DIFF and MERGE require both tables to have **identical column definitions** (name, type, order). If schemas have diverged:

1. **FORK both sides first** — never ALTER the original tables directly. Align schemas on the forks, then DIFF/MERGE on the forks.
2. **Replay the missing DDL on each fork** to make column definitions identical.
3. **Use a schema diff tool** like [Atlas](https://atlasgo.io/) for complex divergences:
   ```bash
   atlas schema diff --from "mysql://seekdb/prod" --to "mysql://seekdb/branch"
   ```

---

## Choosing the Right Workflow

**Default to the simplest workflow.** Most tasks only need FORK + RENAME. Escalate to DIFF or MERGE only when the situation requires it.

```
FORK ──► work on copy ──┬─ Want to keep as a long-lived branch?
                        │   → KEEP both tables as-is
                        │
                        ├─ Only the copy changed?
                        │   → RENAME to swap (or DROP to discard)
                        │   (even if user says "merge back", prefer RENAME
                        │    when main is unchanged — it's simpler and atomic)
                        │
                        ├─ Want to review what changed?
                        │   → DIFF, then decide
                        │
                        └─ Main AND copy BOTH changed?
                            → DIFF to review, MERGE to reconcile
```

**When the user says "merge" but main table is unchanged:** prefer RENAME over MERGE. RENAME is simpler, atomic, and preserves deletions. Only use MERGE when the main table has also received independent writes after the fork and you need to reconcile both sides.

| Situation | Workflow | Operations |
|-----------|----------|------------|
| One-way experiment (sandbox, DDL test, CI) | **Simple** | FORK → work → RENAME or DROP |
| Want to inspect changes before promoting | **With review** | FORK → work → DIFF → RENAME or DROP |
| Main table also received writes after fork | **Reconcile** | FORK → both change → DIFF → MERGE |
| Multiple branches merging into main | **Multi-merge** | FORK N → DIFF each → MERGE sequentially |

**When to use FORK TABLE vs FORK DATABASE:**
- **FORK TABLE** — single table experiment, quick test, per-table versioning *(available in anon mode)*
- **FORK DATABASE** — clone entire database (all tables), for consistent full-database snapshot, CI/CD full environment, feature branches *(not available in anon mode — use Fork Instance API instead)*

---

## Rules & Best Practices

### Mandatory interaction rules

#### After working on a fork, always ask the user how to proceed.

Never silently RENAME or DROP. Present the options:
- **KEEP** — keep both tables as-is, no swap, no discard (useful when the fork serves as a long-lived branch)
- **RENAME** — promote the fork to production (atomic swap, keeps old table as backup)
- **DROP** — discard the fork, no changes to production
- **DIFF** — review changes before deciding

#### Presenting DIFF Results (mandatory)

**After running DIFF, ALWAYS summarize the raw output into a human-readable change report before proceeding.** The raw `__table` / `__flag` output is hard to read. Categorize every differing row into exactly three groups:

1. **Conflict rows** (same PK, different values) — show which columns changed, old → new
2. **Rows only in current** (deleted from branch) — list PKs and key columns
3. **Rows only in incoming** (new in branch) — list PKs and key columns

Then ask the user how to proceed (RENAME, MERGE, or DROP). Never silently move to the next step.

#### Pre-MERGE Checklist (mandatory)

**Before executing MERGE, you MUST complete ALL of the following steps. Never skip any.**

1. **Ask the user if they want to review DIFF** — before merging, ask whether they'd like to see a DIFF comparison first. If yes, run `DIFF TABLE current AGAINST incoming` and present the human-readable summary (see "Presenting DIFF Results" above). If the user declines, skip to step 3.

2. **Warn about deletions that won't propagate** (only when DIFF was run) — if DIFF shows rows only in the current table (meaning the branch deleted them), explicitly warn:
   > "N rows were deleted in the branch but MERGE will NOT remove them from the main table: [id list]. To sync deletions, run DELETE manually after MERGE."

3. **Ask the user to choose a strategy** — present the three options with a brief explanation of what each will do *for this specific merge*:
   - `FAIL` — N conflicting rows exist; MERGE will error and rollback (use to verify zero conflicts)
   - `THEIRS` — overwrite main table's N conflicting rows with the branch's values
   - `OURS` — keep main table's N conflicting rows, only INSERT new rows from the branch

   **Never silently choose a strategy.** Even if the user says "merge", you must confirm which strategy to use.

4. **Execute MERGE** — only after the user explicitly confirms the strategy.

### NEVER modify production tables directly

**This is the most important rule.** Any destructive or schema-changing operation — `ALTER TABLE`, `DROP COLUMN`, `DELETE`, `TRUNCATE`, bulk `UPDATE` — must be performed on a FORK, never on the original table. This applies to:

- The user's primary operation (e.g., adding a column, bulk update)
- **Any intermediate step** you perform to achieve the user's goal (e.g., stripping a column to work around a DIFF limitation)
- Workarounds, fixes, cleanup — every operation that could lose data or change schema

If you find yourself about to run a destructive statement on a production table, **stop and FORK first**.

### General guidelines

- **Fork is instant** — don't hesitate; fork for any experiment, however small
- **Default to FORK + RENAME** — only reach for DIFF/MERGE when both sides changed or multiple branches need merging
- **Use RENAME TABLE for atomic swap** — not DROP + CREATE; RENAME is a single atomic operation
- **Use FORK DATABASE** when you need a full-database copy — clones all tables in the database at once *(anon mode: use Fork Instance API instead)*
- **Drop forks when done** — diverging writes accumulate storage cost over time
- **DIFF before MERGE** — always review differences before merging when both sides changed
- **Name branches clearly** — `_sandbox`, `_experiment`, `_v2`, `_feature_x`

---

## Use Cases

### 1. Safe Experimentation (AI sandbox, DDL validation, index tuning)

Fork, experiment freely, swap in or discard. This is the most common workflow.

```sql
FORK TABLE user_profiles TO user_profiles_sandbox;

-- AI agent / developer experiments safely on the fork
ALTER TABLE user_profiles_sandbox ADD COLUMN embedding VECTOR(1024);
INSERT INTO user_profiles_sandbox (id, name, embedding) VALUES (1, 'test', '[0.1, ...]');

-- Swap atomically if happy
RENAME TABLE user_profiles TO user_profiles_backup,
             user_profiles_sandbox TO user_profiles;

-- Or discard the experiment
DROP TABLE user_profiles_sandbox;
```

Works the same for DDL validation, index strategy testing, or any one-way experiment.

### 2. Data Versioning (knowledge base, training data, config)

Fork before bulk updates, validate, swap. Roll back by swapping again.

```sql
FORK TABLE knowledge_base TO knowledge_base_v2;

INSERT INTO knowledge_base_v2 (doc, embedding) VALUES ('New FAQ', '[0.1, 0.2, ...]');
DELETE FROM knowledge_base_v2 WHERE outdated = 1;

-- Promote
RENAME TABLE knowledge_base TO knowledge_base_v1,
             knowledge_base_v2 TO knowledge_base;

-- Roll back if needed
RENAME TABLE knowledge_base TO knowledge_base_v2_bad,
             knowledge_base_v1 TO knowledge_base;
```

For per-customer variants, fork a shared base and add private content:

```sql
FORK TABLE public_knowledge TO customer_a_knowledge;
FORK TABLE public_knowledge TO customer_b_knowledge;
INSERT INTO customer_a_knowledge (doc, embedding) VALUES ('Customer A FAQ', '[...]');
```

### 3. Full-Database Feature Branch

When a feature needs a full-database copy, fork the entire database.

> **SeekDB D0:** `FORK DATABASE` is not available. Use the Fork Instance API (`POST /api/v1/instances/{instance_id}/fork`) to create an independent instance-level branch, then connect to it and continue.

```sql
FORK DATABASE app_db TO app_db_feature_search;  -- not available in anon mode

ALTER TABLE app_db_feature_search.documents ADD COLUMN embedding VECTOR(768);
CREATE VECTOR INDEX idx_emb ON app_db_feature_search.documents(embedding)
  WITH (distance=cosine, type=hnsw, lib=vsag);
INSERT INTO app_db_feature_search.search_config VALUES ('model', 'bge-m3');

-- Validate, then swap
RENAME DATABASE app_db TO app_db_backup,
                app_db_feature_search TO app_db;
```

### 4. AI Agent Debug Loop — using DIFF

Use DIFF to compare actual output against expected, feeding precise row-level errors back to the agent.

> **SeekDB D0:** Replace `FORK DATABASE test TO test_run` with Fork Instance API to get an isolated copy, then run the rest of the steps inside the fork instance.

```sql
-- Prepare baseline
CREATE TABLE test.result_template (id INT PRIMARY KEY, final_amount DECIMAL(10,2));
CREATE TABLE test.orders_input (id INT PRIMARY KEY, amount DECIMAL(10,2), status VARCHAR(20));
INSERT INTO test.orders_input VALUES (1, 100, 'pending'), (2, 200, 'pending');

FORK DATABASE test TO test_run;  -- not available in anon mode; use Fork Instance API

-- Actual result: agent's business logic
FORK TABLE test_run.result_template TO test_run.actual_result;
INSERT INTO test_run.actual_result
  SELECT id, CASE WHEN amount >= 150 THEN amount * 0.9 ELSE amount END
  FROM test_run.orders_input;

-- Expected result: known correct values
FORK TABLE test_run.result_template TO test_run.expected_result;
INSERT INTO test_run.expected_result VALUES (1, 100), (2, 180);

-- Compare: no output = pass; output rows = exact failures
DIFF TABLE test_run.actual_result AGAINST test_run.expected_result;
```

### 5. Reconciling Diverged Branches — using MERGE

When the main table received writes after you forked, RENAME would lose those writes. Use MERGE instead.

```sql
FORK TABLE prod.orders TO prod.orders_experiment;

-- Production receives new data after fork
INSERT INTO prod.orders VALUES (1001, 500, 'new_order');

-- Experiment modifies existing rows
UPDATE prod.orders_experiment SET status = 'reviewed' WHERE id = 42;
INSERT INTO prod.orders_experiment VALUES (1002, 300, 'experiment_order');

-- RENAME would lose id=1001. Review diff, then merge:
DIFF TABLE prod.orders AGAINST prod.orders_experiment;
-- (follow Pre-MERGE Checklist: present DIFF summary, warn about
--  deletions, ask user to choose strategy, then execute)
MERGE TABLE prod.orders_experiment INTO prod.orders STRATEGY ...;

DROP TABLE prod.orders_experiment;
```

### 6. Multi-Team Parallel Development — using DIFF + MERGE

Multiple teams fork from the same base, work independently, merge back sequentially.

> **SeekDB D0:** Replace each `FORK DATABASE` with Fork Instance API calls to create independent instance branches per team. Each team connects to their own instance. Table-level DIFF and MERGE still apply within each instance.

```sql
-- Each team forks (not available in anon mode; use Fork Instance API per team)
FORK DATABASE prod TO prod_risk_review;
FORK DATABASE prod TO prod_promo_campaign;

-- Risk team works
UPDATE prod_risk_review.orders SET risk_flag = 1 WHERE order_id = 10002;

-- Promo team works
UPDATE prod_promo_campaign.orders SET amount = amount * 0.9
  WHERE order_id IN (10001, 10002);

-- Merge one at a time (follow Pre-MERGE Checklist for each)
MERGE TABLE prod_risk_review.orders INTO prod.orders STRATEGY ...;

-- Review second merge for conflicts against updated main
DIFF TABLE prod.orders AGAINST prod_promo_campaign.orders;
MERGE TABLE prod_promo_campaign.orders INTO prod.orders STRATEGY ...;

DROP DATABASE prod_risk_review;
DROP DATABASE prod_promo_campaign;
```

### 7. Conflict Resolution with MERGE FAIL

When both sides diverge and you need to preserve data from both, use MERGE FAIL as a safety gate, resolve conflicts on a fork, then MERGE the resolved branch.

```
MERGE FAIL (detect conflicts, zero side effects)
  → DIFF (categorize conflicts)
  → FORK branch → resolve conflicts on the fork:
      • Value conflicts: combine values from both sides
      • PK collisions: DELETE + re-INSERT to get new IDs
  → MERGE resolved_fork INTO main STRATEGY THEIRS
      (conflict rows → resolved values, new rows → INSERT, main-only rows → keep)
  → Clean up forks
```

**Why MERGE after resolving:** MERGE handles UPDATE (resolved conflicts) + INSERT (new rows) atomically in one command.

| Conflict type | How to identify | Resolution |
|---------------|----------------|------------|
| Value conflict | Same PK, both sides changed | Combine desired values on fork |
| PK collision | Same PK, different rows entirely | DELETE + re-INSERT on fork for new ID |
| Single-side change | Same PK, only branch changed | No action needed |

---