---
name: seekdb-search
description: Expert guide for seekdb-d0 search capabilities including vector search, full-text search, and hybrid search. Use when users ask about seekdb search features, search SQL syntax, MATCH AGAINST, vector similarity, IVF_FLAT/IVF_PQ indexes, full-text indexes, hybrid search, DBMS_HYBRID_SEARCH, or building search applications with seekdb. Note - D0 project does NOT support AI Functions or HNSW indexes.
---

# seekdb Search Expert (D0 Project)

Specialized knowledge base for seekdb-d0's search capabilities. Provides direct answers about vector search, full-text search, and hybrid search without needing to look up external docs for common questions.

> **D0 Project Constraints**: This project does NOT support AI Function (AI_RERANK, AI_EMBED, AI_COMPLETE, etc.) or HNSW-family indexes (HNSW, HNSW_SQ, HNSW_BQ). Only IVF-family indexes are available for vector search.

## Quick Reference: Search Capabilities Overview

seekdb-d0 supports three search modalities:

| Search Type | Use Case | Key Feature |
|-------------|----------|-------------|
| **Vector Search** | Semantic similarity matching | IVF_FLAT/IVF_PQ indexes, up to 16K dimensions |
| **Full-Text Search** | Keyword matching and relevance ranking | Multiple tokenizers (Space, Ngram, IK, Beng, jieba) |
| **Hybrid Search** | Combined keyword + semantic search | `DBMS_HYBRID_SEARCH.SEARCH` with boost tuning |

---

## 1. Vector Search

### Core Concepts
- Supports dense vectors up to **16,000 dimensions** and sparse vectors
- Distance metrics: **L2 (Euclidean)**, **inner product**, **cosine**, Manhattan
- Index types: **IVF_FLAT**, **IVF_PQ** (max indexed dimension: 4096)
- Supports incremental updates and deletions without affecting recall
- Recommended workflow: write all data first, then create index for best performance

### Create Table with Vector Index

```sql
CREATE TABLE items (
    id INT PRIMARY KEY,
    doc VARCHAR(200),
    embedding VECTOR(3),
    VECTOR INDEX idx1(embedding) WITH (distance=L2, type=ivf_flat)
);
```

Key parameters for `VECTOR INDEX ... WITH (...)`:
- `distance`: `l2` | `inner_product` | `cosine` (required)
- `type`: `ivf_flat` | `ivf_pq` (required)
- `lib`: `ob` (default, only supported library for IVF)
- `nlist`: number of cluster centers, default 128, range [1, 65536]
- `sample_per_nlist`: samples per cluster center, default 256 (required for post-creation index)
- IVF_PQ-specific: `m` (quantized dimension, required), `nbits` (quantization bits, default 8, range [1, 24])

### Exact Search (Full Scan)

```sql
SELECT id, doc FROM items
ORDER BY l2_distance(embedding, '[0.9, 1.0, 0.9]')
LIMIT 5;
```

### Approximate Search (Using Vector Index)

```sql
SELECT id, doc FROM items
ORDER BY l2_distance(embedding, '[0.9, 1.0, 0.9]')
APPROXIMATE
LIMIT 5;
```

The `APPROXIMATE` (or `APPROX`) keyword triggers vector index scan instead of full table scan.

### Distance Functions
- `l2_distance(v1, v2)` — Euclidean distance
- `inner_product(v1, v2)` — Inner product
- `cosine_distance(v1, v2)` — Cosine distance
- `negative_inner_product(v1, v2)` — Negative inner product

### Vector Data Type
- Declare: `VECTOR(dim)` where dim is the number of dimensions
- Insert: `'[1.0, 2.0, 3.0]'` (JSON array string format)
- NULL first comparison mode by default — add `NOT NULL` condition in queries

### Comparison: Exact vs Approximate Search

| Aspect | Exact Search | Approximate Search |
|--------|-------------|-------------------|
| Operator | `TABLE FULL SCAN` + `TOP-N SORT` | `VECTOR INDEX SCAN` |
| Accuracy | 100% | Approximately accurate |
| Performance | Degrades with data size | Stable |
| Best for | Small data, high accuracy needs | Large datasets, real-time queries |

---

## 2. Full-Text Search

### Supported Column Types
Full-text indexes can be created on `CHAR`, `VARCHAR`, and `TEXT` columns.

### Create Full-Text Index

Three methods:

```sql
-- Method 1: During table creation
CREATE TABLE articles (
    id INT AUTO_INCREMENT PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    FULLTEXT INDEX ft1(content) WITH PARSER SPACE
);

-- Method 2: CREATE FULLTEXT INDEX
CREATE FULLTEXT INDEX ft_idx ON articles(content)
    WITH PARSER IK
    PARSER_PROPERTIES=(ik_mode='smart');

-- Method 3: ALTER TABLE
ALTER TABLE articles ADD FULLTEXT INDEX ft_idx(content)
    WITH PARSER NGRAM
    PARSER_PROPERTIES=(ngram_token_size=2);
```

### Tokenizers

| Tokenizer | Best For | Key Parameter |
|-----------|----------|---------------|
| **SPACE** (default) | English text, space-separated | `min_token_size` [1,16], `max_token_size` [10,84] |
| **BENG** | Basic English (treats `_` as separator) | `min_token_size` [1,16], `max_token_size` [10,84] |
| **NGRAM** | Fuzzy matching, short texts, Chinese | `ngram_token_size` [1,10] (default 2) |
| **NGRAM2** | Multiple fixed-length tokens | `min_ngram_size` [1,16], `max_ngram_size` [1,16] |
| **IK** | Chinese text segmentation | `ik_mode`: `smart` (default) or `max_word` |
| **jieba** | Chinese/multilingual (experimental) | `jieba_mode`: precise, full, search |

### Query with MATCH ... AGAINST

```sql
-- Find matching documents sorted by relevance
SELECT id, title, MATCH(content) AGAINST('seekdb database') AS score
FROM articles
WHERE MATCH(content) AGAINST('seekdb database');
```

### Tokenizer Selection Guide
- English title search → **Space** or **Beng**
- Chinese product descriptions → **IK** (smart mode)
- Fuzzy match logs/error codes → **Ngram**
- Technical paper keywords → **jieba**

### Test Tokenization

```sql
SELECT TOKENIZE('your text here', 'space');
SELECT TOKENIZE('your text', 'IK', '[{"additional_args":[{"ik_mode": "smart"}]}]');
```

### Limitations
- Only **LOCAL** full-text indexes supported
- Cannot use `UNIQUE` keyword
- Multi-column indexes require same character set
- Columnstore full-text indexes not supported

---

## 3. Hybrid Search

Combines vector search (semantic) + full-text search (keyword) via `DBMS_HYBRID_SEARCH` package.

> **Important**: Hybrid search is only supported on **heap tables** (tables without primary key clustering).

### API

| Method | Description |
|--------|------------|
| `DBMS_HYBRID_SEARCH.SEARCH(table, params)` | Returns JSON results sorted by relevance |
| `DBMS_HYBRID_SEARCH.GET_SQL(table, params)` | Returns the actual SQL being executed |

### Pure Vector Search

```sql
SET @parm = '{
    "knn": {
        "field": "vector",
        "k": 3,
        "query_vector": [1,2,3]
    }
}';
SELECT JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
```

### Pure Full-Text Search

```sql
SET @parm = '{
    "query": {
        "query_string": {
            "fields": ["query", "content"],
            "query": "hello oceanbase"
        }
    }
}';
SELECT JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
```

### Combined Hybrid Search

```sql
SET @parm = '{
    "query": {
        "query_string": {
            "fields": ["query", "content"],
            "query": "hello oceanbase"
        }
    },
    "knn": {
        "field": "vector",
        "k": 5,
        "query_vector": [1,2,3]
    }
}';
SELECT JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
```

### Boost Parameter Tuning

Control weights of keyword vs semantic search:

```sql
SET @parm = '{
    "query": {
        "query_string": {
            "fields": ["query", "content"],
            "query": "hello oceanbase",
            "boost": 2.0
        }
    },
    "knn": {
        "field": "vector",
        "k": 5,
        "query_vector": [1,2,3],
        "boost": 1.0
    }
}';
```

The final `_score` = `_keyword_score` + `_semantic_score` (weighted by boost).

### Table Setup for Hybrid Search

```sql
CREATE TABLE doc_table(
    c1 INT,
    vector VECTOR(3),
    query VARCHAR(255),
    content VARCHAR(255),
    VECTOR INDEX idx1(vector) WITH (distance=l2, type=ivf_flat),
    FULLTEXT INDEX idx2(query),
    FULLTEXT INDEX idx3(content)
);
```

---

## Need More Details?

For questions beyond this quick reference, install the full `seekdb` documentation skill which covers ~1000 doc entries with semantic search.

Install guide: https://www.oceanbase.ai/docs/agent-skills