Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Parser Plugins

Parser plugins let you transform non-JSON files into structured, queryable JSON when they are stored in AeorDB. AeorDB provides two parser mechanisms:

  1. Native parsers (built-in) – 8 parsers for common formats (text, HTML/XML, PDF, images, audio, video, MS Office, ODF) that run automatically with zero deployment. See Plugin Endpoints – Native Parsers for the full format list.
  2. WASM parser plugins – custom parsers compiled to WebAssembly, deployed per-table for proprietary or specialized formats.

Native parsers are tried first. If no native parser handles the content type, the engine falls through to any configured WASM parser.

How WASM Parsers Work

When a file is written to a table that has a WASM parser configured, AeorDB automatically routes the raw bytes through the parser’s WASM module. The parser receives the file data plus metadata and returns a JSON value. That JSON is then indexed by AeorDB’s query engine, making the original non-JSON file fully searchable.

Writing a Parser: Step by Step

1. Create a Rust Crate

cargo new my-parser --lib
cd my-parser

Edit Cargo.toml:

[package]
name = "my-parser"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["cdylib"]

[dependencies]
aeordb-plugin-sdk = { path = "../aeordb-plugin-sdk" }
serde_json = "1"

The crate-type = ["cdylib"] is required – it tells the compiler to produce a dynamic library suitable for WASM.

2. Implement the Parse Function

Use the aeordb_parser! macro to generate the WASM export boilerplate. Your job is to write a function that takes a ParserInput and returns Result<serde_json::Value, String>.

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;

aeordb_parser!(parse);

fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
    let text = std::str::from_utf8(&input.data)
        .map_err(|e| e.to_string())?;
    Ok(serde_json::json!({
        "text": text,
        "metadata": {
            "line_count": text.lines().count(),
            "word_count": text.split_whitespace().count(),
        }
    }))
}
}

The aeordb_parser! macro generates:

  • A global allocator for the WASM target
  • A handle(ptr, len) -> i64 export that deserializes the parser envelope, calls your function, and returns the serialized response as a packed pointer+length

You never interact with the raw WASM ABI directly.

3. Build for WASM

cargo build --target wasm32-unknown-unknown --release

The compiled module lands at:

target/wasm32-unknown-unknown/release/my_parser.wasm

4. Deploy the Parser

Upload the WASM binary to a table’s plugin deployment endpoint:

curl -X PUT \
  http://localhost:6830/mydb/myschema/mytable/_deploy \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/wasm" \
  --data-binary @target/wasm32-unknown-unknown/release/my_parser.wasm

5. Configure Content-Type Routing

Create or update /.config/parsers.json to route specific content types to your parser:

{
  "parsers": {
    "text/plain": "my-parser",
    "text/csv": "csv-parser",
    "application/pdf": "pdf-parser"
  }
}

When a file with a matching Content-Type is stored, AeorDB automatically invokes the corresponding parser.

6. Configure Indexing

Add the parser name to indexes.json so the parsed output is indexed:

{
  "indexes": [
    {
      "field": "text",
      "type": "fulltext"
    },
    {
      "field": "metadata.word_count",
      "type": "numeric"
    }
  ]
}

The ParserInput Struct

Your parse function receives a ParserInput with two fields:

FieldTypeDescription
dataVec<u8>Raw file bytes (already base64-decoded from the wire envelope)
metaFileMetaMetadata about the file being parsed

FileMeta Fields

FieldTypeDescription
filenameStringFile name only (e.g., "report.pdf")
pathStringFull storage path (e.g., "/docs/reports/report.pdf")
content_typeStringMIME type (e.g., "text/plain")
sizeu64Raw file size in bytes
hashStringHex-encoded content hash (may be empty)
hash_algorithmStringHash algorithm used (e.g., "blake3_256")
created_ati64Creation timestamp (ms since epoch, default 0)
updated_ati64Last update timestamp (ms since epoch, default 0)

Real-World Example: Plaintext Parser

The built-in plaintext parser (aeordb-parsers/plaintext) demonstrates a production parser:

#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;

aeordb_parser!(parse);

fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
    let text = std::str::from_utf8(&input.data)
        .map_err(|e| format!("not valid UTF-8: {}", e))?;

    let line_count = text.lines().count();
    let word_count = text.split_whitespace().count();
    let char_count = text.chars().count();
    let byte_count = input.data.len();

    // Extract first line as a "title" (common convention for text files)
    let title = text.lines().next().unwrap_or("").trim().to_string();

    // Detect if it looks like source code
    let has_braces = text.contains('{') && text.contains('}');
    let has_imports = text.contains("import ")
        || text.contains("use ")
        || text.contains("#include");
    let looks_like_code = has_braces || has_imports;

    Ok(serde_json::json!({
        "text": text,
        "metadata": {
            "filename": input.meta.filename,
            "content_type": input.meta.content_type,
            "size": byte_count,
            "line_count": line_count,
            "word_count": word_count,
            "char_count": char_count,
        },
        "title": title,
        "looks_like_code": looks_like_code,
    }))
}
}

This parser:

  • Validates UTF-8 encoding (returns an error for binary data)
  • Extracts text statistics (lines, words, characters)
  • Pulls the first line as a title
  • Heuristically detects source code

Error Handling

Return Err(String) from your parse function to signal a failure. AeorDB will store the error in the parser response and the file will not be indexed. The original file is still stored – only parsing/indexing is skipped.

#![allow(unused)]
fn main() {
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
    if input.data.is_empty() {
        return Err("empty file".to_string());
    }
    // ...
}
}

See Also

  • Query Plugins – plugins that query the database and return custom responses
  • SDK Reference – complete type reference for the plugin SDK