Parser Plugins
Parser plugins let you transform non-JSON files into structured, queryable JSON when they are stored in AeorDB. AeorDB provides two parser mechanisms:
- Native parsers (built-in) – 8 parsers for common formats (text, HTML/XML, PDF, images, audio, video, MS Office, ODF) that run automatically with zero deployment. See Plugin Endpoints – Native Parsers for the full format list.
- WASM parser plugins – custom parsers compiled to WebAssembly, deployed per-table for proprietary or specialized formats.
Native parsers are tried first. If no native parser handles the content type, the engine falls through to any configured WASM parser.
How WASM Parsers Work
When a file is written to a table that has a WASM parser configured, AeorDB automatically routes the raw bytes through the parser’s WASM module. The parser receives the file data plus metadata and returns a JSON value. That JSON is then indexed by AeorDB’s query engine, making the original non-JSON file fully searchable.
Writing a Parser: Step by Step
1. Create a Rust Crate
cargo new my-parser --lib
cd my-parser
Edit Cargo.toml:
[package]
name = "my-parser"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["cdylib"]
[dependencies]
aeordb-plugin-sdk = { path = "../aeordb-plugin-sdk" }
serde_json = "1"
The crate-type = ["cdylib"] is required – it tells the compiler to produce a dynamic library suitable for WASM.
2. Implement the Parse Function
Use the aeordb_parser! macro to generate the WASM export boilerplate. Your job is to write a function that takes a ParserInput and returns Result<serde_json::Value, String>.
#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;
aeordb_parser!(parse);
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
let text = std::str::from_utf8(&input.data)
.map_err(|e| e.to_string())?;
Ok(serde_json::json!({
"text": text,
"metadata": {
"line_count": text.lines().count(),
"word_count": text.split_whitespace().count(),
}
}))
}
}
The aeordb_parser! macro generates:
- A global allocator for the WASM target
- A
handle(ptr, len) -> i64export that deserializes the parser envelope, calls your function, and returns the serialized response as a packed pointer+length
You never interact with the raw WASM ABI directly.
3. Build for WASM
cargo build --target wasm32-unknown-unknown --release
The compiled module lands at:
target/wasm32-unknown-unknown/release/my_parser.wasm
4. Deploy the Parser
Upload the WASM binary to a table’s plugin deployment endpoint:
curl -X PUT \
http://localhost:6830/mydb/myschema/mytable/_deploy \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/wasm" \
--data-binary @target/wasm32-unknown-unknown/release/my_parser.wasm
5. Configure Content-Type Routing
Create or update /.config/parsers.json to route specific content types to your parser:
{
"parsers": {
"text/plain": "my-parser",
"text/csv": "csv-parser",
"application/pdf": "pdf-parser"
}
}
When a file with a matching Content-Type is stored, AeorDB automatically invokes the corresponding parser.
6. Configure Indexing
Add the parser name to indexes.json so the parsed output is indexed:
{
"indexes": [
{
"field": "text",
"type": "fulltext"
},
{
"field": "metadata.word_count",
"type": "numeric"
}
]
}
The ParserInput Struct
Your parse function receives a ParserInput with two fields:
| Field | Type | Description |
|---|---|---|
data | Vec<u8> | Raw file bytes (already base64-decoded from the wire envelope) |
meta | FileMeta | Metadata about the file being parsed |
FileMeta Fields
| Field | Type | Description |
|---|---|---|
filename | String | File name only (e.g., "report.pdf") |
path | String | Full storage path (e.g., "/docs/reports/report.pdf") |
content_type | String | MIME type (e.g., "text/plain") |
size | u64 | Raw file size in bytes |
hash | String | Hex-encoded content hash (may be empty) |
hash_algorithm | String | Hash algorithm used (e.g., "blake3_256") |
created_at | i64 | Creation timestamp (ms since epoch, default 0) |
updated_at | i64 | Last update timestamp (ms since epoch, default 0) |
Real-World Example: Plaintext Parser
The built-in plaintext parser (aeordb-parsers/plaintext) demonstrates a production parser:
#![allow(unused)]
fn main() {
use aeordb_plugin_sdk::aeordb_parser;
use aeordb_plugin_sdk::parser::*;
aeordb_parser!(parse);
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
let text = std::str::from_utf8(&input.data)
.map_err(|e| format!("not valid UTF-8: {}", e))?;
let line_count = text.lines().count();
let word_count = text.split_whitespace().count();
let char_count = text.chars().count();
let byte_count = input.data.len();
// Extract first line as a "title" (common convention for text files)
let title = text.lines().next().unwrap_or("").trim().to_string();
// Detect if it looks like source code
let has_braces = text.contains('{') && text.contains('}');
let has_imports = text.contains("import ")
|| text.contains("use ")
|| text.contains("#include");
let looks_like_code = has_braces || has_imports;
Ok(serde_json::json!({
"text": text,
"metadata": {
"filename": input.meta.filename,
"content_type": input.meta.content_type,
"size": byte_count,
"line_count": line_count,
"word_count": word_count,
"char_count": char_count,
},
"title": title,
"looks_like_code": looks_like_code,
}))
}
}
This parser:
- Validates UTF-8 encoding (returns an error for binary data)
- Extracts text statistics (lines, words, characters)
- Pulls the first line as a title
- Heuristically detects source code
Error Handling
Return Err(String) from your parse function to signal a failure. AeorDB will store the error in the parser response and the file will not be indexed. The original file is still stored – only parsing/indexing is skipped.
#![allow(unused)]
fn main() {
fn parse(input: ParserInput) -> Result<serde_json::Value, String> {
if input.data.is_empty() {
return Err("empty file".to_string());
}
// ...
}
}
See Also
- Query Plugins – plugins that query the database and return custom responses
- SDK Reference – complete type reference for the plugin SDK