Architecture

AeorDB is a single-file database built on an append-only write-ahead log (WAL). The database file contains all data, indexes, and metadata in one place. Understanding the architecture helps you reason about performance, recovery, and versioning behavior.

High-Level Overview

                         aeordb start
                             |
                     +-------+-------+
                     |  HTTP Server  |
                     |  (axum)       |
                     +-------+-------+
                             |
              +--------------+--------------+
              |              |              |
        +-----+----+  +-----+----+  +------+------+
        | Query    |  | Plugin   |  | Version     |
        | Engine   |  | Manager  |  | Manager     |
        +-----+----+  +-----+----+  +------+------+
              |              |              |
              +--------------+--------------+
                             |
                    +--------+--------+
                    | Storage Engine  |
                    | (StorageEngine) |
                    +--------+--------+
                             |
              +--------------+--------------+
              |              |              |
        +-----+----+  +-----+----+  +------+------+
        | Append   |  | KV Store |  | NVT         |
        | Writer   |  | (.kv)    |  | (in-memory) |
        +----------+  +----------+  +-------------+
              |              |
              +--------------+
                    |
            [  mydb.aeordb  ]    <-- single file on disk
            [ mydb.aeordb.kv ]   <-- KV index file

The Database File (`.aeordb`)

The .aeordb file is an append-only WAL. Every write appends a new entry to the end of the file. Entries are never modified in place (except during garbage collection).

File Layout

[File Header - 256 bytes]
  Magic: "AEOR"
  Hash algorithm, timestamps, KV/NVT pointers, HEAD hash, entry count

[Entry 1] [Entry 2] [Entry 3] ... [Entry N]
  Chunks, FileRecords, DirectoryIndexes, Snapshots, DeletionRecords, Voids

The 256-byte file header contains pointers to the KV block, NVT, and the current HEAD hash. Every entry carries its own header with magic bytes, type tag, hash algorithm, compression flag, key, and value.

Entry Types

Type	Purpose
Chunk	Raw file data (256KB blocks)
FileRecord	File metadata + ordered list of chunk hashes
DirectoryIndex	Directory contents (child entries with hashes)
Snapshot	Named point-in-time version reference
DeletionRecord	Marks a file as deleted (for version history completeness)
Void	Free space marker (reclaimable by future writes)

The KV Index File (`.aeordb.kv`)

The KV store is a sorted array of (hash, offset) pairs stored in a separate file. It maps content hashes to byte offsets in the main .aeordb file, providing O(1) lookups when combined with the NVT.

Each entry is hash_length + 8 bytes (40 bytes for BLAKE3-256). The entries are sorted by hash, and the NVT tells you which bucket to look in, so lookups are a single seek + small scan.

KV Resize

When the KV store needs to grow, the engine enters a brief resize mode:

A temporary buffer KV store is created
New writes go to the buffer (no blocking)
The primary KV store is expanded
Buffer contents are merged into the primary
Buffer is discarded

Writes never block during resize.

NVT (Normalized Vector Table)

The NVT is an in-memory structure that provides fast hash-to-bucket lookups for the KV store.

How It Works

Normalize the hash to a scalar: first_8_bytes_as_u64 / u64::MAX produces a value in [0.0, 1.0]
Map the scalar to a bucket: bucket_index = floor(scalar * num_buckets)
The bucket points to a range in the KV store – scan that range for the exact hash

BLAKE3 hashes are uniformly distributed, so buckets stay balanced without manual tuning. The NVT starts at 1,024 buckets and doubles when the average scan length exceeds a threshold.

Scaling

Entries	NVT Buckets	NVT Memory	Avg Scan
10,000	1,024	16 KB	~10
1,000,000	65,536	1 MB	~15
100,000,000	1,048,576	16 MB	~95

Hot File WAL (Crash Recovery)

The --hot-dir flag specifies a directory for write-ahead hot files. During a write:

The entry is written to a hot file first (fsync’d)
The entry is then written to the main .aeordb file
On success, the hot file entry is cleared

If the process crashes between steps 1 and 2, the hot file is replayed on the next startup to recover uncommitted writes. If --hot-dir is not specified, the hot directory defaults to the same directory as the database file.

Snapshot Double-Buffering

AeorDB uses ArcSwap for lock-free concurrent reads. The in-memory directory state is wrapped in an Arc that readers clone cheaply. When a write completes:

The writer builds a new directory state
The new state is swapped in atomically via ArcSwap::store
Readers holding the old Arc continue using it until they finish
The old state is dropped when the last reader releases it

This means:

Readers never block writers
Writers never block readers
Every read sees a consistent point-in-time snapshot
No read locks, no write locks on the read path

B-Tree Directories

Small directories (under 256 entries) are stored as flat lists of child entries. When a directory exceeds 256 entries, the engine automatically converts it to a B-tree structure. This keeps directory lookups O(log n) even for directories with millions of files.

B-tree nodes are themselves stored as content-addressed entries, so they participate in versioning and structural sharing just like any other data.

Directory Propagation

When a file changes, the engine propagates the update up the directory tree:

Write /users/alice.json
  -> update /users/ directory (new child hash for alice.json)
    -> update / root directory (new child hash for users/)
      -> update HEAD (new root hash)

Each directory gets a new content hash because its contents changed. This is how the Merkle tree works – a change at any leaf creates new hashes all the way to the root. The root hash (HEAD) uniquely identifies the complete state of the database.

Next Steps

Storage Engine – entry format, hashing, chunking, and dedup details
Versioning – how snapshots, forks, and diff/patch work
Indexing & Queries – how indexes are built and queried

Keyboard shortcuts

AeorDB Documentation