Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Pre-Hashed Upload Protocol

AeorDB provides a 4-phase upload protocol for efficient, deduplicated file transfers. Clients split files into chunks, hash them locally, and only upload chunks the server does not already have.

Protocol Overview

  1. Negotiate – GET /upload/config to learn the hash algorithm and chunk size.
  2. Dedup check – POST /upload/check with a list of chunk hashes to find which are already stored.
  3. Upload – PUT /upload/chunks/{hash} for each needed chunk.
  4. Commit – POST /upload/commit to atomically assemble chunks into files.

Endpoint Summary

MethodPathDescriptionAuthBody Limit
GET/upload/configNegotiate hash algorithm and chunk sizeNo
POST/upload/checkCheck which chunks the server already hasYes1 MB
PUT/upload/chunks/{hash}Upload a single chunkYes10 GB
POST/upload/commitAtomic multi-file commit from chunksYes1 MB

Phase 1: GET /upload/config

Retrieve the server’s hash algorithm, chunk size, and hash prefix. This endpoint is public (no authentication required).

Response

Status: 200 OK

{
  "hash_algorithm": "blake3",
  "chunk_size": 262144,
  "chunk_hash_prefix": "chunk:"
}
FieldTypeDescription
hash_algorithmstringHash algorithm used by the server (e.g., "blake3")
chunk_sizeintegerMaximum chunk size in bytes (262,144 = 256 KB)
chunk_hash_prefixstringPrefix prepended to chunk data before hashing

How to Compute Chunk Hashes

The server computes chunk hashes as:

hash = blake3("chunk:" + chunk_bytes)

Clients must use the same formula. The prefix ("chunk:") is prepended to the raw bytes before hashing, not to the hex-encoded hash.

Example

curl http://localhost:3000/upload/config

Phase 2: POST /upload/check

Send a list of chunk hashes to determine which ones the server already has (deduplication). Only upload the ones in the needed list.

Request Body

{
  "hashes": [
    "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2",
    "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5"
  ]
}
FieldTypeRequiredDescription
hashesarray of stringsYesHex-encoded chunk hashes

Response

Status: 200 OK

{
  "have": [
    "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2"
  ],
  "needed": [
    "f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5d4c3b2a1f6e5"
  ]
}
FieldTypeDescription
havearrayHashes the server already has – skip these
neededarrayHashes the server needs – upload these

Example

curl -X POST http://localhost:3000/upload/check \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"hashes": ["a1b2c3...", "f6e5d4..."]}'

Error Responses

StatusCondition
400Invalid hex hash in the list

Phase 3: PUT /upload/chunks/

Upload a single chunk. The server verifies the hash matches the content before storing.

Request

  • URL parameter: {hash} – hex-encoded blake3 hash of "chunk:" + chunk_bytes
  • Headers:
    • Authorization: Bearer <token> (required)
  • Body: raw chunk bytes

Hash Verification

The server recomputes the hash from the uploaded bytes:

computed = blake3("chunk:" + body_bytes)

If the computed hash does not match the URL parameter, the upload is rejected.

Response

Status: 201 Created (new chunk stored)

{
  "status": "created",
  "hash": "f6e5d4c3b2a1..."
}

Status: 200 OK (chunk already exists – dedup)

{
  "status": "exists",
  "hash": "f6e5d4c3b2a1..."
}

Compression

The server automatically applies Zstd compression to chunks when beneficial (based on size heuristics). This is transparent to the client.

Example

curl -X PUT http://localhost:3000/upload/chunks/f6e5d4c3b2a1... \
  -H "Authorization: Bearer $TOKEN" \
  --data-binary @chunk_001.bin

Error Responses

StatusCondition
400Chunk exceeds maximum size (262,144 bytes)
400Invalid hex hash in URL
400Hash mismatch between URL and computed hash
500Storage failure

Phase 4: POST /upload/commit

Atomically commit multiple files from previously uploaded chunks. Each file specifies its path, content type, and the ordered list of chunk hashes that compose it.

Request Body

{
  "files": [
    {
      "path": "/data/report.pdf",
      "content_type": "application/pdf",
      "chunk_hashes": [
        "a1b2c3d4e5f6...",
        "f6e5d4c3b2a1..."
      ]
    },
    {
      "path": "/data/image.png",
      "content_type": "image/png",
      "chunk_hashes": [
        "1234abcd5678..."
      ]
    }
  ]
}
FieldTypeRequiredDescription
filesarrayYesList of files to commit
files[].pathstringYesDestination path for the file
files[].content_typestringNoMIME type
files[].chunk_hashesarrayYesOrdered list of hex-encoded chunk hashes

Response

Status: 200 OK

The response contains a summary of the commit operation.

Example

curl -X POST http://localhost:3000/upload/commit \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [
      {
        "path": "/data/report.pdf",
        "content_type": "application/pdf",
        "chunk_hashes": ["a1b2c3d4...", "f6e5d4c3..."]
      }
    ]
  }'

Error Responses

StatusCondition
400Invalid input (missing path, bad hash, etc.)
500Commit task failure or panic

Full Upload Workflow

Here is a complete workflow for uploading a file:

# 1. Get server configuration
CONFIG=$(curl -s http://localhost:3000/upload/config)
CHUNK_SIZE=$(echo $CONFIG | jq -r '.chunk_size')

# 2. Split file into chunks and hash them
# (pseudo-code: split report.pdf into 256KB chunks, hash each with blake3)
# chunk_hashes=["hash1", "hash2", ...]

# 3. Check which chunks are needed
DEDUP=$(curl -s -X POST http://localhost:3000/upload/check \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"hashes": ["hash1", "hash2"]}')

# 4. Upload only the needed chunks
for hash in $(echo $DEDUP | jq -r '.needed[]'); do
  curl -X PUT "http://localhost:3000/upload/chunks/$hash" \
    -H "Authorization: Bearer $TOKEN" \
    --data-binary @"chunk_$hash.bin"
done

# 5. Commit the file
curl -X POST http://localhost:3000/upload/commit \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "files": [{
      "path": "/data/report.pdf",
      "content_type": "application/pdf",
      "chunk_hashes": ["hash1", "hash2"]
    }]
  }'