Document Classification: Engineering Paper
Status: Published - December 2025

The Engram Archive Format: Durable Knowledge Containers with Embedded Database Access

Production specification v1.0.0 addresses fundamental limitations in contemporary archive systems through table-of-contents placement, per-file compression, and Virtual File System integration

Full specification: Engram Format Specification
Plain HTML fallback for the engineering paper. The canonical markdown remains available at /papers/engram-format-announcement.md.
# The Engram Archive Format## Executive summaryThe Engram archive format (.eng) has reached production specification status after multi-year development at Blackfall Laboratories. It addresses limitations in contemporary archive systems, especially Electron's ASAR format and conventional ZIP archives, through three architectural choices: an end-placed table of contents for streaming creation, per-file compression that preserves true random access, and Virtual File System integration that allows direct SQLite queries without extraction. Performance measurements show 80 to 90 percent of native filesystem query speed while eliminating extraction overhead.The specification is permanently open. The reference implementation, engram-rs, is published to crates.io under the MIT license, enabling integration across preservation workflows independent of Laboratory continuity.## 1.0 Fundamental problems in contemporary archive formats### 1.1 The ASAR limitationElectron's ASAR format demonstrates that uncompressed archives with JSON manifests can achieve filesystem-equivalent performance. However, that approach sacrifices storage efficiency by distributing 100 MB of text and databases as 100 MB rather than the 40 to 60 MB possible through compression. The format provides no structured query capability, and accessing SQLite databases requires full extraction to temporary storage.### 1.2 The ZIP compromiseZIP archives employ whole-archive or per-file compression but suffer compression-induced access penalties. Extracting individual files from compressed archives often requires decompressing unrelated content. The format also lacks database integration, so embedded SQLite files remain opaque blobs that must be extracted before query execution.### 1.3 The streaming creation problemFormats that place file manifests at the beginning (ASAR, TAR with GNU extensions) require full file enumeration before writing data. Writers must enumerate all input files during initialization, which is impossible for streaming inputs, or reserve estimated space for the directory, introducing padding waste or overflow complexity.## 2.0 The Engram synthesis### 2.1 End-placed central directoryEngram places the file manifest at the end of the archive rather than the header. This enables streaming creation without format compromises. Archive construction proceeds without foreknowledge of the complete file set: writers append entries sequentially and finalize structural metadata when complete. This matches natural workflows such as repository snapshots, build artifact collection, and incremental backups.> **Read performance preservation**> Reading the terminal block to locate the end record triggers operating system read-ahead caching. The central directory read, typically 1 to 5 MB even for large archives, completes in single-digit milliseconds on contemporary storage. Once loaded, the in-memory hash index provides O(1) file lookup regardless of directory placement.### 2.2 Per-file compression maintaining random accessPer-file compression preserves true random access. Extracting file N requires: (1) hash lookup in the central directory, O(1), (2) seek to data offset, O(1), (3) read compressed bytes, and (4) decompress only the target file. Total operations remain constant regardless of archive size. Whole-archive compression destroys this property; extracting the final file from a 10 GB archive requires decompressing 10 GB of intermediate data.| Compression Method   | Decompression Speed      | Use Case                                                           || -------------------- | ------------------------ | ------------------------------------------------------------------ || LZ4 (Method 1)       | 2 to 4 GB/s              | Speed-critical content, frequently accessed configuration          || Zstandard (Method 2) | 400 to 800 MB/s          | Text, databases, balanced compression (40 to 50 percent reduction) || Frame-based          | Selective (64 KB frames) | Large files over 50 MB, partial decompression                      |### 2.3 Virtual File System integrationSQLite databases embedded within archives accept standard SQL queries through a Virtual File System abstraction. Query execution runs at 80 to 90 percent of native filesystem performance without intermediate extraction. A 50 MB database with 2 million rows executes point lookups in 0.12 ms (cold cache) versus 0.08 ms on native filesystems. Complex joins and aggregations show narrower deltas because computation dominates I/O.| Query Type              | Native FS | Engram (Cold) | Engram (Warm) || ----------------------- | --------- | ------------- | ------------- || Point lookup (indexed)  | 0.08 ms   | 0.12 ms       | 0.09 ms       || Range scan (1K rows)    | 12 ms     | 18 ms         | 13 ms         || Complex join (5 tables) | 45 ms     | 68 ms         | 48 ms         |## 3.0 Format architecture### 3.1 Binary structure```[File Header: 64 bytes][File Data Region: LOCA headers + compressed payloads][Central Directory: 320 bytes per entry, fixed-width][End Record: 64 bytes, enables backward scan]```### 3.2 Operational characteristics- **O(1) file lookup** via HashMap-indexed central directory- **16.5 ms archive open time** for 10,000-file collections- **Sub-millisecond extraction** for files under 10 MB- **Memory consumption**: about 420 bytes per file for the index- **Fixed 320-byte entries** enable array indexing and binary search> **Frame-based compression**> Large files over 50 MB can use frame-based compression, dividing content into 64 KB chunks compressed independently. When SQLite's VFS requests bytes 2,000,000 to 2,004,096, the system decompresses only frames 30 and 31 (128 KB total) rather than the entire multi-gigabyte database. For a 2 GB SQLite database, a typical query touching 10 pages (160 KB) decompresses three frames (192 KB compressed) and consumes 1 to 2 ms. Equivalent whole-file decompression would require 2 to 3 seconds.## 4.0 Production applications### 4.1 Immutable knowledge distributionInstitutions can distribute documentation, regulatory frameworks, and technical references as single queryable artifacts. Recipients execute SQL against embedded databases without extraction overhead, turning preservation archives into operational knowledge systems rather than static file containers.### 4.2 Long-term preservationArchives created under this specification remain readable across technological transitions without migration-induced data loss. Fixed-width fields, explicit versioning, and reserved extension space ensure stability across multi-decade timescales. Cryptographic verification (Ed25519 signatures, SHA-256 hashes) maintains authenticity independent of vendor continuity.### 4.3 Offline-first intelligence systemsThe Societal Advisory Module (SAM) uses Engram archives as modular knowledge containers. Knowledge Engrams contain domain-specific databases (medical procedures, technical documentation, regulatory frameworks) queryable through VFS integration. Skill Engrams provide executable modules with opcode dictionaries. Hot-swap architecture allows runtime knowledge module replacement via atomic registry snapshots so systems extend capabilities without service interruption.> **SAM architecture integration**> Retrieval Augmented Generation achieves sub-millisecond knowledge retrieval through hybrid vector search (HNSW) and full-text search (SQLite FTS5) against Engram-backed databases. Systems query preserved knowledge at 80 to 90 percent of native filesystem performance while remaining fully offline.## 5.0 Reference implementation: engram-rs v1.0.0Blackfall Laboratories publishes the reference implementation to crates.io under the MIT license. The engram-rs library provides full format compliance with production-grade performance characteristics.### 5.1 Core components- ArchiveWriter: streaming archive creation with compression selection- ArchiveReader: random access extraction with zero-copy optimization- VfsReader: SQLite VFS for embedded database queries- Manifest: TOML-to-JSON conversion and signature management### 5.2 Installation```[dependencies]engram-rs = "1.0.0"```### 5.3 Production characteristics- 23 comprehensive tests ensuring specification compliance- Zero-copy extraction for uncompressed files- SIMD-accelerated decompression (LZ4, Zstandard)- Frame-based compression for files over 50 MB- Clippy-clean implementation with warnings-as-errors enforcementThe engram-cli command-line tool provides operator-facing capabilities for archive manipulation, database queries, and cryptographic verification.### 5.4 Example usage```# Create archive from directoryengram pack /path/to/data -o knowledge.eng --compression zstd# Query embedded databaseengram query knowledge.eng database.db "SELECT * FROM users WHERE active=1"# Verify signatures and integrityengram verify knowledge.eng --manifest```## 6.0 AI-mediated development and durable systemsThe development of engram-rs demonstrates architectural patterns for supervised machine intelligence in systems engineering contexts.### 6.1 Continuous AI collaborationengram-rs development employed Claude Code as a primary implementation partner under operator supervision. Machine intelligence generated test coverage, iteratively refined implementation through automated linting (clippy), and produced documentation including integration examples. All architectural decisions remained subject to human approval; the AI served as an accelerant, not an autonomous agent.### 6.2 Significance for preservation systemsThe combination of the Engram preservation format and supervised AI development yields systems with multi-decade operational characteristics:- Knowledge bases preserved in format-independent containers resistant to software obsolescence- AI assistance accelerates development without introducing runtime dependencies on cloud services- All intelligence operates offline, ensuring continuity independent of network availability- Cryptographic verification maintains knowledge integrity across institutional timescales> **Offline-first intelligence architecture**> The Societal Advisory Module demonstrates offline-first intelligence employing Engram archives as modular knowledge containers. Retrieval Augmented Generation achieves sub-millisecond retrieval through hybrid vector search (HNSW) and full-text search (SQLite FTS5) against Engram-backed databases.## 7.0 Open specification commitmentThe Engram format specification remains permanently open and implementation-independent. The v1.0.0 specification documents complete binary structure, compression algorithms, VFS integration patterns, and validation requirements without restriction.Alternative implementations in any programming language may use the specification without license constraints. The format serves institutional preservation needs that transcend individual vendor lifecycles; the specification guarantees format stability independent of Laboratory continuity.### 7.1 Specification access- Complete specification: https://github.com/blackfall-labs/engram-rs/blob/main/SPECIFICATION-FULL.md- Reference implementation: https://github.com/blackfall-labs/engram-rs- Command-line tool: https://github.com/blackfall-labs/engram-cli- Crates.io publication: https://crates.io/crates/engram-rs- API documentation: https://docs.rs/engram-rs## 8.0 Next steps for integration### 8.1 Evaluate applicabilityThe format suits immutable distribution, embedded database access, and long-term preservation. Avoid it for frequent incremental updates (use the Cartridge format) or maximum legacy compatibility requirements (use ZIP).### 8.2 Examine the reference implementation```cargo add engram-rscargo run --example basiccargo run --example vfs```### 8.3 Review the specificationThe full specification documents binary structure, validation requirements, and integration patterns. Implementation questions receive responses within 48 hours (business days) via magnus@blackfall.dev.## 9.0 ConclusionThe Engram archive format addresses fundamental limitations in contemporary archive systems through deliberate architectural synthesis. The production specification and reference implementation provide institutions with durable knowledge preservation capabilities independent of vendor continuity, network availability, or technological obsolescence.Archives created under this specification remain queryable across multi-decade timescales without extraction overhead or format migration. The open specification ensures preservation guarantees transcend individual implementation lifecycles.## Related resources- Engineering announcement page: /engineering/engram-format-announcement- Engram format specification: /engineering/engram/specification- Engineering documentation index: /engineering## Document metadata- Publication date: December 20, 2025- Document authority: Blackfall Laboratories- Format version: v1.0.0