Security — Corvail

Ghost Protocol

Zero retention architecture

Documents are never written to disk. Ever. The entire processing pipeline runs in volatile memory, and that memory is wiped the moment extraction completes. No files. No logs. No copies.

In-memory processing only

Every document received by Corvail is loaded directly into BytesIO buffers — Python's in-memory file objects. These live exclusively in RAM. Nothing touches the filesystem at any point in the pipeline.

BytesIO buffers

Instead of writing attachments to a temp directory, we stream them directly into io.BytesIO() objects. When the function exits, the buffer is garbage collected. There's no path on disk to recover, because no path was ever created.

Cryptographic wipe

After extraction, the buffer contents are overwritten with null bytes before the object is released. This ensures that even if memory is inspected or swapped to disk before garbage collection, the document content has been zeroed.

Zero disk writes

We've audited every code path in the pipeline. No open() calls. No tempfile usage. No framework middleware that buffers to disk. Attachments move from network → memory → extraction → wipe. Disk is never in the path.

Data lifecycle

Where your data lives — and when it's destroyed

This is the exact lifecycle of every document that passes through Corvail. Each stage shows the memory state.

Step 01 — Email received

SendGrid inbound parse

Vendor sends email with document attachment. SendGrid's inbound parse webhook fires a POST request to Corvail's API endpoint. The attachment is transmitted as base64-encoded multipart form data over HTTPS.

// Memory state: attachment in transit over TLS
// No storage on SendGrid servers after webhook fires

Step 02 — BytesIO buffer created

Loaded into memory

The decoded attachment bytes are immediately written into a io.BytesIO() object in RAM. The buffer lives only within the request handler's scope. No file path is created. The data exists exclusively as a heap allocation.

buffer = io.BytesIO(attachment_bytes)
// Disk: 0 bytes written
// RAM: document bytes allocated

Step 03 — AI extraction

Gemini processes buffer contents

The buffer's bytes are passed directly to Google Gemini's API as inline image data. Gemini returns structured JSON — field names and values. The original document bytes are not returned and are not stored by Google after the API call completes (per their data processing terms).

result = gemini.extract(buffer.getvalue())
// buffer still in RAM, result is structured dict

Step 04 — Math validation

Cross-check all arithmetic

Extracted numbers are validated: line items must sum to subtotals, taxes must compute correctly, totals must match. This step uses only the extracted values — the buffer is no longer needed. The original document content plays no role past this point.

validated = math_check(result)
// buffer.getvalue() no longer referenced

Step 05 — Memory wipe

Buffer overwritten and released

The BytesIO buffer is explicitly overwritten with b'\x00' * len before being dereferenced. Python's garbage collector then reclaims the memory. The document content no longer exists anywhere in the system.

buffer.write(b'\x00' * len(buffer.getvalue()))
del buffer
// Document bytes: DESTROYED
// Disk: 0 bytes written (throughout entire pipeline)

Step 06 — ERP update + alert

Structured data delivered

Only the extracted, validated field values (vendor name, amounts, line items, etc.) are sent to your ERP and Slack. No document bytes. No original content. Just the structured data you need — and nothing else.

// Delivered: { vendor, invoice_no, total, line_items }
// Document: does not exist
// Retained: 0 bytes of original content

Infrastructure

How we host and protect the system

Railway hosting

Deployed on Railway's secure cloud infrastructure with automatic deployments, isolated containers, and no shared tenancy. Each request runs in a fresh process context.

HTTPS everywhere

All API endpoints are served exclusively over TLS 1.2+. Unencrypted HTTP is rejected at the infrastructure level. Certificate management is handled automatically.

Environment secrets

All API keys, database credentials, and service tokens are stored as environment variables in Railway's encrypted secret store. They are never hardcoded, never logged, and never exposed in responses.

API security

Protecting the API layer

Timing-safe key comparison

API key verification uses hmac.compare_digest() — a constant-time comparison that prevents timing attacks. An attacker cannot determine how many characters of a key are correct by measuring response time.

Rate limiting

All endpoints are rate-limited by IP and API key. Requests that exceed limits are rejected with a 429 response. This prevents brute-force key guessing and abuse of the processing pipeline.

Request validation

All incoming webhook requests are validated against a schema before processing begins. Malformed requests, missing required fields, and oversized payloads are rejected before any document bytes are touched.

SendGrid signature verification

Inbound parse webhooks include a SendGrid signature header. Corvail verifies this signature on every request to ensure the payload originated from SendGrid and has not been tampered with in transit.

Explicit commitments

What we never do

These are not just policies — they're architectural guarantees. The system cannot do these things because it was built so they are impossible.

We never write your documents to disk. No temp files, no upload directories, no cached copies. The filesystem is never touched during document processing.

We never store document content in a database. Audit logs contain metadata (timestamp, document type, vendor, result status) — never the document bytes or raw extracted text.

We never send your document content to third parties. Gemini receives the document for extraction but does not retain it. No other third-party services see your document data.

We never log raw document content. Application logs contain request metadata — timestamps, status codes, processing times — not document bytes or extracted values that could be sensitive.

We never sell or share your data. The only people who see the extracted data from your documents are you (via ERP update and Slack alert) and us (briefly, in transit). Full stop.

Security is the architecture,not an afterthought.