We designed zero retention into the system before writing the first line of code. Here's exactly what that means.
Documents are never written to disk. Ever. The entire processing pipeline runs in volatile memory, and that memory is wiped the moment extraction completes. No files. No logs. No copies.
Every document received by Corvail is loaded directly into BytesIO buffers — Python's in-memory file objects. These live exclusively in RAM. Nothing touches the filesystem at any point in the pipeline.
Instead of writing attachments to a temp directory, we stream them directly into io.BytesIO() objects. When the function exits, the buffer is garbage collected. There's no path on disk to recover, because no path was ever created.
After extraction, the buffer contents are overwritten with null bytes before the object is released. This ensures that even if memory is inspected or swapped to disk before garbage collection, the document content has been zeroed.
We've audited every code path in the pipeline. No open() calls. No tempfile usage. No framework middleware that buffers to disk. Attachments move from network → memory → extraction → wipe. Disk is never in the path.
This is the exact lifecycle of every document that passes through Corvail. Each stage shows the memory state.
Vendor sends email with document attachment. SendGrid's inbound parse webhook fires a POST request to Corvail's API endpoint. The attachment is transmitted as base64-encoded multipart form data over HTTPS.
The decoded attachment bytes are immediately written into a io.BytesIO() object in RAM. The buffer lives only within the request handler's scope. No file path is created. The data exists exclusively as a heap allocation.
The buffer's bytes are passed directly to Google Gemini's API as inline image data. Gemini returns structured JSON — field names and values. The original document bytes are not returned and are not stored by Google after the API call completes (per their data processing terms).
Extracted numbers are validated: line items must sum to subtotals, taxes must compute correctly, totals must match. This step uses only the extracted values — the buffer is no longer needed. The original document content plays no role past this point.
The BytesIO buffer is explicitly overwritten with b'\x00' * len before being dereferenced. Python's garbage collector then reclaims the memory. The document content no longer exists anywhere in the system.
Only the extracted, validated field values (vendor name, amounts, line items, etc.) are sent to your ERP and Slack. No document bytes. No original content. Just the structured data you need — and nothing else.
Deployed on Railway's secure cloud infrastructure with automatic deployments, isolated containers, and no shared tenancy. Each request runs in a fresh process context.
All API endpoints are served exclusively over TLS 1.2+. Unencrypted HTTP is rejected at the infrastructure level. Certificate management is handled automatically.
All API keys, database credentials, and service tokens are stored as environment variables in Railway's encrypted secret store. They are never hardcoded, never logged, and never exposed in responses.
API key verification uses hmac.compare_digest() — a constant-time comparison that prevents timing attacks. An attacker cannot determine how many characters of a key are correct by measuring response time.
All endpoints are rate-limited by IP and API key. Requests that exceed limits are rejected with a 429 response. This prevents brute-force key guessing and abuse of the processing pipeline.
All incoming webhook requests are validated against a schema before processing begins. Malformed requests, missing required fields, and oversized payloads are rejected before any document bytes are touched.
Inbound parse webhooks include a SendGrid signature header. Corvail verifies this signature on every request to ensure the payload originated from SendGrid and has not been tampered with in transit.
These are not just policies — they're architectural guarantees. The system cannot do these things because it was built so they are impossible.
We never write your documents to disk. No temp files, no upload directories, no cached copies. The filesystem is never touched during document processing.
We never store document content in a database. Audit logs contain metadata (timestamp, document type, vendor, result status) — never the document bytes or raw extracted text.
We never send your document content to third parties. Gemini receives the document for extraction but does not retain it. No other third-party services see your document data.
We never log raw document content. Application logs contain request metadata — timestamps, status codes, processing times — not document bytes or extracted values that could be sensitive.
We never sell or share your data. The only people who see the extracted data from your documents are you (via ERP update and Slack alert) and us (briefly, in transit). Full stop.
If you need more detail about our security architecture for compliance purposes, we're happy to go deeper. Reach out directly.
Get in touch →