Cortina DUC: Running a 200B LLaMA Model Offline Inside a Briefcase

Most AI companies want you in the cloud. Your data on their servers. Your queries routed through their APIs. Your intelligence — rented, not owned.

Cortina DUC is built on a different premise: the most powerful AI in your building should fit in a briefcase.

The Problem We're Solving

In India, there are three types of users who cannot use cloud AI and know it:

Lawyers handling privileged client communications subject to Bar Council confidentiality norms.
Hospitals managing patient records covered under DPDP Act 2023 and pre-existing health data protection rules.
HNIs and family offices whose financial and investment data is a liability if it leaves their network.

For these users, "just use ChatGPT" is not an answer. It's a compliance violation waiting to happen.

India's Digital Personal Data Protection Act 2023 creates meaningful obligations for how sensitive data can be processed and where. Sovereign AI is not a luxury feature — for certain sectors, it's a legal requirement.

What's Inside the Briefcase

Cortina DUC (Distributed Unified Compute) is a self-contained AI inference unit. Current hardware spec:

Compute: 4× NVIDIA A30 GPUs (96GB total VRAM) in a custom NVLink configuration
CPU: AMD EPYC 7543 (32-core, 64-thread)
RAM: 512GB DDR4 ECC registered
Storage: 8TB NVMe RAID-1 (model weights + document corpus)
Network: Air-gap capable — runs fully offline, or connects to client's private LAN only
Form factor: 42cm × 34cm × 22cm, 18kg. Fits in a standard hard-shell carry-on.

200B

PARAM MODEL

96GB

TOTAL VRAM

CLOUD DEPS

18kg

WEIGHT

The Model Stack

We run a quantised version of LLaMA 3.1 405B at Q4_K_M precision — effectively 200B+ effective parameters with minimal quality degradation. This is loaded alongside:

A custom-fine-tuned Naira Legal layer for Indian law, case references, and regulatory text.
A Naira Medical adapter fine-tuned on ICMR guidelines, drug interaction databases, and ICD-11 codes.
A RAG pipeline that ingests the client's own document corpus on first setup — case files, patient notes, financial records — and keeps them searchable via semantic vector indices.

// Inference latency benchmarks (avg over 1,000 queries)
Legal summary (3,000 token doc):     1.4s
Medical report analysis (5-page PDF): 2.1s
Financial document extraction:        0.9s
Multi-doc RAG query (50-doc corpus):  3.3s

Deployment: What It Looks Like On Site

A Cortina DUC deployment takes 4 hours on-site. We arrive with the unit, a setup technician, and a USB drive containing the client's custom model configuration. Steps:

Physical placement in client's server room or secure office.
LAN integration — the unit gets a static IP visible only to authorised devices.
Document corpus ingestion — client uploads their files via a local web UI.
User provisioning — each authorised user gets credentials, accessible from any browser on the local network.
Air-gap verification — we physically disconnect internet and confirm full functionality.

After setup, the unit requires zero cloud connectivity. Updates and model upgrades are delivered via encrypted USB.

Who Has One

We are not disclosing client names. What we can say: we have active deployments in two law firms (one in Mumbai, one in Delhi NCR), one diagnostic chain with 7 clinics in Tier 2 cities, and one family office managing assets above ₹500Cr.

Waitlist is currently 6 months. Enterprise enquiries via the proposal form.