Privacy with AI

How Protostar AI Secures Patient and Customer Data

A Protostar AI white paper. Author: Nick Soro, Founder, Protostar AI, LLC. June 2026. Version 1.0.

Abstract

Organizations in healthcare, finance, and other regulated fields face a hard problem. The most capable AI models are operated by third parties, and using them usually means sending confidential data to systems the organization does not control. That data might be patient records, financial details, or proprietary information. The common responses are both unsatisfying. You either accept the exposure, or you retreat to weaker tools and fall behind.

This paper explains how Protostar AI solves that problem. Our platform is built so that sensitive data is protected before it ever reaches a model. The most sensitive data never leaves our customer’s control at all. The work that genuinely benefits from frontier models is anonymized, split apart, and governed by contract. Privacy is not a setting we add at the end. It is the architecture.

The dilemma: capability versus privacy

The industry has quietly taught everyone to accept a trade-off. More capable AI means less privacy, and more privacy means less capability. That trade-off is real only because, in the conventional design, privacy gets bolted on after the decision to call a model. Once a prompt containing a patient name or an account number leaves for an external service, the protection question has already been lost.

We reject that premise. When privacy controls run in front of the model call, and when the most sensitive work stays on infrastructure we never expose to the internet, capability and privacy stop competing.

Design principles

Five principles guide every decision in the platform.

First, privacy comes before the model call. Sensitive data is found and removed before any external system can see it, not redacted after the fact.

Second, we minimize exposure. The most sensitive data, such as protected health information and regulated financial records, is processed only on self-hosted models that have no path to the public internet.

Third, we rely on layers, not on any single control. Anonymization, network isolation, contractual guarantees, fragmentation, and auditing each reinforce the others.

Fourth, we insist on clean provenance. We build only on permissively licensed, auditable components. Our self-hosted models are open and permissively licensed, are improved only on data that is clearly licensed for training, and are never trained on the outputs of other AI providers. Customers can attest both to what their data touches and to what the model learned from.

Fifth, we record everything. Every request, every routing decision, and every transformation is written to a tamper-evident log. Trust in a privacy system has to be verifiable, not assumed.

Architecture overview

At the center of the platform sits a single Secure AI Gateway. It is the one entrance through which every request passes, and the one place where privacy is enforced. The gateway authenticates the caller, classifies the data, decides what is allowed, applies the right protection, routes the request, and records the outcome.

Every request is sorted into a data class that determines how it may be handled. Regulated and highly sensitive data, such as PHI under HIPAA or regulated trading records, is handled only on the self-hosted tier that has no outbound connectivity. Proprietary and confidential data, such as internal documents and research, is handled on the self-hosted tier, or on managed cloud models only when policy permits. Low sensitivity data, such as general or public information, becomes eligible to consult frontier models, but only after it has been de-identified. Classification is deliberately conservative. When the system is unsure, it treats data as more sensitive rather than less.

System architecture

The diagram below shows how data moves through the platform and where privacy is enforced.

Privacy is enforced at three points. Classification decides where data is allowed to go. Anonymization removes sensitive values before anything leaves, and those values live only in an encrypted vault that never crosses the boundary. The most sensitive class, C1, is processed entirely on self-hosted models with no route to the internet. Everything that does reach an outside model is de-identified first, and for compartmentalized work it is split so that no single provider receives a reconstructable whole. The coherent answer is reassembled only after it returns inside your boundary.

How a request is secured

A request moves through six steps.

It begins with classification. The gateway determines the data class and the policy that applies.

Next comes anonymization. A self-hosted model, running entirely inside our customer’s secure boundary with no external connectivity, finds the sensitive elements. Those include names, identifiers, dates, account numbers, and the full set of HIPAA Safe Harbor identifiers. It replaces each one with a placeholder, and the real values stay in an encrypted vault that never leaves the boundary. The anonymizer itself is never outsourced. Sending data to an outside model in order to de-identify it would defeat the entire purpose.

The third step is routing. Regulated and sensitive data is processed on the self-hosted tier and goes no further. Other classes may, where policy allows, be routed to managed or frontier models, but only in de-identified form.

The fourth step is compartmentalization. For work that can be divided, the gateway splits a request into independent fragments and distributes them across multiple providers. No single external system ever receives a piece large enough to reconstruct the whole. This is the principle of compartmentalization applied to AI.

The fifth step is internal recombination. Fragment results return to the gateway, and they are reassembled into a coherent answer only inside the customer’s boundary, by the self-hosted model. The outside world never sees the assembled result.

The final step is inspection and return. A last check screens the output, the placeholders are restored to their real values inside the boundary, and the complete interaction is written to the audit trail.

The secure core: data that never leaves

For the most sensitive workloads, the safest control is also the simplest. The data does not leave. Protostar AI runs open, self-hosted models on infrastructure our customers control, in a network segment with no outbound route to the internet. Protected health information and regulated financial data are processed there and nowhere else. There is no external service to trust, because no external service is ever involved.

Using frontier models safely

Where the leading commercial models add real value, we let customers use them safely through three controls that reinforce one another.

Anonymization ensures the model receives placeholders rather than real identities. Contractual guarantees limit external access to zero-data-retention agreements and, for regulated data, Business Associate Agreements, routed through enterprise cloud platforms so that prompts are not retained or used for training. Compartmentalization spreads even the de-identified fragments across providers, so that no single one holds enough to understand or rebuild the work.

These controls are additive. Together they reduce exposure dramatically. In keeping with our principle of honesty, we still treat the most sensitive class as never leaving the secure core at all. We also use frontier models only at inference time, never as a training source for our own models, so nothing about their use becomes embedded in what we ship.

Compliance and governance

Protostar AI is designed and documented to a layered set of standards, with one unified set of controls mapped to every standard it satisfies. ISO/IEC 42001 governs AI management and is our primary standard. ISO/IEC 27001 and SOC 2 cover information security management and independent attestation. HIPAA shapes our Security, Privacy, and Breach Notification safeguards, with Business Associate Agreements and de-identification aligned to HHS guidance. IEC 62304 and ISO 14971 apply where AI is used inside regulated medical software workflows. MiFID II RTS 6 applies to regulated trading data. Implementing each control once and mapping it across these regimes keeps the program rigorous without becoming redundant.

Deployment and data residency

Protostar AI is designed to run where your data is governed. The secure core runs inside your own cloud account or on dedicated hardware, in a network segment with no outbound route. United States workloads stay in United States regions today, and the design is built to add a European region when regulated European data comes into scope, such as data under MiFID II. Customers decide how much ever leaves the boundary. Some keep everything on the self-hosted core. Others allow de-identified, contractually protected use of frontier models for specific tasks. That policy is set per data class, and it is enforced by the gateway rather than left to convention.

Cybersecurity controls

Protecting customer data also means protecting our own systems. The platform runs under a zero-trust, least-privilege posture. Networking is default-deny. The model tier has no public access, and the secure core has no egress. Access requires multi-factor authentication and short-lived credentials. Encryption keys are customer-managed. The software supply chain is secured, monitoring is centralized with tamper-evident logging, vulnerabilities are managed continuously and tested by independent penetration testers, and incident response is documented and rehearsed.

Threat model and assurances

We design against the failure modes that matter for confidential data. If a frontier provider retained or leaked a prompt, it would hold only de-identified text, and under compartmentalization only a fragment of it. If the anonymizer missed a field, conservative classification sends uncertain data to the self-hosted core instead of outward. If an internal account were compromised, least-privilege access, short-lived credentials, and tamper-evident logs both limit and reveal the damage. If the network were probed, the secure core has no path to the internet to begin with. No single control carries the whole burden, which is the entire point of defense in depth.

Honest boundaries

A privacy claim is only as good as its precision, so we state ours plainly.

The most sensitive class never leaves the secure core. That is the strongest guarantee we make, and it is structural rather than procedural.

For data that does consult external models, anonymization and compartmentalization reduce re-identification risk but do not erase it. We therefore reinforce them with contractual no-retention terms and with conservative routing that fails safe toward the secure tier.

We are a privacy and security layer around the best available models. We are not a model maker, and we hold ourselves to claims an auditor can verify. This honesty is itself a security feature. A system that overstates its guarantees invites the misuse that breaks them.

Conclusion

Privacy and capability were never truly opposed. They only seemed that way because privacy was treated as an afterthought. By enforcing protection before the model call, by keeping the most sensitive data on infrastructure that never reaches the internet, and by anonymizing, fragmenting, and governing everything else, Protostar AI lets organizations use the best AI in the world on the very data they are most obligated to protect.

That is what we mean by Privacy with AI.

Protostar AI, LLC. For more information, contact hello@protostarai.com.