Last week, Utimaco hosted a live webinar — Securing GenAI at Runtime: How GP HSMs and KMS Protect Your RAG Pipeline. If you missed it, this post captures everything you need to know, and why it matters for your architecture right now.
Retrieval-Augmented Generation unlocks real-time enterprise intelligence, but it also creates a critical new attack surface that traditional security architectures were never designed to handle. Here is what every CISO needs to know.
Generative AI has moved out of the lab and into the enterprise data stack. Across industries, organizations are deploying Retrieval-Augmented Generation (RAG) to make their AI systems smarter, faster, and more context-aware by pulling live data from internal knowledge bases, CRMs, legal repositories, and financial systems to generate accurate, real-time responses.
But here is the security reality that most vendors are not talking about: the moment your AI starts retrieving, decrypting, and processing sensitive enterprise data at inference time, your existing perimeter controls become largely irrelevant.
The Numbers That Should Concern You
73% | 3x | 0 |
| of enterprises deploying GenAI report inadequate visibility into runtime data access | more sensitive data is exposed during AI inference vs. traditional application queries | dedicated industry standards today specifically address cryptographic control in RAG pipelines |
of enterprises deploying GenAI report inadequate visibility into runtime data access
more sensitive data is exposed during AI inference vs. traditional application queries
dedicated industry standards today specifically address cryptographic control in RAG pipelines
What RAG Actually Does to Your Data
Traditional AI models are static. They run on pre-trained knowledge and never touch your proprietary data at inference time. RAG fundamentally breaks that model.
In a RAG architecture, every user query triggers a live data retrieval cycle. The system searches a vector database or document store, pulls the most relevant chunks of enterprise data, decrypts them, injects them into the model's context window alongside the user's prompt, and generates a response, all within milliseconds.
The RAG pipeline, and where risk lives:
That decryption step, and the in-memory exposure window that follows, is where the real risk lives. Data is no longer just stored or in transit. It is actively processed, unencrypted, in an execution environment that may span multiple cloud services, microservices, and third-party model APIs.
The core security gap: Most organizations protect data at rest (storage encryption) and in transit (TLS). RAG introduces a third state, data in use and during AI inference, that is systematically under-protected in current enterprise security architectures.
The Threat Vectors That Keep Security Teams Up at Night
| Uncontrolled Decryption | Who decrypts enterprise data for the model? If the process is not hardware-enforced, the answer is effectively anyone with valid service credentials — with no tamper-evident record of access. |
| Prompt Injection | Malicious content embedded in retrieved documents can hijack model behavior and exfiltrate sensitive data directly from the context window, bypassing output filters entirely. |
| Data Provenance Gaps | Which documents/data influenced a specific AI response? Without cryptographic traceability, you cannot audit, demonstrate regulatory compliance, or perform targeted remediation after an incident. |
| Key Sprawl | Encryption keys distributed across containers, services, and cloud providers with no unified lifecycle control create a sprawling, unmanageable attack surface. |
The Utimaco Approach: Cryptographic Control at Inference Time
At Utimaco, we believe that securing GenAI requires shifting security controls from the perimeter to the point of data use. This means embedding cryptographic controls directly into the RAG pipeline.
In our live webinar, we walked through a reference architecture built on two foundational components:
1. General Purpose HSM (GP HSM): The Hardware Root of Trust
Utimaco's GP HSMs establish a hardware root of trust for all cryptographic operations in the RAG pipeline. Keys that protect enterprise data are generated, stored, and used exclusively within the tamper-resistant HSM boundary.
Every decryption operation required for AI inference is executed inside a protected environment, with full auditability.
2. Key Management System (KMS): Policy-Controlled Access at Scale
A centralized KMS governs the full lifecycle of every cryptographic key used across your RAG infrastructure, from generation and distribution to rotation and revocation. This means that access to enterprise data by the AI pipeline is always policy-controlled, time-bound, and revocable.
In practice:
When your RAG system retrieves data, the decryption request goes to the HSM. The HSM validates the request against KMS policy, executes decryption within a trusted execution boundary, with every operation logged for compliance. The data exposure window is minimized and fully audited.
From Monitoring to Verifiable Trust
Most GenAI security conversations focus on monitoring and observability; logging model inputs and outputs, detecting anomalous queries, flagging sensitive data in responses. These are valuable. But they are detective controls, not preventive ones.
Utimaco's approach pushes the security boundary upstream. By the time a suspicious output is logged, sensitive data has already been processed and exposed. Hardware-enforced cryptographic control prevents that exposure from happening in the first place.
- Data remains encrypted until the HSM authorizes and executes decryption
- Key usage policies control which AI services can access which data and when
- Every cryptographic operation produces a tamper-evident audit log
- Revocation is immediate: compromise one key or service without affecting the rest
This is the shift from observing your AI systems to being able to prove how they handle sensitive data to regulators, auditors, and your own security team.
Watch the Full Webinar On-Demand here.
Further Reading & Resources:
E-Book: AI, PQC & GP HSM: Securing the AI ecosystem - Utimaco
Solution brief: AI & PQC: Securing Retrieval-Augmented Generation (RAG) Architectures with Cryptographic Trust Infrastructure - Utimaco
Prêt à assurer votre avenir numérique ?
Rejoignez plus de 500 entreprises mondiales et institutions gouvernementales qui font confiance à Utimaco pour leur infrastructure de sécurité critique.
Contacter le service des ventesYour download request(s):

Your download request(s):

About Utimaco's Downloads
Visit our Downloads section and select from resources such as brochures, data sheets, white papers and much more. You can view and save almost all of them directly (by clicking the download button).
For some documents, your e-mail address needs to be verified. The button contains an e-mail icon.
A click on such a button opens an online form which we kindly ask you to fill and submit. You can collect several downloads of this type and receive the links via e-mail by simply submitting one form for all of them. Your current collection is empty.