Injection & PII Auditor
The Injection Auditor (lucid-llm-judge-auditor) is a comprehensive security node that detects prompt injection attacks, jailbreak attempts, and Personal Identifiable Information (PII) like SSNs, emails, and credit card numbers using LLM-driven guardrails via NeMo.
Use Case
- Prompt Injection Defense: Block OWASP LLM Top 10 #1 attacks including jailbreaks and instruction override attempts.
- Regulatory Compliance: Enforce GDPR, CCPA, and HIPAA compliance by ensuring PII never reaches the model.
- Data Leakage Prevention: Automatically detect and block sensitive identifiers in prompts.
Implementation
This auditor hooks into the Request phase to observe inputs and produce claims. A Cedar policy at the Gateway decides whether to block.
import re
from lucid_auditor_sdk import ClaimsAuditor, claims, serve, Phase
from lucid_schemas import Claim
class LLMJudgeAuditor(ClaimsAuditor):
"""Detects prompt injection and PII in user requests."""
# PII Patterns
SSN_PATTERN = re.compile(r'\b\d{3}-\d{2}-\d{4}\b')
EMAIL_PATTERN = re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b')
# Injection Patterns
INJECTION_PATTERNS = [
"ignore all previous instructions",
"disregard the above",
"system prompt:",
"you are now",
]
@claims(phase=Phase.REQUEST)
def measure_injection(self, request: dict) -> list[Claim]:
prompt = request.get("prompt", "").lower()
matches = [p for p in self.INJECTION_PATTERNS if p in prompt]
detected = len(matches) > 0
return [
Claim(name="injection_risk", type="score_normalized",
value=0.9 if detected else 0.0, confidence=0.95 if detected else 1.0),
]
@claims(phase=Phase.REQUEST)
def measure_pii(self, request: dict) -> list[Claim]:
prompt = request.get("prompt", "")
ssn_found = bool(self.SSN_PATTERN.search(prompt))
email_found = bool(self.EMAIL_PATTERN.search(prompt))
entities = []
if ssn_found:
entities.append("US_SSN")
if email_found:
entities.append("EMAIL_ADDRESS")
return [
Claim(name="pii_types", type="string_list", value=entities),
Claim(name="pii_count", type="count", value=len(entities)),
]
serve(LLMJudgeAuditor())
Cedar Policy
Claims are evaluated by the Gateway's Cedar policy. Example:
// Block prompt injection with high confidence
@annotation("id", "guardrails-injection-deny")
@annotation("decision", "deny")
forbid (principal, action, resource)
when { context.claims.injection_risk > 0.7 };
// Block high-sensitivity PII (SSN)
@annotation("id", "pii-ssn-deny")
@annotation("decision", "deny")
forbid (principal, action, resource)
when { context.claims.pii_types.contains("US_SSN") };
// Warn on low-sensitivity PII (email) but allow
@annotation("id", "pii-email-warn")
@annotation("decision", "warn")
forbid (principal, action, resource)
when { context.claims.pii_types.contains("EMAIL_ADDRESS") };
Behavior
- Injection Detection: If a user types "Ignore all previous instructions", the auditor produces
injection_risk = 0.9. The Cedar policy evaluates toDENY, and the model is never invoked. - PII Blocking: If a user types "My SSN is 123-45-6789", the auditor produces
pii_types = ["US_SSN"]. The Cedar policy evaluates toDENY. - PII Warning: If a user includes an email address, the auditor produces
pii_types = ["EMAIL_ADDRESS"]. The Cedar policy evaluates toWARNbut allows the request to proceed.