HANK_SAFE_HARBOR
HIPAA Safe Harbor de-identification — detect the 18 identifiers in a
clinical note or PDF and replace each with a consistent, realistic fake
(not a [REDACTED] hole), so the document stays readable.
- Text & PDF. Both modalities — native text layer and an OCR pass for scans.
- Runs fully local. Zero PHI egress, no external LLM, nothing logged.
- Consistent fakes. The same person gets the same surrogate document-wide.
Evaluated on synthetic clinical text so far; recall-first. Apache-2.0 engine.