Known limitations¶
pdf-defang strips active content. It does not solve every PDF threat.
Honest documentation of what we don't cover:
Out of scope¶
Visual phishing¶
A PDF can display a convincing fake login page using just text and images, with no active content. We don't analyse visible content.
Mitigation: Email gateway content analysis, user awareness training.
Type 3 fonts with PostScript¶
Type 3 fonts embed PostScript drawing commands. Historically a CVE source (2010-2015). We don't strip these because doing so would break the rendering of legitimate PDFs that use them.
Mitigation: Re-image the PDF using a tool like Dangerzone for high-risk inputs.
Parser vulnerabilities in pikepdf / qpdf¶
A bug in our PDF parser could be triggered by a malformed PDF before our sanitization runs.
Mitigation: Keep pikepdf updated. Sandbox PDF processing.
Social engineering¶
A PDF doesn't need active content to be dangerous - it can ask the user
to click a button that submits credentials to an attacker URL. In
level="strict" we strip SubmitForm actions; in level="balanced"
they are intentionally preserved (for legitimate form submission), so
balanced offers no protection against this vector. Either way, a button
that's actually a plain link to an external phishing page still works.
Mitigation: User awareness. Layer with URL reputation services.
Steganography / hidden data¶
A PDF can carry hidden information in images (steganography), metadata, or whitespace. We don't analyse for this.
Mitigation: Use a dedicated steganalysis tool if this matters for your threat model.
Defense in depth¶
pdf-defang is one layer. For high-risk workflows (executive email,
legal document intake, government file exchange), combine with:
- AV / EDR scanning - signatures catch known malware families
- Sandboxed processing - run pikepdf/our library in a container or subprocess with limited filesystem and network access
- Email gateway filtering - content-aware filters at the perimeter
- User awareness - phishing resistance training
- Re-imaging for highest-risk inputs - Dangerzone renders PDFs to images and reassembles, defeating most active-content threats at the cost of throughput