What gets removed¶
Complete reference of every dangerous element pdf-defang strips, and
how each level treats it.
Levels at a glance¶
sanitize() accepts a level keyword:
level |
Form interactivity | Embedded files | Default |
|---|---|---|---|
"strict" |
stripped | stripped | yes |
"balanced" |
preserved | preserved | no |
Both levels strip pure attack vectors. balanced differs only in
what it leaves alone so legitimate forms and PDF portfolios survive.
Document-level¶
| Element | Path | What it does | Risk | strict |
balanced |
|---|---|---|---|---|---|
| Document JavaScript | /Catalog → /Names → /JavaScript |
Runs JS automatically when the PDF opens (in vulnerable viewers) | 🔴 High | removed | removed |
| Embedded files | /Catalog → /Names → /EmbeddedFiles |
Hides files (often malware) inside the PDF container | 🟡 Medium | removed | kept |
| Open action | /Catalog → /OpenAction |
Auto-executes an action when the PDF opens | 🔴 High | removed | removed |
Document /AA |
/Catalog → /AA |
Auto-executes on navigation events (close, print, etc.) | 🔴 High | removed | removed |
| XFA forms | /Catalog → /AcroForm → /XFA |
Legacy XML-based forms - long history of CVEs | 🔴 High | removed | removed |
| Calculation order | /Catalog → /AcroForm → /CO |
Form field calculation order (can trigger JS chains) | 🟡 Medium | removed | kept |
Page-level¶
| Element | Path | What it does | Risk | strict |
balanced |
|---|---|---|---|---|---|
Page /AA |
Each page's /AA |
Auto-executes on page-level events (open, close) | 🟡 Medium | removed | removed |
Annotation-level¶
Action types¶
Action /S value |
What it does | Risk | strict |
balanced |
|---|---|---|---|---|
/JavaScript |
Runs JS when the user clicks the annotation | 🔴 High | removed | kept (form calculations) |
/Launch |
Opens an external program (e.g., calc.exe) |
🔴 High | removed | removed |
/ImportData |
Reads external data into form fields | 🔴 High | removed | removed |
/SubmitForm |
Sends form values to a URL | 🟡 Medium | removed | kept |
/ResetForm |
Clears the form | 🟡 Medium | removed | kept |
/Rendition |
Plays media - legacy attack surface | 🟡 Medium | removed | removed |
/GoToR |
Opens another PDF (file:// or http:// - phishing vector) |
🔴 High | removed | removed |
/GoToE |
Opens an embedded file | 🟡 Medium | removed | removed |
/Movie |
Deprecated movie playback - old reader exploits | 🟢 Low (deprecated) | removed | removed |
/Sound |
Deprecated sound playback - same | 🟢 Low (deprecated) | removed | removed |
URI scheme filtering (both levels)¶
/URI actions are not removed (legitimate hyperlinks).
But the URL value is checked: if the scheme is dangerous, the action is
stripped. This filtering is identical in strict and balanced.
| Scheme | Action |
|---|---|
http://, https://, mailto:, tel:, ftp://, sftp://, news:, nntp:, irc://, magnet: |
✅ Kept |
javascript: |
❌ Removed |
file:// |
❌ Removed (local file access) |
data: |
❌ Removed (data URIs, can carry HTML) |
vbscript: |
❌ Removed |
\\server\share (Windows UNC paths) |
❌ Removed |
//server/share (alternate UNC form) |
❌ Removed |
| Any other unknown scheme | ❌ Removed (whitelist approach) |
| Relative URIs (no scheme) | ✅ Kept (usually in-document) |
Annotation-level extras¶
| Element | What it does | strict |
balanced |
|---|---|---|---|
/AA on annotation |
Auto-actions when hovering / focusing / calculating | removed | kept (form triggers) |
/JS on annotation |
Direct JavaScript attached to annotation | removed | kept (form calculations) |
What is preserved (both levels)¶
pdf-defang is non-destructive to visible content:
- All text, images, and layout
- Standard form fields (filled values stay intact)
- Bookmarks, table of contents, page labels
- Document metadata (Author, Title, Subject, Keywords)
- Standard link annotations to
mailto:/http(s):URLs - Document structure, page count, page order
- Encryption (when password provided)
A PDF passing through pdf-defang looks identical to a human reader,
just without the executable surface.
When to pick which level¶
strict- the default. Use when accepting PDFs from anyone you don't know: public upload forms, email attachments, customer file shares. Form interactivity is sacrificed for safety, but every viewer still displays the document content normally.balanced- opt in only when forms must actually work end-to-end (calculator buttons, "Submit" buttons, format/validate triggers) and you've vetted the source. Tax returns, expense reports, and similar internal documents are typical examples.