pdf-defang¶
Strip JavaScript, OpenAction, Launch actions and other active content from PDFs. Lightweight Python library on top of pikepdf. MIT licensed.
Why?¶
PDFs can carry executable content: JavaScript that runs when the file opens, auto-actions that fire on every page navigation, "Launch" actions that try to open other programs, embedded files that drop malware. If you process user-uploaded PDFs in your app, you should strip this content before serving them back.
The Python ecosystem has parsers (pikepdf, pypdf, PyMuPDF) and a
heavy container-based tool (Dangerzone), but
no clean drop-in library that says "give me this PDF without active
content."
This is that library.
Install¶
Requires Python 3.9+ and pikepdf 8+.
Three ways to use it¶
Two levels: strict (default) vs balanced¶
sanitize("untrusted.pdf") # level="strict" - safest
sanitize("internal_form.pdf", level="balanced") # keep form interactivity
Both levels strip the same attack vectors (JavaScript at document level,
/Launch, /GoToR, dangerous URI schemes, etc.). balanced keeps
form-related actions (/SubmitForm, /ResetForm, form JS, calculate
triggers) and embedded files alive when you trust the source. See
Protections for the full matrix.
What gets removed¶
A complete reference is on the Protections page. The short list (everything below is stripped in both levels unless noted):
- Document-level JavaScript
- OpenAction and document /AA auto-execute actions
- Launch, GoToR, GoToE, ImportData, Rendition, Movie, Sound annotation actions
- Page-level /AA
- XFA forms (legacy attack surface)
- Dangerous URI schemes (
javascript:,file:,data:, UNC paths) - Strict-only: annotation JavaScript / SubmitForm / ResetForm,
annotation
/AAand/JS, AcroForm/CO, embedded files
What stays:
- All visible text, images, layout
- Standard form fields and their values
- Safe hyperlinks (
http,https,mailto,tel,ftp) - Bookmarks, table of contents, metadata, encryption
Built by¶
kovetz.co.il - Hebrew/English PDF tools. Contact: contact@kovetz.co.il.