API Reference¶

Full reference for pdf_defang public symbols. All are importable from the top level::

from pdf_defang import (
    sanitize, scan,
    sanitize_async, scan_async,
    sanitize_bytes, scan_bytes,
    SanitizeReport, ScanReport, Level,
)

Sanitization levels¶

sanitize(), sanitize_async() and sanitize_bytes() all accept a level keyword. Two values are supported:

`level`	Form interactivity	Embedded files	Use when
`"strict"` (default)	stripped	stripped	Accepting PDFs from untrusted users. The safest default.
`"balanced"`	preserved	preserved	You trust the source and need legitimate form/portfolio behaviour to survive.

Both levels strip the same set of pure attack vectors: document JavaScript, /OpenAction, document /AA, XFA forms, page /AA, and dangerous URI schemes (javascript:, file:, data:, UNC paths, etc.), plus the annotation action types /Launch, /ImportData, /GoToR, /GoToE, /Movie, /Sound, /Rendition.

balanced additionally keeps: /SubmitForm and /ResetForm annotation actions, /JavaScript annotation actions (for form calculations), annotation /AA (calculate / format / validate / blur / focus triggers), annotation /JS keys, the AcroForm /CO calculation order, and /EmbeddedFiles (used by PDF portfolios).

Passing any other string raises ValueError. See Protections for the full removal table.

Sync API¶

`sanitize(pdf_path, *, return_report=False, password=None, level="strict")`¶

Strip active content from a PDF in place.

Arguments:

pdf_path (str or os.PathLike): Path to the PDF file. Modified in place.
return_report (bool): If True, return a SanitizeReport. If False (default), return a simple boolean.
password (str, optional): Password for encrypted PDFs.
level ("strict" or "balanced"): Sanitization aggressiveness. Default "strict". See Sanitization levels.

Returns:

bool when return_report=False. True if modified, False on failure.
SanitizeReport when return_report=True.

Raises: ValueError if level is not "strict" or "balanced".

Encryption preservation: If the input PDF is encrypted and you supply the correct password, the output PDF stays encrypted (same password, modern AES). If you don't supply a password and the PDF is encrypted, sanitization fails cleanly without modifying the file.

`scan(pdf_path, *, password=None)`¶

Inspect a PDF and return what dangerous content it contains. Does not modify the file.

Arguments:

pdf_path: Path to the PDF.
password: Password for encrypted PDFs.

Returns: ScanReport. Always returns - errors are reflected in report.error, not raised.

Async API¶

`sanitize_async(...)`, `scan_async(...)`¶

Same arguments and return types as the sync versions, but they await under the hood (offloaded to a thread pool, non-blocking in asyncio).

Use these in FastAPI, aiohttp, Starlette, or any other asyncio-based framework.

Bytes API¶

`sanitize_bytes(data, *, return_report=False, password=None, level="strict")`¶

Sanitize a PDF given as bytes. Returns cleaned bytes.

Arguments:

data (bytes): The PDF file content.
return_report (bool): If True, returns (bytes, SanitizeReport).
password: For encrypted PDFs.
level ("strict" or "balanced"): Same semantics as sanitize(). Default "strict".

Returns:

bytes (cleaned PDF) when return_report=False.
tuple[bytes, SanitizeReport] when return_report=True.
On failure: returns the original input bytes unchanged. Check report.error to detect this.

Raises: ValueError if level is not "strict" or "balanced".

`scan_bytes(data, *, password=None)`¶

Inspect a PDF given as bytes. Returns ScanReport.

Data classes¶

`SanitizeReport`¶

Returned by sanitize(..., return_report=True) and sanitize_bytes(...).

Fields:

Field	Type	Meaning
`modified`	`bool`	Whether the file was successfully processed
`level`	`"strict"` or `"balanced"`	Level that was applied
`javascript_in_names`	`int`	JS entries removed from `/Names → /JavaScript`
`embedded_files`	`int`	Embedded files removed
`open_action_removed`	`bool`	Whether `/OpenAction` was present and removed
`document_aa_removed`	`bool`	Whether document-level `/AA` was removed
`xfa_form_removed`	`bool`	Whether XFA form was removed
`calculation_order_removed`	`bool`	Whether `/CO` was removed
`pages_with_aa`	`int`	Pages where `/AA` was removed
`annotations_with_actions`	`int`	Dangerous annotation actions removed
`annotation_action_types`	`list[str]`	Types found (e.g., `['Launch', 'JavaScript']`)
`annotations_with_js`	`int`	Annotations with `/JS` removed
`dangerous_uris_removed`	`int`	URI actions with dangerous schemes removed
`dangerous_uri_schemes_removed`	`list[str]`	Schemes found (e.g., `['javascript', 'file']`)
`file_size_before`	`int`	Bytes before
`file_size_after`	`int`	Bytes after
`error`	`str` or `None`	Error message if sanitization failed

Helper: report.as_dict() returns a JSON-serialisable plain dict.

`ScanReport`¶

Returned by scan() and scan_bytes().

Fields:

Field	Type	Meaning
`has_javascript`	`bool`	Document JavaScript detected
`has_open_action`	`bool`	`/OpenAction` detected
`has_document_aa`	`bool`	Document `/AA` detected
`has_xfa_form`	`bool`	XFA form detected
`has_embedded_files`	`bool`	Embedded files detected
`javascript_in_names`	`int`	Count of JS entries
`embedded_files_count`	`int`	Count of embedded files
`pages_with_aa`	`int`	Pages with `/AA`
`annotations_with_actions`	`int`	Dangerous annotation count
`annotation_action_types`	`list[str]`	Types found
`annotations_with_js`	`int`	Annotations with /JS
`dangerous_uris`	`int`	URI actions with dangerous schemes
`dangerous_uri_schemes`	`list[str]`	Schemes found
`page_count`	`int`	Total pages in PDF
`is_encrypted`	`bool`	True if PDF is encrypted and no password given
`risk_level`	`Literal["none", "low", "medium", "high"]`	Bucketed risk
`file_size`	`int`	Total bytes
`error`	`str` or `None`	Error if scanning failed

Helper: report.as_dict() returns a plain dict.

Risk level rules¶

high: Document JavaScript, OpenAction, document /AA, XFA form, Launch/ImportData/GoToR annotation action, or dangerous URI (javascript:, file:, UNC, etc.).
medium: Annotation actions other than the high-risk ones, embedded files, page-level /AA.
low: Annotation with /JS but no execution trigger.
none: Nothing dangerous detected.

API Reference¶

Sanitization levels¶

Sync API¶

sanitize(pdf_path, *, return_report=False, password=None, level="strict")¶

scan(pdf_path, *, password=None)¶

Async API¶

sanitize_async(...), scan_async(...)¶

Bytes API¶

sanitize_bytes(data, *, return_report=False, password=None, level="strict")¶

scan_bytes(data, *, password=None)¶

Data classes¶

SanitizeReport¶

ScanReport¶

Risk level rules¶

`sanitize(pdf_path, *, return_report=False, password=None, level="strict")`¶

`scan(pdf_path, *, password=None)`¶

`sanitize_async(...)`, `scan_async(...)`¶

`sanitize_bytes(data, *, return_report=False, password=None, level="strict")`¶

`scan_bytes(data, *, password=None)`¶

`SanitizeReport`¶

`ScanReport`¶