Skip to content

Use Cases

Working examples for common scenarios. Each links to the full runnable script in examples/.

Flask: sanitize uploaded PDFs

from flask import Flask, send_file, request
from pdf_defang import sanitize_bytes
import io

app = Flask(__name__)

@app.route("/sanitize", methods=["POST"])
def clean():
    raw = request.files["file"].read()
    cleaned, report = sanitize_bytes(raw, return_report=True)
    if report.error:
        return {"error": report.error}, 400
    return send_file(io.BytesIO(cleaned), mimetype="application/pdf")

Full script: examples/flask_upload.py

FastAPI: async non-blocking

from fastapi import FastAPI, UploadFile
from fastapi.responses import StreamingResponse
from pdf_defang import sanitize_bytes
import io

app = FastAPI()

@app.post("/sanitize")
async def clean(file: UploadFile):
    raw = await file.read()
    cleaned, _ = sanitize_bytes(raw, return_report=True)
    return StreamingResponse(io.BytesIO(cleaned), media_type="application/pdf")

Full script: examples/fastapi_async.py

Batch processing with audit log

from pathlib import Path
import json
from pdf_defang import sanitize

with open("audit.jsonl", "w") as log:
    for pdf in Path("incoming/").rglob("*.pdf"):
        report = sanitize(pdf, return_report=True)
        log.write(json.dumps({"file": str(pdf), **report.as_dict()}) + "\n")

Full script: examples/batch_processor.py

Read-only forensic scan

from pdf_defang import scan

report = scan("suspicious.pdf")
if report.risk_level == "high":
    print("DO NOT OPEN:", report.annotation_action_types)

Full script: examples/audit_only.py

S3 / cloud streaming (no disk)

import boto3
from pdf_defang import sanitize_bytes

s3 = boto3.client("s3")

def process(bucket: str, key: str):
    raw = s3.get_object(Bucket=bucket, Key=key)["Body"].read()
    cleaned, report = sanitize_bytes(raw, return_report=True)
    if report.error:
        s3.copy_object(Bucket=bucket, Key=f"quarantine/{key}", ...)
    else:
        s3.put_object(Bucket=bucket, Key=f"clean/{key}", Body=cleaned)

Full script: examples/s3_streaming.py

Internal form that needs to keep working

For trusted-source PDFs where forms must still calculate and submit (e.g. expense reports, tax forms, signed contracts), opt into balanced mode:

from pdf_defang import sanitize

# Strips /Launch, /GoToR, document JS, dangerous URIs - but keeps
# /SubmitForm, /ResetForm, form JavaScript, calculate triggers, and
# embedded files.
sanitize("expense_form.pdf", level="balanced")

Use strict (the default) for anything coming from outside your organisation. Use balanced only when you trust the producer and a non-functional form is worse than a residual JS execution surface.

Compliance / audit pipeline

For HIPAA/GDPR/regulatory workflows where every modification needs an audit record:

import logging
from pdf_defang import sanitize

audit_log = logging.getLogger("compliance.pdf_audit")

def compliance_clean(path: str, user_id: str) -> dict:
    report = sanitize(path, return_report=True)
    audit_log.info(
        "user=%s file=%s modifications=%s",
        user_id, path, report.as_dict(),
    )
    return report.as_dict()

The SanitizeReport.as_dict() output is JSON-serialisable, so it flows into any structured logging or audit store (Splunk, Elastic, Datadog, CloudWatch, etc.).