PDF Metadata Viewer
Inspect title/author/creator metadata in a PDF and strip it before sharing
Drop a PDF to inspect its metadata.
Max file size: 50MB
How to PDF Metadata Viewer Online
Inspect a PDF's metadata (title, author, creator, dates) and optionally strip everything before re-saving.
- Drop a PDF. The tool extracts all standard metadata fields using pdf-lib.
- Review what's there. Common surprise: the Author is your OS username and the Creator is your authoring app, both of which you may have wanted to remove before sharing.
- Click 'Strip all metadata' to rewrite every field to empty. The PDF content is unchanged; only the metadata is cleared.
- Download the stripped PDF. Open it in a PDF viewer and check File → Properties — every field should be blank.
About PDF Metadata Viewer
Every PDF carries a small block of metadata that almost nobody notices. Title, Author, Subject, Keywords, Creator (the app that wrote the document), Producer (the export tool), Creation Date, Modification Date. PDF viewers tuck these behind a Properties menu most users never open. Authoring apps populate them automatically — Microsoft Word fills in your name as Author and your Word version as Creator without asking. When you save a Word document as PDF and email it, your name goes with it, embedded in the file.
For business documents this is usually fine. For documents you're sharing publicly — a whistleblower's report, an academic paper submission, a contract you don't want the recipient to know which exact employee drafted — the metadata is a privacy hole. The Vox 2015 piece on metadata-leaked authorship of a TPP analysis is the canonical example: a "leaked" document was attributed to its true source within hours because the Word metadata named the user account, which named the law firm, which named the individual.
This tool does two things. **Read**: shows you exactly what metadata is in the PDF, in a clean table, so you know what's there before you share. **Strip**: rewrites every field to empty and re-saves the PDF. The actual content (text, images, layout, signatures) is preserved bit-for-bit. Only the metadata header changes.
The metadata fields, in practice:
- **Title** — what the document is named in the PDF's own header (often different from the filename). Sometimes set manually, sometimes auto-derived from the first heading. - **Author** — usually the OS account name of the person who created the file. Microsoft Word, LibreOffice, and Pages all default to this. - **Subject** — a one-liner description. Almost never populated by default. - **Keywords** — comma-separated tags. Almost never populated unless someone deliberately set them in the authoring app. - **Creator** — the app that authored the source document. `Microsoft® Word for Microsoft 365`, `LaTeX with hyperref`, `LibreOffice 7.5`, etc. - **Producer** — the tool that exported the source document to PDF. Often a separate library from the Creator. `Acrobat Distiller`, `Microsoft® Word for Microsoft 365`, `xdvipdfmx`, `Skia/PDF`. - **Creation Date** — when the PDF was first generated. - **Modification Date** — when the PDF was last edited (in tools that support PDF editing).
The strip operation sets all eight to empty / epoch. The result is a PDF whose metadata reveals nothing about who made it. A recipient who opens Properties sees blank rows; a forensic analyst who runs `exiftool` sees the same. The PDF version itself is still visible (1.4, 1.7, 2.0), and the embedded font names may still hint at the authoring environment, but the immediate human-readable identifiers are gone.
**Limitations worth knowing.** Stripping the standard metadata doesn't anonymize the PDF completely. Possible residual signals:
- **Embedded fonts**: PDFs include subsets of every font used. The font names may indicate the authoring OS (system fonts like `Calibri` are Windows; `Helvetica Neue` is macOS) or the authoring app (`CMUNRM` is LaTeX's Computer Modern Roman). - **PDF version**: PDF 1.4 was the standard for early-2000s authoring tools; PDF 1.7 is current Office defaults; PDF 2.0 is newer apps. The version itself is a coarse fingerprint. - **Layout fingerprints**: subtle pixel-positioning differences can identify the rendering engine (Skia vs Acrobat vs Quartz) and sometimes the OS. - **XMP metadata packets**: some PDFs embed a verbose XMP metadata block that duplicates the standard fields. This tool only modifies the standard fields; the XMP block (if present) is unchanged. For complete metadata removal, you'd need a tool that rewrites both. (`exiftool -all=` is the command-line option.)
For most purposes, stripping the standard fields is enough — the recipient who opens Properties sees nothing, and casual scrapers see nothing. For high-stakes anonymity (whistleblowing, legal document drops), the additional steps in the limitations above are worth taking.
**Encrypted PDFs.** Password-protected PDFs can't be modified without decryption first. pdf-lib doesn't decrypt in-browser. If your PDF is encrypted, decrypt it externally (Adobe Reader's Properties → Security tab → Save As without security) and then run the unencrypted version through this tool.
**Privacy.** This is a privacy tool, so the no-network promise is the whole point. pdf-lib runs as JavaScript in your tab. The PDF is read into memory via `File.arrayBuffer()`, parsed by pdf-lib, displayed in the table, optionally rewritten with cleared metadata, and downloaded locally via blob URL. The file never leaves your browser. Verify in DevTools — the network panel stays empty during operation.
**Edge cases handled:** PDFs with no metadata set (all fields show "not set"); PDFs with very long titles or keywords (table cells scroll); PDFs with malformed dates (parsed as epoch, displayed as 1970); PDFs that pdf-lib can't parse (clear error with the parser's complaint); blob URL lifecycle properly managed on unmount.
Related Tools
Frequently Asked Questions
What's actually stored in PDF metadata?
Eight standard fields: Title, Author, Subject, Keywords, Creator (the app that authored the document, like Microsoft Word or LaTeX), Producer (the tool that exported it to PDF, often a different app from the Creator), Creation Date, Modification Date. Plus a few internal fields (PDF version, encryption flags) that aren't user-visible. Apps can also embed extension metadata in XMP packets — this tool reads the standard fields; XMP is much larger and often duplicates the same data.
Why does my PDF have my name in it?
Because Microsoft Word and most other authoring apps automatically set the Author field to your OS username or your Office account name. The Creator field shows the application (`Microsoft® Word for Microsoft 365`). When you 'Save as PDF' from Word, both fields get embedded. Most people never see them because PDF viewers hide metadata behind a 'Properties' menu, but anyone who knows where to look can read it — including the recipient of a PDF you 'anonymized' before sending.
What does 'Strip all metadata' do?
It rewrites every metadata field to empty: title, author, subject, keywords, creator, producer all set to empty strings; creation and modification dates set to the epoch (1970-01-01). The PDF content (text, images, layout) is unchanged. The output is a regular PDF with no identifying metadata — safe to share publicly without leaking your name, your employer, or the app you used.
Does stripping protect me from forensic analysis?
Mostly, but not completely. Stripping the standard metadata fields handles the obvious case — a recipient opening Properties in their PDF viewer sees nothing. Advanced forensic tools can still extract residual signals: fonts embedded in the PDF (which may identify your fonts and OS), the PDF version (which narrows down the authoring app), exact pixel coordinates of layout elements (which can fingerprint the rendering engine). For truly anonymous PDFs, generate them on a clean VM with generic fonts, or convert to a flattened image-based PDF that removes structured content entirely.
Why are creation and modification dates set to 1970?
Because pdf-lib doesn't expose an API to *remove* date fields entirely — only to *set* them. The epoch (1970-01-01 00:00:00 UTC) is the universal 'null date' convention. A recipient seeing 1970 as the creation date will recognize that the dates have been stripped, which is fine — the goal is privacy, not deception. If you wanted to deceive, you'd set a plausible-looking date instead, which this tool doesn't offer because that's a different intent.
Does this work on encrypted PDFs?
No. pdf-lib can't decrypt password-protected PDFs in the browser. If the file is encrypted, the tool fails with a clear error. Decrypt the PDF first with a password (using a desktop tool like Adobe Reader's File → Properties → Security tab), then run the unencrypted result through here.
Is the PDF uploaded to a server?
No. pdf-lib runs in your browser as a JS library. The PDF is read into memory, parsed in-tab, displayed, optionally rewritten, and downloaded locally. No file ever crosses the network. Verify in DevTools — the network panel is empty during operation.