You send a proposal to a client. Thirty pages, carefully worded, formatted. What you see is the content you wrote.

What the file contains is more than that. The author's full name. The software used to create it. When the first draft was started and when the last edit was saved. The name of the person who last modified it. If images are embedded, possibly the GPS coordinates of where they were taken. If the document went through rounds of revision, possibly the text that was deleted along the way.

None of this is visible when you open the file. All of it is readable by anyone who knows where to look.

Document properties: the basics

Every PDF carries a set of document properties. Open any PDF in Preview, press ⌘I, and look at what's listed.

Author. The name of the person (or the username of the account) that created the original document. If a paralegal drafted the contract in Word and exported it to PDF, their name is in the file. Not the partner's name. Not the firm's name. The person who pressed "Save As."

Producer and Creator. The software used to generate the file. "Microsoft Word 2021" or "Adobe Acrobat Pro DC" or "LaTeX with hyperref." This reveals your toolchain, your software versions, and sometimes your operating system.

Dates. Creation date and modification date, both with full timestamps. If you created a draft on Tuesday and revised it Thursday morning at 9:47am, that sequence is recorded. For sensitive negotiations, timestamps can reveal how quickly (or slowly) your side responded.

These fields exist because PDF is a container format designed to carry information about the document, not just the document itself. Most of this information was never meant for the recipient.

XMP metadata: the deeper layer

Beyond the basic properties, PDFs can carry XMP (Extensible Metadata Platform) blocks. This is a structured XML payload embedded in the file, often invisible to standard viewers.

XMP can include:

  • The document title, description, and keywords
  • Copyright and licensing information
  • The full editing history across applications (if the PDF was touched by multiple Adobe tools)
  • Custom fields added by enterprise document management systems

A document that passed through a corporate DMS might carry records of every user who modified it. Not because someone intended to share that information, but because the system logged it in the file's XMP history, and no one removed it before distribution.

Embedded images: location, device, time

When you paste a photo or a scanned signature into a document, the image may carry its own metadata. Depending on how the PDF was created, EXIF data from the original JPEG or TIFF can survive embedding. That data can include:

  • GPS coordinates of where the photo was taken
  • Camera make and model (or phone model)
  • The date and time of capture
  • Software used for post-processing

A consulting firm includes a site photo in a report. The photo's EXIF data reveals the exact address of the site visit, the date it happened, and the phone model used. The text of the report might not mention any of this. The image does.

Incremental saves: the text you deleted

PDF files support incremental saving. Instead of rewriting the entire file, the application appends changes. This is efficient for large documents but has a side effect: earlier versions of the content may still exist in the file.

A contract that went through negotiation might contain the original terms underneath the final ones. Deleted paragraphs, revised figures, earlier pricing. Standard PDF viewers won't display them. A hex editor or forensic tool will.

Not every PDF contains recoverable deleted content. But enough do that it's worth knowing the mechanism exists, especially for documents that went through multiple rounds of revision before export.

Annotations and comments

PDF supports annotations: highlights, sticky notes, strikethrough marks, free-text comments. In collaborative workflows, these accumulate during review. When the final version is exported, the annotations may still be embedded in the file, even if they're not visible in the recipient's viewer.

Overlays are not redactions

A court filing in 2019 made headlines when redacted portions of a legal document were found to be recoverable because the "redaction" was a black overlay drawn on top of the text, not actual removal. The text underneath was selectable.

This is an extreme case, but the principle applies broadly. Black rectangles drawn over text in a PDF don't remove the text. They cover it. The difference matters.

Why this matters for professionals

These aren't theoretical risks. They're the kind of information that leaks routinely in professional work:

Legal. A contract sent to opposing counsel contains the author field showing which associate drafted it. Modification timestamps reveal the response timeline. "Redacted" sections remain selectable under black overlays.

Consulting. A deliverable includes embedded photos from a client site. The GPS data identifies the location. The creation date reveals when work actually began, regardless of what the project timeline says.

HR. Resumes converted to PDF carry the author field from whoever did the conversion. A candidate's "anonymous" application isn't anonymous if the HR coordinator's name is in the document properties.

Finance. A financial model exported to PDF includes the modification history. The timestamps show when figures were last changed, and sometimes by how many versions.

AI workflows. When you upload a PDF to an AI tool for analysis, some document parsers extract metadata alongside the text. The model might process not just the content you intended to share but also the authoring context you didn't.

How to check your own files

On macOS, open a PDF in Preview and press ⌘I to view document properties. For more detail, install exiftool via Homebrew:

brew install exiftool

Then run it against any PDF:

exiftool your-document.pdf

The output lists every metadata field in the file. For most professional documents, the result is longer than expected.

For embedded images, the same tool works:

exiftool image-inside-the-document.jpg

Don't share your original. Share a clean one.

Removing metadata is a good start. Tools like exiftool -all:all= file.pdf can strip document-level properties. On macOS, using Print → Save as PDF produces a new file that may carry less metadata than the original.

But stripping metadata is a game of whack-a-mole. Document properties, XMP blocks, image EXIF, incremental save layers, annotation remnants. Miss one, and the information is still there. And metadata is only half the problem. The text itself may contain names, addresses, financial details, and identifiers that shouldn't be shared either.

The more reliable approach is to never send the original at all.

Your working document is an internal artifact. It accumulated metadata, revision history, and embedded context over its lifetime. That's fine for internal use. What you share externally should be a different object: a new document containing only the content you reviewed and approved.

That's the approach RedMatiq takes. When you export, it doesn't modify your original file or strip fields from a copy. It generates a new PDF from the redacted text. A clean rendering of the content you chose to include. No metadata inheritance, no hidden layers, no structural artifacts from the original. The output isn't a replica of your internal file. It's a new document, built from scratch, containing exactly what you intended to share and nothing else.

What you share should be a decision, not a copy

The content of a document is deliberate. Every word was chosen, reviewed, and approved for the recipient. The metadata is different. It's what the software remembered without asking. Author names, timestamps, GPS coordinates, revision layers, software versions.

The document you share should contain what you decided to share. Not what the software accumulated on your behalf.

Related reading


Share a clean document, not your original

RedMatiq exports are built from scratch. Only the content you reviewed. No metadata, no hidden layers, no history.