Home / Products / Full PDF Format Support

Full PDF Format Support

The official PDF file format specification (published by Adobe) is large, and complex. PDF files can be rich, dynamic documents, and getting to all of the interesting and useful parts of them (i.e. their content, text, metadata, etc) is a daunting task.

Further, Adobe's specification only defines how PDF documents are supposed to be constructed. Experience shows that our applications must often process PDF documents from multiple sources, each of which may (and do) generate PDF files that sometimes bend and often break the "official" PDF specification.

This is just one of the many reasons why we aggressively support and maintain PDFTextStream. Doing so allows us to guarantee maximum compatibility with all PDF document formats, regardless of their source or to what degree they violate certain rules of good PDF file format etiquette.

Our aim is to ensure that PDFTextStream can extract the full text of any PDF document that you can copy-and-paste text from using Adobe Acrobat -- and do it accurately, in an automated, high-performance environment.

PDF Format Support Details

The range of PDF file format features (and quirks!) that PDFTextStream supports is broad and deep. Below is a list of the major points of the PDF file format that PDFTextStream supports; if you are aware of a particular detail that is not listed here, then please feel free to contact us to confirm that PDFTextStream supports the file format aspect you are interested in.

(For details on which parts of the PDFTextStream API correspond to these points of support for the PDF file format, see our features page.)

From PDFTextStream, you can expect:

  • Compatibility with all versions of the PDF document specification. This includes:

    • v1.0 (Acrobat 1)
    • v1.1 (Acrobat 2)
    • v1.2 (Acrobat 3)
    • v1.3 (Acrobat 4)
    • v1.4 (Acrobat 5)
    • v1.5 (Acrobat 6)
    • v1.6 (Acrobat 7)
    • v1.7 (Acrobat 8)
  • Support for decryption of 40- and 128-bit encrypted PDF documents (using the Standard decryption handler)

  • Excellent embedded and standard font and character encoding support (critical for enabling proper layout and spacing of text elements):

    • Type 0
    • Type 1
    • Type 1C
    • TrueType
    • Identity-H and Identity-V encodings
    • CMap encodings (including Chinese, Japanese, and Korean character sets, both horizontal and vertical writing modes)
  • Support for extracting and updating Acroform (interactive form) data

  • Support for extracting text from "searchable image" PDF documents (common in files sourced from an OCR process)

  • Support for all varieties of rotated text (page-level as well as text-level rotations)

  • Support for extracting all PDF bookmark (document outline) data

  • Support for extracting document annotations (including web links)

  • Support for both types of document-wide metadata (classic key/value attributes as well as Adobe XMP XML-format metadata)

  • And much more...

Put PDFTextStream to the test >>

Download PDFTextStream Now >>

What's your PDF problem?