PDFTextStream was built from the ground up specifically to meet the most stringent PDF text and metadata content extraction requirements. Its API is comprehensive, and includes the following features:
Extensive support for the PDF file format specification and all known variants Learn More
Full Unicode-capable text extraction facilities, including support for extracting Chinese, Japanese, and Korean text, in both horizontal and vertical writing modes
Full support for updating interactive AcroForms (including text, checkbox, radio button, and choice fields) api doc
Comprehensive PDF document metadata access
It's easy to find out why you should use PDFTextStream. Other parts of our site are dedicated to PDFTextStream's comprehensive PDF file format support and its unbeatable performance.
Page-level object model via com.snowtide.pdf.Page (api doc), providing page-specific text extraction (api doc) and page metrics (height, width, rotation angle, etc)
Acroform (interactive form) data extraction api doc
PDF bookmark (document outline) access api doc
PDF annotation access (including Link (web URL) annotations) api doc
EncryptionInfo API: provides access to PDF document encryption parameters api doc
Text-piping API for super-fast text extraction api doc provides hooks for customizing how PDF text extracts are formatted (such as when the visual layout of each page needs to be maintained)
Selective regional text extraction built-in, ideal for extracting data from fixed-format forms api doc
Optional in-memory operation api doc
Built-in PDF merge utility api doc
PDF to HTML exporter api doc
PDFTextStream subclasses java.io.Reader, which ensures a simple, familiar interface, and straightforward integration opportunities with existing components that expect a java.io.Reader instance.
Flexible logging toolkit hooks
Built-in support for logging to standard out, Log4J, and java.util.logging toolkits
Ability to plug in custom logging implementations api doc