We are very proud to introduce you to PDFTextStream v2.0 (read the press release). Having worked on this release for nearly 18 months, we are confident that you will find it as compelling today as PDFTextStream v1.0 was two years ago.
Each of the major changes introduced in PDFTextStream v2.0 are the direct result of us listening to our customers constantly pushing us to deliver more. These major changes include:
Starting with v2.1, PDFTextStream can update persistent form field values in PDF documents that contain interactive AcroForms. This is in addition to its existing form field value extraction capability. Document workflows continue to be migrated in astounding numbers to 100% paperless solutions, PDF documents with AcroForms in particular. Given this trend, it's good to know that PDFTextStream now delivers robust form-filling and updated PDF document writing capabilities to enable your enterprise or products to deliver customized PDF forms for customers, business partners, government compliance, and data archival requirements.
One of the most frequently asked-for "features" was that PDFTextStream should be available for platforms other than Java. And, while PDFTextStream's origins are rooted firmly in the Java world, we have found ways to bring PDFTextStream to two more very popular, very powerful platforms, .NET and Python.
PDFTextStream.NET and PDFTextStream.Python don't ask you to compromise: you get all of the performance, functionality, robustness, and features provided by PDFTextStream for Java, but on the development platform of your choice. Rarely is life this kind.
Today's is a global marketplace, and surviving means being able to work well with others, regardless of their location or language. So, it is fitting that PDFTextStream v2.0 now supports extracting Chinese, Japanese, and Korean (CJK) text from PDF documents (as well as other less well known double-byte character sets).
PDFTextStream v2.0's CJK text extraction capabilities aren't half baked or bolted-on; they've been built into the library at the lowest levels. This enables PDFTextStream v2.0 to:
PDFTextStream's performance has always led the pack, but PDFTextStream v2.0 raises the bar yet again. PDFTextStream v2.0 is up to twice as fast as PDFTextStream v1.4, so it blows away the competition more than ever before.
This translates into increased efficencies and reduced costs for your IT department, improved productivity for your staff and users, and a better experience for your customers.
Finding the mission-critical data trapped in the sea of unstructured content in your enterprise and integrating it into your existing systems is one of the most valuable contributions an IT team can make. PDFTextStream v2.0 introduces some functionality that can make tackling PDF-bound data extraction and conversion goals easier:
Extracting tabular data from PDF documents is one of the most common unstructured content tasks. PDFTextStream v2.0 makes this job a lot easier by:
Some kinds of tabular content (or other data formats) cannot be recognized as tables. In such circumstances, extracting the text of the source PDF files while retaining each page's visual layout can simplify the conversion of unstructured content into structure data elements. PDFTextStream v2.0 makes this possible via the new VisualOutputTarget class, which keeps the text extracted from each PDF file formatted as it appears when viewing it in Adobe Acrobat, for example.
There's a ton of additional features and beneficial tweaks included in PDFTextStream. Some of these include:
Please refer to the change log for a full list of the changes present in PDFTextStream v2.0.