Below is a collection of examples and short technical hints to help you use PDFTextStream effectively. All of the example code presented here is in Java -- however, python and .NET users can access the same functionality using the same classes and functions as outlined in the documentation for PDFTextStream.NET and PDFTextStream for python.
Getting Started with PDFTextStream
Getting started with PDFTextStream is very easy, and your application
can begin seeing benefits on day one. Here's a simple example you can
use to see results within minutes.
Read more...
Making Lucene Play Nice with PDF's
Jakarta Lucene is the most popular document indexing and search library
available for Java. PDFTextStream happens to provide a simple yet
powerful solution to one of the most common questions Lucene users
have: how can PDF documents be added to a Lucene index?
Read more...
Using PDFTextStream in Multiple-CPU Environments
PDFTextStream delivers unbeatable throughput and performance, so some
care should be taken to ensure that your team is taking full advantage
of that performance. This is a particluar concern for teams that deploy
PDFTextStream into multiple-CPU environments; this techtip provides a
starting point for addressing that concern.
Read more...
Using PDF Bookmarks as Extraction Delimiters
It is not uncommon to want to access only a particular section of a PDF
document's text content. Such requirements are often stressed in order
to provide more focussed input to downstream text processing, including
indexing, summarization, and knowledge management. Thankfully, many PDF documents (especially those generated using templates from word processors like Microsoft Word) include bookmarks that specify where the bounds of sections of interest are. This tutorial will
show you how to use PDFTextStream to utilize that bookmark information
to yield highly focussed text extracts.
Read more...