Home / Support / Tutorials

Tutorials

Below is a collection of examples and short technical hints to help you use PDFTextStream effectively. All of the example code presented here is in Java -- however, python and .NET users can access the same functionality using the same classes and functions as outlined in the documentation for PDFTextStream.NET and PDFTextStream for python.

Getting Started with PDFTextStream
Getting started with PDFTextStream is very easy, and your application can begin seeing benefits on day one. Here's a simple example you can use to see results within minutes.
Read more...

Making Lucene Play Nice with PDF's
Jakarta Lucene is the most popular document indexing and search library available for Java. PDFTextStream happens to provide a simple yet powerful solution to one of the most common questions Lucene users have: how can PDF documents be added to a Lucene index?
Read more...

Using PDFTextStream in Multiple-CPU Environments
PDFTextStream delivers unbeatable throughput and performance, so some care should be taken to ensure that your team is taking full advantage of that performance. This is a particluar concern for teams that deploy PDFTextStream into multiple-CPU environments; this techtip provides a starting point for addressing that concern.
Read more...

Using PDF Bookmarks as Extraction Delimiters
It is not uncommon to want to access only a particular section of a PDF document's text content. Such requirements are often stressed in order to provide more focussed input to downstream text processing, including indexing, summarization, and knowledge management. Thankfully, many PDF documents (especially those generated using templates from word processors like Microsoft Word) include bookmarks that specify where the bounds of sections of interest are. This tutorial will show you how to use PDFTextStream to utilize that bookmark information to yield highly focussed text extracts.
Read more...

What's your PDF problem?