Home / Products / Platforms: Java, .NET, Python / PDFTextStream.Python

PDFTextStream.Python

Python is architecturally quite different than Java in many ways. Therefore, using a Java library like PDFTextStream from within a Python application requires some bridge-building. While many “compatibility layers” have often disappointed in the past (especially with regard to performance), the integration approach embodied in PDFTextStream.Python results in Python developers being able to use PDFTextStream.Python as if it were a native Python module. Therefore, it is comprehensive, high-fidelity, and comes without significant developer effort, performance compromises, or API complications.

Learn what makes PDFTextStream the best >>

See PDFTextStream in action >>

Download PDFTextStream.Python >>

Requirements and Architecture

Using PDFTextStream.Python requires Python 2.3 or higher and the JPype Python module. JPype is an open source module that allows a Python process to transparently utilize Java components by embedding a Java virtual machine (JVM) instance within the host Python process.

(JPype is licensed under the liberal Apache License v2.0, making it possible to redistribute it with commercial products. Therefore, licensing PDFTextStream.Python for inclusion in your own product on an OEM basis is perfectly straightforward and poses no threat to your product's license.)

 


Figure 1: PDFTextStream.Python integration architecture.

 

Figure 1 depicts the architecture involved when utilizing PDFTextStream.Python. Conceptually, this architecture is very straightforward:

  • Your Python application code requests JPype to initialize a JVM, which uses JNI to do so
  • Python application code then imports and uses classes and functions available from the JVM, including PDFTextStream as well as core library components
  • At all times, classes, function calls, and return values are transparently converted to native Java invocations and back by JPype and JNI

Installation

Please refer to the PDFTextStream Developer's Guide for the proper installation procedure for PDFTextStream.Python.

Typical Usage

Only two or three additional lines of code are necessary to initialize the JVM through JPype. After that, working with PDFTextStream.Python is at least as easy and straightforward as is working with PDFTextStream for Java or PDFTextStream.NET:

from jpype import java, startJVM, JPackage, getDefaultJVMPath startJVM(getDefaultJVMPath(), '-Xmx512m', '-Djava.class.path=/path/to/PDFTextStream.jar') pdftsPackage = JPackage('com.snowtide.pdf') def extractPDFText (pdfFilePath): sb = java.lang.StringBuffer(1024) tgt = pdftsPackage.OutputTarget(sb) pdfts = pdftsPackage.PDFTextStream(java.io.File(pdfFilePath)) pdfts.pipe(tgt) pdfts.close() return sb.toString()

A more comprehensive example, with step-by-step explanations can be found in the PDFTextStream.Python section of the PDFTextStream Developer's Guide.

What's your PDF problem?