com.qoppa.ocr
Class TessJNI

java.lang.Object
  extended by com.qoppa.ocr.TessJNI

public class TessJNI
extends Object

This class provides a native interface to the Tesseract OCR engine.

Author:
Qoppa Software

Constructor Summary
TessJNI()
           
 
Method Summary
 String performOCR(String language, BufferedImage image)
          Performs OCR on an image and returns an hOCR result string.
 String performOCR(String language, PDFPage pdfPage, int dpi)
          Performs OCR on a PDF page and returns an hOCR result string.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TessJNI

public TessJNI()
Method Detail

performOCR

public String performOCR(String language,
                         BufferedImage image)
                  throws OCRException
Performs OCR on an image and returns an hOCR result string. This method makes a call to the Tesseract OCR engine to perform character recognition on the image. The results are in hOCR format, a standard format for OCR results that includes recognized text as well as location and size information.

Parameters:
language - The language to use in performing the OCR.
image - The image to process
Returns:
The OCR results, in hOCR format.
Throws:
OCRException

performOCR

public String performOCR(String language,
                         PDFPage pdfPage,
                         int dpi)
                  throws PDFException,
                         OCRException
Performs OCR on a PDF page and returns an hOCR result string. This method converts the PDF page to an image and then makes a call to the Tesseract OCR engine to perform character recognition on the image. The results are in hOCR format, a standard format for OCR results that includes recognized text as well as location and size information.

Parameters:
language - The language to use in performing the OCR.
pdfPage - The PDF page to process
Returns:
The OCR results, in hOCR format.
Throws:
OCRException
PDFException