com.qoppa.pdfText
Class PDFText

java.lang.Object
  extended bycom.qoppa.pdfText.PDFText

public class PDFText
extends Object

PDFText is the class to extract the list of words contained in a PDF document as a Vector. It also returns the DocumentInfo for the document as well as the page count.

Author:
Qoppa Software

Constructor Summary
PDFText(InputStream inStream, IPassword password)
          Creates a PDFText object from a PDF InputStream.
PDFText(String fileName, IPassword password)
          Loads a PDFText object from a file.
PDFText(URL url, IPassword password)
          Loads a PDFText object from a URL.
 
Method Summary
 DocumentInfo getDocumentInfo()
          Returns a DocumentInfo object containing the information section of a PDF document (author, title, etc.)
static DocumentInfo getDocumentInfo(InputStream inStream, IPassword password)
          Returns a DocumentInfo object containing the information section of a PDF document (author, title, etc.)
 String getFileName()
          Returns the name of the pdf document.
 Vector getLinesWithPositions(int pageIndex)
          Returns position information for all the lines of text in the specified page of the PDF document.
 int getPageCount()
          Returns the number of pages of the pdf document.
 String getText()
          Returns the text in the pdf document as a String.
 String getText(int pageIndex)
          Returns text contained in the specified page of the pdf document as a String.
static String getVersion()
          Returns version string for jPDFText.
 Vector getWords()
          Returns all words in the pdf document as a Vector of Strings.
 Vector getWords(int pageIndex)
          Returns all words contained in the specified page of the pdf document as a Vector of Strings.
 Vector getWordsWithPositions(int pageIndex)
          Returns position information for all the words in the specified page of the PDF document.
static boolean setAppletKey(String key, Applet applet)
          Method to unlock the production version of the library.
static boolean setKey(String key)
          Method to unlock the production version of the library.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDFText

public PDFText(InputStream inStream,
               IPassword password)
        throws PDFException
Creates a PDFText object from a PDF InputStream.

Parameters:
inStream - InputStream to read the pdf document from.
password - An object that provides passwords to open the document, leave null if not needed. When working with documents that have no passwords, the host application should pass null for the value of this parameter. When documents are known to have passwords, the host application should pass an instance of the PDFPassword class that can hold a single password or a list of passwords.

PDFText

public PDFText(String fileName,
               IPassword password)
        throws PDFException
Loads a PDFText object from a file.

Parameters:
fileName - Name of the PDF file.
password - An object that provides passwords to open the document, leave null if not needed. When working with documents that have no passwords, the host application should pass null for the value of this parameter. When documents are known to have passwords, the host application should pass an instance of the PDFPassword class that can hold a single password or a list of passwords.

PDFText

public PDFText(URL url,
               IPassword password)
        throws PDFException
Loads a PDFText object from a URL.

Parameters:
url - URL pointint to the location of the PDF file.
password - An object that provides passwords to open the document, leave null if not needed. When working with documents that have no passwords, the host application should pass null for the value of this parameter. When documents are known to have passwords, the host application should pass an instance of the PDFPassword class that can hold a single password or a list of passwords.
Method Detail

getDocumentInfo

public DocumentInfo getDocumentInfo()
Returns a DocumentInfo object containing the information section of a PDF document (author, title, etc.)


getDocumentInfo

public static DocumentInfo getDocumentInfo(InputStream inStream,
                                           IPassword password)
                                    throws PDFException
Returns a DocumentInfo object containing the information section of a PDF document (author, title, etc.)

Parameters:
inStream - InputStream to read the pdf document from.
password - An object that provides passwords to open the document, leave null if not needed. When working with documents that have no passwords, the host application should pass null for the value of this parameter. When documents are known to have passwords, the host application should pass an instance of the PDFPassword class that can hold a single password or a list of passwords.
Throws:
PDFException

getFileName

public String getFileName()
Returns the name of the pdf document.


getPageCount

public int getPageCount()
Returns the number of pages of the pdf document.


getText

public String getText()
               throws PDFException
Returns the text in the pdf document as a String. Pages are separated with a return char.

Returns:
Text contained in the pdf document as String.
Throws:
PDFException

getText

public String getText(int pageIndex)
               throws PDFException
Returns text contained in the specified page of the pdf document as a String.

Parameters:
pageIndex - is the 0 based page number. pageIndex = 0 is the first page of the document.
Returns:
Text contained in the specified page as a String.
Throws:
PDFException

setAppletKey

public static boolean setAppletKey(String key,
                                   Applet applet)
Method to unlock the production version of the library.

Parameters:
key - Production key.

setKey

public static boolean setKey(String key)
Method to unlock the production version of the library.

Parameters:
key - Production key.

getVersion

public static String getVersion()
Returns version string for jPDFText. If this is the demo version, the return string will contain 'Demo Version' at the end.


getWords

public Vector getWords()
                throws PDFException
Returns all words in the pdf document as a Vector of Strings.

Returns:
Words contained in the pdf document as a Vector of Strings.
Throws:
PDFException

getWords

public Vector getWords(int pageIndex)
                throws PDFException
Returns all words contained in the specified page of the pdf document as a Vector of Strings.

Parameters:
pageIndex - is the 0 based page number. pageIndex = 0 is the first page of the document.
Returns:
Words contained in the specified page as a Vector of Strings.
Throws:
PDFException

getLinesWithPositions

public Vector getLinesWithPositions(int pageIndex)
                             throws PDFException
Returns position information for all the lines of text in the specified page of the PDF document.

Parameters:
pageIndex - is the 0 based page number. pageIndex = 0 is the first page of the document.
Returns:
A Vector of TextPosition objects.
Throws:
PDFException

getWordsWithPositions

public Vector getWordsWithPositions(int pageIndex)
                             throws PDFException
Returns position information for all the words in the specified page of the PDF document.

Parameters:
pageIndex - is the 0 based page number. pageIndex = 0 is the first page of the document.
Returns:
A Vector of TextPosition objects.
Throws:
PDFException