PDF to Text Node

The PDF to Text node extracts all text contained within a PDF document. During this process PAS will extract all text in the document and insert the raw text data into the message payload as a .txt file. The text will be exported with no formatting except for return characters (designates move to the next line).

Note: Only documents that contain text using font characters can be extracted. If the PDF contains scanned images of documents or text created using path content you will first need to use the OCR Node in order to be able to extract text.

General Settings

Pages

All Pages: all pages in the document

Page Range: set custom range of pages if needed using values separated by commas. For example, to print only pages 2 to 4 & 6 to 12 & 20 of a 30 page document enter “2-4, 6-12, 20″. Documents using Page Labels will require the exact page label to be entered (i.e. iv, v, etc...)

First Page: the first page of the document

Last Page: the last page of the document

Subset: Subset of the currently set page range convert to text. Choose from the below options

All pages in range: all the pages in the set range
Even pages only: only the even pages in the set range (i.e. 2, 4, 6, etc...)
Odd pages only: only the odd pages in the set range (i.e. 1, 3, 5, etc...)

Qoppa Software's PDF Automation Server for Windows, Linux, Unix, and macOS

Automate PDF Document Workflows through RESTful Web Services & Folder Watching