April 29, 2013

Extracting & Analyzing PDF Form Data

Filed under: Uncategorized — frameworker @ 9:28 pm

PDF Form Export is an OS X app that extracts field names and values from PDF forms snd puts them into a text file.

To use the app, just drop a PDF Form on it – or “Open…” the form from it. A PDZ file is created in the directory containing the form. The PDZ file contains one line per field – the field names and values are tab-separated – like this:

  • NAME  \T  Some Guy
  • ADDRESS  \T  1234 Somewhere St.
  • etc.

You can then paste the exported data into a spreadsheet or manipulate it with a script. My Benford’s Analysis program, which I wrote as a component of a tax-return-audit-risk-identification suite, is an example of doing this.

When businesses provide forms to customers to be filled out and returned, this utility simplifies data extraction avoiding tedious and error-prone recopying.

The program is written in Objective-C with Cocoa and PDF Kit. PDF Kit makes it straightforward to iterate through a PDF’s form fields extracting their names and values.


