frameworker

June 29, 2009

Inside PDQForms

Filed under: Uncategorized — frameworker @ 9:31 pm

Overview

PDQForms is tax preparation software for State and Federal forms. It integrates spreadsheet logic into pdf forms using PDFKit to combine form field information with labels and expressions. The result is a spreadsheet-like application layer over the form.

Creating a PDQ Form

The idea was to automate
everything that could be.

The first step is to extract information for all the form’s fields, or “Annotations” in PDF lingo.

We need the type of Annotation: TEXT or CHECKBOX, and its PAGE, RECT and MAXLEN values.

This is accomplished with a PDFKit based program called PDQ Annotation Editor, which outputs a text file.

Next we must define expressions that contain the logic for each of the form’s fields. This is just like programming a spreadsheet.

Adding this information to the Annotation meta-data completes our task.

We now have a text file containing the information PDQForms will need to automate the form.

We then “bake” this back into the original PDF file.

And the form can now be opened with PDQForms!

Advertisements

June 23, 2009

TABLE LOOKUPS IN PDQFORMS

Filed under: Uncategorized — frameworker @ 5:26 am

Since some tax forms use tables as well as formulae, I had to implement a tax table lookup algorithm. Things were complicated by the fact that these tables were not always available in sorted format.

I first tried using the unsorted tax tables. This required doing a comparison of each table entry until the proper tax bracket was found. But tax lookup tends to occur whenever you change any numeric field on the form, since this usually affects taxable income. So, doing a comparison for each entry, while easy, was way too slow! Suddenly I was seeing a totally unacceptable delay.

The solution was to pre-sort the tables, shifting the performance burden completely out of the user’s work flow, and then to use an efficient, recursive binary search, algorithm to do the table lookup in PDQForms.

PREPARING A TABLE

Copy the table from the tax handbook into a text file and clean-out any patches of non-table data. Fortunately, the topology of non-table data is amenable to doing this, and also, the table data consists of an integral number of tuples on each line.

Finally, filter-out commas, and prepend TUPLESIZE to the table.

The table we’re creating will have the same name as the form it applies to, but with a suffix that’s specified in the call to createTupleTable. Here it’s “tax”.


    NSString * theTableContents = [self createTupleTable: theTableData named: @"tax"];

PRE-SORTING THE TABLES

1. Put the elements (NSStrings) of each tuple into an array (NSArray)

2. Put these tuples into an array so they can be sorted.

3. Sort the array of tuples, from smallest to largest, using “sortedArrayUsingFunction”
and passing it the comparison function “sortByValue”:


    NSArray *sortedTuples = [tuples sortedArrayUsingFunction: sortByValue context: 0];

    NSInteger sortByValue(id tuple1, id tuple2, void *context)
    {
        double value1 = [[tuple1 objectAtIndex: 0] doubleValue];
        double value2 = [[tuple2 objectAtIndex: 0] doubleValue];
	
        if      (value1 > value2) return NSOrderedDescending;
        else if (value1 < value2) return NSOrderedAscending;
        else                      return NSOrderedSame;
    }

&#91;/sourcecode&#93;

<strong>4.</strong> Convert the sorted tuples back into a single NSString.

<strong>5.</strong> Finally, "bake" the table data into the pdq file using PDFKit (10.5).

<strong>DOING TABLE LOOKUPS IN PDQFORMS</strong>

This stack-trace depicts the tax calculation mechanism:

<strong>[PDQAbstractWidget evaluateExpression: theTableLookup]</strong>
Table lookups are processed as a special case by evaluateExpression.  
e.g. =TAXTABLE("TAXTABLE","6",INCOME,FILING_STATUS)
evaluateExpression calls doTableLookup. 

&nbsp;    <strong>[PDQAbstractWidget doTableLookup: theTableLookup]</strong>
&nbsp;    doTableLookup is analagous to evaluateFunction, but for tableLookups.
&nbsp;    doTableLookup packages the call's parameters and calls "execute."
&nbsp;    Note that quoted parameters are passed in as literal strings.

&nbsp; &nbsp;        <strong>[NSString+PDQFunctionAdditions execute: parameterArray]</strong>
&nbsp; &nbsp;        execute dispatches the call to taxTableLookup.

&nbsp; &nbsp;                <strong>[NSString+PDQFunctionAdditions taxtablelookup: parameterArray]</strong>
&nbsp; &nbsp;                taxTableLookup unpacks the parameters, finds the taxBracket 
&nbsp; &nbsp;                and uses it to determine the tax.

&nbsp; &nbsp; &nbsp;                    <strong>[NSArray+PDQTableAdditions taxBracket]</strong>
&nbsp; &nbsp; &nbsp;                    taxBracket is the recursive binary search algorithm,
&nbsp; &nbsp; &nbsp;                    initially called from taxTableLookup,
&nbsp; &nbsp; &nbsp;                    that does the "heavy lifting."
&nbsp; &nbsp; &nbsp;                    taxBracket calls the helper routine taxInTuple

&nbsp; &nbsp; &nbsp;                <strong>[NSArray+PDQTableAdditions taxInTuple]</strong>
&nbsp; &nbsp; &nbsp;                taxInTuple tests to see if taxable income falls
&nbsp; &nbsp; &nbsp;                within a bracket, to its left, or to its right.
&nbsp; &nbsp; &nbsp;                If income isn't in the current tax bracket
&nbsp; &nbsp; &nbsp;                taxInTuple's return value is then used 
&nbsp; &nbsp; &nbsp;                to seed recursive calls to taxBracket.

taxBracket and taxInTuple are shown below.



- (int) taxBracket: (double) income 
        tupleWidth: (int) tupleSize 
      startingWith: (int) firstTuple 
     andEndingWith: (int) lastTuple
{
    int tupleIndex = -1;

    int comparison;

    int middleTuple = firstTuple+(lastTuple-firstTuple)/2;

    comparison = [self taxInTuple: income 
                          atIndex: middleTuple 
                       tupleWidth: tupleSize];
	
    if (comparison == 0)
    {
        tupleIndex = middleTuple;
    }
    else 
    if (comparison == 1)
    {
        tupleIndex = [self taxBracket: income 
                           tupleWidth: tupleSize 
                         startingWith: middleTuple+1 
                        andEndingWith: lastTuple];
    }
    else 
    if (comparison == -1)
    {
        tupleIndex = [self taxBracket: income 
                           tupleWidth: tupleSize 
                         startingWith: firstTuple 
                        andEndingWith: middleTuple-1];
    }

    return tupleIndex;
}

// Tuples are laid out end to end as one long array.
// The first two items of a tuple are its income bracket.
// the trailing items are the tax for each filing status
// in that tuple’s income bracket.
– (int) taxInTuple: (double) income
atIndex: (int) tupleIndex
tupleWidth: (int) tupleSize
{
int leftIndex = tupleIndex*tupleSize;
int rightIndex = leftIndex + 1;

NSString * leftItem = [self objectAtIndex: leftIndex];
NSString * rightItem = [self objectAtIndex: rightIndex];

double leftValue = [leftItem doubleValue];
double rightValue = [rightItem doubleValue];

// Test if intervals overlap:
//
// IF YES use <= for right value // IF NO use < for right value. // CA brackets don't overlap // they're [x,y] [y+1,z]. // So when income is exactly "y" // we want to match the ONLY bracket containing "y", // not the higher of two brackets! // One even, one odd means the intervals shouldn't "overlap." if ((int)leftValue%2 != (int)rightValue%2) { if ((income >= leftValue) && (income <= rightValue)) { return 0; } } else // IRS brackets are [x,y] [y,z] so when income is exactly "y" // we want to match the higher of the two brackets! { if ((income >= leftValue) && (income < rightValue)) { return 0; } } if (income < leftValue) { return -1; } else { return 1; } } [/sourcecode]

HOW I TEST FORMS

Filed under: Uncategorized — frameworker @ 4:58 am

OVERVIEW

The PDQForms Debug target contains a Debug menu with commands that are enabled if a PDQDocument is open.

  Read test file…
  Save test file…

The “Save test file…” command journals edited forms into test files. The “Read test file…” command causes the test file to be read back into the current form, and verifies that values of the calculated fields are correct. This allows for rapid regression testing of forms after making changes to the code base.

I TEST FILE

Test files have the same names as the forms they “exercise,” but with the suffix “test”.

Test files are kept in the same folder as their corresponding “pdq” file.

The test file consists of FIELD_NAME, STRING_VALUE pairs, one pair per line.

II TEST COMMAND

Saving the test file writes out the form’s FIELD_NAME, STRING_VALUE data.

Reading the file back into the current form, populates each FIELD_NAME with its STRING_VALUE and causes that field’s dependents to update.

But if a STRING_VALUE read from the test file begins with an “=”, then it is an expected result and it will be compared to the calculated value of the form’s field, not stuffed into the form.

III TEST RESULTS

The test command logs descrepencies in calculated values:

“Unexpected value for calculated widget: ‘WidgetID’ shown: ‘itsValue’ expected: ‘itsExpectedValue'”

If there are no descrepencies, the test command logs the message:

“All calculated widgets have expected values :-)”

IV SUMMARY

Since the accuracy of calculations in forms is paramount, this simple but powerful approach solves an important problem.

EXPRESSIONS IN PDQFORMS

Filed under: Uncategorized — frameworker @ 4:50 am

I realize this document is pretty dry, but I needed to document this implementation to facilitate discussion with othere in the quest to improve it. Without going into why I took the approach I did, I will say that I did become very proficient in using strings with Objective C 🙂

I OVERVIEW OF EXPRESSION SYNTAX

Expressions may be FORMULAE (including BOOLEAN FORMULAE), FUNCTIONS, TABLE LOOKUPS or CONDITIONAL EXPRESSIONS.

Expressions are written in infix notation, just as you’d expect. They consist of OPERATORS, OPERANDS, SEPARATORS and FUNCTION NAMES.

A. OPERATORS include BOOLEAN OPERATORS

== <= or == or => != or && || !

and ARITHMETIC OPERATORS

+ – * % (modulus) ^ (exponentiation)

B. OPERANDS may be cell references or numbers. Numbers are evaluated as double-precision floating point.

C. SEPARATORS include ( ) , ;

The “=” character is prepended to all expressions, just like in VisiCalc.

D. All other tokens, those which are not OPERATORS, OPERANDS or SEPARATORS, are FUNCTION NAMES.

FUNCTION NAMES that end with “TABLE” are a special case. They perform Table Lookups for forms that use Tax Tables.

II FORMULAE

Operands, within formulae, may be cell references or numbers. Numbers are evaluated as double-precision floating point.

III BOOLEAN FORMULAE

Boolean Formulae, expressions that resolve to YES (1) or NO (0), are used in the scriptIf component of Conditional Expressions. They aren’t used elsewhere at this time, but they could be. Any positive number could be interpreted as YES, but we currently require “1”.

IV FUNCTIONS

Functions are snippets of code that get dispatched interpretively.

Function arguments may be formulae or can, themselves, be functions.

Quotes are used to transmit function (and table lookup) arguments as literals.

*Describe how a function gets added to PDQForms and the function dispatch mechanism.*
*Show the recursive function parsing routine.*

V TABLE LOOKUPS

Table Lookups are used to find income tax, for example, for Federal filers whose taxable income is less than $100,000.

See the blog post on TABLE LOOKUPS IN PDQFORMS for a more detailed description of how they’re implemented.

VI CONDITIONAL EXPRESSIONS

Conditional Expressions are of the form =IF(scriptIf;scriptThen;scriptElse)

(“IF” is a reserved word)

ScriptIf, a Boolean Formula, resolves to a value of zero or one. If scriptIf resolves to 0, scriptThen is executed, otherwise scriptElse is executed. ScriptThen and scriptElse can be Table Lookups, Functions or Formulae.

Conditional Expressions may not be “nested.”

VII LIMITATIONS

Expressions may not contain functions. For now, if an expression needs an embedded function, an invisible widget, that calls the function, can be referenced from the expression.

Blog at WordPress.com.