August 1, 2021

Fly-through – How To Build a FastTax Form

Filed under: Uncategorized — frameworker @ 2:18 am

These are the steps in the tool/procedure pipeline which take you from a fillable PDF file to a smart form which will work with FastTax. Note that if the initial tax form is “flat,” can turn it into a fillable form in just a minute or two.

While f4868 is a simple form that doesn’t touch all the “edges” of the form creation process, it still exposes many of the details involved in form creation; enough to give you a good picture of the process. Here we go!

First, create a folder “f4868” and download f4868.pdf into it.

Duplicate f4868.pdf and add “original” to its name: f4868 original.pdf. There are so many files flying around when building the smart forms that you don’t want to lose track of the original file.

Flatten f4868.pdf by printing it to PDF – making sure it’s printing at 100% – and label it f4868 flat.pdf. This will later be used as the image component of the smart form.

The next thing you need to do is to shoehorn the label names you want to use into the PDF. Here’s how.

You start by making a PDZ file and using it as a scaffold for your new label names.

Use PDFFormExport to make the “raw” PDZ file. PDFFormExport reads a PDF extracts its labels and the values of its widgets and writes them as pairs into a file with the suffix PDZ. Just launch PDFFormExport, open the form with it, then quit the app. This creates a PDZ file in the same folder as the form.

Next, edit your new labels into the PDZ file. Each form has default label names but they’re usually gibberish. You want field names that are clear, year-to-year consistent, and concise so that when they are used as algebraic variables in formulae, the operations being performed are clear.

Open the PDZ file in a spreadsheet. The default label names are in the first column. The value of each widget – or the word ‘none’ if there isn’t a value – is in the second column. Clear the, unneeded, second column and then edit the new label names into it. 

Since the default label names are gibberish, it helps to stay focused on what you’re doing by opening the form in the PDF Annotation Editor while you’re editing the spreadsheet. Then you can tab through it to see which field corresponds to the one you’re editing in the spreadsheet. Edit your labels into the second column of the spreadsheet, this way. Also check periodically to make sure that you are keeping the field names in sync with the form fields! If you do get out of sync, though, it’s not a big deal since you can move part of column 2 down or up, in whichever direction you’re off.

After you’ve finished editing the label names into the second column of the spreadsheet, create a new blank text document and paste the spreadsheet values into it. Then save the text document as the PDZ file. You will have to either rename the raw PDZ file to f4868 raw.pdz or delete it before you can do this, though.

Then duplicate the new PDZ fie and name it f4868 w labels.pdz. Were you to run PDFFormExport in that folder again it would clobber the new PDZ file – the one you worked so hard to create – and you don’t want to lose that work!

Next use PDFFormImport to shoehorn the new labels from the PDZ file into the form. To do this, simply open the document you’re “relabeling” with PDFFormImport – in this case f4868.pdf – and then save it, overwriting the existing PDF. Be sure to say YES in PDFFormImport’s confirmation dialog so you do overwrite the gibberish PDF label names with the meaningful ones!

Quit PDFFormImport and check the PDF file by using PDFAnnotationEditor to tab through the form’s fields, making sure that labels ended up where you expected them to be and also that you entered their names correctly. When you’re satisfied that the form field names are correct, duplicate that PDF and name it f4868 w labels.pdf, so you have a backup copy, just in case.

If you catch any errors, you can edit them in the PDF with PDFAnnotationEditor. Be sure to save the new file over itself and to replace f4868 w labels.pdf with the edited version. Also update the PDZ file and its “w labels” backup if you changed any labels in the PDF.

Building the Fixup File

The FIX file contains important meta-information about the smart-form. Building it is the most complicated step in creating a TAX form, see the chapter on “Fix File Format” in “Inside the Fixup File” for detailed information on this process, including which keywords you add to each row to create the FIX file, shown here:


👉🏿 NO SPACES ARE ALLOWED IN THE FIX FILE except in certain directives like “MAXLEN n.” And there must be a single carriage return (blank line) at the end of the FIX file!

👉🏿 When using a spreadsheet to build the FIX file, be sure to prefix ‘=‘ signs with a ‘/‘ character to keep their cell values from being replaced by calculations – and then to remove the ‘/’s after you copy the spreadsheet into FIX file.

Building the PDX file

The PDX file contains additional information about the form, complementary to that of the FIX file. The two of them will be combined to make a smart form. But first we have to create the PDX file. To do that, just open the PDF that now has your labels with PDFAnnotationExtractor to produce f4868.pdx. The PDX file is written automatically. Then quit the program.


BuildPDVFile is then used to create the PDV file by merging the PDX and FIX files together using the label column as a zipper. To use BuildPDVFile just open the form f4868.pdf with it and then quit. It automatically writes out the PDV file. Simple.


Finally, run EmbedPDVFile. When launched, it embeds the PDV file inside the PDF file. Then save the open document giving it the “.tax” extension. You have now produced the f4868 that works with FastTax!

These are the files that should be in your f4868 folder, in the order in which they were created:

f4868 original.pdf
f4868 flat
f4868 raw.pdz
f4868 w labels.pdf

These are the steps you performed in the form creation pipeline:

PDFFormExport extracts annotation labels and values from the PDF and writes them as a PDZ file.
Edit labels in column 2 of the PDZ file.
PDFFormImport imports the new labels from the PDZ file into the PDF using the old label names in the PDZ file as a zipper.
Use PDFAnnotationEditor to QA the newly labelled PDF, editing out any errors you happen to catch.
Build the FIX file. (See “Inside the Fixup File.”)
PDFAnnotationExtractor creates the PDX file.
BuildPDVFile merges the FIX file and the PDX file into a PDV file.
EmbedPDVFile embeds the PDV file within the PDF file re-suffixed to TAX.


July 7, 2021

Preflight Validation Of Tax Return Data

Filed under: Uncategorized — frameworker @ 8:47 am

IRS eFiling is completely intolerant of error; in order to avoid failing validation against IRS XSD files, it’s imperative that transmitted XML data be both correct and complete. FastTax ensures both.

FastTax ensures correctness through on-the-fly validation that text field data is XSD compliant for eFiling. If the data fails validation, that field is drawn in red so the user knows to correct it.

FastTax ensures completeness by providing a validation command that the user can give to check that all required fields have been entered and see a summary of omissions. But we’re deferring further discussion of this until we’ve finished our discussion on correctness.

The hook into the validation mechanism is based on the Delegation pattern. Each text field (PDQTextField) has an associated widget (PDQWidget) which is its delegate. When the user tabs or clicks out of the text field currently being edited, it loses focus and its delegate receives a controlTextDidEndEditing method call. controlTextDidEndEditing is able to tell if the widget’s text changed because another delegate method, controlTextDidChange, was tracking those changes.

We need to take a step back here to see the preparation for this mechanism. When the form was created, several files were baked into it as Free Text PDF Annotations. When the form was opened in FastTax, these annotations were parsed and added to a dictionary of key data structures that that form needs. The two that interest us for validation are the lookupDict and the xsdDict.

The lookupDict, shown here, lets us use the widgetID to find the XSD type for that widget’s text string. If a widget doesn’t require validation, its ID isn’t included in the lookupDict, so trying to lookup its XSD type will return nil and nothing will happen. For example, currency fields have already been validated using NSFormatters attached to their cells so their IDs are not included in the lookupDict.

But if the lookupDict does return an xsdType for the widgetID, we use it to extract the RegEx from the xsdDict to validate that widget’s data.

So within the delagate’s controlTextDidEndEditing “completion point” we test to see if the widget changed. If so, we use the widgetID as a key into the lookupDict to obtain the xsdType of the delegate’s text field. We then use that xsdType as a key into an xsdDict to obtain the RegEx needed to validate the delegate’s text field and test for a “match” against that.

Notice the simplicity of the RegEx code, below, thanks to the excellent Objective-C-RegEx-Categories by BendyTree which encapsulate NSRegularExpression.


…to be continued.

March 31, 2021


Filed under: Uncategorized — frameworker @ 4:25 am


My Partial Plate app could allow cases to be solved that previously would not have been. It uses a fault-tolerant “approximate string matching” algorithm to avoid false negatives. It geo-sorts matches to float the most likely – geographically closer – candidates to the top of the list. It has a slider to improve visualization of the results list by gradually increasing selectivity. This provides the best of both worlds, geo-sorting and ranked visualization.

I was in a hit-and-run collision that hasn’t been solved by law enforcement because they don’t have the tools they need. So I created a new tool, Partial Plate, that improves on the traditional approach to finding hit-and-run vehicles. I believe that it would likely enable my hit-and-run to be solved. 

Information that is almost complete is available in many hit-and-run cases, and both Partial Plate and the DMV tool give each search result a score that reflects how good a match it is. But the cases are often unsolvable with the DMV tool because of false negatives (the tool is fault-intolerant) or because it misses a vehicle due to the results being ordered only by score (whereas Partial Plate also sorts by geographical location). When you have thousands of entries in your slice of the database, piling candidates together by score alone can have the effect of “data blinding”: the best match may be hidden in plain sight, undifferentiable within its cohort.

Partial Plate solves both these problems through three main innovations that go beyond the DMV’s approach.

  • It uses a fault-tolerant matching algorithm. If there’s no exact match for the license plate characters you specify, Partial Plate will find approximate matches, whereas the DMV will not – one mismatched character and the plate/vehicle is thrown out.
  • It geo-sorts matches and does so without revealing location data! This floats likely candidates to the top of the list, since the farther away from the collision a vehicle is registered the less likely it is to be the offender. Look at it this way: while a truck from Escondido may have been involved in a Bay Area collision, most likely the culprit is much closer to home.
  • It has a slider to improve visualization of the results table. Initially, the whole slice is displayed, in geo-sorted order (with different scores interspersed). The slider lets you gradually increase selectivity by raising the minimum score required for inclusion in the results. This provides the best of both worlds, geo-sorting and ranked visualization. Stopping the slider at an intermediate setting will still intersperse some items of lesser score among the higher ones, but if you move the slider all the way, you’ll see only the best matches – still geo-sorted, but packed together.
  • If the person who saw the plate spots it in the results list it’s possible that they will have an “aha moment” and think “Oh yeah, that was the one!” This could be thought of as “Human AI.”

HOW IT WORKS is a Python tool to find hit-and-run vehicles based on a partial license plate and whatever vehicle information you can provide (make, model, style, range of years, and whether gas, electric, or diesel).

When the program is launched, it displays a dashboard that allows you to enter these parameters:


The PARTIAL_PLATE field has seven positions. You enter the letters and numbers you have for the plate in the places you think they should occupy. If you’re unsure about a position, you can leave it blank. Note that personalized plates can have one of the symbols ❤️, ✋, ★, and ➕ in place of a letter or number; to specify these symbols, enter the characters you get when you shift the numbers 1 through 4 – that is, !, @, #, and $, respectively.

ZIP is the ZIP Code in which the collision occurred. TYPE is car, truck, motorcycle, etc. You know what MAKE, MODEL, and STYLE are. YEAR is the year of the hit-and-run vehicle, or the first year in a range. If there’s a range of years, TO is the last year, otherwise it’s blank. POWER is gas, electric or diesel.

For example:


5ABC 25 94303 CAR  BMW  328i  SEDAN  2000  2015  GAS     


When you’ve entered all the information you have for the hit-and-run vehicle, you click the Search button to perform the search, and its results are displayed in a table. The following sections describe the specific steps that Search performs.


The lookup process first scans a local database separated by type and sorted by make, model, style, and year, extracting the slice of the database that corresponds to the vehicle you’ve described in the dashboard. 

The more specific your description, the smaller the slice will be and the greater the likelihood that the tool will succeed in helping you find the hit-and-run vehicle.

[OMITTING IMPLEMENTATION DETAIL – need “internal” design spec.]


Then the items in the slice are geo-sorted – arranged in ascending order by their distance from the collision site. This floats likely candidates to the top of the list, since the probability of a vehicle being in a collision is inversely proportional to its distance from the collision (based on where the vehicle was registered). The DMV tool does not do this. Geo-sorting can make it easier to find the hit-and-run vehicle by shifting inherently unlikely candidates out of focus.

For details on how this works, see “A Method for Geo-sorting Without Revealing Location Data” below.


Next, the plate of each vehicle in the list is scored based on how closely it matches the entered plate. This is done using a fault-tolerant “approximate string matching” algorithm. 

Characters that don’t match lower an item’s score but don’t necessarily disqualify it. This gives you more leeway in your guesses and thus a much greater likelihood of catching what you’re looking for.

These are the specifics: A correct guess for a character position gives a +1, an incorrect guess a –1. If you leave a position blank because you don’t want to try to guess it, it’s a 0 – neither helping nor hurting the plate’s score. But each blank increases the potential number of matches by at least a factor of 10. So if there are many blanks it will be extremely difficult to find the “needle in the haystack” without a highly specific vehicle (make-model-style-range-of-years) description.

There is one adjustment to the score: if bad guesses decreased the final score of a potential match but it had a run of consecutive character matches exceeding that score, then the run count is used as the score.


Finally, the results table is displayed, with scoring visualized in the first column by a color-coded swatch: green for great matches, yellow for possible ones, and red for unlikely candidates.

There is a slider, discussed previously, which allows you to increase selectivity in the results display (filter out less likely matches) by raising the score required for inclusion in the results.




After Partial Plate finds the slice corresponding to your search parameters, it finds and loads the “distance table” for the collision ZIP Code. The distance table is a list of [zip-code, distance-from-collision] pairs. Because the pairs are sorted by ZIP Code, you can do an efficient binary search to find the distance from the collision for any ZIP Code.

Partial Plate then iterates through the slice, finding the distance from the collision for each entry by doing a lookup in the distance table. That distance is substituted for the entry’s ZIP Code. 

Then the slice is sorted from the entry nearest the collision to the one farthest away. No ZIP Code data has been revealed. And the distance data, an internal parameter, is not included in the results table, so that is also hidden.


August 6, 2018


Filed under: Uncategorized — frameworker @ 10:09 pm

FormKit lets you easily create PDF forms through rapid and precise layout of editable fields.

Build the form described in this fly-through and you’ll understand the basics. The rest of this User’s Guide will fill in the details of FormKit’s operation, introducing additional functions and showing you alternative ways of doing some of the things covered in the fly-through.

Constructing a fillable form with FormKit is easy. You start with a “flat” PDF image of your form; it looks like a form but can’t be filled in. Using FormKit, you lay out fields over the PDF image. When you save it, the fields are “baked into” the PDF and it becomes fillable. Then any user can open it in Preview, fill it in, and save it.

You love chinchillas so much that you’ve decided to start “Chinchillas-R-Us,” a business boarding them. One thing you’ll need will be a registration form to check in new clients.

Legibility is important, so paper forms are out. You also want to make it easy for clients to download, complete, and print the form at home if they wish. That makes it easier on everyone because it lets you focus on the pet rather than the paperwork. You decide that a PDF fill-in form will fit your needs perfectly.

NOTE: In several places in this fly-through you’ll see a “PRO-TIP” heading indicating material that doesn’t strictly belong in the fly-through. It’s included because the context is perfect for introducing the additional useful information. You don’t have to be a pro to benefit from these tips, but they will make you feel like one. 🙂


You use your favorite word processing or graphics app – such as Pages – to create the form. Then you save it as a PDF file. This gives you a “flat” form that could be printed out and filled in by hand. But it’s also the first step toward making the fill-in form you need.

Taking a peek ahead, this is what your completed form will look like when it’s filled in:

Chinchillas-R-US (completed) fitted to reduced once.png

Launch FormKit. Its menu bar and tool palette will appear. Open Chinchillas-R-Us.pdf. It will be the background image over which you lay out form fields.

It’s easier to lay out fields when the form is blown up. So you expand the page to be quite large by zooming in on the document several times using ⌘=.

Notice that FormKit’s page resizing maintains a full page view as you zoom in.


Let’s look at the Preferences before starting to lay out the form. Choose FormKit > Preferences to open them.

Prefs reduced once.png

Use the Image Opacity slider to dim the background image. This will decrease eyestrain while you’re sketching fields onto the form, making it easier to focus on the fields as you lay them out. The fields you sketch will not be dimmed by the slider. And it doesn’t affect the final form; the image there will not be dimmed at all.

Since your background image doesn’t contain borders for check boxes, click Field Shading for check boxes. That will provide a way for the user to know where to click when filling in the form. This won’t affect the form’s appearance during construction, but check boxes will be shaded in the output PDF.

Note that text fields don’t need shading because they’re more clearly demarcated and more easily discoverable; you can see the text pointer when you mouse over one.

You can play with the Stroke preferences as you’re laying out the form to find the settings that make it easiest for you to work with fields.

The Fonts preferences can wait for now, but you’ll need to set them before you save the final version of your form. They’ll be registered in the form and will be used by Preview when the form is filled in.


A typical drawing app, FormKit has a palette of tools. The arrow tool is used for selecting, resizing, and moving fields. The other tools correspond to the type of field that is sketched when they’re selected.

Tool Palette w Labels reduced once.png


Select the text field tool on the tool palette. When you move the mouse pointer over your document window, it changes from an arrow to a plus sign (+). Sketch out a wide, thin rectangle for the “Pet’s Name” field. Notice that it has selection handles that you can use to resize it, and also that an Annotations Inspector panel has appeared. That happens whenever a single field is selected. We don’t need to do anything with the inspector right now, but it will be useful later.


Before you continue sketching text fields, it’s useful to understand how positioning and resizing work.

You want to adjust the text field you just sketched to be a certain size and in a particular location.

Go back to the tool palette and select the arrow tool.

You can adjust a selected field’s position by dragging the field. You can fine-tune its position a pixel at a time using the arrow keys.

You can resize a selected field by dragging one of its handles. Dragging the corner handles allows both horizontal and vertical resizing. Dragging the mid-line handles only allows you to resize perpendicularly to the line the handle is on.

You can fine-tune the height and width of a selected field a pixel at a time using the arrow keys. Option-Up raises it, Option-Down lowers it, Option-Left shrinks it, and Option-Right expands it.

Fiddle with the text field you just sketched to position and resize it the way you want.


There are two approaches to doing text field layout, one in which you copy and paste existing fields to create new ones. Alternatively, you could continue to sketch fields iteratively with the mouse; it’s your choice. Let’s look now at the copy and paste method.

Be sure the arrow tool is selected for this.

The recipe for building a form by copying and pasting is to iteratively copy and then paste one of the completed fields. The pasted item will appear just below the original, slightly offset to the right. Drag it to its destination and lengthen or shorten it. Repeat this, moving down the form rapidly. The fields you’ve laid out won’t be positioned perfectly, but you’ll fine-tune them in the next phase.

Do this with the “Pet’s Name” field, dragging the pasted item into the “Weight” field. Drag the middle handle on its right end and shorten it to be the width you want.

Next, copy and paste the “Weight” field, dragging it down to the “How many times a day” field and adjusting its length.

Then copy and paste the “Pet’s Name” field again, this time dragging the pasted item down to the “Own supply” field. Make it wider by dragging to the right.

Chinchillas-R-US (partial annotated) reduced once.png

Continue with this until you’ve laid out all the text fields.


If you accidentally resize a field’s height when you’re stretching or shrinking it, you can deal with that by selecting it and another field – one with the correct height – using an area selector like the one the Finder uses. Then Option-click on the good field to specify it as the anchor (it turns green) and choose the Align & Size > Make Same Height command.

The anchor is the reference item. The command will make the other items in the selection conform to its attributes.

Note that the Align & Size commands are not enabled unless there’s a compound selection with a designated (green) anchor.

Align & Size reduced once.png

The exception to this is Make Square, which needs no anchor and is enabled for any selection with a non-square item. We’ll use Make Square when laying out check boxes.


Now we want to add in the check boxes. We’ll use the same Copy and Paste recipe for this as we did with text fields.

Sketch the first one, adjust its size, and make it square with Align & Size > Make Square. Our check boxes all come in pairs, so copy and paste the check box. Move it into a position complementing the first one and align their bottom edges.

Then copy and paste these check boxes and move the new pair into place – adjusting their horizontal spread as needed – repeating until they’re all done. Then save the document again.


Make sure you’ve finished all of the initial layout — including fields that aren’t demarcated in the background by lines or boxes — copying and pasting fields and moving them to their needed positions.

Once all the fields are in place, you’ll want to fine-tune their positions and sizes using the arrow keys and the Align & Size commands.

For example, it’s a good idea to have the fields float a consistent distance above their underlines, so that the form has a smooth appearance when it’s been filled in. To do this, use the arrow keys like little tugboats to nudge the fields into their moorings.

Another example of fine tuning is to align the right edges of all the fields that extend to the form’s right margin to give it a crisp look. To do this, just select them, Option-click on the reference item, and choose Align & Size > Right Edges.

Now save the work you’ve done with the File > Save command (Command-S).


When the form is being filled in, you want the user to be able to tab in a logical order from field to field. In FormKit, it’s a breeze to set the fields’ tab order. Here’s how.

Click in the form to make its window active and to clear any selection.

Next, Command-click on the fields in the order in which you want to tab through them. When you’re done, click once on the form to clear the selection. That seals the tab order.


If making a mistake while setting the tab order meant that you had to start the process all over again, it would be a tense and unpleasant task, especially for long forms. But it can be performed incrementally, if need be.

When your setting of the tab order “goes off the road,” here’s what to do: Find the last item that’s in the correct tab order, select it, and Command-click on succeeding fields in the tab order you want. When you’ve finished, click once to clear the selection, resealing the tab order. It’s that simple.


Now that you can tab between fields in the order you want, it’s time to tab through them to inspect their settings and change them if necessary.

As noted earlier, an Annotations Inspector panel appears whenever you have a single field selected. As you tab through the fields, the inspector changes to show the settings for the currently selected item. Notice that the fields already have default names; these were set as you set the tab order.

When a text field is selected, Field Name, Max Length, and Alignment settings are shown.

textfield-inspector reduced once.png

When a check box is selected, Field Name and “On” Value settings are shown. “On” is the default “On” Value setting (a setting that’s only useful with radio buttons).

checkbox-inspector reduced once.png

If you edit a field’s settings as you tab through the form, you’ll need to return focus to the form by clicking back on the selected field in order to continue tabbing.

PRO-TIP: When you copy and paste fields, their settings are preserved. If you want a group of items to have the same settings, be sure that the first one has those settings so that the copied ones will too.


When all the field settings are correct, make sure that the Fonts preferences are set to your liking; then save the document one last time, and you’re done. You now have the Chinchillas-R-Us form for your website.

Chinchillas-R-US (completed) reduced once.png

This fly-through is dedicated to John Spragens, the late cousin of my good friend Alan Spragens, who introduced me to this genre of writing. John was a creatively gifted and generous person. I’m lucky to have known him.

July 27, 2018

PDFKit, The Lost Samples

Filed under: Uncategorized — frameworker @ 5:06 am

PDFKit, you are a strong friend,
doing heavy lifting,
to make my task more manageable.


Through the vestiges of time, the “Lost PDFKit Samples” have been rediscovered and brought up to date!

When I began using PDFKit, in the days of Tiger and Leopard, there were a number of sample apps that helped me learn the framework: PDFKitViewer, PDFKitLinker2, Link Snoop, PDFCalendar, PDFViewSubclasser, and PDFAnnotationEditor. These samples showed interesting aspects of PDFKit. Beyond PDFKit, they highlighted significant aspects of Cocoa. And they’re incredibly well written—readable and well factored.

But through the accumulation of deprecated APIs over nine builds and, especially, the pervasive breakage in the macOS 10.13 High Sierra rewrite, they broke and faded into obscurity.

However, with the port to iOS—“PDFKit reloaded” (sorry, Keneau)—PDFKit has come into its own. And macOS will be a direct beneficiary of that, giving me confidence that the framework, which had been allowed to stagnate will now receive true support. So, I felt it would’ve been a shame to let these samples fade away! As an exercise, I’ve made them usable once more by updating their code from Tiger and Leopard to build and run in High Sierra.

They all work now—but some polish is still needed. See the Read Me files in their project folders for details.

PDFKitViewer 2.0 illustrates the display of a PDF document.

PDFKitViewer demonstrates some simple and some less than simple uses of PDFKit:

• Basic use of PDFView with single page or side-by-side page display.
• Display of PDFOutline as data for NSOutlineView displayed using NSTableView in an NSDrawer.
• Using PDFDocument to search. The UI has a search tool in the drawer above a Page/Section results list.

Unfortunately, though, PDFKitViewer does not support annotations.

PDFKitLinker2 2.0 presents many features of (Tiger) PDFKit.

This application enables the user to annotate a PDF document by embedding links or editing existing ones. The destination of a link may be another page in the document or an external URL.

PDFAnnotationEditor is Apple’s flagship PDFKit sample program, but it was written for Leopard; PDFAnnotationLink was the one interactive annotation that was properly implemented for Tiger. That’s why this sample was written: at the time, links were the only game in town 🙂

PDFKitLinker also shows how to create and display a PDFOutline of the opened document. If the opened PDFDocument has an outline, PDFKitLinker will use the PDFOutline as data in an NSOutlineView and display it using NSTableView in an NSDrawer. This is a very useful recipe—and worth the price of admission, all by itself 🙂

Familiarity with NSTableView would be very helpful in understanding the code for this sample, but lacking that, it would still be possible to use PDFKitLinker as a “Rosetta Stone” to figure out table views.

PDF Calendar 2.0 uses PDFKit to show you how to generate your own PDF content.

The sample uses a PDFPage subclass to generate a PDFPage from an NSImage. The PDFPage subclass’s draw and bounds methods are overridden. The bounds method returns the image bounds and the draw method displays the image. This PDFPage object can be added to a PDFDocument.

The app generates a calendar from images the user supplies. The resulting PDF document can be saved, printed, etc. The sample also incorporates PDFView and PDFThumbnailView.

Link Snoop 2.0

Link Snoop is a sample application using PDFKit in Mac OS X 10.5 Tiger. When a user opens a PDF with Link Snoop, it scans it for Link annotations that have a URL associated with them and displays the PDF with these annotations highlighted.

The contents of the URL link annotations are also displayed, in a panel consisting of an NSTableView in an NSSplitView.

While lacking NSDrawer’s elegance, this architecture shows the desired workaround for replacing drawers with split views in the other Lost PDFKit Samples that still use NSDrawer.

PDFViewSubclasser 2.0

PDFViewSubclasser shows you how to subclass PDFView to overlay content—by adding subviews—relative to the PDF content.

It uses the new 10.5 PDFView method drawPagePost.

The default implementation of this method draws the text highlighting (if any) for the page. This method does not apply scaling or rotating to the current context to map to page space; instead, the context is in view-space coordinates (in which the origin is at the lower-left corner of the current PDF view).

The Go To Marker button at the bottom of the window uses PDFDestination to scroll the marker into the center of the window. But, unless you resize the window to be smaller so that scrollbars appear, there will not be any place to scroll the marker and the command will appear to do nothing.

PDFAnnotation Editor 2.1

This sample demonstrates how to use PDFKit to inspect, edit, and create annotations in a PDF document.

This is the flagship sample of PDFKit. It was updated by Apple to use the new APIs in macOS 10.13 High Sierra.

NOTE: This is not one of the Lost PDFKit Samples, but I’m including it in this collection because it’s been significantly improved. Version 2.1 can now tab between annotations in edit mode so you can see the responder chain.

The projects are available here:


December 4, 2017

Zooming Breaks Focus-ring Architecture

Filed under: Uncategorized — frameworker @ 2:36 am

Programatically zooming my app’s window breaks Cocoa’s focus-ring architecture. This happens in 10.13.1 but I believe it was a problem in previous systems.

The main view of my app can be zoomed by the user. It contains NSTextView subviews. The NSTextView focus-rings are well-drawn when the main view is not zoomed. But when it is zoomed, focus rings are not drawn properly. They are drawn the same size, no matter what the zoom factor. And they drift downward as zoom is increased. (see attached screen shots). Note that the text fields themselves are zoomed properly and all mouse clicks are interpreted correctly in zoomed views. Just the focus-rings are wrong.

Is there any way to get zoomed focus rings to work using the Cocoa focus ring APIs? I would like to adopt them, but I can’t since they don’t work for me. N.B. This is bug report #35817176.

This describes my zoom-architecture:

// Here, "self" is an NSImageView subclass containing NSTextViews as subviews.
- (void) doZoom:(float)theScaleFactor
    float width = [[self bounds].width;
    float height = [[self bounds].height;
    NSRect zoomedRect = NSMakeRect(0, 0, width*theScaleFactor, height*theScaleFactor*pageCount);

    NSRect boundsRect = NSMakeRect(0, 0, width, height*pageCount);

    zoomedRect = [self roundedRect:zoomedRect]; // Make view corners have integral values.

    [self setFrame:zoomedRect];

    [self setBounds:boundsRect];

    [self setNeedsDisplay:YES];

    [self scrollToTop]; // Will lose relative scroll position before zoom.
// Here, "self" is an NSTextField nested in the NSImageView subclass.
// This is the part of the initialization method that's pertinent to focus-ring issue

- (void) initTextField;
    [self setFocusRingType:NSFocusRingTypeExterior];
    [self setDrawsBackground:NO];

    [[self cell] setRefusesFirstResponder:NO]; // accept first responder.
    [[self cell] setShowsFirstResponder: YES]; // show   first responder.

// If I use Apple's focus-ring APIs, the focus-ring is drawn as if the text-view has not been zoomed.
// And it is shifted downward from it's correct location.

// These two methods comprise the Cocoa focus-ring adoption protocol.

- (void)drawFocusRingMask
    // Set the focus ring mask to the zoomed bounds.
    NSRectFill([self focusRingMaskBounds]);

- (NSRect)focusRingMaskBounds
    return [self bounds];
// But if, instead of using the Cocoa APIs,
// I call this method from the drawRect method of the NSTextView (subclass),
// the focusRing scales to fit the zoomed text field.

- (void) drawFocusRing
    if ([self focus])
        [NSGraphicsContext saveGraphicsState];
        [[NSColor keyboardFocusIndicatorColor] set];
        [[NSBezierPath bezierPathWithRect:[self bounds]] fill];
        [NSGraphicsContext restoreGraphicsState];




April 29, 2013

Extracting & Analyzing PDF Form Data

Filed under: Uncategorized — frameworker @ 9:28 pm

PDF Form Export is an OS X app that extracts field names and values from PDF forms snd puts them into a text file.

To use the app, just drop a PDF Form on it – or “Open…” the form from it. A PDZ file is created in the directory containing the form. The PDZ file contains one line per field – the field names and values are tab-separated – like this:

  • NAME  \T  Some Guy
  • ADDRESS  \T  1234 Somewhere St.
  • etc.

You can then paste the exported data into a spreadsheet or manipulate it with a script. My Benford’s Analysis program, which I wrote as a component of a tax-return-audit-risk-identification suite, is an example of doing this.

When businesses provide forms to customers to be filled out and returned, this utility simplifies data extraction avoiding tedious and error-prone recopying.

The program is written in Objective-C with Cocoa and PDF Kit. PDF Kit makes it straightforward to iterate through a PDF’s form fields extracting their names and values.

October 14, 2011

Thank you, Steve

Filed under: Uncategorized — frameworker @ 6:30 pm

Steve changed my life completely in a good way. First, in the 1980’s, Macintosh became a vehicle that infused my – languishing – career with new purpose. And then again, when I lost my bearing in the last decade, OS X was a nurturing place to come home to. I fervently hope that Steve felt the deep gratitude of the developer community for having begotten this fertile ground for our achievement. Namasté, Steve.

April 20, 2010

Cocoa to Cappuccino – Spatially Formatting Text Fields

Filed under: Uncategorized — frameworker @ 6:18 am


I’m using pdf images as the background of electronic forms. The purpose of doing this is to make the electronic form feel just like the familiar paper one. It’s user friendly. Also the electronic forms can perform calculations automatically and accurately.


One of the cases that inevitably has to be handled is when a string has to be entered, with its characters equally spaced in a sequence of contiguous boxes, like this:


Cocoa has this nifty NSString method, drawAtPoint, that lets you do this when the field is not active. (But when it is active you can enter the string – unformatted – in the text field, which exactly covers the boxes in the background, so it all feels quite natural.)

- (void)drawRect:(NSRect)rect
    if ((![self focus]) && ([self format] == kWideFormat))
        // drawWideString calculates the bounds for each character
        // and calls a routine to draw it using drawAtPoint.
        [self drawWideString: [self stringValue]];
        [super drawRect:rect];


Unfortunately, there is no -[CPString drawAtPoint] method in Cappuccino, so this approach can’t be used.

I must confess that this “deficiency” left me with some confusion about how to procceed to implement the WIDE string behavior in Cappuccino.

I wondered if there was a way to do it using Canvas or CSS, or if Cappuccino text support for this kind of thing might be “just around the corner.”

And the sledgehammer approach of creating a sequence of single character text fields to display the inactive text, to be swapped-out with a regular text field while editing, seemed inelegant.

But a recent conversation with @saikatc at #shdh37 convinced me that the “buffered” approach was, in fact, a reasonable way to do this. And as it is with so many things in life, once the approach was determined, it started happening.


The one CPTextField subclass I use throughout the forms project overrides becomeFirstResponder / resignFirstResponder to bracket the active field with controlTextDidBeginEditing / controlTextDidEndEditing calls, something that neither Cocoa nor Cappuccino does, but which is critical to knowing when to swap the static text array in and out with the regular text field. Having this mechanism obviates the need for a special CPTextField subclass for these WIDE fields!

The Begin/End Editing messages are passed to the Text Field’s delegate, a widget subclass, which sends setFocusedDisplayFormat/setUnfocusedDisplayFormat messages to the object being activated/inactivated.

It’s important to note that the approach used here takes advantage of fact that the text field covers the char-array. If this were not the case, it would be necessary to buffer the char-array’s stringValues, so they wouldn’t be displayed, while the text field was active. That would be confusing.

So to avail ourselves of this pattern, we create a WIDE Widget subclass which switches the display of the static array on and off.

Here’s the code for that class.

PDQWideWidget descends from the concrete PDQTextWidget class and adds an array for the equally spaced characters.

// PDQWideWidget.j

@import "PDQTextWidget.j"

@implementation PDQWideWidget : PDQTextWidget

	CPMutableArray chars @accessors; // The array of equally spaced characters

Init calls super, then creates the array.

- (id) init
	self = [super init];

	if (self)
		// Do any initialization here!
		chars = [];

    return self;

makeView is overridden to create the array of one-char text fields. The order in which the constituent fields are created is crucial. The real CPTextField is created last, so it will be on top and receive mouse events. And the widget is put into “unfocused” mode.

- (void)makeView:(CPView)itsSuperview
	[self buildChars:itsSuperview];

	[super makeView:itsSuperview];

	[self setUnfocusedDisplayFormat];

buildChars builds the array of one-char text fields using the maxLen parameter to determine how many to create that will cover the widgetRect.

- (void)buildChars:(CPView)itsSuperview
	var frameRect = [self widgetRect];

	var width   = frameRect.size.width;
	var height  = frameRect.size.height;
	var x	    = frameRect.origin.x;
	var y	    = frameRect.origin.y;

	var cellWidth = width/[self maxLen];

	for (var index = 0; index < [self maxLen]; index++)
		var left = x + cellWidth * index;
		var charFrame = CGRectMake(left, y, cellWidth, height); // x,y,w,h

		var newCharField = [[CPTextField alloc] initWithFrame:charFrame];

		[self initCharField: newCharField];
		[newCharField setStringValue: @""];
		[newCharField setDelegate:self];
		[itsSuperview addSubview:newCharField];
		[chars addObject: newCharField];

Each charField must be initialized to (among other things) not accept events nor become a responder.

- (void) initCharField:(CPTextField)aCharField
	[aCharField setBordered:NO];
	[aCharField setBezeled:NO];
	[aCharField setEditable:NO];
	[aCharField setEnabled:NO];
	[aCharField setSelectable:NO];
	[aCharField setAlignment: CPCenterTextAlignment];
	[aCharField setBackgroundColor:[CPColor clearColor]];
	[aCharField setDrawsBackground:YES];
	[aCharField setVerticalAlignment:CPCenterVerticalTextAlignment];
	[aCharField setFont:[PDQAbstractWidget getFont]];

When the text field is focused, compose and set its stringValue from the char-array. The TextField will be displayed over the char-array, masking it.

- (void)setFocusedDisplayFormat
	// Build the stringValue from the char-array

	var itsStringValue = @"";

	for (var index = 0; index < [chars count]; index++)
		var aCharField = [chars objectAtIndex: index];
		var aChar = [aCharField stringValue];
		itsStringValue += aChar;

	[[self attachedControl] setStringValue:itsStringValue];

When the text field loses focus, or is first created, unpack stringValue into the char-array, then clear stringValue. The equally spaced chars will be displayed, but the empty TextField covering it, will not.

- (void)setUnfocusedDisplayFormat
	// Set the chars from stringValue and then clear it.
	var itsStringValue = [[self attachedControl] stringValue];
	var count = [itsStringValue length];

	for (var index = 0; index < count; index++)
		var aCharField = [chars objectAtIndex: index];
		[aCharField setStringValue: [itsStringValue characterAtIndex:index]];

	[[self attachedControl] setStringValue:@""];



March 20, 2010


Filed under: Uncategorized — frameworker @ 7:52 pm

I’d been testing PDQForms and hadn’t seen any performance problems, but then I saw a noticable recalculation delay when certain fields were changed in a particular form.

After a moment of doubt whether my approach was simply wrong, I sucked it up and asked myself “What would Mike Ash do?”

So I jumped into the debugger, and after tracing the flow of execution – aided by liberal logging of intermediate results – I realized that I was seeing a cascading dependency problem.

I was adding a notifier for each cell reference in a formula, so when it had more than one reference to the same cell, I was creating duplicate notifiers. And if that cell was referenced more than once in another cell’s formula, there would be duplicated recalculations. This is what I was seeing*. And the problem could become arbitrarily worse than this, since there could be an indefinite coupling of such formulae. Ouch!

* Formula A, of cell a, has n references to cell b, who’s formula B contains m references to cell c. So when cell c changes, formula B would be recalculated m times and formula A would be recalculated m * n times.

The solution to this cascading dependency problem was to allow only one notifier for any cell reference in a formula, even if the cell was referenced more than once in that formula.


When a PDQ document is opened, its form widgets are “internalized.” One step in this process is, for widgets that have a formula, to add observers to cells referenced by that formula. This stack-trace depicts the widget internalization pattern.

-[PDQDocument windowControllerDidLoadNib:]
Code added here is executed when the windowController has loaded the document’s window.

  -[PDQDocument internalizeWidgets]
  This contains the widgets’ internalization logic.

    -[PDQDocument observeReferencedCells]
    This document method calls observeReferencedCells for each widget.

      -[PDQAbstractWidget observeReferencedCells]
      Adds notifications to observe each cell referenced by this widget.

        -[NSString coalesceObservers]
        Constructs the observer list and removes duplicates, before adding notifications.


Here is the add/coalesceObservers code associated with the widget internalization pattern.

observeReferencedCells creates an array of the referenced cell IDs. Then it finds the object referenced by each ID and adds an observer to it.

- (void) observeReferencedCells
    if ([self hasExpression])
        NSMutableArray * observers = [[self expression] coalesceObservers];

        int index;
        for (index = 0; index < [observers count]; index++) // Work OK for empty array?
            NSString * theToken =  [observers objectAtIndex:index];
            // iterate the document's widgets (a global variable)

            PDQAbstractWidget *referencedCell = [self findWidgetWithID:theToken];
            [self addObserverToReferencedCell:(PDQAbstractWidget *)referencedCell];

coalesceObservers constructs the observer list, avoiding duplicates, by copying one instance of each cellRef token into coalescedObservers before adding notifications

- (NSMutableArray *) coalesceObservers
    NSMutableArray * theTokens = [self createTokensForExpression];
    NSMutableArray * coalescedObservers = [NSMutableArray array];

    int index = 0;
    while (index < [theTokens count])
        NSString * theToken = [theTokens objectAtIndex:index];

        if ([theToken tokenType] == eCellRefToken)
            [coalescedObservers addObject:theToken];
            // Remove all occurrences of theToken from theTokens.
            [theTokens removeObject:theToken];

    return coalescedObservers;

When theReferencedCell changes value addObserverToReferencedCell tells the dependent cell to handleVariableChanged by sending the NSNotificationCenter a changed message.

The NSNotificationCenter then sends the observer a PDQReferencedCellChanged message with an object reference to the cell that changed.

- (void) addObserverToReferencedCell:(PDQAbstractWidget *)theReferencedCell
    NSNotificationCenter* nc = [NSNotificationCenter defaultCenter];

    [nc addObserver:self
           name    :@"PDQReferencedCellChanged"
           object  :theReferencedCell];

handleReferencedCellChanged is the “action procedure” being set by addObserverToReferencedCell.

// Update the cell's value since a cell it depends on has changed.
- (void) handleReferencedCellChanged:(NSNotification *)notification
    [self recalculate];
    // Now say "changed" to tell the cells that depend on me to update also. 
    NSNotificationCenter* nc = [NSNotificationCenter defaultCenter];
    [nc postNotificationName:@"PDQReferencedCellChanged" object:self];

Older Posts »

Create a free website or blog at