Awesome Automation for a Paperless Office
I'm a toolsmith. One thing I pride myself in is finding better ways to accomplish tasks. This usually saves not only time but resources (read: money). (That's why I named my company Awesome Automation, and that's the thrust of this post: I created an Awesome Automation tool.)
About four years ago, I made a conscious effort to reduce my business (and personal) clutter. As an independent contractor, I wear many hats: secretary, business development, accountant, and janitor. The problem is that none of these are revenue generating.
What brings money in the door is when I actually produce code for a client. So the more time that I can focus on that, and less time on the other aspects, the better my bottom line becomes. However, I cannot ignore those other aspects of being my own boss.
One aspect that I could change to reduce the amount of time spent on non-revenue generating tasks was to automate how I save, search for, and organize all the paper that enters my office. In 2008, I had to buy a lateral file cabinet as my paper collection grew. Taxes. Invoices. Receipts.
The effective implementation of the Check Clearing for the 21st Century Act (Check 21) prompted me to look at the time-suck that pushing paper around my office was.
Since the IRS was now allowing electronic records, I no longer needed to keep every paper receipt—I could keep an electronic copy and be in compliance with the law. I wanted to make the process of creating electronic records as easy as possible, which would involve creating an automated workflow.
Converting to a Paperless Office
On my journey to tame my paper tiger, I started with some comparative research on scanners. I settled on the Fujitsu ScanSnap iX500. It wasn't cheap; I can tell you that I paid $428.42 from Tiger Direct on June 5, 2014.
I pulled out that info in under a minute by searching for “Fujitsu” in my Scans folder.
All the paper that enters my office goes into one of three bins: recycle, scan/shred, or scan/keep. Every so often, probably weekly, I turn on the scanner and process the “scan/*” bins.
The ScanSnap connects to my MacBook Pro by the wifi network. There's an included app, ScanSnap Manager, that automatically loads when it sees the scanner on the network.
The ScanSnap Manager has the ability to have many profiles. I have one that will scan paper in full color, saved as JPG, in a folder of my choosing. I do this often with the items in the scan/keep bin, as well as with items like colorful artwork and ticket stubs.
The default profile that I use on the scan/shred bin, I've named B&W (Black & White). This is for my paperwork. A two-page utility bill comes in at about 150kb. This profile is set to “Scan to Folder” (/Users/dmayo/Scans) with the file name formatted as “YYYY-MM-DD-HH-MM-SS.pdf”. The image quality is set for “normal” (B&W, 300dpi, duplex, automatic image rotation, deskew, and blank page removal). This profile saves a searchable PDF with a compression rate of 4.
Great. Now I can shred those documents rather than file them away. (By the way, I sold my lateral file cabinet in 2015 after scanning nearly everything. All the paper that I keep fits in two banker boxes.) This process is why I was able to recall the cost of the scanner with ease.
This process of scanning paper documents has never been enough, though. Formats change. Storage mediums become obsolete. (Have you seen my collection of Zip drive disks? Or my JMP Stats Program on floppy disks? Those are sitting in a small storage box alongside a few 500gb hard disk drives)
So, if I'm going to do this right, I'll want my PDFs to be in an archival format. According to Wikipedia:
PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features ill-suited to long-term archiving, such as font linking (as opposed to font embedding) and encryption.
Unfortunately, PDF/A is not an option in my ScanSnap Manager software, therefore I needed to find a way to convert standard PDFs into the PDF/A-2b archival format.
My Electronic Paper Workflow
My scanning workflow continued with the opening of each scanned PDF into Adobe Acrobat Professional X and converting it to the archive-friendly PDF/A-2b format. Yes, this is a very old version of Acrobat, but a number of features changed in later versions which make automation more difficult. So I'm happy with this version, and you can still find copies on eBay.
Still, this workflow wasn't ideal. Too much manual touching of the files. Not very “awesome” nor very “automated.” I needed to change that.
Acrobat has a workflow widget for automating tasks called the Action Wizard. I created one that would convert a scanned PDF to the archival format, check for any errors, rename the file, and save it in a dedicated folder (/Scans/Converted-2b). All I needed to do was open a Fujitsu scanned PDF in Acrobat and click on a menu item.
This cut my time invested per document down a bit. But not enough.
I wanted to automate this task as much as possible; I’d rather not even open an app and click a menu item. To make this awesome, I needed some new tools.
My Thought Process on Automating This Task
To make this workflow move faster, I decided to try the macOS automation tool, Hazel. I know that there are enough tools built into macOS that would do what Hazel does. However, Hazel is more streamlined, and it is easy to keep track of processes that enable my new paperless workflow.
(For Windows users, two popular alternatives to Hazel are File Juggler and AutoIt. These programs have rule-based monitoring and can run scripts written in “BASIC-like” as well as run any command-line command or program.)
Hazel works by watching for changes in folders. Some events that trigger Hazel can include a new file being added, a modified date changing, or even a change in a file’s metadata. You determine what Hazel actually watches for by setting up rules.
A common rule included in the initial install is watching your Downloads folder. If any file becomes older than four weeks, Hazel adds a color tag to that file. This is visible in Finder and can help you keep your Downloads folder clean. Hazel accomplishes this in two clicks. AppleScript and setting a cron-job could accomplish the same thing, but not that quickly (and for a $32 license, I find it very much worth it).
What I want my automated paperless workflow to accomplish:
- Create an optical character recognition (OCR) PDF of a scanned document
- Open newly scanned PDF file in Acrobat
- Initialize the Action Wizard: Convert-2b-file
- Save the converted file in ‘/Scans/Converted-2b’ folder
- Move original PDF to ‘/Scans/original’ folder
Simple enough, right? If it was that simple, everyone would be a toolsmith!
The first task is accomplished with the Fujitsu scanner and its included app.
The second task is where I will first use Hazel. I’ve created a Rule titled “Convert PDF 2b” and associated it with the “/Scans” folder, which is where PDFs are saved when a document is physically scanned.
Since there is an associated rule, Hazel is watching this folder for any changes. I can move a file into this folder, ScanSnap can create a new PDF, and I can alter a document within this folder (which would change that file’s “modified” timestamp). Whatever change happens, Hazel will notice and compare that change to all Rules associated with this folder.
The trigger used in this Rule, as illustrated below, looks for any new PDF file in this folder.
If this Rule is triggered, Hazel will do the following on the selected file that matched this Rule:
- Run the embedded AppleScript (see below)
- Move the matched file to the “/Scans/original” folder
Seems simple enough. The AppleScript will take the matched file and send it to Acrobat to be processed via the Action Wizard. The Acrobat process creates a new PDF and saves it in the “/Scans/Converted-2b” folder. Hazel will then move this matched file to the backup folder (“/Scans/original”).
When I was first learning to code AppleScript, I found it to be very conversational, but like all coding languages, its syntax is strict. In this case, I needed the official app name, “Adobe Acrobat Pro,” as well as what is displayed in the menu bar, just “Acrobat.” We will also need to know what various window names are. So, here’s the embedded AppleScript for this Hazel Rule:
The first “tell / end tell” block tells the macOs system to “activate” or open the app Adobe Acrobat Pro and load the file that Hazel is processing, “theFile”.
The second “tell / end tell” block tells the operating system, macOS, via its System Events, to virtually click on menu items within the Acrobat program. Basically, it is walking through the Graphical User Interface (GUI) that you and I use. AppleScript takes its instructions for navigating the GUI backwards. Here's the full line of code:
You can see the GUI in this screenshot:
Which reads: On the top menu bar, click on the “File” menu item, then in that dropdown, click on the menu item “Action Wizard” which will open a child menu, and to click the menu item “Convert-2b-file”—the name of my Action Wizard.
When the AppleScript “clicks” on the menu item “Convert-2b-file,” the Acrobat Action Wizard will load and start processing the file (theFile) that Hazel gave to Acrobat into the archival format, and save this new archival PDF into the “/Scans/Converted-2b” folder.
Hazel has saved me from touching each newly scanned PDF file. I no longer need to open each PDF in Acrobat, navigate the menu structure, and activate the Action Wizard. Excellent.
Initial Testing and Problem Solving
I let Hazel loose. But first, because of my many years of writing utilities, I duplicated my “Scans” directory. Just in case.
Whew. Glad I did.
I had a number of scanned PDFs sitting in my “Scans” folder awaiting conversion to the archive format, and Hazel was happy to start processing them. All. At. Once.
Suddenly I had multiple Acrobat windows open, each running the Action Wizard on a different file. That seemed OK.
However, I ran into a problem.
Each implementation of the Action Wizard ends with a dialog box that needs a “Close” button to be clicked. Great. We’ve already seen that AppleScript can click on things. Without clicking on this Close button, no other actions can take place within Acrobat (even closing the application).
But AppleScript is not so good at waiting around for that dialog box to appear after the Action Wizard completes its process.
To complicate any solution, each file being processed by the Action Wizard has a different number of pages, a differing amount of Optical Character Recognition (OCR'd) characters on each page. Therefore, each file would take a different amount of time to process, and thus the Close dialog box would not appear after a set amount of time.
I could write some AppleScript code that would keep polling, or looking for that dialog box to appear, then tell the “System Events” to click it.
Could I figure out how to get AppleScript to accomplish waiting around for that closing dialog box? I'm sure. But I'm not sure it would be worth the effort and time.
Sure, I'd be learning more and expanding my toolbox, but there's a better way (for me).
Coming to a Solution
I need to know when Acrobat’s Action Wizard Close dialog box appears. What triggers that “Close” dialog box to appear? The completion of the Action Wizard, which in its final step, recall, saves a copy of the original PDF, in archival format, in the “/Scans/Converted-2b” folder. Watching folders—exactly what Hazel does so well.
To solve the problem of closing the Action Wizard dialog box, let’s set up a new Hazel Rule within the “/Scans/Converted-2b” folder to watch for new PDFs. When one appears, we know that the Action Wizard has completed and its Close dialog box is waiting to be closed.
So this rule looks in the converted folder for files that meet these conditions:
- Kind is PDF
- No color label
- Date is today
If these conditions are met, then Hazel will run an embedded AppleScript and set the color label to green (to signal that this file is done). Here's the AppleScript:
The “System Events” process, aka the operating system, tells the process “Acrobat” (which is the main process of Adobe Acrobat Pro) to find an Acrobat window titled “Convert-2b-file” (the name of my Action Wizard) and click the “Close” button.
Then it tells the app, Adobe Acrobat Pro, to quit/close.
After the AppleScript runs, since this folder is the final storage place of archived PDFs, we need to differentiate between newly created/saved PDFs and ones that are older. (Hazel cannot remember what files she has already acted upon.) So, in addition to running an AppleScript to close the Action Wizard dialog box, we will add a color tag to the file’s metadata, resulting in Hazel only looking for new PDFs without a color tag to trigger this Rule.
This is a nice way to set triggers for Hazel without actually changing anything about the file itself (like internal metadata, or a filename, or anything else that might conflict with the PDF/A-2 archive standard).
Second Testing and Problem Solving
So I closed out those multiple Acrobat windows, restored my “Scans” folder, and moved all but one PDF out. I then let Hazel loose again.
She saw the PDF which matched my conditions, ran the AppleScript, and died. Ugh. I checked the Hazel logs, and turns out that she is too quick. She was moving the original file before the AppleScript was done processing that file. Acrobat was complaining that there was no original file to work on.
I need to break that first Hazel process up into two steps: First just convert the PDF, then move the original.
So I altered the first Rule to convert the PDF via AppleScript calling Acrobat’s Action Wizard, then tag the file with a color. I like purple. This way, we can tell which PDFs in the “/Scan” folder had been converted. We’ll move them to the “/Scans/original” folder later.
I could just delete the original file, as we now have an archival PDF, but I’ll keep the original as a backup just in case. I’ll probably delete all the originals sometime in the future when I see and use, with success, the converted files.
The third Hazel rule is in the “Scans” folder and looks for PDF files with a purple tag. When it finds such a file, it moves it to its final destination, “Scans/original,” and changes the color tag from purple to gray.
Enabling the Automated Process
I was able to successfully process all the backlog of the Fujitsu-scanned PDFs by dragging them one at a time into the “Scans” folder. I was reluctant to move more than one file at a time due to my first test of this workflow.
I rationalize this because moving forward, I'll scan a document, and Acrobat and Hazel should finish their processes quicker than I can get a second document ready and inserted into the scanner.
I did just that. I fed in my first document. ScanSnap started scanning and OCR-ing the file. When it was done, it presented a dialog box where I can confirm where I want to save it and a look at the preview if needed.
Here's the real world for your utility scripts: I found something out. When the paper document is scanned, ScanSnap immediately saves the file before it begins the OCR process. If I change a parameter in its preview dialog, it would rename it, or move it, or do whatever I changed. That's fine.
However, it created the file in the ‘/Scans’ folder. Hazel noticed. She started processing it. Probably OK. But if I changed something in the preview dialog box, would it screw up what Acrobat is in the process of converting?
So, I went back into my first Hazel rules for Acrobat processes and added a condition. A file will match if it is a PDF, has no color tag, and now, the created date has to be more than two minutes old—that will give the ScanSnap program enough time to complete before Hazel gives the file to Acrobat.
My final Hazel Rule will move all the purple tagged PDFs in the “/Scans” folder if the date was “yesterday.” So at midnight, or when I boot the computer in morning, Hazel moves all the original, already converted PDFs to the “/Scans/original” folder.
I’ve been running this automated process for a couple of weeks now. It’s running as expected as I scan in new documents. I’m not touching any of the electronic files. I scan paper, then immediately shred it. This process has released me from a chore and kept my office (and mind) cleaner, allowing me to create more revenue.
This productivity tool that I created joins many other tools, such as rules that inbound emails go through, using To Do lists, learning keyboard shortcuts, or apps like IFTTT, and Trello, or getting good sleep.
I’ve probably spent 12 hours creating this process and debugging it, as well as writing this post. I will recoup this initial cost. It may save me 20 minutes per week, which doesn’t sound like a huge productivity boost, but will compound over time.
If this extra 20 minutes allows me to get 1% more done compared to everyone else, then maybe I can land that competitive gig, or spend some more time with my kids. Not a bad investment of time.
Someone told me long ago: get the important stuff done, and don’t waste time on stupid stuff. Keeping records for the tax man may not be stupid, but it’s definitely not the important stuff.