Uncategorized

You’ve Got Mail: Reading Addresses With OCR [Hackaday]

View Article on Hackaday

Last time I delivered on this column, I told you about the USPS’ attempts to fully automate a post office. Of course, that’s a bit of a misnomer, since it took 1,500 employees to actually operate the place on a daily basis. Although Project Turnkey in Rhode Island and Project Gateway in California were proving grounds for all kinds of mail sorting and processing equipment, the act of actually reading addresses and routing mail to its final destination still required human intervention and hand coding.

Today, the post office processes hundreds of millions of mail pieces each day using various pieces of equipment. One of those important pieces of equipment is the OCR address reader, which manages to make sense of all kinds of chicken scratch.

All Eyes On OCR

Image via Smithsonian Postal Museum

In their ever-increasing efforts to remove the human from the mail sorting operation, the USPS looked with a loving eye toward Optical Character Recognition, or OCR.

The post office was an early adopter of OCR, beginning their R&D in the 1950s. During this time, the Farrington Manufacturing Company began developing their Automatic Address Reader under contract with the USPS.

Within a few rounds of prototypes, this machine could recognize and register addresses almost anywhere on the face of the envelope, whether they were typed, handwritten, or imprinted, tightly-spaced or not, and whether the lines were flush or staggered. After confirming the addresses, the machine would sort the mail into various slots for local, long distance, and international destinations.

Although there were two ways for a machine to recognize characters — optical and magnetic — the optical way eventually won out.  The optical operation employed photo-electric cells in order to sense the mail piece and then read the address. The magnetic method scanned for ink containing iron oxides. They both had their merits; although OCR had issues with lack of contrast and sometimes over-marking of addresses, it was ultimately the more practical choice.

As you will see in the video below, OCR machines could read 42,000 addresses per hour by 1970 in an operation called Line Find. The machine performed three steps for each piece of mail. First, it finds either the last line (city and state) or the second-to-last line (street address) depending on whether the letter is local or outgoing, and then secondly it measures the height of the character. Finally, it reads the line.

How does it do this? A CRT shoots a beam of light through an “expanding optical system” at the face of the envelope. The beam produces a raster, which scans from right to left until it finds the address block. Then it finds the leftmost character and stops. All of this happens in five thousandths of a second.

Then the raster changes to a finer scan and takes a look at the first letter in the line to determine it’s size. Based on this, the raster wastes no energy on blank space, adjusting to the height of the rest of the line. The optical system uses the characteristics of letters such as horizontal lines on the left and various curves and lines to the right to determine the letter. There’s a lot more to it than that, but I won’t spoil this short but informative video for you.

The Curse of Cursive

As you might imagine, the wild variations in people’s handwriting caused problems for OCR machines. But by analyzing the length and location of strokes, some handwriting could be analyzed. Today, OCR can read nearly everything — about 99% of addresses, even those written in tight or looping cursive. These days, if an address can’t be read by OCR, a picture gets sent to the Remote Encoding Center (REC) in Salt Lake City, UT for decoding by human eyes.

Check out this special keyboard they use at the REC.

Indeed, the REC’s operations are so vital that they have three ISPs coming in on three fiber lines at different points for redundancy. There used to be dozens of RECs across the United States, but OCR has gotten so good that they only need the one center these days.

Even so, the REC handles 1.2 million mail pieces per day, requiring 7,150 keystrokes minimum per hour from each operator. That means they process one piece of mail every four seconds on average. So as you can see, the movement of mail requires human handling to this day.

In the video below, Tom Scott takes a trip to the REC and learns how to read and encode mail so that it can move forward and be delivered. It’s an interesting process that requires a special keyboard with the numbers on the home row, and a host of modifiers and things in their place along the top.

First, unless it’s missing entirely, the C portion of the address (the ZIP code) is deciphered and coded, then outward portion of the address (city and state), and then the inward portion (the street address). The REC has every known good address in America sitting on their servers, and once they get a match, the plant that has the mail piece is notified immediately where to send it, and the piece moves forward. All of this for the low, price of 66 cents per ounce. Amazing, isn’t it?

But Wait, There’s More

Stay tuned for more about the USPS’ advancements, including ZIP codes, vending machines, and something called v-mail. We’ll also take a look at ways the USPS has attempted to improve productivity and service as well as the customer experience. And no, I haven’t forgotten about that bit of trivia that I promised.