: High on the Hog Blog
Purveyor of Idle Observation

Please note: I'm no longer updating this particular blog, but keep it around for archival purposes. Visit me at the current blog at

Investigating Optical Character Recognition (OCR)

By: Luke Gilman | Other Posts by
Go to Comments | Be the First to Comment

Part of my transition to law school mode has been trying to think of all the things that it might be nice to know how to do before crunchtime. I sat in on Johnny Rex Buckles’ Taxation of Non-Profit Organizations last week and one thing I noticed was that Buckles would bring up an obscure or just overlooked aspect of the material and everyone would be sent flipping through the book to find what he was referring to. Every time it happened my left hand subconciously twitched into the CTRL + F position, which any good geek will tell you means “find” as in return me all instances of the phrase I’m about to type in that occur in this here document. It’s an extremely useful feature. I sure wish my brain had it. All the students had laptops open and were taking notes, but the books were lying there outside the computer, pathetically undigital and uselessly unsearchable.

Why oh why didn’t they have a pdf version of their textbook? But then how would they get one? Book publishers are notoriously terrified of digitization, and the makers of big expensive law books are not likely to be an exception. If a digital copy is available it will likely be some heinous DRM-laden executable. So assuming I get no help from the book publisher, how do I have my book and search it to?

Optical Character Recognition software has been around for a while, so my plan is to finally put it through its paces and see if it can handle a real-world problem in a cost-effective manner. More on this in days to come…

Bookmark this Page:
  • e-mail
  • Google Bookmarks
  • Facebook
  • digg
  • Live
  • Furl
  • Ma.gnolia
  • Reddit
  • YahooMyWeb
  • Slashdot

Category: law school, technology


Leave a Reply