Friday, August 12, 2011

Document mark-up and correction with voice recognition

I spend a lot of my time commenting on document drafts. Traditionally, I do this with a red pen, and I hand the marked-up copy back to the author.

The traditional approach works well, for several reasons:
  • Marking up with a pen gives great flexibility to draw pictures and to relate chunks of text with freehand arrows.
  • It is easy and natural to flip among pages and to amend previous comments.
  • There is no need to be connected to a computer, so it does not contribute to my hand and eye strain.
The traditional approach also has some disadvantages:
  • When my collaborator doesn't work in my building, I have to send the comments by postal mail, or else scan them in color and email the scan, but the scanned version is invariably much harder to read than the hardcopy.
  • Giving comments to multiple people on a collaborative project requires photocopies/scans, or else sharing a single hardcopy.
  • My handwriting is sometimes hard to read.
I still frequently use the traditional approach, but I also sometimes give back comments electronically, using voice recognition.

I load a PDF onto my computer, point to some text of interest, and speak my comments about that text. My comments are transcribed into text annotations in the PDF, which I can email back to the author.

Because I use a tablet computer and a stylus, I can do all this while reclining on my couch, which saves me the eye and hand strain of sitting at the computer and typing. For long comments, this approach is considerably faster than typing, even accounting for correcting occasional speech recognition mistakes. For shorter comments, it's about the same speed, but the greater comfort, and the convenience of the electronic form, makes it well worth doing. It has improved an activity that I spend many hours on each week.

If you haven't used voice recognition recently, you owe it to yourself to give it another try. I was really impressed with the accuracy, especially compared to even a few years ago.

There are three key components to my setup:

1. A tablet computer with a stylus

I use a ThinkPad X61s, though this is an older model which has since been replaced by newer ones.

You want a real computer with a decent CPU, not a “slate computer” or “tablet” such as the iPad and its rivals. The reason is that voice recognition software is extremely CPU-intensive, and your computer will be going all-out to provide you accurate speech recognition.

My setup would work with any laptop/notebook computer, not just a tablet computer, but I love being able to get away from my desk and change my posture.

2. Dragon NaturallySpeaking

Dragon's products is so dominant — and so good! — that there isn't much competition in this product space. I tried to find a usable speech recognition program that would work under Linux, but there doesn't seem to be one.

Microsoft Windows 7 comes with built-in speech recognition that is pretty good. However, it is not quite as good as Dragon NaturallySpeaking. More importantly for me, the Windows 7 built-in speech recognition is not able to type into all text boxes in all applications, including pop-ups in Acrobat Professional.

Thanks to this new competition, NaturallySpeaking retails for only $200 ($100 for the “home” version, which I have not tried), though you can find it even cheaper and there is also a 50% academic discount. It's well worth the price.

Interestingly, NaturallySpeaking works better the faster you talk.  That is, it works best when you speak in full sentences, without pauses in the middle of a sentence.  The reason is that it uses context to determine what word you meant to say.  When marking up with a pen, I typically write one part of a sentence at a time and think about the best way to convey the rest of my thought.  This still works, but you may end up doing a bit more correction of the voice recognition than you would if you didn't use the pauses.

3. Adobe Acrobat Professional

When you select text, Acrobat lets you apply annotations/comments/markup to it, which highlights the text and associates a pop-up note with it. You can choose among different types of annotations: a comment, replacement text, crossing out, etc. Here are examples of how it looks:

After you have added your annotations, you can save the file and send it to your colleague. Your colleague can then click on each highlighted snippet of text to see your comments. This is a bit of a pain, because they have to be in front of a computer, have to click on each one individually, and can easily forget which ones they have already read. Some people I work with like this format, but most do not. Therefore, I always send my comments in two formats: both the original PDF, and also what Acrobat calls the “comments summary”. This is a format, suitable for printing, in which the original document is displayed along with a list of comments.

Acrobat 8 and 9 have a menu item “Comments > Summarize comments”, which offers 4 ways to summarize comments, including the one in the linked images (which is my favorite). Acrobat X has lesser functionality, and that functionality is harder to find: there is only one way to create the comments summary, and it appears as the “Summarize comments” button in the “File > Print” dialog box. Adobe has documentation about printing a comment summary.

I tried using free or cheaper products, such as those from Foxit, but their text annotation features are much more limited. They didn't have as many types of annotations, were less versatile, were visually uglier, and, most importantly, didn't seem to have functionality analogous to Acrobat's “summarize comments”.

3 comments:

Nobody said...

I've been tempted by voice recognition. My options right now are to run NaturallySpeaking inside a Windows VM, or wait until Dragon releases a Mac product with feature/price parity. I don't see either of these happening soon.. :(

Unknown said...

Cool post! Regarding CPU usage for voice recognition, the voice recognition on Android phones is surprisingly accurate, and it works by sending the audio over the network and performing the recognition on a server. I haven't tried dictating paragraphs of text this way (though 2 or 3 sentences works fine), and it won't work in an offline setting, but this kind of functionality isn't completely out of the question for an iPad-like device.

Michael Ernst said...

@rakingleaves: I've been trying voice recognition on an Android phone for three months now -- I wanted to give it a fair trial before responding. My take is that it works well if there isn't much ambient noise, you aren't using any specialized vocabulary, and you are connected to the network. For me, it's not yet a competitor for local voice recognition with training for your own voice and your vocabulary.