| skip navigation | |||||||
|
PDF Document Management Software, Services & Support |
||||||
|
|||||||
|
|
REVIEW: Acrobat Capture 3.0x: Adobe Fires Back, Nearly MissesThursday, August 24, 2000 by Duff Johnson From the start in 1994, Acrobat Capture 1.01 was a visionary product. In 1994, the earliest days of the commercial Internet, the decision to develop Acrobat Capture offered the promise of unifying paper and electronic source materials in a single, compact, standardized, multi-platform environment. This lofty aim helped propel and fulfill the PDF concept, making possible the modern reality: PDF as the globally accepted "full-fidelity" electronic document format. The only document-imaging product in the Adobe lineup, Capture is focused at the "production" volume imaging market. Unique among OCR software developers, Adobe recognized that fully automated processes and text-accuracy-only correction sub-systems would leave conversions to PDF/Normal files -- i.e., files made from original electronic sources -- hopelessly out of reach. The original model for Capture was a faux client-server arrangement - a central processing core for automated functions and a workstation application for value-added (text accuracy, page layout and image enhancement) work. Capture 3.0x expands on this premise with beefed up automated processing and a true client-server arrangement for the value-added labor.Gone is the simple "flowchart" dialog. Capture 3.0x sports a busy new interface (screengrab 1) with a Windows Explorer-esque workflow builder mated to a raft of processing information. There is a new Zone tool, simple and reasonably functional, and a superb new QuickFix text correction tool. Capture Reviewer got a major upgrade as well, although the changes are subtle compared to the overall application. The EngineOne impressive new feature in Capture 3.0x is true load-balanced distributed processing for the OCR and other automated engines. Add a processor to the Clusters, and watch throughput go up as other processors come on-line. It's stable too, with far fewer crashes than the 2.01 software. Exception handling has been beefed up, but is still weak compared to full-fledged volume imaging systems such as Input Accel and Ascent Capture. Capture includes a scanner-control front-end, but while aimed at the volume scanning market, the software is still best used to process existing scanned, enhanced, rotated and quality-controlled images. A moving target throughout the development process, OCR in the Capture 3.0 engine represented a substantial improvement over the aged engine in Capture 2.01 -- although we have yet to see this promise fulfilled in the beta 3.0x code. Tuning issues with the 3.01 engine aside, the new Capture should be substantially more accurate overall, particularly on clean documents and small, clear type. The OCR issue we encounter more than any other, however, is a frequent refusal to capture individual text blocks, particularly those located near images. This problem resulted in a very serious flaw in the 3.0 release - large text areas going without OCR at all. Adding a template zone stage to the workflow helps, but does not fully correct the problem. We chose not to use the 3.0 code for Searchable Image production. An additional concern is the font recognition in the new engine, which we evaluate (so far) as worse than Capture 2.01 in terms of font size and spacing. New Tools: The Zone ToolLong-awaited, the new Zone tool (screengrab 2) does what every Zone tool does, and not much more, which is a pity with the PDF-directed Capture. After all, we're not just talking OCR here! We would like to see the ability to control the output of each image zone, as is offered with PageGenie, not to mention a powerful table-recognition tool, as in Scansoft's powerful TextBridge software. Additionally, the engine tends to crash on pages with very many zones - as always, your mileage may vary! It frequently seems necessary to include a full-page "text zone" step in each workflow to force the OCR on bitmap regions that appear as images to the software. Missing this step can result in a hard-to-detect failure to OCR in Searchable Image pages - another problem Adobe says they will attempt to correct in the release version of Capture 3.01. We still get mixed results on selected pages, and I suspect that this type of failure to OCR will remain a significant annoyance, as it is with most OCR software. New Tools: QuickFixThis new tool has a unique interface which brings sorting rudiments to the process of OCR error correction. The concept allows for some pretty sophisticated text-correction routines. QuickFix (screengrab 3) allows you to sort suspects, permitting a correction focus on terms to be added to a dictionary, or alphanumeric words of low confidence. It's primarily useful for Searchable Image text correction. Improved Tools: Capture Reviewer 3.0Already a reasonably mature tool in 2.01, (Reviewer 3.01 screengrab 4) is the last word (so far) in layout and appearance management for PDF/Formatted Text and Graphics conversions. A central irritation is that the screen paints are slower, retarding productivity. Valuable new features include:
The Overall ExperienceI saved this for last, because the reader should not be distracted from the many fantastic improvements to this ground-breaking 3rd generation software. HOWEVER! Out of the box, we found Capture 3.0 virtually unusable. From botched OCR (a problem mostly cured with a forced Zone operation) to basic image quality in skew-adjusted images, we could not put Capture 3.0 in production at all. Our extensive testing of the forthcoming 3.01 code promises much better things to come. A major advantage to Capture 3.0x is in file management. The workflow design will automatically move your files from one step to the next. This saves a lot of the administrative overhead for high-volume production compared with Capture 2.01. One sour note: As of the last beta version, Capture 3.0x is unusable for color output. Images are highly compressed and, in PDF/Formatted Text & Graphics output, there is gray shading surrounding text and images. We certainly hope Adobe addresses this issue, because we won't be Capturing in color unless they do! ConclusionAre there alternatives to Capture for converting paper to PDF? In a word, yes. PageGenie is capable of fine Searchable Image PDF files, and can optionally use PDFWriter to access the Adobe Libraries, if required for the project. Most other alternatives offer far better OCR than Capture 2.01, and most are faster and more stable as well. The key advantages of Adobe's new Capture product lie in its scalability, stability and the degree with which it will reward the sophisticated user. Just as important is the fact that Adobe's Capture remains the only post-OCR text/layout/images editor (Capture Reviewer) that makes true layout reproduction possible. CompetitorsSince the mid-1997 release of Acrobat Capture 2.0 (and the must-have 2.01 maintenance update), options for paper-to-PDF conversion have multiplied like mushrooms. Some other currently available applications include: FineReader, by ABBYY Software House PageGenie, by Paravision Imaging, Inc. Prime OCR, by Prime Recognition, Inc. TextBridge, by Scansoft, Inc. TypeReader, by ExperVision, Inc. This list is not intended to be comprehensive.Several full-scale imaging software solutions that include PDF creation are:
Resources from Document Solutions, Inc
Originally posted on planetpdf.com |
||||||