PluggedIN: Intelligent Document Processing

Intelligent Document Processing: The Present and Future of Automated Information Management Q3 2021

Character Recognition Technology

95% of companies cite the need to manage unstructured data as a problem for their business.

A major driver of the recent wave of digital transformation has been advancements in character recognition technology. While the capability to scan documents into computer-rendered images has been around for a long time, an image on its own provides minimal value. Something must interpret the image in order to extract usable information, and until relatively recently, that work had to be done by humans. Data entry operations were, and in some cases still are, vital as a bridge between documentation and information management systems. Today, advanced optical character recognition (OCR) and intelligent character recognition (ICR) technologies enable scanned images of paper documents, or fully digital documents, to be automatically mined of their contents, generally with little to no human intervention. Whereas OCR technology can convert images with known fonts into machine-coded text, ICR takes that a step further by incorporating machine learning feedback loops for self-improvement over time. This enables ICR systems to more accurately process difficult content, such as novel fonts, images of poor quality, and handwriting.

Global Optical CharacterRecognition Market

Unstructured Data

By now, it’s common knowledge that effective use of data can yield incredible results for businesses of all types. Yet, despite the recent enthusiasm for mining and utilizing data, a significant portion of existing data remains out of reach because it lacks a known structure. Unstructured data is exactly what the name implies – data that lacks a predefined form by which software can identify and process it. A standard document type, such as an application form, can be created with determined fields that enable simple data extraction by basic OCR technology trained on a template. The software “knows” it is pulling a name from the ‘name’ field and an address from the ‘address’ field. However, an email, a web page, or a report has no such set structure. In order for machines to help us extract information from these unstructured media, smarter tools are needed. According to one study, 95% of companies cite the need to manage unstructured data as a problem for their business. 7 Most estimates suggest that 80-90% of all data is unstructured. 8 OCR and ICR offer the first step on the path to harnessing the power of unstructured data, and intelligent document processing (IDP) tools are pushing even farther.

10% Sturctured

80-90% Unstructured

exelatech.com