OCR Conversion and Processing Services
OCR (Optical Character Recognition) is the process of converting scanned paper documents into fully editable electronic digital files, such as Microsoft Word files, Excel spreadsheets, XML, CSV and PDF searchable formats. The process begins by scanning the hard-copy material (e.g. books, newspapers, documents, magazines, journals, directories, invoices etc.) to create digital images such as TIFF, PDF, JPEG etc., before the OCR is applied in order to convert these images into an editable text format of your choice.
OCR scanning and conversion
Our specialist OCR solution services include scanning of various types and sizes of documents, and converting these to a specific required digital file format. The accuracy of the OCR recognition depends on the quality of the source documents, but for documents that are in fairly good condition with clear, legible information, the OCR recognition accuracy level can be as high as 99.99%. However if your documents are old and faint or contain marks and scratches, the accuracy and the quality of the OCR recognised text will be affected accordingly.
OCR clean up options
For documents and files that have been marked or scratched from everyday usage we can apply a further OCR application to your documents as we want you to benefit as much as possible of OCR conversion;
- OCR clean up
- OCR proof reading
- OCR formatting (layout, tables, images, fonts, pagination etc.)
OCR conversion to microsoft excel
Our OCR to Excel conversion service can be applied to structured (tables), semi-structured (text, tables, images etc.) or non-structured (loose formatted) documents. Read more...
We are also able to further process the data and convert it to file formats, such as CSV, XML, text searchable PDF and SharePoint import.
We apply OCR to
- Books, newspapers, magazines and manuscripts
- Books and documents to Microsoft Word
- Catalogues to Microsoft Excel
- Document conversion to XML, HTML, CSV and SPSS
OCR conversion process
The first step in the process of OCR conversion is to assess the quality of the original documents in order to determine the layout and formatting. Read more...
Once we have assessed the documents, the OCR processing rules are then configured. OCR tests are carried out with samples created for approval. We offer three levels of Optical Character Recognition conversion – all dependent upon what your requirements are:
OCR level 1
Suitable for plain and simple formatted documents, the output can be converted to any required file format, such as Microsoft Word, XML, text file etc.
OCR level 2
This is for somewhat more complex layouts which have data in tables or flow charts, and/or have differing fonts and images. If you need to keep the original layout, formatting, fonts and page numbering then we recommend level 2.
OCR level 3
The most in-depth OCR recognition and conversion level which includes manual proof-reading and correction of any errors that may occur throughout the OCR process. Level 3 ensures that specific areas are double-checked, corrected, cleansed as required, and, importantly, OCR accuracy of up to 99.99%.
Advantages of OCR documents
Pearl Scan Group's OCR recognition services offer many advantages, for example:
Time and cost savings
If you have a book in hard copy format that you need to edit or update, for example, this will normally require a significant amount of time to re-type. Read more...
Additionally, if you have directories or something similar, we can scan these and convert to Microsoft Excel, CSV or XML format in order to provide you with complete addresses or contact lists. We can also import the data into your CRM system, Outlook, or any other format as required.
Fully text searchable documents
OCR provides machine readable and searchable formats better than standard non-readable and searchable image formats, such as PDF, TIFF, JPEG etc.
OCR recognition of different languages
With our OCR conversion service, we can process multi lingual documents with ease. We are able to process all major languages in the world, including English, French, German, Portuguese, Italian, Spanish, Urdu, Arabic and Russian, subject to sample testing.