Best practices for Source Document Processing (OCR)

Alerts and notices

You can use Source Document Processing (SDP) to identify, name, and extract tax document pages and data from source documents. With the information that is extracted, the UltraTax CS Source Data Entry utility populates fields for corresponding 1040 clients in UltraTax CS.

Prior knowledge or familiarity with UltraTax CS and the source data utility is not required. You can increase the efficiency of your firm's return preparation by delegating scanning, document transfer, data retrieval, and exports to UltraTax CS to administrative or clerical staff.

To enable source document processing in the application, complete one of the following tasks.

Enable your firm's credit card for PRP/ELF/Web Organizer/OCR.

Purchase an unlimited Source Document Processing license. For pricing details, please contact your CS Sales representative.

Tips for improving OCR results

The following tips can assist you in achieving the best results for the preparation of individual 1040 income tax returns via Optical Character Recognition (OCR).

  • We do not recommend that you scan printed PDFs, as this method reduces the quality of the output.
  • Scan documents at 400 DPI, black & white, also known as binary, text, and monochrome, to help you achieve the best quality and minimize file size (DPI is the document's resolution). If you must scan in color or grayscale, set the resolution to no greater than 300 DPI, otherwise you may produce file sizes more than 1 megabyte per page.
  • In the Scan Pages dialog, click the Options button. Make sure that the Reduce image size on disk option is unchecked.
  • Use original documents whenever possible, they provide the most clarity when scanning. Because the OCR process cannot read poor quality text, scanning photo copies, faxes and reprints is not recommended.
  • Clean scanning equipment at least once a year or more, based on the environment in which you work. This helps to eliminate vertical black lines that can appear in scanned output that is typically caused by dirt and dust that has collected on the scanner's glass reader.
  • Avoid marking up the document prior to scanning. Handwriting, highlighting, and other similar mark-ups can interfere with the OCR process.

Notes

  • Many firms also find that using outside scanning software, such as Kofax Virtual ReScan (VRS), to de-skew and clean up images, helps.
  • Options to adjust the settings for the scanner may vary based on the scanner software that you have installed.
  • Variations in scanner types and levels of color and grayscale may cause slight differences in the file size.
  • Based on bandwidth and network traffic, processing times can vary for every 100 pages.

The following table provides a sample comparison of the size of a file when it is scanned in different color modes and DPI settings, and are based on a 10 page 8.5" X11" document test.

Color mode Black & White Color Grayscale
DPI Settings 400 (base on the scanner type) 300 300
File Size 10.97MB 247.5MB 82.82MB

The following example illustrates a single enlarged character from the same file that has been scanned in Black and White, Color, and Grayscale. Note that the clarity of the color and grayscale examples is degraded, which would make it difficult for the OCR process to identify and extract that information into source documents.

Black and White Color Grayscale
black and white character color character grayscale character

Was this article helpful?

Thank you for the feedback!