Docparser offers a number of different modes for running OCR on your documents to cover different uses and document formats. You can access the dropdown to change the default setting for your parser in the Settings page under the Preprocessing tab:
The options available are as follows:
- Automatic - Our legacy option that was the only setting until the additional modes were released on September 24th, does not include rotation correction.
- Default - An updated default option that includes automatic page segmentation along with rotation correction.
- Default with Skew Correction - Same as the default option, but with skew correction added.
- Sparse Text - Finds as much text as possible in no particular order.
- Sparse Text with Skew Correction - Same as sparse text, but with skew correction added.
There are considerations to take into account with regards to processing speed - Automatic and Sparse Text will run the fastest since there are fewer peripheral actions to be made, like rotation correction and skew detection.
In regards to the best option for you, if your document has already had OCR run on it and is machine-readable (you can confirm this by opening a file with your web browser and trying to click-and-drag to select text), OCR should be off.
If your documents require OCR but are simple, you should use Automatic or Sparse Text.
If your documents require OCR and have orientation problems, you should use Default.
If your documents require OCR and aren't returning good data with Automatic or Default options, try changing to Sparse Text and reupload your document(s).