What to do when a PDF document is converted to garbled characters and symbols? – Docparser

Sometimes, imported PDF documents display unreadable or garbled text when viewed in the parsing rule editor or processed with existing parsing rules. When you see random symbols or gibberish text (as shown in the example below), the issue is typically caused by a corrupted or improperly formatted PDF file.

Example of garbled or unreadable text in PDF due to corrupted font mapping

In most cases, the problem occurs because the PDF is missing essential font character mapping information. This can happen if:

The document was generated incorrectly.
The text was intentionally obfuscated to prevent copying or extracting content.
The file was processed with low-accuracy Optical Character Recognition (OCR) before being uploaded to Docparser.

How to Fix Garbled Text Using OCR

To resolve this issue:

Inside the parser navigate to to Parser Settings.
Open the Preprocessing tab.
Set the option Perform OCR to Yes - Always perform OCR.

When this setting is enabled, Docparser converts your document into an image and then runs Optical Character Recognition (OCR) to recreate the text layer. This generates a clean text version of your PDF with proper character encoding. Once enabled, all newly uploaded documents will automatically be processed through OCR, and the text should appear correctly in your parsing rules.

Note:

If you open your original PDF in your web browser and try copying and pasting the text, you’ll likely encounter the same gibberish characters. If the text pastes normally, please contact our support team and share your file so we can investigate further.

How to Fix Garbled Text Using OCR

Note:

Related articles