How to parse documents with multiple records? – Docparser

Sometimes one PDF document holds multiple data records spread over multiple pages. For example, a document containing multiple invoices with one invoice per page.

While Docparser provides certain filters for 'repeating data', the majority of tools was built to handle simple documents containing one data set to parse. This is why we recommend to split up the document into individual documents, each one containing "one data set" to be parsed.

Automatically Split Files - Docparser offers different options for splitting up a document automatically when importing it into the app. You can access the different options in the settings of document parser under "Settings > Preprocessing". You can choose to split documents based on page numbers (e.g. each page, every second page) or do a splitting by keyword (e.g. split when is present on a page).

Automatically Drop Unnecessary Pages - Next to automatically splitting up your document, Docparser also allows you to automatically drop certain pages from your document. This can be helpful if your document has for example a cover sheet which you would like to remove. You can choose to remove pages based on page ranges (e.g. "1; 5-10; 8-end;") or remove pages by keyword (e.g. remove when can be found on page).

Related articles