Docparser works best when each data set is represented by one individual document, for example one document containing one invoice.
Our automated splitting feature allows you dissect one large document into several individual documents. Docparser comes with a variety of splitting options which can be found in the 'Settings > Preprocessing' section of your document parser. Once activated, new documents are automatically split up into individual files according to the rules you provide.
The following document splitting methods are available:
Every [x] pages
Imported documents are split into N individual documents based on page numbers. This option is the most simple splitting option and is a great fit when all individual documents have the same number of pages.
Look for keywords on first page
Imported documents are split into individual documents whenever certain keywords are found on a given page. When a keyword is present on a page, it marks the beginning of a new document. For example, the term 'Invoice ID' marks the beginning of a new invoice.
Look for keywords on last page
Imported documents are split into individual documents whenever certain keywords are found on a given page. When a keyword is present on a page, it marks the end of a new document. For example, the term 'Invoice Total' usually marks the last page of an invoice.
When RegEx pattern changes
Imported documents are split up whenever a specific patterns changes in your document. The pattern matching used for this option is based Perl Compatible Regular Expressions (PCRE) and a common use-case would be to look for changes in ID numbers (Purchase Order ID, Invoice ID, ...).