How many document parsers should you create? – Docparser

After your first tests with Docparser it's probably a good time to step back for a moment and think about how to set up and manage your document parsers.

At Docparser, we are convinced that there is no one-size-fits-all solution when it comes to document data extraction. This is why we designed Docparser to be as flexible as possible while still being easy to use.

In this article we will cover different ways of setting up Docparser. In order to choose the best approach for your use-case, you should first take some time to answer the following questions:

How many different document layouts do you want to process?
Do you know the layout of a document before importing it to Docparser?
What does the data look like which you want to extract from each layout?
By who and how are documents imported to Docparser?

In case you are not sure what we mean with document layout, please refer to this article.

Below are a couple of scenarios which we see quite often. If you are not sure which route to take, please don't hesitate to reach out to our support team. We'll be happy to chat about your parsing needs and find a fitting solution.

In any of the scenarios below, you have different ways of importing your documents to Docparser. You can read more about the different options of importing documents here.

You have only a few document layouts

Having just a couple of document layouts means that the setup of Docparser will be quite straight forward. All you need to do is to create one document parser for each layout and import the documents accordingly. Each document parser will contain one set of parsing rules tailored to one specific document layout.

You have a document layout which the data positions vary for each document

In this case you probably want to create a document parser for each layout which is capable of parsing all variations. You have basically the following approaches:

Create parsing rules which can catch data from variable positions
Create additional parsing rules which cover every data field for each layout variation
Use our advanced "Layout Models" feature. This is currently a Beta feature and you can get early access.

You have various layouts and you can identify the layout prior to importing the documents

If you are able to identify the document layout on your end, we recommend to build one document parser for each layout. There is no limitation on the numbers of document parser you can build in Docparser and many of our customers have hundreds of document parsers.

Each document parser will have one set of parsing rules which matches exactly one document layout. On your end you will identify the document type and import the documents to the matching document parser ( via our API, by email or with our app).

You have various layouts but you can't identify the layout when importing the documents

If you receive documents from third parties, chances are high that you can't identify the document layout prior to importing the documents to Docparser. This means that you won't be able to create different document parsers and send the documents directly to a specific document parser.

Luckily we are having an advanced feature (Layout Models) which lets you process multiple layouts with one single parser.

The idea is the following: You create one set of parsing rules which helps you to identify the document layout. You then create additional sets of parsing rules inside the same document parser, one for each layout. In a final step, you define document routing rules like (e.g. If this, then use this model) which will send the documents to the right layout model.

You have an unknown quantity of document layouts, probably thousands

To be honest, this sounds like we should talk and check whether or not Docparser is a good fit for your use-case. Docparser was designed to extract data from recurring documents following a known set of layouts. We do offer some extraction tools which don't rely on the visual representation of the data though. So let's talk and check how we can help you.

You have only a few document layouts

You have a document layout which the data positions vary for each document

You have various layouts and you can identify the layout prior to importing the documents

You have various layouts but you can't identify the layout when importing the documents

Related articles