What exactly is a parsing rule?
A parsing rule is a set of instructions which tells our algorithms what kind of data you are looking for. For example, you can tell our algorithms that you want to extract text from a specific position in your document. Furthermore, you could then add more instructions to the parsing rule, such as formatting a date, cropping and modifying words, etc.
Parsing rules come in different forms and we offer many templates to get you started. Typically, some of your parsing rules will be position based, which means that we will extract your data from a fixed position in your document. Other parsing rules will extract data from a variable position by looking for keyword markers inside the document text or applying pattern matching.
Does Docparser create parsing rules automatically?
Depending on what document type template you chose when creating your document parser, Docparser already creates a couple of popular parsing rules for you automatically. For example, when choosing ' Invoice' as the document type, Docparser automatically creates parsing rules which extract the invoice date, the invoice number and the totals (Net, Tax, Carriage, Total).
For every other data fields you want to extract, you need to create a new parsing rules manually.
Let's be completely honest ... creating parsing rules is the most tricky part in the setup of Docparser and there is a learning curve. But don't worry, we are there to help in case you are stuck somewhere and want a helping hand.
How many parsing rules do I need to create?
The answer to this is simple: You generally create one parsing rule for each data field which you want to extract. Let's say you want to extract the fields Invoice Date, Invoice Number and Invoice Totals. This means you have three fields which you want to extract and you will thus create three parsing rules. And exception to this rule is table data. Our table extraction tools allow you to extract an entire table with one single parsing rule.
How to create my first parsing rule?
Building up a parsing rule from scratch can take anything from one minute to a couple of minutes depending on the complexity of your data. To get familiar with how Docparser works, we recommend building a first simple parsing rule which extracts data from a specific position in your document. All you need to do is to choose 'Text Fixed Position' from the parsing rule templates presented to you.
In the next screen you will be prompted to select the area where the data is located. Simply draw a rectangle around the position of your data field and confirm the selection in the bottom right.
In the final screen you have the possibility to chain up multiple text-filters to manipulate your data. This step is optional and you can save your parsing rule without adding any filters. In following articles we will cover in detail how you can use text filters to further process your extracted data until it matches exactly the format you need.
That's it! You just created your first parsing rule which extracts data from a fixed position. This new parsing rule will be used to extract data from all imported documents in your document parser, existing documents and new ones.
As you probably noted, Docparser provides a variety of different parsing rule templates for all kind of use-cases. The 'Text Fixed Position' is the most easy way to get data from a fixed position. Other templates will allow you to extract tables, data points located at a variable position and much more.
Want do dive into parsing rule creation right now? Check out the following articles in our knowledge base: