Docparser offers various tools to extract table data from PDF and scanned documents. The easiest way to extract tables is to use our "Table Data" parsing rule preset, or if you are processing invoices our "Line Items" preset. Both presets let you define the column boundaries of your table with a simple point & click interface. But what if your tables have multiple rows or lines per item as in the example below?
Using the filter for "Group and Merge Rows" will allow you to combine those description lines into the rows above to gather 4 rows of data. To learn how to do this just follow the steps below.
The first step will be to select Add Parsing Rule and select from either the "Line Items" or "Table Data" parsing rules types.
Then next step is to then identify the column boundaries and the area where the items are located on the page - to do this simply click the red + buttons on the left or right of the screen to add the column guides and drag those to the correct location on your file. You can then choose to use a crop area selection to identify the location on the page where the line items are, or alternatively, you can use filters such as "Keep Table Section" Or "Keep Rows Where" filters to select the area or rows you would like to keep.
Once you have isolated only the line item rows within your document, add in the table filter for "Group and Merge Rows"
This filter by default will be set to "Automatic Merge", this setting will look for values in the first column of your table, and merge everything until a value is found in Column 1 again. This works for many documents, however if this setting does not work for you, you can change this from Automatic Merge to "Define Rules"
Once selected the filter will open up the options to allow you to define when a group begins and ends. In the example below many options would work to accomplish this for example
A row group begins when column #2 has a value
A row group begins when column #2 is in integer number
A row group begins when column #3 has a value
A row group begins when column #3 is an amount
*and more
There are a number of options to select from when defining the beginning of a row group - this will be unique to each PDF
Once that has been defined your row groups will merge together. You can then select to define where a row group ends using the unique value of the last row you require by selecting When does a row group end? and selecting either "When cell conditions match" or "After [x] rows"
By default all rows will merge cell values - however, if your use case requires you can select to Append the values of the additional rows to the end using either of the two options in the image below:
These options will add additional columns to the right of the table with the values of the additional rows
Once your table appears in the way you need simply select "SAVE PARSING RULE".