The Custom Dictionary allows you to define your own keywords and extraction patterns so Docparser can automatically detect and extract specific values from your documents. This is useful when your documents contain consistent labels such as “Invoice”, “Order Number”, “Account ID”, “Reference”, or any other term your business relies on.
Once you add keywords, Docparser scans your documents for the matching terms and extracts the text according to the rules you define.
1. Adding Search Terms
Use the Search Term input to define the keywords you want Docparser to look for in your documents.
Example:
Enter Invoice to extract values following “Invoice”.
Add a second entry (using the + button) for terms such as Order Number, Reference, or Customer ID.
You can add as many terms as you need. Docparser will check for them in the order listed.
You can also toggle Case Sensitive if the keyword must exactly match the case as written in the document.
2. Data Format
The Data Format setting controls what type of information Docparser should extract after the keyword is found. This helps you avoid incorrect matches.
Available options include:
Any - Extracts all characters.
Characters only – Extracts text with letters only.
Alphanumeric – Extracts letters and numbers.
Alphanumeric with punctuation – Extracts mixed characters (e.g., "INV-2025").
Numbers – Extracts digits only.
Numbers with punctuation – Useful for amounts (e.g., “1,250.50”).
Alphanumeric with number – Ensures the result contains at least one number.
Alphanumeric with punctuation and number – Most flexible structured option.
Example:
If you are looking for an invoice number such as INV-23411, choose Alphanumeric with punctuation.
3. Keyword Start Position
This defines where extraction begins in relation to the keyword you entered.
Options include:
Starts with the keyword - Extraction begins at the keyword itself.
Contains the keyword - Extracts anywhere the keyword appears in the line.
Next Nearest Match - Skips the keyword and extracts the next matching pattern.
Next Word On Same Line - Looks for the first word after the keyword.
Next Keyword Match On Same Line - Useful when multiple keywords appear on one line.
Next Keyword Match On Same Line Or Below - Searches downward if not found on the same line.
Nearest Match On The Line Below The Current Keyword - Extracts from the next line only.
Example:
For a line like:Invoice: 556712
Use: Next Word On Same Line.
4. Keyword End Position
This defines where extraction stops, letting you control the boundaries of the captured value.
Options include:
End of line - Extracts until the end of the line.
End Keyword - Stops when another keyword is found.
End On The Next Line - Extracts across two lines.
End After Keyword Found - Stops immediately after detecting the keyword.
End Of Line (of found keyword) - Slight variation used for precise alignment.
End After [x] consecutive spaces - Useful for data separated by spacing instead of punctuation.
End After [x] consecutive new-lines - Extracts multi-line values.
Example:
If your target line looks like:Order Number 923419
And spacing varies across documents, choose:
End After [1] consecutive spaces.
How Docparser Uses Your Custom Dictionary
Once you configure your keywords and extraction rules:
Docparser scans each document for the keywords you supplied.
When a keyword is found, the system applies the Start Position, End Position, and Data Format logic.
The extracted value is returned as the output of that dictionary rule.
You can create multiple dictionary entries to extract multiple fields.
This gives you fine-grained control over values that may appear in different places or formats within your documents.
Examples
Example 1: Extracting an Invoice Number
Search Term: Invoice
Data Format: Alphanumeric with punctuation
Start Position: Next Word On Same Line
End Position: End of line
Result: Extracts values such as INV-20451.
Example 2: Extracting an Order Number From a Label Below
Document:
Order Number
44512
Search Term: Order Number
Start Position: Nearest Match On The Line Below
End Position: End of line
Result: 44512