The Content Summarization Filter is a powerful feature that leverages OpenAI to extract structured insights from your documents. When set to output bullet points, this filter provides a table-style summary that lists the key fields found in your document—such as Invoice Number, Invoice Date, Total, and more.
This summary table can be used as the starting point for building precise parsing rules, especially when you're working with complex or unfamiliar document layouts.
What the Bullet Point Summary Does
When enabled, the bullet point option in the Content Summarization Filter analyzes the content of your document and returns a structured list of fields in a table format. For example:
| Field Name | Value |
|---|---|
| Invoice Number | INV-2024-0567 |
| Invoice Date | 2025-06-15 |
| Total Amount | £1,245.00 |
You can then filter this table to isolate the field you're interested in and build a rule directly from it.
Step-by-Step: Creating a Parsing Rule from a Summary Field
Let’s walk through an example of how to extract a single field—like the Invoice Number—using the bullet point summary output:
Step 1: Add the Content Summarization Filter
- Navigate to your parser.
- Add a new filter and select Content Summarization.
- Set the output style to Bullet Points (Table Format).
Step 2: Retain the Row for Your Target Field
- Add a "Keep Rows Where" filter.
- Set it to keep rows where the first column contains
invoice_number(or your desired field name).
This keeps only the row that contains the invoice number, for example:
Step 3: Keep Only the Field Value
- Add a "Keep Columns from [X] to [Y]" filter.
- Set it to keep only the second column (e.g., column 2) to retain the value.
Step 4: Rename and Save Your Rule
- Give your rule a descriptive name, such as "Invoice Number (from Summary)".
- Save the filter and test it on a few documents to confirm it's extracting the value correctly.
Why Use This Approach?
- Fast Setup: Ideal for quickly pulling out structured fields without manually building complex filter chains.
- Document Discovery: Useful for exploring unfamiliar document types or getting a sense of what fields are available.
- Flexible: You can repeat this process for any number of fields returned in the bullet point summary.
Tips
- Adding a Convert to Running Text filter to the end of your rule is ideal for creating a reliable text/string output for those single fields.