Docparser was primarily designed to process "transactional business documents", such as invoices, purchase orders, work orders, etc. We try to support all common file formats used to exchange business documents between entities, as well as file formats used by scanning software.
At the time of writing, Docparser can read the following file formats:
- Native PDF documents with text
- Scanned PDF documents with images only
- Microsoft DOCX and DOC files
- JPG image
- PNG image
- TIFF image
- CSV
- Microsoft XLS
- TXT
- XML