Logo Parsio Knowledge Base
  1. Overview
  2. Data Extraction
  3. OCR Parsing of PDF Files and Images

OCR Parsing of PDF Files and Images

Parsio can extract data from text PDFs, scanned PDFs and images. It uses Machine Learning for OCR and data extraction. There is a set of prebuilt AI models to automatically extract data from some commonly-used document types:

  • Invoices
  • Receipts
  • Business cards
  • Identity documents: passports, driving licenses, ID cards etc
  • W-2 forms (US)
  • General documents and forms, including handwritten text in different languages.

You can extract line data (e.g. tables & repetitive data) from PDFs and images. It costs 5 credits per page to use the AI engine.

How to use the AI engine

1. Create a mailbox, choose "I will parse PDFs and images" and select a pre-built model.

2. Send email with attachments as usual, upload files manually or use our API to import PDF files.

3. Parsio will automatically identify fields, tables and data to extract.

4. After this, you can export the parsed data as usual (Google Sheets, automation platforms, webhooks or files).


Was this article helpful?