What changes

Forty minutes per invoice, or thirty seconds.

Without us

Manual data entry, on every page.

Open the PDF in one window, ERPNext in another.

Retype the supplier, invoice number, date, total.

Type each line item, hope you didn’t miss a digit.

If the column on the PDF is rotated, start over.

End of week: backlog of unentered invoices.

With us

Drafts ready before you open them.

PDF lands in ERPNext via email, drop, or upload.

OCR + AI extracts supplier, fields, line items.

Validation rules check for missing or weird values.

Draft Purchase Invoice ready for review and submit.

End of week: nothing in the queue.

Data entry is the slowest part of accounts payable.

Suppliers don’t send PEPPOL XML — they send PDFs in email. Someone retypes those into ERPNext. The cost isn’t one invoice; it’s 200 a week and the typos that ripple through the ledger. AI does the typing now. Your team does the approving.

What you get

OCR + AI, scoped to your documents.

OCR for any PDF

pytesseract reads the file, including scanned and photo-of-paper invoices. PDFs with embedded text are read directly.

AI field extraction

An LLM extracts the supplier, invoice number, date, totals, and line items. Trained on the kind of documents your business actually receives.

Document classification

Sales order, purchase order, invoice, receipt — classified automatically. The right draft DocType is created.

Validation rules

Missing VAT number? Suspicious total? Misclassified document? Validation catches it before you waste time approving a bad draft.

Field-level confidence

Each extracted field carries a confidence score. High-confidence fields auto-fill; low-confidence ones surface for review.

Process queue + reprocessing

Re-run extraction with different prompts or models. Reprocess a batch when supplier formats change. No code, just settings.

How a PDF becomes a draft

Drop. Extract. Validate. Approve.

PDF arrives

Email, drop folder, or upload. ERPNext sees the file as a Document Processor record.

OCR + AI extract

Pytesseract reads the page; an LLM extracts the fields and line items.

Validated

Validation rules flag missing or weird values. Low-confidence fields surface for review.

Draft created

A draft Purchase Invoice (or other doctype) is created with all fields filled. You review, edit, submit.

Built for

Anyone receiving PDFs as a default.

Accounts Payable

Supplier invoices that arrive by email or post. OCR + AI replaces the typist.

Inbound logistics

Delivery notes, packing slips, customs documents — all PDFs, all OCR-able.

Anywhere paper meets ERP

If your team retypes anything from a PDF into ERPNext, this product replaces that step.

Under the hood

Pytesseract + pdf2image + LLM. Open source.

OCR uses pytesseract over pdf2image and OpenCV preprocessing. Field extraction goes through an LLM provider (OpenAI GPT-4, or bring your own — the same setup as the Pilot copilot). Extracted records are draft ERPNext documents (Sales Order, Purchase Order, Sales Invoice, Purchase Invoice). No parallel data model; documents land where they would have landed if a human had typed them. Open source.

Ready for the next step?

30 minutes. No sales pitch.

A free discovery call with a senior architect. We ask a few questions, you get an honest estimate on scope, timeline and cost. That is it.

Book your call Email us instead →

What happens next

You book 30 min

Through Calendly. English or Dutch.

We ask a few questions

About your current system, team and pain points.

You get an estimate

Scope, timeline, cost in a PDF, within 48 hours.

PDFs in.ERPNext records out.