Information Extraction from Invoices

Hamdi, Ahmed; Carel, Elodie; Joseph, Aurélie; Coustaty, Mickael; Doucet, Antoine

doi:10.5281/zenodo.5562412

Published September 2, 2021 | Version v1

Conference paper Open

Information Extraction from Invoices

1. Université de La Rochelle
2. Yooz

The present paper is focused on information extraction from key fields of invoices using two different methods based on sequence labeling. Invoices are semi-structured documents in which data can be located based on the context. Common information extraction systems are model-driven, using heuristics and lists of trigger words curated by domain experts. Their performances are generally high on documents they have been trained for but processing new templates often requires new manual annotations, which is tedious and time-consuming to produce. Recent works on deep learning applied to business documents claimed a gain in terms of time and performance. While these systems do not need manual curation, they nevertheless require a large amount of data to achieve good results. In this paper, we present a series of experiments using neural networks approaches to study the trade-off between data requirements and performance in the extraction of information from key fields of invoices (such as dates, document numbers, types, amounts...). The main contribution of this paper is a system that achieves competitive results using a small amount of data compared to the state-of-the-art systems that need to be trained on large datasets, that are costly and impractical to produce in real-world applications.

Files

ICDAR_2021_Data_Extraction_from_Invoices.pdf

Files (464.5 kB)

Name	Size	Download all
ICDAR_2021_Data_Extraction_from_Invoices.pdf md5:0d3e6aecc289779f273e7a6a9f85ab63	464.5 kB	Preview Download

Additional details

European Commission
NewsEye - NewsEye: A Digital Investigator for Historical Newspapers 770299
Agence Nationale de la Recherche
IDEAS - International Document Engineering, Analysis and Security lab ANR-18-LCV3-0008

	All versions	This version
Views	173	173
Downloads	170	169
Data volume	84.1 MB	83.6 MB

Information Extraction from Invoices

Authors/Creators

Description

Files

ICDAR_2021_Data_Extraction_from_Invoices.pdf

Files (464.5 kB)

Additional details

Funding