Journal article Open Access

Automatic Table Detection, Structure Recognition and Data Extraction from Document Images

Borra Vineetha; D. N. D. Harini; Ravi Yelesvarupu


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Deep Learning, OCR, Scanned documents, Table detection, Structure recognition, Table data extraction</subfield>
  </datafield>
  <controlfield tag="005">20210915134823.0</controlfield>
  <controlfield tag="001">5509822</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Computer Science and Engineering, GVP College of Engineering, Visakhapatnam (A.P.), India.</subfield>
    <subfield code="a">D. N. D. Harini</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">CEO, Hallmark Solutions, Visakhapatnam (A.P.), India</subfield>
    <subfield code="a">Ravi Yelesvarupu</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Publisher</subfield>
    <subfield code="4">spn</subfield>
    <subfield code="a">Blue Eyes Intelligence Engineering &amp; Sciences Publication (BEIESP)</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">757524</subfield>
    <subfield code="z">md5:88586f3ad8e4818690fc4d37ca9dcaee</subfield>
    <subfield code="u">https://zenodo.org/record/5509822/files/I93490710921.pdf</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021-07-30</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="o">oai:zenodo.org:5509822</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="4">
    <subfield code="c">73-79</subfield>
    <subfield code="n">9</subfield>
    <subfield code="p">International Journal of Innovative Technology and Exploring Engineering (IJITEE)</subfield>
    <subfield code="v">10</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Department of Computer Science and Engineering, GVP College of Engineering, Visakhapatnam (A.P.), India.</subfield>
    <subfield code="a">Borra Vineetha</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Automatic Table Detection, Structure Recognition and Data Extraction from Document Images</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2=" ">
    <subfield code="a">ISSN</subfield>
    <subfield code="0">(issn)2278-3075</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2=" ">
    <subfield code="a">Retrieval Number</subfield>
    <subfield code="0">(handle)100.1/ijitee.I93490710921</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;In the recent advancement, the extensive usage of electronic devices to photograph and upload documents, the requirement for extracting the information present in the unstructured document images is becoming progressively intense. The major obstacle to the objective is, these images often contain information in tabular form and extracting the data from table images presents a series of challenges due to the various layouts and encodings of the tables. It includes the accurate detection of the table present in an image and eventually recognizing the internal structure of the table and extracting the information from it. Although some progress has been made in table detection, obtaining the table contents is still a challenge since this involves more fine-grained table structure (rows and columns) recognition. The digitization of critical information has to be carried out automatically since there are millions of documents. Based on the motivation that AI-based solutions are automating many processors, this work comprises three different stages: First, the table detection using Faster R-CNN algorithm. Second, table internal structure recognition process using morphology operation and refine operation and last the table data extraction using contours algorithm. The dataset used in this work was taken from the UNLV dataset&amp;nbsp;&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">issn</subfield>
    <subfield code="i">isCitedBy</subfield>
    <subfield code="a">2278-3075</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.35940/ijitee.I9349.0710921</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">publication</subfield>
    <subfield code="b">article</subfield>
  </datafield>
</record>
26
26
views
downloads
Views 26
Downloads 26
Data volume 19.7 MB
Unique views 20
Unique downloads 25

Share

Cite as