QUESTION ANSWERING MODULE LEVERAGING HETEROGENEOUS DATASETS
Description
Question Answering has been a well-researched NLP area over recent years. It has become necessary for
users to be able to query through the variety of information available - be it structured or unstructured. In
this paper, we propose a Question Answering module which a) can consume a variety of data formats - a
heterogeneous data pipeline, which ingests data from product manuals, technical data forums, internal
discussion forums, groups, etc. b) addresses practical challenges faced in real-life situations by pointing to
the exact segment of the manual or chat threads which can solve a user query c) provides segments of texts
when deemed relevant, based on user query and business context. Our solution provides a comprehensive
and detailed pipeline that is composed of elaborate data ingestion, data parsing, indexing, and querying
modules. Our solution is capable of handling a plethora of data sources such as text, images, tables,
community forums, and flow charts. Our studies performed on a variety of business-specific datasets
represent the necessity of custom pipelines like the proposed one to solve several real-world document
question-answering.
Files
10621ijnlc01.pdf
Files
(1.5 MB)
Name | Size | Download all |
---|---|---|
md5:b02a3adf13bfb04706c51391ba1b1879
|
1.5 MB | Preview Download |