Azure Cognitive Search for Oil Industry
- 1. Openstream Technologies, Bangalore, India.
- 2. Department of Computer Science and Engineering, National Institute of Technology (NIT), Warangal, India.
Description
Oil industries typically store a vast amount of valuable information about past research and exploratory work. This data is used by the subsurface team in the oil industry to find potential oil prospects. As most of this data is hosted on-premises, it has challenges such as (1) access from geographically distant locations, (2) search, and (3) cost of hosting the data. This paper proposes a cloud-based solution using Azure PaaS (Platform as a Service) to migrate the on-premises data to the Azure blob to reduce on-premises storage cost and make it accessible across the regions. We implemented a knowledge extraction framework using the cognitive pipeline feature of the Azure Search service to cluster data, meta data extraction from documents, duplicate file detection, and search. We used the K-Means clustering algorithm to categorize documents and tag them with additional meta data information to search documents easily. We also present a search interface that includes thumbnail view, fast web view, and image compression to facilitate quick access to information to all subsurface users and benefit business users in identifying document groups and related information. Experimental results on an oil dataset that contained 466 PDF and image files demonstrate the viability of our approach for cognitive search on the performance of the pipeline in the Oil industry. The presented approach is generic and can be employed using other cloud service providers and other industrial domains.