Enhancing Industrial Data Analysis through Machine Learning-based Classification of Petrochemical Datasets

Rastislav Fáber; Karol L'ubušký; Martin Mojto; Radoslav Paulen

doi:10.5281/zenodo.8284097

Published May 17, 2023 | Version v1

Poster Open

Enhancing Industrial Data Analysis through Machine Learning-based Classification of Petrochemical Datasets

1. Slovak University of Technology in Bratislava
2. Slovnaft, a.s.

Incorporating data analytics and machine learning (ML) algorithms into industrial decision making has proven to be a promising way to boost production efficiency. By utilizing ML algorithms to classify historical measurements from online sensors and laboratory analyses, it is possible to provide an operation guideline that was previously unavailable. We apply rigorous data treatment to prepare the raw data for ML-based classifier design. This process includes data cleaning, data standardization, data averaging, variable removal (based on linear dependency analysis), and distant outlier detection; to ensure the quality and reliability of available data. Selection of a suitable classifier model depends on the complexity of an industrial process, the level of its automation (implementation effort) and the ability to handle data outliers. We employ Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for initial ground-truth labeling, after which we utilize well understood ML algorithms; k-Means, k-Nearest Neighbors (k-NN), Support Vector Machine (SVM) and SVM with time difference, to engineer a framework for real-time classification. Accurate categorization of measurements is crucial for identifying slight deviations from real values that could impact the quality of the final product. Moreover, the complexity of the data plays a significant role in the performance of ML algorithms. With precise categorization of real-time data, the need for human intervention in process control can be minimized. To evaluate the performance of the designed classifiers, we compare their classification accuracy against the aforementioned synthetic ground truth labels. This comparison is carried out on a testing dataset that was not used during the framework design. Overall, our results demonstrate that the ML-based classifiers achieve comparable results in real-time classification. The most accurate classifier was the SVM model which uses not only absolute data, but also their time differences, which achieved the highest anomaly detection, 82 %.

Files

2511.pdf

Files (54.5 kB)

Name	Size	Download all
2511.pdf md5:85b57d3bbe82080083a4530a33220d8e	54.5 kB	Preview Download

Additional details

European Commission
FrontSeat – Fostering Opportunities Towards Slovak Excellence in Advanced Control for Smart Industries 101079342

	All versions	This version
Views	113	113
Downloads	52	52
Data volume	3.1 MB	3.1 MB

Enhancing Industrial Data Analysis through Machine Learning-based Classification of Petrochemical Datasets

Creators

Description

Files

2511.pdf

Files (54.5 kB)

Additional details

Funding