Published April 30, 2020 | Version v1
Journal article Open

Data Pre-Processing for Machine Learning Models using Python Libraries

  • 1. Department of Computer Science and Engineering, Bhilai Institute of Technology Durg, Bhilai, Chhattisgarh, India.
  • 1. Publisher


Data pre-processing is the process of transforming the raw data into useful dataset. Data pre-processing is one of the most important phase of any machine learning model because the quality and efficiency of any machine learning model directly depends upon the data-set, if we skip this step and design a model with data sets containing missing values then the model we have designed will not be that efficient and will be inconsistent model. This paper describes the methodology for pre-processing the data in seven sequence of steps using python powerful libraries which are open source machine learning libraries that support both supervised and unsupervised learning like pandas is a high level data manipulation tool, scikit learn which provides various tools for model fitting, data pre-processing, model selection and many other utilities. These steps include dealing with missing value, categorical values, importing data sets etc. This analysis helps in cleaning and transforming the datasets which future applied to any learning model and produce a efficient machine learning model.



Files (490.6 kB)

Name Size Download all
490.6 kB Preview Download

Additional details

Related works

Is cited by
Journal article: 2249-8958 (ISSN)


Retrieval Number