Calidad de datos con Python: un enfoque práctico
Authors/Creators
Description
Si bien la calidad de los datos en el análisis y toma de decisiones resulta de vital importancia, escasos estudios proporcionan pasos claros para ejecutarlos a través del lenguaje de programación Python. En consecuencia, el objetivo de la presente investigación se relaciona con diseñar una guía para evaluar y mejorar la calidad de los datos utilizando el lenguaje de programación Python. Esta investigación con enfoque cualitativo se aplica en un caso práctico medido a través de las características de calidad: Exactitud, Integridad, Libre de Errores y Valor Añadido. Los resultados indican que, mediante la aplicación de la metodología propuesta basada en 12 pasos a través de Python, los datos cumplen con las características de calidad requeridas.
Abstract (English)
Although the quality of the data in the analysis and decision-making is of vital importance, few studies provide clear steps to execute them through the Python programming language. Consequently, the objective of this research is related to designing a guide to evaluate and improve the quality of data using the Python programming language. This research with a qualitative approach is applied in a practical case measured through the quality characteristics: Accuracy, Completeness, Free of Errors and Added Value. The results indicate that by applying the proposed methodology based on 12 steps through Python, the data meets the required quality characteristics.
Files
Additional details
Additional titles
- Translated title (English)
- Data quality with Python: a practical approach
Identifiers
- ISSN
- 2960-8317
- DOI
- 10.61347/ei.v2i2.55
Dates
- Submitted
-
2023-08-04Recibido
- Accepted
-
2023-09-07Aceptado
References
- Dasari, D., & Varma, P. S. (2022). Employing Various Data Cleaning Techniques to Achieve Better Data Quality using Python. In 2022 6th International Conference on Electronics, Communication and Aerospace Technology (pp. 1379-1383). IEEE. https://doi.org/10.1109/ICECA55336.2022.10009079
- Ehrlinger, L., & Wöß, W. (2022). A survey of data quality measurement and monitoring tools. Frontiers in Big Data, 5, 850611. https://doi.org/10.3389/fdata.2022.850611
- Equipo de Desarrollo de Pandas. (2023). Pandas-dev/pandas. Github. https://github.com/pandas-dev/pandas/tree/v2.0.3
- Hassenstein, M., & Vanella, P. (2022). Data Quality—Concepts and Problems. Encyclopedia, 2(1), 498-510. https://doi.org/10.3390/encyclopedia2010032
- Ilyas, I., & Chu, X. (2019). Data cleaning. Morgan & Claypool. https://doi.org/10.1145/3310205
- Jadhav, A., Pramod, D., & Ramanathan, K. (2019). Comparison of performance of data imputation methods for numeric dataset. Applied Artificial Intelligence, 33(10), 913-933. https://doi.org/10.1080/08839514.2019.1637138
- Lentini, A. (2021). Calidad de datos y aprendizaje automático: detección de errores semánticos en datos estructurados con esquema desconocido [Tesis de especialización, Instituto Tecnológico de Buenos Aires]. Repositorio del Instituto Tecnológico de Buenos Aires. https://ri.itba.edu.ar/entities/trabajo%20final%20de%20especializaci%C3%B3n/1d04d92e-69bf-43cf-889f-a4acb13ab040
- McKinney, W. (2011). Pandas: a foundational Python library for data analysis and statistics. Python for high performance and scientific computing, 14(9), 1-9. https://www.researchgate.net/publication/265194455_pandas_a_Foundational_Python_Library_for_Data_Analysis_and_Statistics
- Müller, H., & Freytag, J. (2003). Problems, methods, and challenges in comprehensive data cleansing. Professoren des Inst. Für Informatik. https://www.researchgate.net/publication/228929938_Problems_methods_and_challenges_in_comprehensive_data_cleansing
- Ridzuan, F., & Zainon, W. (2019). A review on data cleansing methods for big data. Procedia Computer Science, 161, 731-738. https://doi.org/10.1016/j.procs.2019.11.177
- Sakpal, M. (2021). How to improve your data quality. Gartner. https://www.gartner.com/smarterwithgartner/how-to-improve-your-data-quality
- West, N., Gries, J., Brockmeier, C., Göbel, J. C., & Deuse, J. (2021). Towards integrated data analysis quality: criteria for the application of industrial data science. In 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 131-138). IEEE. https://doi.org/10.1109/IRI51335.2021.00024