Machine Learning-Based Name Matching: A Logistic Regression Perspective

Ashlin Darius Govindasamy

doi:10.5281/zenodo.8115376

Published July 3, 2023 | Version v2

Journal article Open

Machine Learning-Based Name Matching: A Logistic Regression Perspective

Ashlin Darius Govindasamy¹

1. University Of South Africa

In this study, we conducted experiments to investigate the use of logistic regression in de- veloping a name matching system. The primary objective was to create a system capable of identifying potential matches between names in a given dataset and a query. To achieve this, we employed established techniques like Levenshtein distance and fuzzywuzzy similarity to assess the similarity between names.

Initially, we preprocessed the dataset by calculating the Levenshtein distance and fuzzy- wuzzy percentages for each name in comparison to the query. These calculated features were then appended as additional columns to the dataset. Subsequently, we utilized a logistic regression model that had been previously trained using a labeled dataset.

To evaluate the performance of the model, we employed it to predict the likelihood of a name being a match for each entry in the dataset. These predictions were incorporated as a new column within the dataset. Finally, we sorted the dataset in descending order based on the prediction values to identify the most probable name matches.

The developed name matching system provides a scalable and efficient approach, enabling users to input a query and obtain a ranked list of potential name matches. To further assess the accuracy and efficacy of the system, it is possible to compare the predicted matches with known ground truth data.

The results obtained from our study demonstrate the effectiveness of the name matching system in identifying potential matches based on the computed features and the trained logistic regression model. The system holds significant value in various applications, including data integration, record linkage, and identity verification.

Files

Machine_Learning_Based_Name_Matching__A_Logistic_Regression_Perspective .pdf

Files (395.3 kB)

Name	Size	Download all
Machine_Learning_Based_Name_Matching__A_Logistic_Regression_Perspective .pdf md5:0da89df11d085489a3aac244951999b6	395.3 kB	Preview Download

	All versions	This version
Views	360	257
Downloads	197	195
Data volume	92.1 MB	89.7 MB

Machine Learning-Based Name Matching: A Logistic Regression Perspective

Authors/Creators

Description

Files

Machine_Learning_Based_Name_Matching__A_Logistic_Regression_Perspective .pdf

Files (395.3 kB)