A Semi-Supervised Approach to Anomaly Detection for Tax Compliance

Leo Corman; Jason Bono; Catherine Acton; Danielle Gewurz; Evan Schulz

doi:10.5281/zenodo.14009896

Published October 29, 2024 | Version v1

Conference proceeding Open

A Semi-Supervised Approach to Anomaly Detection for Tax Compliance

A semi-supervised autoencoder model is applied to predict anomalies in tax form data consisting of hundreds of sparse, irregularly distributed features. Historical tax data generally contains a small number of labeled examples (i.e., returns with known compliance outcomes) and a much larger number of unlabeled examples (i.e., returns with unknown compliance status). Compliance outcomes can be measured by various success metrics; these metrics tend to have right-tailed distributions, with higher values indicating outcomes that are more beneficial to the IRS. Building on recent literature related to semi-supervised anomaly detection and our previous work applying unsupervised autoencoders to tax data, we introduce new techniques for incorporating labeled data into model training and leveraging both binary (i.e., compliant vs non-compliant) and continuous (i.e., success metrics) outcomes to update the model. These techniques provide flexibility regarding the level of influence that the labeled data has on the training process – ranging from no influence at all to near total influence – while addressing the sparsity of the data. We also apply novel ensemble methods to improve detection of anomalies. Testing across different datasets and population segments shows favorable performance for the semi-supervised autoencoder compared to existing operational models.

Files

A Semi-Supervised Approach to Anomaly Detection for Tax Compliance_vF.pdf

Files (1.2 MB)

Name	Size	Download all
A Semi-Supervised Approach to Anomaly Detection for Tax Compliance_vF.pdf md5:29c60a141982182626279dbc2ea8802d	1.2 MB	Preview Download

Additional details

A. S. Parker, D. Gewurz, and W. J. J. Roberts. "Quality and Validity Testing of Sparse Form Data using Gaussian Mixture Models," JSM Proceedings, Social Statistics Section, 2018.
A. S. Parker, D. Gewurz, and W. J. J. Roberts. "Recommender Algorithms for Form Anomaly Detection," JSM Proceedings, Government Statistics Section, 2020.
C. Acton, L. Corman, J. Bono, D. Gewurz, C. Walsh, and E. Schulz. "Anomaly Detection on Sparse Data with Autoencoders," JSM Proceedings, 2023, DOI: 10.5281/zenodo.10001050.
N. Merrill and A. Eskandarian. "Modified Autoencoder Training and Scoring for Robust Unsupervised Anomaly Detection in Deep Learning," IEEE Access, vol. 8, pp. 101824-101833, DOI: 10.1109/ACCESS.2020.2997327.
L. Ruff, R. Vandermeulen, N. Görnitz, A. Binder, E. Müller, K-R. Müller, and M. Kloft. "Deep Semi-Supervised Anomaly Detection," International Conference on Learning Representations, 2020, URL: https://doi.org/10.48550/arXiv.1906.02694.

	All versions	This version
Views	112	112
Downloads	106	106
Data volume	153.7 MB	153.7 MB

A Semi-Supervised Approach to Anomaly Detection for Tax Compliance

Creators

Description

Files

A Semi-Supervised Approach to Anomaly Detection for Tax Compliance_vF.pdf

Files (1.2 MB)

Additional details

References