Scalability of SageMaker Autopilot's Preprocessing Pipeline and Fairness Metrics on Large-Scale Tabular Datasets

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20653545

Published June 12, 2026 | Version v1

Report Open

Scalability of SageMaker Autopilot's Preprocessing Pipeline and Fairness Metrics on Large-Scale Tabular Datasets

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results and has contributed significantly to the progress of AI, particularly in the sphere of supervised deep learning. It has also simplified the design of machine learning systems as the learning process is highly automated. However, not all data processing tasks in conventional deep learning pipelines have been automated. In most cases data has to be manually collected, preprocessed and further extended through data augmentation before they can be e

Research goal: How does the scalability of SageMaker Autopilot's preprocessing pipeline affect fairness metrics (e.g., group fairness) when applied to large-scale tabular datasets (e.g., Criteo, Kaggle datasets) compared to distributed fairness-aware preprocessing frameworks like Turi Create?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.1/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.1/10.

Files

paper.pdf

Files (74.2 kB)

Name	Size	Download all
paper.pdf md5:9a21356128b516bce7accce620784d71	74.2 kB	Preview Download

	All versions	This version
Views	2	2
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Scalability of SageMaker Autopilot's Preprocessing Pipeline and Fairness Metrics on Large-Scale Tabular Datasets

Authors/Creators

Description

Notes

Files

paper.pdf

Files (74.2 kB)