TonalityPrint: A Contrast-Structured Voice Dataset for Exploring Functional Tonal Intent, Ambivalence, and Inference-Time Prosodic Alignment v1.0
Authors/Creators
Contributors
Contact person:
Description
TonalityPrint is a specialized single-speaker speech corpus designed to enable the exploration of fine-tuning functional tonal intents - Trust, Attention, Reciprocity, Empathy Resonance, and Cognitive Energy - in voice AI systems. Unlike emotion recognition datasets, TonalityPrint annotates functional tonal intents (what speakers do with tone), not just what they feel.
Annotations include five Functional Tonal Intents and an explicit ambivalence condition, conceptualized as a perceptual entropy transitional state rather than a discrete emotion. A core innovation of TonalityPrint is its treatment of Ambivalence (systematically annotated as ambivalex), where, rather than discarding mixed or transitional signals as noise, this dataset treats tonal complexity as a perceptual entropy feature essential for real-world inference-time alignment.
Utilizing its Fixed-Phrase Octet, the dataset delivers 144 audio samples across 18 utterances, each recorded in 8 parallel prosodic states. It is accompanied by a detailed README describing design philosophy, ethical constraints, and proposed evaluation affordances.
Grounded in real-world practitioner experience from 8,873+ consequential interactions, the corpus potentially captures an “AI-adjacent yet trusted” vocal profile observational motivation that may challenge assumptions about the ‘uncanny valley’ effects and potentially offer provocative insights for humanoid robotics, companion AI, human-agent interaction and reasoning-based voice interfaces.
TonalityPrint is intended as a hypothesized contrast substrate, not a training corpus for general-purpose speech models. TonalityPrint is designed for researchers exploring inference-time alignment, prosodic interpretability, style-conditioned synthesis, human-AI voice calibration, and evaluation of "Safety-Critical" voice agents (e.g., healthcare, autonomous systems) that must audibly sound uncertain when hallucinating.
Featuring;
•144 high-fidelity, unprocessed WAVs preserve tonal fidelity (48kHz/32-bit).
•18 unique utterances across 8 prosodic states.
•Continuous intensity indices (0-100) for five core functional intents.
•Comprehensive metadata including "ambivalex" flags and practitioner-verified outcome associations.
All recordings: 100% authentic human voice (author) with explicit consent; Released under CC BY-NC 4.0 (academic/research free; commercial licensing available).
This work emerges from independent practitioner-research conducted without institutional funding and is released for academic research use under CC BY-NC 4.0. Commercial licensing is available.
Supplement to: Polhill, R. (2025) "Tonality as Attention" white paper (DOI: 10.5281/zenodo.17410581).
Why Download Now:
TonalityPrint is designed to enable precision isolation of functional prosodic signals in voice AI - a growing priority for labs focused on safe, nuanced, and human-aligned speech interfaces. Dataset v1.0 is available today for benchmarking; collaborative validation and multi-speaker extensions are actively sought.
Abstract
TonalityPrint is a specialized, controlled, single-speaker prosody dataset designed for precision-tuning functional tonal intents that govern complex human conversation in voice AI systems. The corpus provides 144 human-verified audio samples across 18 utterances, each recorded in 8 parallel prosodic states: Baseline/Neutral, five core Functional Tonal Intents (Trust, Attention, Reciprocity, Empathy Resonance, Cognitive Energy)(Picard, 1997; Cutler et al., 1997), and systematically annotated Ambivalence states, treating tonal complexity as a learnable feature rather than annotation error (Cowen & Keltner, 2017; Pell & Kotz, 2011) .
Core Innovation: TonalityPrint proposes hypothesized data-level substrates that may enable prosodic AI alignment:
-
Ambivalence Annotation: Unlike traditional emotion datasets that treat mixed or transitional signals as noise, TonalityPrint systematically annotates ambivalence as a perceptual entropy cross-intent feature, aiming to provide an operational reference signal for AI systems that must navigate real-world tonal complexity during inference.
-
Differential Latent Analysis (DLA): Maintaining speaker identity and lexical content constant across parallel prosodic states (the "Fixed-Phrase Octet"), TonalityPrint may enable researchers to perform contrastive approximation of tonal intent vectors - contrastive approximation of tonal intent vectors, analogous to established activation-steering methods in LLMs but applied to voice prosody (Rimsky et al., 2024; Anthropic, 2024). These represent proposed methodological extensions and hypothesis-generating proposals requiring empirical validation at scale.
Empirical Context: Annotations are grounded in ecological feasibility derived from 8,873+ consequential customer interactions, where the observed tonal patterns correlated with a stable ~35.85% average conversion rate (ecological provenance rather than proof). During this period, 68 listeners (~0.76% of interactions) spontaneously described the speaker's voice using AI-adjacent descriptors ('automated,' 'TV voice') while maintaining high trust and positive outcomes. While these observations are confounded by numerous variables and cannot establish causation, they motivated the hypothesis that certain prosodic patterns merit systematic investigation for human-AI voice alignment - particularly the counterintuitive finding that "AI-adjacent" vocal qualities may co-occur with trust-building (Breazeal, 2003) rather than necessarily triggering discomfort typically associated with the well-documented ‘uncanny valley’ effects.
Theoretical Foundation: These innovations operationalize the Tonality as Attention framework (Polhill, 2025), which proposes that human vocal prosody serves as an active attention mechanism functionally analogous to computational attention mechanisms in AI architectures - a shared signaling system that may bridge human-machine communication. TonalityPrint aims to provide an early controlled dataset for testing this framework's core hypothesis: that prosodic patterns function as steerable intent vectors during inference.
Dataset Structure: The corpus provides exploratory continuous indices (0-100) for Trust, Attention, Reciprocity, Empathy Resonance, and Cognitive Energy intensity, enabling both categorical classification and continuous gradient modeling. All 144 samples are ethically sourced (100% human recordings by the author with explicit consent), unprocessed to preserve tonal fidelity, and include comprehensive metadata for reproducible research.
Research Applications: TonalityPrint may function as a precision-tuning resource for investigating: inference-time prosodic alignment in reasoning-based voice models, ambiguity-aware dialogue systems, style-conditioned speech synthesis, embodied AI voice-appearance synchrony, and socio-pragmatic attention mechanisms (e.g. Truthfulness Calibration- Potentially aligning model confidence scores with vocal doubt (Ambivalence) to prevent deceptive confidence, audibly signally uncertainty when hallucinating). As a controlled single-speaker corpus, it complements rather than replaces large-scale multi-speaker datasets - designed for architectural development, feature extraction research, and transfer learning evaluation, opens new research directions
Limitations: Findings are derived from a single speaker in one professional context and require independent validation across speakers, cultures, and domains. Observational conversion data reflect correlation rather than causation, and are subject to numerous confounding variables. The dataset aims to provide a precision-tuning resource for investigating a replicable annotation methodology and controlled substrate for validation studies but does not itself constitute proof of the Tonality as Attention framework or downstream efficacy.
Availability: Dataset available now for immediate precision-tuning experiments with full methodological documentation released under CC BY-NC 4.0 at Zenodo https://doi.org/10.5281/zenodo.17913895, with commercial licensing available. The accompanying Tonality as Attention white paper is available separately at https://doi.org/10.5281/zenodo.17410581.
TonalityPrint aims to address a critical gap in voice AI training data by moving beyond discrete emotion recognition to capture functional tonal intent, including ambivalent prosodic signals as potentially essential nuances for inference-time alignment.
Files
DATACARD.zip
Files
(43.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:6ace62ab158583dd36f40796f87dbf6b
|
42.9 MB | Preview Download |
|
md5:acdd18ccb3fead4e57f4b0bb35d6a3df
|
1.0 MB | Preview Download |
Additional details
Related works
- Is referenced by
- Report: 10.5281/zenodo.19237818 (DOI)
- Is supplement to
- Report: 10.5281/zenodo.17410581 (DOI)
Dates
- Copyrighted
-
2026-01-24