Published August 18, 2025 | Version v1
Publication Open

HES-118: An Energy–Information Framework for the Periodic Table (BE–Ce–IS) Unsupervised Clustering in BE–Ce–IS Space: K-Selection, Stability, and External Validation

  • 1. Business and Technology University

Description

This record releases the data and reproducible code for a study that unifies two complementary directions: the HES-118 energy–information framework (BE–Ce–IS) and unsupervised clustering carried out in the same feature space. We build a three-dimensional representation for the chemical elements, where BE is the average nuclear binding energy (MeV per nucleon), Ce is the correlation energy of valence electrons (eV), and IS is an information metric of orbital abundance (log₂N, bits). Units and measurement standards are harmonized; preprocessing uses StandardScaler with a robustness check (appendix). Outliers are flagged only (not removed) via a Mahalanobis χ² test (df = 3, p < 0.01).

 
x = (BE, Ce, IS)^T IS = log2 N D^2 = (x' − μ)^T Σ^{-1} (x' − μ) Flag if D^2 > χ^2_{3;0.99}

Clustering pipeline. K-means (init = “k-means++”, n_init = 50, max_iter = 1000, tol = 1e−6) serves as the base algorithm and is compared against Agglomerative (Ward), GMM, Spectral, and HDBSCAN baselines. The number of clusters K is selected using Elbow, Silhouette_avg, Calinski–Harabasz (increasing), and Davies–Bouldin (decreasing) indices, with an explicit K±1 sensitivity test.

Stability & validation. We assess stability via a 100-seed sweep (ARI variance) and 100× bootstrap (80%) with ARI/Jaccard distributions. External validation uses known chemical families and the s/p/d/f block structure (row-normalized confusion matrix; ARI). PCA (2D/3D) and UMAP support visualization.

Findings. The resulting groups align semantically with established chemical families; we identify “empty zones” in the energetic–informational space and highlight borderline elements (e.g., Ni, Y, Nd) for targeted interpretation. We also provide a prediction module that combines cluster ID, distance-to-centroid, and local density with auxiliary features to evaluate new hypotheses.

Quantitative signal (study reference point).
K* = 5; ARI = 0.82; Silhouette_avg = 0.42; DB = 0.75.

Reproducibility & contents. This open-science package includes: the HES-118 dataset, preprocessing scripts, clustering/validation notebooks, environment specifications, and exportable figures/tables to fully reproduce the pipeline and results.

Limitations & future work. K-means can be sensitive to non-spherical clusters; a Spectral/HDBSCAN robustness line is proposed for follow-up studies.

Keywords: BE–Ce–IS; HES-118; unsupervised clustering; K-selection; stability analysis; ARI; Mahalanobis distance; PCA; UMAP; chemical families; energetic–informational space.

Files

article 16.08.2025.pdf

Files (3.6 MB)

Name Size Download all
md5:c61ab7260c85896892391eb5c10b225b
3.6 MB Preview Download

Additional details

Dates

Accepted
2025-08-18