Published January 29, 2025 | Version v1
Dataset Open

A New Bayesian Approach to Increase Measurement Accuracy Using a Precision Entropy Indicator

Description

"We believe that by accounting for the inherent uncertainty in the system during each measurement, the relationship between cause and effect can be assessed more accurately, potentially reducing the duration of research."

Short description

This dataset was created as part of a research project investigating the efficiency and learning mechanisms of a Bayesian adaptive search algorithm supported by the Imprecision Entropy Indicator (IEI) as a novel method. It includes detailed statistical results, posterior probability values, and the weighted averages of IEI across multiple simulations aimed at target localization within a defined spatial environment. Control experiments, including random search, random walk, and genetic algorithm-based approaches, were also performed to benchmark the system's performance and validate its reliability.

The task involved locating a target area centered at (100; 100) within a radius of 10 units (Research_area.png), inside a circular search space with a radius of 100 units. The search process continued until 1,000 successful target hits were achieved.

To benchmark the algorithm's performance and validate its reliability, control experiments were conducted using alternative search strategies, including random search, random walk, and genetic algorithm-based approaches. These control datasets serve as baselines, enabling comprehensive comparisons of efficiency, randomness, and convergence behavior across search methods, thereby demonstrating the effectiveness of our novel approach.

Uploaded files

  1. The first dataset contains the average IEI values, generated by randomly simulating 300 x 1 hits for 10 bins per quadrant
    (4 quadrants in total) using the Python programming language, and calculating the corresponding IEI values. This resulted in a total of 4 x 10 x 300 x 1 = 12,000 data points. The summary of the IEI values by quadrant and bin is provided in the file results_1_300.csv. The calculation of IEI values for averages is based on likelihood, using an absolute difference-based approach for the likelihood probability computation.
    IEI_Likelihood_Based_Data.zip

  2. The weighted IEI average values for likelihood calculation (Bayes formula) are provided in the file Weighted_IEI_Average_08_01_2025.xlsx

  3. This dataset contains the results of a simulated target search experiment using Bayesian posterior updates and Imprecision Entropy Indicators (IEI). Each row represents a hit during the search process, including metrics such as Shannon entropy (H), Gini index (G), average distance, angular deviation, and calculated IEI values. The dataset also includes bin-specific posterior probability updates and likelihood calculations for each iteration. The simulation explores adaptive learning and posterior penalization strategies to optimize the search efficiency. Our Bayesian adaptive searching system source code (search algorithm, 1000 target searches): IEI_Self_Learning_08_01_2025.py

    This dataset contains the results of 1,000 iterations of a successful target search simulation. The simulation runs until the target is successfully located for each iteration.
    The dataset includes further three main outputs: 
    a) Results files (results{iteration_number}.csv): Details of each hit during the search process, including entropy measures, Gini index, average distance and angle, Imprecision Entropy Indicators (IEI), coordinates, and the bin number of the hit. 
    b) Posterior updates (Pbin_all_steps_{iter_number}.csv): Tracks the posterior probability updates for all bins during the search process acrosations multiple steps.
    c) Likelihoodanalysis(likelihood_analysis_{iteration_number}.csv): Contains the calculated likelihood values for each bin at every step, based on the difference between the measured IEI and pre-defined IE bin averages.
    IEI_Self_Learning_08_01_2025.py

  4. Based on the mentioned Python source code (see point 3, Bayesian adaptive searching method with IEI values), we performed 1,000 successful target searches, and the outputs were saved in the:
    Self_learning_model_test_output.zip file.

  5. Bayesian Search (IEI) from different quadrant. This dataset contains the results of Bayesian adaptive target search simulations, including various outputs that represent the performance and analysis of the search algorithm.
    The dataset includes: 
    a) Heatmaps (Heatmap_I_Quadrant, Heatmap_II_Quadrant, Heatmap_III_Quadrant, Heatmap_IV_Quadrant): These heatmaps represent the search results and the paths taken from each quadrant during the simulations. They indicate how frequently the system selected each bin during the search process.
    b) Posterior Distributions (All_posteriors, Probability_distribution_posteriors_values, CDF_posteriors_values): Generated based on posterior values, these files track the posterior probability updates, including cumulative distribution functions (CDF) and probability distributions.
    c) Macro Summary (summary_csv_macro): This file aggregates metrics and key statistics from the simulation. It summarizes the results from the individual results.csv files.
    d) Heatmap Searching Method Documentation (Bayesian_Heatmap_Searching_Method_05_12_2024): This document visualizes the search algorithm's path, showing how frequently each bin was selected during the 1,000 successful target searches.
    e) One-Way ANOVA Analysis (Anova_analyze_dataset, One_way_Anova_analysis_results): This includes the database and SPSS calculations used to examine whether the starting quadrant influences the number of search steps required. The analysis was conducted at a 5% significance level, followed by a Games-Howell post hoc test [43] to identify which target-surrounding quadrants differed significantly in terms of the number of search steps. Results were saved in the 
    Self_learning_model_test_results.zip

  6. This dataset contains randomly generated sequences of bin selections (1-40) from a control search algorithm (random search) used to benchmark the performance of Bayesian-based methods. The process iteratively generates random numbers until a stopping condition is met (reaching target bins 1, 11, 21, or 31). This dataset serves as a baseline for analyzing the efficiency, randomness, and convergence of non-adaptive search strategies.
    The dataset includes the following:
    a) The Python source code of the random search algorithm.
    b) A file (summary_random_search.csv) containing the results of 1000 successful target hits.
    c) A heatmap visualizing the frequency of search steps for each bin, providing insight into the distribution of steps across the bins.
    Random_search.zip

  7. This dataset contains the results of a random walk search algorithm, designed as a control mechanism to benchmark adaptive search strategies (Bayesian-based methods). The random walk operates within a defined space of 40 bins, where each bin has a set of neighboring bins. The search begins from a randomly chosen starting bin and proceeds iteratively, moving to a randomly selected neighboring bin, until one of the stopping conditions is met (bins 1, 11, 21, or 31).
    The dataset provides detailed records of 1,000 random walk iterations, with the following key components:
    a)
    Individual Iteration Results: Each iteration's search path is saved in a separate CSV file (random_walk_results_<iteration>.csv), listing the sequence of steps taken and the corresponding bin at each step. b) Summary File: A combined summary of all iterations is available in random_walk_results_summary.csv, which aggregates the step-by-step data for all 1,000 random walks.
    c) Heatmap Visualization: A heatmap file is included to illustrate the frequency distribution of steps across bins, highlighting the relative visit frequencies of each bin during the random walks.
    d) Python Source Code: The Python script used to generate the random walk dataset is provided, allowing reproducibility and customization for further experiments.
    Random_walk.zip

  8. This dataset contains the results of a genetic search algorithm implemented as a control method to benchmark adaptive Bayesian-based search strategies. The algorithm operates in a 40-bin search space with predefined target bins (1, 11, 21, 31) and evolves solutions through random initialization, selection, crossover, and mutation over 1000 successful runs.
    Dataset Components:
    a) Run Results: Individual run data is stored in separate files (genetic_algorithm_run_<run_id>.csv), detailing: Generation: The generation number. Fitness: The fitness score of the solution. Steps: The path length in bins. Solution: The sequence of bins visited.
    b) Summary File: summary.csv consolidates the best solutions from all runs, including their fitness scores, path lengths, and sequences.
    c)
    All Steps File: summary_all_steps.csv records all bins visited during the runs for distribution analysis.
    d) A heatmap was also generated for the genetic search algorithm, illustrating the frequency of bins chosen during the search process as a representation of the search pathways.
    Genetic_search_algorithm.zip

Technical Information

The dataset files have been compressed into a standard ZIP archive using Total Commander (version 9.50). The ZIP format ensures compatibility across various operating systems and tools.

The XLSX files were created using Microsoft Excel Standard 2019 (Version 1808, Build 10416.20027)

The Python program was developed using Visual Studio Code (Version 1.96.2, user setup), with the following environment details: Commit fabd6a6b30b49f79a7aba0f2ad9df9b399473380f, built on 2024-12-19. The Electron version is 32.6, and the runtime environment includes Chromium 128.0.6263.186, Node.js 20.18.1, and V8 12.8.374.38-electron.0. The operating system is Windows NT x64 10.0.19045.

The statistical analysis included in this dataset was partially conducted using IBM SPSS Statistics, Version 29.0.1.0 

The CSV files in this dataset were created following European standards, using a semicolon (;) as the delimiter instead of a comma, encoded in UTF-8 to ensure compatibility with a wide range of software and support for special characters, and applying the European notation for decimal numbers with a comma (,) as the decimal separator to ensure easy readability in European spreadsheet applications such as Microsoft Excel.

The Python source codes creates a folder on the C:\ drive, so please ensure to review the source code and verify that the necessary permissions are available to create the folder and files before running.

Corresponding author: 
Peter Domjan
domjan.peter@phd.semmelweis.hu
Semmelweis University, Doctoral College, PhD student

This dataset serves as supplementary files supporting our research article.

The preprint is available at the following link: https://www.researchsquare.com/article/rs-5926277/v1

Files

Genetic_search_algorithm.zip

Files (561.3 MB)

Name Size Download all
md5:69e049a4905fc194338f4c3d65b26786
1.8 MB Preview Download
md5:0a7f6b2ea025ad72f4fedb378c250774
501.6 MB Preview Download
md5:6c183b96369a075939671e915478a80f
20.9 kB Download
md5:3e12ca7d77bec5116f3269a0812dc65f
126.9 kB Preview Download
md5:04961205ed51c71f8e7f9f0cc6e52486
924.8 kB Preview Download
md5:ff24804227a4062af04a772e106672e5
279.3 kB Preview Download
md5:e47f2a1a42051fe4b117794053a75f27
33.2 MB Preview Download
md5:58970ddc5da9fe3a66629cbb6fb8dd56
23.4 MB Preview Download
md5:c3373d1e5c2f0b362e0377473adb845d
16.7 kB Download

Additional details

Dates

Submitted
2025-01-29
Dataset 1.0

Software

Programming language
Python
Development Status
Active