Published January 1988 | Version v2
Dataset Open

Asia Lung Diseases

Description

This synthetic datset is about lung diseases and visits to Asia. It was introduced in Lauritzen and Spiegelhalter (1988).

 

Task: The dataset can be used to study causal discovery algorithms.

 

Summary: 

  • Size of dataset: 5,000 x 6
  • Task: Causal Discovery Problem
  • Data Type: Binary Data
  • Dataset Scope: Standalone Dataset
  • Ground Truth: Known Graph
  • Temporal Structure: Static Data
  • License: CC0 (generated for bnlearn)
  • Missing Values: No Missing Values

 

Missingness Statement: There are no missing values.

 

Features:

  • D: Dyspnoea (yes / no)
  • T: Tuberculosis (yes / no)
  • L: Lung cancer (yes / no)
  • B: Bronchitis (yes / no)
  • A: Visit to Asia (yes / no)
  • S: Smoking (yes / no)
  • X: Chest X-ray (yes / no)
  • E: Tuberculosis versus lung cancer/bronchitis (yes / no)

 

Files:

  • asia.csv: dataset
  • ground_truth.csv: DAG used for data generation (Lauritzen and Spiegelhalter (1988)).
  • asia.bif: Bayesian Network from (Scutari (2010), License CC BY-SA 3.0). The network was used for data generation in Lauritzen and Spiegelhalter (1988).

Files

asia.csv

Files (209.9 kB)

Name Size Download all
md5:02fd219b796829a56f6fee1e90b97d83
1.1 kB Download
md5:279c59b8dde3b0a8957ab362e4d469d3
208.8 kB Preview Download
md5:2b9670a86cc591ecb9e4babcbc403e2d
50 Bytes Preview Download

Additional details

Related works

Is documented by
10.1007/978-1-4757-3502-4_6 (DOI)

References

  • Lauritzen S, Spiegelhalter D (1988). "Local Computation with Probabilities on Graphical Structures and their Application to Expert Systems (with discussion)." Journal of the Royal Statistical Society: Series B, 50(2):157–224.
  • Scutari M (2010). "Learning Bayesian Networks with the bnlearn R Package." Journal of Statistical Software, 35(3), 1–22. doi:10.18637/jss.v035.i03