There is a newer version of the record available.

Published February 25, 2023 | Version v1

bnlearn datasets

Description

This collection consists of 5 structure learning datasets from the Bayesian Network Repository (Scutari, 2010).

 

Task: The dataset collection can be used to study causal discovery algorithms.

 

Summary: 

  • Size of collection: 5 datasets with 3 - 56 columns of various sizes
  • Task: Causal Discovery
  • Data Type: Discrete
  • Dataset Scope: Collection
  • Ground Truth: Known / Estimated
  • Temporal Structure: No
  • License: TBD
  • Missing Values: No

 

Missingness Statement: There are no missing values.

 

Collection: 

The alarm dataset contains the following 37 variables:

  • CVP (central venous pressure): a three-level factor with levels LOW, NORMAL and HIGH.
  • PCWP (pulmonary capillary wedge pressure): a three-level factor with levels LOW, NORMAL and HIGH.
  • HIST (history): a two-level factor with levels TRUE and FALSE.
  • TPR (total peripheral resistance): a three-level factor with levels LOW, NORMAL and HIGH.
  • ... (33 more variables, see the corresponding .html file)

 

The binary synthetic asia dataset:

  • D (dyspnoea), a two-level factor with levels yes and no.
  • T (tuberculosis), a two-level factor with levels yes and no.
  • L (lung cancer), a two-level factor with levels yes and no.
  • B (bronchitis), a two-level factor with levels yes and no.
  • A(visit to Asia), a two-level factor with levels yes and no.
  • S (smoking), a two-level factor with levels yes and no.
  • X (chest X-ray), a two-level factor with levels yes and no.
  • E (tuberculosis versus lung cancer/bronchitis), a two-level factor with levels yes and no.

 

The binary coronary dataset:

  • Smoking (smoking): a two-level factor with levels no and yes.
  • M. Work (strenuous mental work): a two-level factor with levels no and yes.
  • P. Work (strenuous physical work): a two-level factor with levels no and yes.
  • Pressure (systolic blood pressure): a two-level factor with levels <140 and >140.
  • Proteins (ratio of beta and alpha lipoproteins): a two-level factor with levels <3 and >3.
  • Family (family anamnesis of coronary heart disease): a two-level factor with levels neg and pos.

 

The hailfinder dataset contains the following 56 variables:

  • N07muVerMo (10.7mu vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • SubjVertMo (subjective judgment of vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • QGVertMotion (quasigeostrophic vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • CombVerMo (combined vertical motion): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • AreaMesoALS (area of meso-alpha): a four-level factor with levels StrongUp, WeakUp, Neutral and Down.
  • SatContMoist (satellite contribution to moisture): a four-level factor with levels VeryWet, Wet, Neutral and Dry.
  • ... (49 more variables are in the correspondent .html file)

 

The lizards dataset contains the following 3 variables:

  • Species (the species of the lizard): a two-level factor with levels Sagrei and Distichus.
  • Height (perch height): a two-level factor with levels high (greater than 4.75 feet) and low (lesser or equal to 4.75 feet).
  • Diameter (perch diameter): a two-level factor with levels narrow (greater than 4 inches) and wide (lesser or equal to 4 inches).

Files

bnlearn_data.zip

Files (2.1 MB)

Name Size Download all
md5:f123ea701227cfd8a43996183b7c5279
2.1 MB Preview Download

Additional details

Related works

Is documented by
Book chapter: 10.1007/978-1-4757-3502-4_6 (DOI)

References

  • Elidan, G. Bayesian Network Repository. (2001), https://www.cs.huji.ac.il/w~galel/Repository/
  • Beinlich I, Suermondt HJ, Chavez RM, Cooper GF (1989). "The ALARM Monitoring System: A Case Study with Two Probabilistic Inference Techniques for Belief Networks". Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, 247–256.
  • Scutari M (2010). "Learning Bayesian Networks with the bnlearn R Package." Journal of Statistical Software, 35(3), 1–22. doi:10.18637/jss.v035.i03