Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published March 8, 2023 | Version v4
Dataset Open

Causal Dataset for cause-effect pairs from Tübingen repository

Description

Cause-effect is a two dimensional database with two-variable cause-effect pairs chosen from the different datasets created by Max-Planck-Institute for Biological Cybernetics in Tuebingen, Germany.

Size: 83 datasets of various sizes

Number of features: 2 in every datasets

Ground truth: yes

Type of Graph: directed

Each dataset  consists of samples of a pair of statistically dependent random variables, where one variable is known to cause the other one. The task is to identify for each pair which of the two variables is the cause and which one the effect, using the observed samples only.

Link/Origin: https://webdav.tuebingen.mpg.de/cause-effect/

The following database contains datasets selected from the Database with cause-effect pairs:

Number of dataset Variable 1 Variable 2 Origin of datasets
001 Altitude Temperature DWD dataset
002 Altitude Precipitation DWD dataset
003 Longitude Temperature DWD dataset
004 Altitude Sunshine hours DWD dataset
005 Age Length Abalone dataset
006 Age Shell weight Abalone dataset
007 Age Diameter Abalone dataset
008 Age Height Abalone dataset
009 Age Whole weight Abalone dataset
010 Age Shucked weight Abalone dataset
011 Age Viscera weight Abalone dataset
012 Age Wage per hour Census income dataset
013 Displacement Fuel consumption Auto mpg dataset
014 Horse power Fuel consumption Auto mpg dataset
015 Weight Fuel consumption Auto mpg dataset
016 Horsepower Acceleration Auto mpg dataset
017 Age Dividends from stocks Census income dataset
018 Age Concentration GA R packages MASS
019 Current duration Next interval geyser
020 Latitude Temperature DWD dataset
021 Longitude Precipitation DWD
022 Age Height arrhythmia
023 Age Weight arrhythmia
024 Age Heart rate arrhythmia
025 Cement Compressive strength concrete_data
026 Blast furnace slag Compressive strength concrete_data  
027 Fly ash Compressive strength concrete_data
028 Water Compressive strength concrete_data
029 Superplasticizer Compressive strength concrete_data
030 Coarse aggregate Compressive strength concrete_data
031 Fine aggregate Compressive strength concrete_data
032 Age   Compressive strength
033 Alcohol consumption Mean corpuscular volume liver disorders
034 Alcohol consumption Alkaline phosphotase liver disorders
035 Alcohol consumption Alanine aminotransferase liver disorders
036 Alcohol consumption Aspartate aminotransferase liver disorders
037 Alcohol consumption Gamma-glutamyl transpeptdase liver disorders
038 Age Body mass index pima indian diabetes
039 Age Serum insulin pima indian diabetes
040 Age Diastolic blood pressure pima indian diabetes
041 Age Plasma glucose concentration pima indian diabetes
042 Day of the year Temperature B.Janzing
049 Ozone concentration Temperature Bafu
050 Ozone concentration Temperature Bafu
051 Ozone concentration Temperature Bafu
056 Female life expectancy, 2000-2005 Latitude UNdata
057 Female life expectancy, 1995-2000 Latitude UNdata
058 Female life expectancy, 1990-1995 Latitude UNdata
059 Female life expectancy, 1985-1990 Latitude UNdata
060 Male life expectancy, 2000-2005 Latitude UNdata
061 Male life expectancy, 1995-2000 Latitude UNdata
062 Male life expectancy, 1990-1995 Latitude UNdata
068 Bytes sent Open http connections P. Stark & Janzing
072 Sunspots Global mean temperature sunspot data
076 Population growth Food consumption growth Food and Agriculture Organization of the United Nations
078 PPFD Net Ecosystem Productivity Moffat A.M.
079 Net Ecosystem Productivity Diffuse PPFDdif Moffat A.M.
080 Net Ecosystem Productivity Direct PPFDdir Moffat A.M.  
086 Size of apartment Monthly rent J.M. Mooij
088 Age Relative spinal bone mineral density “bone” dataset of R ElemStatLearn package
089 Mass loss Oct 2012 in % Mass loss APRIL 2012 in % Solly et al (2014). Plant and Soil, 382(1-2), 203-218
090 root decomposition Oct (forest) root decomposition Apr Solly et al (2014). Plant and Soil, 382(1-2), 203-218.
091 clay cont. in soil (forest) soil moisture Solly et al (2014). Plant and Soil, 382(1-2), 203-218.
092 organic carbon in soil (forest) clay cont. in soil (forest) Solly et al (2014). Plant and Soil, 382(1-2), 203-218.
093 precipitation runoff MOPEX (ftp://hydrology.nws.noaa.gov/pub/gcip/mopex/US_Data/Us_438_Daily/)
094 hour of day temperature S. Armagan Tarim
095 hour of day electricity load S. Armagan Tarim
096 temperature electricity load S. Armagan Tarim
097 speed at the beginning speed at the end The data has been recorded by Dominik Janzing using a ball track that has been equipped with two pairs of light barriers. The first pair measures the initial speed and the second pair the speed of a ball at some later position of the track. The units of the speeds are arbitrary and differ for both measurements (X and Y) since they are obtained by inverting the time the ball needed to pass the distance between two light barriers of one pair.
098 speed at the beginning speed at the end D. Janzing
099 language test score social-economic status family “nlschools” dataset of R MASS package
100 cycle time of CPU performance “cpus” dataset of R MASS package
101 grey value of a pixel brightness of the screen D. Janzing
102 position of a ball time for passing a track segment D. Janzing
103 position of a ball time for passing a track segment D. Janzing
106 time required for one round voltage D. Janzing
107 strength of contrast answer correct or not The data set is from a psychophysics experiments with human subjects. A screen shows tilted Gabor patches (which are patterns of stripes frequently used as stimuli in psychological experiments), either tilted to the left or to the right. The subject are asked to infer the direction, while the patches are shown with stronger or weaker contrast. The variable X describes the contrast values ranging from 0.0150 to 0.0500 in steps of 0.0025. The variable Y is a binary indicating whether the direction has been identified correctly (Y=1) or not (Y=0). For low values of the contrast the fraction of correct decisions approaches chance level (50%).
108 time for 1/6 rotation temperature This pair shows the dependence of the inverse velocity and the temperature of the heat bath of a Striling engine. The engine is driven by a cup of hot water that is put underneath.The inverse velocity is measured by the time needed for the engine’s wheel for 1/6 rotation (because the wheel has 6 radius arms). The temperature is measured by a sensor that was put into the cup. Recorded by D. Janzing

 

 

 

More information about the dataset is contained in causal_description.html file.

Reference

J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, B. Schoelkopf: “Distinguishing cause from effect using observational data: methods and benchmarks”, Journal of Machine Learning Research 17(32):1-102, 2016

 

Files

datasets.zip

Files (1.2 MB)

Name Size Download all
md5:c8609a6c3b267a9e52de6b50abbfdd8b
636.2 kB Download
md5:a1482ed003ff95f5b4783e2a3b5c662c
8.5 kB Download
md5:a53413b2b8ddbbf09295c1a55fce9993
366.1 kB Preview Download
md5:8b05a3f64928afcbc7b298730a86fa38
187.7 kB Preview Download

Additional details