Causal Dataset for cause-effect pairs from Tübingen repository
Description
Cause-effect is a two dimensional database with two-variable cause-effect pairs chosen from the different datasets created by Max-Planck-Institute for Biological Cybernetics in Tuebingen, Germany.
Size: 83 datasets of various sizes
Number of features: 2 in every datasets
Ground truth: yes
Type of Graph: directed
Each dataset consists of samples of a pair of statistically dependent random variables, where one variable is known to cause the other one. The task is to identify for each pair which of the two variables is the cause and which one the effect, using the observed samples only.
Link/Origin: https://webdav.tuebingen.mpg.de/cause-effect/
The following database contains datasets selected from the Database with cause-effect pairs:
Number of dataset | Variable 1 | Variable 2 | Origin of datasets |
---|---|---|---|
001 | Altitude | Temperature | DWD dataset |
002 | Altitude | Precipitation | DWD dataset |
003 | Longitude | Temperature | DWD dataset |
004 | Altitude | Sunshine hours | DWD dataset |
005 | Age | Length | Abalone dataset |
006 | Age | Shell weight | Abalone dataset |
007 | Age | Diameter | Abalone dataset |
008 | Age | Height | Abalone dataset |
009 | Age | Whole weight | Abalone dataset |
010 | Age | Shucked weight | Abalone dataset |
011 | Age | Viscera weight | Abalone dataset |
012 | Age | Wage per hour | Census income dataset |
013 | Displacement | Fuel consumption | Auto mpg dataset |
014 | Horse power | Fuel consumption | Auto mpg dataset |
015 | Weight | Fuel consumption | Auto mpg dataset |
016 | Horsepower | Acceleration | Auto mpg dataset |
017 | Age | Dividends from stocks | Census income dataset |
018 | Age | Concentration GA | R packages MASS |
019 | Current duration | Next interval | geyser |
020 | Latitude | Temperature | DWD dataset |
021 | Longitude | Precipitation | DWD |
022 | Age | Height | arrhythmia |
023 | Age | Weight | arrhythmia |
024 | Age | Heart rate | arrhythmia |
025 | Cement | Compressive strength | concrete_data |
026 | Blast furnace slag | Compressive strength concrete_data | |
027 | Fly ash | Compressive strength | concrete_data |
028 | Water | Compressive strength | concrete_data |
029 | Superplasticizer | Compressive strength | concrete_data |
030 | Coarse aggregate | Compressive strength | concrete_data |
031 | Fine aggregate | Compressive strength | concrete_data |
032 | Age | Compressive strength | |
033 | Alcohol consumption | Mean corpuscular volume | liver disorders |
034 | Alcohol consumption | Alkaline phosphotase | liver disorders |
035 | Alcohol consumption | Alanine aminotransferase | liver disorders |
036 | Alcohol consumption | Aspartate aminotransferase | liver disorders |
037 | Alcohol consumption | Gamma-glutamyl transpeptdase | liver disorders |
038 | Age | Body mass index | pima indian diabetes |
039 | Age | Serum insulin | pima indian diabetes |
040 | Age | Diastolic blood pressure | pima indian diabetes |
041 | Age | Plasma glucose concentration | pima indian diabetes |
042 | Day of the year | Temperature | B.Janzing |
049 | Ozone concentration | Temperature | Bafu |
050 | Ozone concentration | Temperature | Bafu |
051 | Ozone concentration | Temperature | Bafu |
056 | Female life expectancy, 2000-2005 | Latitude | UNdata |
057 | Female life expectancy, 1995-2000 | Latitude | UNdata |
058 | Female life expectancy, 1990-1995 | Latitude | UNdata |
059 | Female life expectancy, 1985-1990 | Latitude | UNdata |
060 | Male life expectancy, 2000-2005 | Latitude | UNdata |
061 | Male life expectancy, 1995-2000 | Latitude | UNdata |
062 | Male life expectancy, 1990-1995 | Latitude | UNdata |
068 | Bytes sent | Open http connections | P. Stark & Janzing |
072 | Sunspots | Global mean temperature | sunspot data |
076 | Population growth | Food consumption growth | Food and Agriculture Organization of the United Nations |
078 | PPFD | Net Ecosystem Productivity | Moffat A.M. |
079 | Net Ecosystem Productivity | Diffuse PPFDdif | Moffat A.M. |
080 | Net Ecosystem Productivity Direct PPFDdir | Moffat A.M. | |
086 | Size of apartment | Monthly rent | J.M. Mooij |
088 | Age | Relative spinal bone mineral density | “bone” dataset of R ElemStatLearn package |
089 | Mass loss Oct 2012 in % | Mass loss APRIL 2012 in % | Solly et al (2014). Plant and Soil, 382(1-2), 203-218 |
090 | root decomposition Oct | (forest) root decomposition Apr | Solly et al (2014). Plant and Soil, 382(1-2), 203-218. |
091 | clay cont. in soil | (forest) soil moisture | Solly et al (2014). Plant and Soil, 382(1-2), 203-218. |
092 | organic carbon in soil (forest) | clay cont. in soil (forest) | Solly et al (2014). Plant and Soil, 382(1-2), 203-218. |
093 | precipitation | runoff | MOPEX (ftp://hydrology.nws.noaa.gov/pub/gcip/mopex/US_Data/Us_438_Daily/) |
094 | hour of day | temperature | S. Armagan Tarim |
095 | hour of day | electricity load | S. Armagan Tarim |
096 | temperature | electricity load | S. Armagan Tarim |
097 | speed at the beginning | speed at the end | The data has been recorded by Dominik Janzing using a ball track that has been equipped with two pairs of light barriers. The first pair measures the initial speed and the second pair the speed of a ball at some later position of the track. The units of the speeds are arbitrary and differ for both measurements (X and Y) since they are obtained by inverting the time the ball needed to pass the distance between two light barriers of one pair. |
098 | speed at the beginning | speed at the end | D. Janzing |
099 | language test score | social-economic status family | “nlschools” dataset of R MASS package |
100 | cycle time of CPU | performance | “cpus” dataset of R MASS package |
101 | grey value of a pixel | brightness of the screen | D. Janzing |
102 | position of a ball | time for passing a track segment | D. Janzing |
103 | position of a ball | time for passing a track segment | D. Janzing |
106 | time required for one round | voltage | D. Janzing |
107 | strength of contrast | answer correct or not | The data set is from a psychophysics experiments with human subjects. A screen shows tilted Gabor patches (which are patterns of stripes frequently used as stimuli in psychological experiments), either tilted to the left or to the right. The subject are asked to infer the direction, while the patches are shown with stronger or weaker contrast. The variable X describes the contrast values ranging from 0.0150 to 0.0500 in steps of 0.0025. The variable Y is a binary indicating whether the direction has been identified correctly (Y=1) or not (Y=0). For low values of the contrast the fraction of correct decisions approaches chance level (50%). |
108 | time for 1/6 rotation | temperature | This pair shows the dependence of the inverse velocity and the temperature of the heat bath of a Striling engine. The engine is driven by a cup of hot water that is put underneath.The inverse velocity is measured by the time needed for the engine’s wheel for 1/6 rotation (because the wheel has 6 radius arms). The temperature is measured by a sensor that was put into the cup. Recorded by D. Janzing |
More information about the dataset is contained in causal_description.html file.
Reference
J. M. Mooij, J. Peters, D. Janzing, J. Zscheischler, B. Schoelkopf: “Distinguishing cause from effect using observational data: methods and benchmarks”, Journal of Machine Learning Research 17(32):1-102, 2016