Examples¶
bnlearn contains several examples within the library that can be used to practice with the functionalities of bnlearn.structure_learning(), bnlearn.parameter_learning() and bnlearn.inference().
Example with DataFrames¶
In bnlearn, there is one example dataset that can be imported; the sprinkler dataset. Note this dataset is readily one-hot, without missing values, and as such does not require any further pre-processing steps. The DAG example models (see Example DAG section) can however be converted from the model to a dataframe.
# Import dataset
df = bnlearn.import_example()
# Structure learning
model = bnlearn.structure_learning.fit(df)
# Plot
G = bnlearn.plot(model)
Example with DAG¶
bnlearncontains several example Directed Acyclic Graphs:‘sprinkler’ (default)
‘alarm’
‘andes’
‘asia’
‘pathfinder’
‘sachs’
‘miserables’
Each DAG can be loaded using the bnlearn.bnlearn.import_DAG() function. With the bnlearn.bnlearn.sampling() function a DataFrame can be created for n samples.
The sprinkler DAG is a special case because it is not loaded from a bif file but created manually. Therefore, the sprinkler model can be generated with(out) a CPD by: CPD=False.
# Import dataset
DAG = bnlearn.import_DAG('sachs', CPD=True)
# plot the keys of the DAG
DAG.keys()
# dict_keys(['model', 'adjmat'])
# The model contains the BayesianModel with the CPDs.
# The adjmat contains the adjacency matrix with the relationships between the nodes.
# plot ground truth
G = bnlearn.plot(DAG)
# Sampling
df = bnlearn.sampling(DAG, n=1000)
Import from BIF¶
Each Bayesian DAG model that is loaded with bnlearn.bnlearn.import_DAG() is derived from a bif file. The bif file is a common format for Bayesian networks that can be used for the exchange of knowledge and experimental results in the community. More information can be found (here)[http://www.cs.washington.edu/dm/vfml/appendixes/bif.htm].
# Import dataset
DAG = bnlearn.import_DAG('filepath/to/model.bif')
Start with RAW data¶
Lets demonstrate by example how to process your own dataset containing mixed variables. I will demonstrate this by the titanic case. This dataset contains both continues as well as categorical variables and can easily imported using bnlearn.bnlearn.import_example().
With the function bnlearn.bnlearn.df2onehot() it can help to convert the mixed dataset towards a one-hot matrix. The settings are adjustable, but by default the unique non-zero values must be above 80% per variable, and the minimal number of samples must be at least 10 per variable.
# Load titanic dataset containing mixed variables
df_raw = bnlearn.import_example(data='titanic')
# Pre-processing of the input dataset
dfhot, dfnum = bnlearn.df2onehot(df_raw)
# Structure learning
DAG = bnlearn.structure_learning.fit(dfnum)
# Plot
G = bnlearn.plot(DAG)
From this point we can learn the parameters using the DAG and input dataframe.
# Parameter learning
model = bnlearn.parameter_learning.fit(DAG, df)
Finally, we can start making inferences. Note that the variable and evidence names should exactly match the input data (case sensitive).
# Print CPDs
bnlearn.print_CPD(model)
# Make inference
q = bnlearn.inference.fit(model, variables=['Survived'], evidence={'Sex':0, 'Pclass':1})
print(q.values)
print(q.variables)
print(q._str())
Survived |
phi(Survived) |
|---|---|
Survived(0) |
0.3312 |
Survived(1) |
0.6688 |
Adjacency matrix¶
The adjacency matrix is a important way to store relationships across variables or nodes.
In graph theory, it is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
bnlearn outputs an adjacency matrix in some functionalities. Values 0 or False indicate that nodes are not connected whereas pairs of vertices with value >0 or True are connected.
Importing a DAG
Extracting adjacency matrix from imported DAG:
import bnlearn
# Import DAG
model = bnlearn.import_DAG('sachs', verbose=0)
# adjacency matrix:
model['adjmat']
# print
print(model['adjmat'])
Reading the table from left to right, we see that gene Erk is connected to Akt in a directed manner. This indicates that Erk influences gene Ark but not the otherway arround because gene Akt does not show a edge with Erk. In this example form, there may be a connection at the “…”.
Erk |
Akt |
PKA |
Mek |
Jnk |
… |
Raf |
P38 |
PIP3 |
PIP2 |
Plcg |
|
Erk |
False |
True |
False |
False |
False |
… |
False |
False |
False |
False |
False |
Akt |
False |
False |
False |
False |
False |
… |
False |
False |
False |
False |
False |
PKA |
True |
True |
False |
True |
True |
… |
True |
True |
False |
False |
False |
Mek |
True |
False |
False |
False |
False |
… |
False |
False |
False |
False |
False |
Jnk |
False |
False |
False |
False |
False |
… |
False |
False |
False |
False |
False |
PKC |
False |
False |
True |
True |
True |
… |
True |
True |
False |
False |
False |
Raf |
False |
False |
False |
True |
False |
… |
False |
False |
False |
False |
False |
P38 |
False |
False |
False |
False |
False |
… |
False |
False |
False |
False |
False |
PIP3 |
False |
False |
False |
False |
False |
… |
False |
False |
False |
True |
False |
PIP2 |
False |
False |
False |
False |
False |
… |
False |
False |
False |
False |
False |
Plcg |
False |
False |
False |
False |
False |
… |
False |
False |
True |
True |
False |
Structure learning
Extracting adjacency matrix after structure learning:
# Load dataframe
df = bnlearn.import_example()
# Learn structure
model = bnlearn.structure_learning.fit(df)
# adjacency matrix:
model['adjmat']
# print
print(model['adjmat'])
Reading the table from left to right we see that Cloudy is connected to Sprinkler and also to Rain in a directed manner. Sprinkler is connect to Wet_grass. Rain is connected to Wet_grass. Wet_grass is connected to nothing.
Cloudy |
Sprinkler |
Rain |
Wet_Grass |
|
|---|---|---|---|---|
Cloudy |
False |
True |
True |
False |
Sprinkler |
False |
False |
False |
True |
Rain |
False |
False |
False |
True |
Wet_Grass |
False |
False |
False |
False |
Parameter learning
Extracting adjacency matrix after Parameter learning:
# Load dataframe
df = bnlearn.import_example()
# Import DAG
DAG = bnlearn.import_DAG('sprinkler', CPD=False)
# Learn parameters
model = bnlearn.parameter_learning.fit(DAG, df)
# adjacency matrix:
model['adjmat']
# print
print(model['adjmat'])
Cloudy |
Sprinkler |
Rain |
Wet_Grass |
|
|---|---|---|---|---|
Cloudy |
False |
True |
True |
False |
Sprinkler |
False |
False |
False |
True |
Rain |
False |
False |
False |
True |
Wet_Grass |
False |
False |
False |
False |
Converting adjacency matrix into vector
# Load DAG
DAG = bnlearn.import_DAG('Sprinkler')
# Convert adjmat to vector:
vector = bnlearn.adjmat2vec(DAG['adjmat'])
source |
target |
weight |
Cloudy |
Sprinkler |
True |
Cloudy |
Rain |
True |
Sprinkler |
Wet_Grass |
True |
Rain |
Wet_Grass |
True |
Converting vector into adjacency matrix
# Load DAG
adjmat = bnlearn.vec2adjmat(vector['source'], vector['target'])
Create a Bayesian Network, learn its parameters from data and perform the inference¶
Lets make an example were we have data with many measurements, and we have expert information of the relations between nodes. Our goal is to create DAG on the expert knowledge and learn the CPDs. To showcase this, I will use the sprinkler example.
Import example dataset of the sprinkler dataset.
df = bnlearn.import_example('sprinkler')
print(tabulate(df.head(), tablefmt="grid", headers="keys"))
Cloudy |
Sprinkler |
Rain |
Wet_Grass |
|
|---|---|---|---|---|
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
1 |
1 |
2 |
0 |
1 |
0 |
1 |
3 |
1 |
1 |
1 |
1 |
4 |
1 |
1 |
1 |
1 |
… |
… |
… |
… |
Define the network structure. This can be based on expert knowledge.
edges = [('Cloudy', 'Sprinkler'),
('Cloudy', 'Rain'),
('Sprinkler', 'Wet_Grass'),
('Rain', 'Wet_Grass')]
Make the actual Bayesian DAG
DAG = bnlearn.make_DAG(edges)
# [BNLEARN] Bayesian DAG created.
# Print the CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.
Plot the DAG
bnlearn.plot(DAG)
Parameter learning on the user-defined DAG and input data using maximumlikelihood.
DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood')
Lets print the learned CPDs:
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] Independencies:
# (Cloudy _|_ Wet_Grass | Rain, Sprinkler)
# (Sprinkler _|_ Rain | Cloudy)
# (Rain _|_ Sprinkler | Cloudy)
# (Wet_Grass _|_ Cloudy | Rain, Sprinkler)
# [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass']
# [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')]
- CPD of Cloudy:
Cloudy(0)
0.494
Cloudy(1)
0.506
- CPD of Sprinkler:
Cloudy
Cloudy(0)
Cloudy(1)
Sprinkler(0)
0.4807692307692308
0.7075098814229249
Sprinkler(1)
0.5192307692307693
0.2924901185770751
- CPD of Rain:
Cloudy
Cloudy(0)
Cloudy(1)
Rain(0)
0.6518218623481782
0.33695652173913043
Rain(1)
0.3481781376518219
0.6630434782608695
- CPD of Wet_Grass:
Rain
Rain(0)
Rain(0)
Rain(1)
Rain(1)
Sprinkler
Sprinkler(0)
Sprinkler(1)
Sprinkler(0)
Sprinkler(1)
Wet_Grass(0)
0.7553816046966731
0.33755274261603374
0.25588235294117645
0.37910447761194027
Wet_Grass(1)
0.2446183953033268
0.6624472573839663
0.7441176470588236
0.6208955223880597
Lets make an inference:
q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1})
+--------------+------------------+
| Wet_Grass | phi(Wet_Grass) |
+==============+==================+
| Wet_Grass(0) | 0.2559 |
+--------------+------------------+
| Wet_Grass(1) | 0.7441 |
+--------------+------------------+
Print the values:
print(q1.values)
# array([0.25588235, 0.74411765])