Examples

bnlearn contains several examples within the library that can be used to practice with the functionalities of bnlearn.structure_learning(), bnlearn.parameter_learning() and bnlearn.inference().

Example with DataFrames

In bnlearn, there is one example dataset that can be imported; the sprinkler dataset. Note this dataset is readily one-hot, without missing values, and as such does not require any further pre-processing steps. The DAG example models (see Example DAG section) can however be converted from the model to a dataframe.

# Import dataset
df = bnlearn.import_example()
# Structure learning
model = bnlearn.structure_learning.fit(df)
# Plot
G = bnlearn.plot(model)

Example with DAG

bnlearn contains several example Directed Acyclic Graphs:
  • ‘sprinkler’ (default)

  • ‘alarm’

  • ‘andes’

  • ‘asia’

  • ‘pathfinder’

  • ‘sachs’

  • ‘miserables’

Each DAG can be loaded using the bnlearn.bnlearn.import_DAG() function. With the bnlearn.bnlearn.sampling() function a DataFrame can be created for n samples. The sprinkler DAG is a special case because it is not loaded from a bif file but created manually. Therefore, the sprinkler model can be generated with(out) a CPD by: CPD=False.

# Import dataset
DAG = bnlearn.import_DAG('sachs', CPD=True)
# plot the keys of the DAG
DAG.keys()
# dict_keys(['model', 'adjmat'])

# The model contains the BayesianModel with the CPDs.
# The adjmat contains the adjacency matrix with the relationships between the nodes.

# plot ground truth
G = bnlearn.plot(DAG)

# Sampling
df = bnlearn.sampling(DAG, n=1000)

Import from BIF

Each Bayesian DAG model that is loaded with bnlearn.bnlearn.import_DAG() is derived from a bif file. The bif file is a common format for Bayesian networks that can be used for the exchange of knowledge and experimental results in the community. More information can be found (here)[http://www.cs.washington.edu/dm/vfml/appendixes/bif.htm].

# Import dataset
DAG = bnlearn.import_DAG('filepath/to/model.bif')

Start with RAW data

Lets demonstrate by example how to process your own dataset containing mixed variables. I will demonstrate this by the titanic case. This dataset contains both continues as well as categorical variables and can easily imported using bnlearn.bnlearn.import_example(). With the function bnlearn.bnlearn.df2onehot() it can help to convert the mixed dataset towards a one-hot matrix. The settings are adjustable, but by default the unique non-zero values must be above 80% per variable, and the minimal number of samples must be at least 10 per variable.

# Load titanic dataset containing mixed variables
df_raw = bnlearn.import_example(data='titanic')
# Pre-processing of the input dataset
dfhot, dfnum = bnlearn.df2onehot(df_raw)
# Structure learning
DAG = bnlearn.structure_learning.fit(dfnum)
# Plot
G = bnlearn.plot(DAG)
_images/fig_titanic.png

From this point we can learn the parameters using the DAG and input dataframe.

# Parameter learning
model = bnlearn.parameter_learning.fit(DAG, df)

Finally, we can start making inferences. Note that the variable and evidence names should exactly match the input data (case sensitive).

# Print CPDs
bnlearn.print_CPD(model)
# Make inference
q = bnlearn.inference.fit(model, variables=['Survived'], evidence={'Sex':0, 'Pclass':1})

print(q.values)
print(q.variables)
print(q._str())

Survived

phi(Survived)

Survived(0)

0.3312

Survived(1)

0.6688

Adjacency matrix

The adjacency matrix is a important way to store relationships across variables or nodes. In graph theory, it is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph. bnlearn outputs an adjacency matrix in some functionalities. Values 0 or False indicate that nodes are not connected whereas pairs of vertices with value >0 or True are connected.

Importing a DAG

Extracting adjacency matrix from imported DAG:

import bnlearn
# Import DAG
model = bnlearn.import_DAG('sachs', verbose=0)
# adjacency matrix:
model['adjmat']

# print
print(model['adjmat'])

Reading the table from left to right, we see that gene Erk is connected to Akt in a directed manner. This indicates that Erk influences gene Ark but not the otherway arround because gene Akt does not show a edge with Erk. In this example form, there may be a connection at the “…”.

Erk

Akt

PKA

Mek

Jnk

Raf

P38

PIP3

PIP2

Plcg

Erk

False

True

False

False

False

False

False

False

False

False

Akt

False

False

False

False

False

False

False

False

False

False

PKA

True

True

False

True

True

True

True

False

False

False

Mek

True

False

False

False

False

False

False

False

False

False

Jnk

False

False

False

False

False

False

False

False

False

False

PKC

False

False

True

True

True

True

True

False

False

False

Raf

False

False

False

True

False

False

False

False

False

False

P38

False

False

False

False

False

False

False

False

False

False

PIP3

False

False

False

False

False

False

False

False

True

False

PIP2

False

False

False

False

False

False

False

False

False

False

Plcg

False

False

False

False

False

False

False

True

True

False

Structure learning

Extracting adjacency matrix after structure learning:

# Load dataframe
df = bnlearn.import_example()
# Learn structure
model = bnlearn.structure_learning.fit(df)
# adjacency matrix:
model['adjmat']

# print
print(model['adjmat'])

Reading the table from left to right we see that Cloudy is connected to Sprinkler and also to Rain in a directed manner. Sprinkler is connect to Wet_grass. Rain is connected to Wet_grass. Wet_grass is connected to nothing.

Cloudy

Sprinkler

Rain

Wet_Grass

Cloudy

False

True

True

False

Sprinkler

False

False

False

True

Rain

False

False

False

True

Wet_Grass

False

False

False

False

Parameter learning

Extracting adjacency matrix after Parameter learning:

# Load dataframe
df = bnlearn.import_example()
# Import DAG
DAG = bnlearn.import_DAG('sprinkler', CPD=False)
# Learn parameters
model = bnlearn.parameter_learning.fit(DAG, df)
# adjacency matrix:
model['adjmat']

# print
print(model['adjmat'])

Cloudy

Sprinkler

Rain

Wet_Grass

Cloudy

False

True

True

False

Sprinkler

False

False

False

True

Rain

False

False

False

True

Wet_Grass

False

False

False

False

Converting adjacency matrix into vector

# Load DAG
DAG = bnlearn.import_DAG('Sprinkler')
# Convert adjmat to vector:
vector = bnlearn.adjmat2vec(DAG['adjmat'])

source

target

weight

Cloudy

Sprinkler

True

Cloudy

Rain

True

Sprinkler

Wet_Grass

True

Rain

Wet_Grass

True

Converting vector into adjacency matrix

# Load DAG
adjmat = bnlearn.vec2adjmat(vector['source'], vector['target'])

Create a Bayesian Network, learn its parameters from data and perform the inference

Lets make an example were we have data with many measurements, and we have expert information of the relations between nodes. Our goal is to create DAG on the expert knowledge and learn the CPDs. To showcase this, I will use the sprinkler example.

Import example dataset of the sprinkler dataset.

df = bnlearn.import_example('sprinkler')
print(tabulate(df.head(), tablefmt="grid", headers="keys"))

Cloudy

Sprinkler

Rain

Wet_Grass

0

0

0

0

0

1

1

0

1

1

2

0

1

0

1

3

1

1

1

1

4

1

1

1

1

Define the network structure. This can be based on expert knowledge.

edges = [('Cloudy', 'Sprinkler'),
         ('Cloudy', 'Rain'),
         ('Sprinkler', 'Wet_Grass'),
         ('Rain', 'Wet_Grass')]

Make the actual Bayesian DAG

DAG = bnlearn.make_DAG(edges)
# [BNLEARN] Bayesian DAG created.

# Print the CPDs
bnlearn.print_CPD(DAG)
# [BNLEARN.print_CPD] No CPDs to print. Use bnlearn.plot(DAG) to make a plot.

Plot the DAG

bnlearn.plot(DAG)
_images/DAG_sprinkler.png

Parameter learning on the user-defined DAG and input data using maximumlikelihood.

DAG = bnlearn.parameter_learning.fit(DAG, df, methodtype='maximumlikelihood')

Lets print the learned CPDs:

bnlearn.print_CPD(DAG)

# [BNLEARN.print_CPD] Independencies:
# (Cloudy _|_ Wet_Grass | Rain, Sprinkler)
# (Sprinkler _|_ Rain | Cloudy)
# (Rain _|_ Sprinkler | Cloudy)
# (Wet_Grass _|_ Cloudy | Rain, Sprinkler)
# [BNLEARN.print_CPD] Nodes: ['Cloudy', 'Sprinkler', 'Rain', 'Wet_Grass']
# [BNLEARN.print_CPD] Edges: [('Cloudy', 'Sprinkler'), ('Cloudy', 'Rain'), ('Sprinkler', 'Wet_Grass'), ('Rain', 'Wet_Grass')]
CPD of Cloudy:

Cloudy(0)

0.494

Cloudy(1)

0.506

CPD of Sprinkler:

Cloudy

Cloudy(0)

Cloudy(1)

Sprinkler(0)

0.4807692307692308

0.7075098814229249

Sprinkler(1)

0.5192307692307693

0.2924901185770751

CPD of Rain:

Cloudy

Cloudy(0)

Cloudy(1)

Rain(0)

0.6518218623481782

0.33695652173913043

Rain(1)

0.3481781376518219

0.6630434782608695

CPD of Wet_Grass:

Rain

Rain(0)

Rain(0)

Rain(1)

Rain(1)

Sprinkler

Sprinkler(0)

Sprinkler(1)

Sprinkler(0)

Sprinkler(1)

Wet_Grass(0)

0.7553816046966731

0.33755274261603374

0.25588235294117645

0.37910447761194027

Wet_Grass(1)

0.2446183953033268

0.6624472573839663

0.7441176470588236

0.6208955223880597

Lets make an inference:

q1 = bnlearn.inference.fit(DAG, variables=['Wet_Grass'], evidence={'Rain':1, 'Sprinkler':0, 'Cloudy':1})

+--------------+------------------+
| Wet_Grass    |   phi(Wet_Grass) |
+==============+==================+
| Wet_Grass(0) |           0.2559 |
+--------------+------------------+
| Wet_Grass(1) |           0.7441 |
+--------------+------------------+

Print the values:

print(q1.values)
# array([0.25588235, 0.74411765])