bnlearn.bnlearn

Bayesian techniques for structure learning, parameter learning, inference and sampling.

bnlearn.bnlearn.adjmat2dict(adjmat)

Convert adjacency matrix to dict.

Parameters

adjmat (pd.DataFrame) – Adjacency matrix.

Returns

graph – Graph.

Return type

dict

bnlearn.bnlearn.adjmat2vec(adjmat, min_weight=1)

Convert adjacency matrix into vector with source and target.

Parameters
  • adjmat (pd.DataFrame()) – Adjacency matrix.

  • min_weight (float) – edges are returned with a minimum weight.

Returns

nodes that are connected based on source and target

Return type

pd.DataFrame()

Examples

>>> import bnlearn as bn
>>> source=['Cloudy','Cloudy','Sprinkler','Rain']
>>> target=['Sprinkler','Rain','Wet_Grass','Wet_Grass']
>>> adjmat = vec2adjmat(source, target)
>>> vector = bn.adjmat2vec(adjmat)
bnlearn.bnlearn.compare_networks(model_1, model_2, pos=None, showfig=True, figsize=(15, 8), verbose=3)

Compare networks of two models.

Parameters
  • model_1 (dict) – Results of model 1.

  • model_2 (dict) – Results of model 2.

  • pos (graph, optional) – Coordinates of the network. If there are provided, the same structure will be used to plot the network.. The default is None.

  • showfig (bool, optional) – plot figure. The default is True.

  • figsize (tuple, optional) – Figure size.. The default is (15,8).

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

scores : Score of differences between the two input models. adjmat_diff : Adjacency matrix depicting the differences between the two input models.

Return type

tuple containing (scores, adjmat_diff)

bnlearn.bnlearn.dag2adjmat(model, verbose=3)

Convert model into adjacency matrix.

Parameters
  • model (object) – Model object.

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

adjacency matrix.

Return type

pd.DataFrame

Examples

>>> import bnlearn as bn
>>> # Load DAG
>>> DAG = bn.import_DAG('Sprinkler')
>>> # Extract edges from model and store in adjacency matrix
>>> adjmat=bn.dag2adjmat(DAG['model'])
bnlearn.bnlearn.df2onehot(df, y_min=10, perc_min_num=0.8, dtypes='pandas', excl_background=None, verbose=3)

Convert dataframe to one-hot matrix.

Parameters
  • df (pd.DataFrame()) – Input dataframe for which the rows are the features, and colums are the samples.

  • dtypes (list of str or 'pandas', optional) – Representation of the columns in the form of [‘cat’,’num’]. By default the dtype is determiend based on the pandas dataframe.

  • y_min (int [0..len(y)], optional) – Minimal number of sampels that must be present in a group. All groups with less then y_min samples are labeled as _other_ and are not used in the enriching model. The default is None.

  • perc_min_num (float [None, 0..1], optional) – Force column (int or float) to be numerical if unique non-zero values are above percentage. The default is None. Alternative can be 0.8

  • verbose (int, optional) – Print message to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

One-hot dataframe.

Return type

pd.DataFrame()

bnlearn.bnlearn.get_edge_properties(model, color='#000000', weight=1, minscale=1, maxscale=10, verbose=3)

Collect edge properties.

Parameters
  • model (dict) – dict containing (initialized) model.

  • color (str, (Default: '#000000')) – The default color of the edges.

  • weight (float, (Default: 1)) – The default weight of the edges.

  • minscale (float, (Default: 1)) – The minimum weight of the edge in case of test statisics are used.

  • maxscale (float, (Default: 10)) – The maximum weight of the edge in case of test statisics are used.

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

Edge properties.

Return type

dict.

Examples

>>> # Example 1:
>>> import bnlearn as bn
>>> edges = [('A', 'B'), ('A', 'C'), ('A', 'D')]
>>> # Create DAG and store in model
>>> model = bn.make_DAG(edges)
>>> edge_properties = bn.get_edge_properties(model)
>>> # Adjust the properties
>>> edge_properties[('A', 'B')]['weight']=10
>>> edge_properties[('A', 'B')]['color']='#8A0707'
>>> # Make plot
>>> bn.plot(model, interactive=False, edge_properties=edge_properties)
>>>
>>> # Example 2:
>>>  # Load asia DAG
>>> df = bn.import_example(data='asia')
>>> # Structure learning of sampled dataset
>>> model = bn.structure_learning.fit(df)
>>> # Compute edge weights based on chi_square test statistic
>>> model = bn.independence_test(model, df, test='chi_square')
>>> # Get the edge properties
>>> edge_properties = bn.get_edge_properties(model)
>>> # Make adjustments
>>> edge_properties[('tub', 'either')]['color']='#8A0707'
>>> # Make plot
>>> bn.plot(model, interactive=True, edge_properties=edge_properties)
bnlearn.bnlearn.get_node_properties(model, node_color='#1f456e', node_size=None, verbose=3)

Collect node properties.

Parameters
  • model (dict) – dict containing (initialized) model.

  • node_color (str, (Default: '#000000')) – The default color of the edges.

  • node_size (float, (Default: 1)) – The default weight of the edges.

  • 3. (Print progress to screen. The default is) – 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

Node properties.

Return type

dict.

Examples

>>> import bnlearn as bn
>>> edges = [('A', 'B'), ('A', 'C'), ('A', 'D')]
>>> # Create DAG and store in model
>>> model = bn.make_DAG(edges)
>>> node_properties = bn.get_node_properties(model)
>>> # Adjust the properties
>>> node_properties['A']['node_size']=2000
>>> node_properties['A']['node_color']='#000000'
>>> # Make plot
>>> bn.plot(model, interactive=False, node_properties=node_properties)
>>>
>>> # Example: Specify all nodes
>>> node_properties = bn.get_node_properties(model, node_size=1000, node_color='#000000')
>>> bn.plot(model, interactive=False, node_properties=node_properties)
bnlearn.bnlearn.import_DAG(filepath='sprinkler', CPD=True, checkmodel=True, verbose=3)

Import Directed Acyclic Graph.

Parameters
  • filepath (str, (default: sprinkler)) – Pre-defined examples are depicted below, or provide the absolute file path to the .bif model file.. The default is ‘sprinkler’. ‘sprinkler’, ‘alarm’, ‘andes’, ‘asia’, ‘sachs’, ‘filepath/to/model.bif’,

  • CPD (bool, optional) – Directed Acyclic Graph (DAG). The default is True.

  • checkmodel (bool) – Check the validity of the model. The default is True

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

model : BayesianNetwork adjmat : Adjacency matrix

Return type

dict containing model and adjmat.

Examples

>>> import bnlearn as bn
>>> model = bn.import_DAG('sprinkler')
>>> bn.plot(model)
bnlearn.bnlearn.import_example(data='sprinkler', n=10000, verbose=3)

Load example dataset.

Parameters
  • data (str, (default: sprinkler)) – Pre-defined examples. ‘titanic’, ‘sprinkler’, ‘alarm’, ‘andes’, ‘asia’, ‘sachs’, ‘water’, ‘random’, ‘stormofswords’

  • n (int, optional) – Number of samples to generate. The default is 1000.

  • verbose (int, (default: 3)) – Print progress to screen. 0: None, 1: Error, 2: Warning, 3: Info, 4: Debug, 5: Trace

Returns

df

Return type

pd.DataFrame()

bnlearn.bnlearn.independence_test(model, df, test='chi_square', alpha=0.05, prune=False, verbose=3)

Compute edge strength using test statistic.

Compute the edge strength using a statistical test of independence based using the model structure (DAG) and the data. For the pairs in the DAG (either by structure learning or user-defined), an statistical test is performed. Any two variables are associated if the test’s p-value < significance_level.

Parameters
  • model (Instance of bnlearn.structure_learning.) – The (learned) model which needs to be tested.

  • df (pandas.DataFrame instance) – The dataset against which to test the model structure.

  • test (str or function) –

    The statistical test to compute associations.
    • chi_square

    • g_sq

    • log_likelihood

    • freeman_tuckey

    • modified_log_likelihood

    • neyman

    • cressie_read

  • alpha (float) – A value between 0 and 1. If p_value < significance_level, the variables are considered uncorrelated.

  • prune (bool (default: False)) – True: Keep only edges that are significant (<=alpha) based on the independence test.

Returns

df – The dataset against which to test the model structure.

Return type

pandas.DataFrame instance

Examples

>>> import bnlearn as bn
>>> df = bn.import_example(data='asia')
>>> # Structure learning of sampled dataset
>>> model = bn.structure_learning.fit(df)
>>> # Compute arc strength
>>> model = bn.independence_test(model, df, test='chi_square')
>>> print(model['independence_test'])
bnlearn.bnlearn.load(filepath='bnlearn_model.pkl', verbose=3)

Load learned model.

Parameters
  • filepath (str) – Pathname to stored pickle files.

  • verbose (int, optional) – Show message. A higher number gives more information. The default is 3.

Return type

Object.

bnlearn.bnlearn.make_DAG(DAG, CPD=None, methodtype='bayes', checkmodel=True, verbose=3)

Create Directed Acyclic Graph based on list.

Parameters
  • DAG (list) – list containing source and target in the form of [(‘A’,’B’), (‘B’,’C’)].

  • CPD (list, array-like) – Containing TabularCPD for each node.

  • methodtype (str (default: 'bayes')) –

    • ‘bayes’: Bayesian model

    • ’nb’ or ‘naivebayes’: Special case of Bayesian Model where the only edges in the model are from the feature variables to the dependent variable. Or in other words, each tuple should start with the same variable name such as: edges = [(‘A’, ‘B’), (‘A’, ‘C’), (‘A’, ‘D’)]

  • checkmodel (bool) – Check the validity of the model. The default is True

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

  • ‘adjmat’: Adjacency matrix

  • ’model’: pgmpy.models

  • ’methodtype’: methodtype

  • ’model_edges’: Edges

Return type

dict keys

Examples

>>> import bnlearn as bn
>>> edges = [('A', 'B'), ('A', 'C'), ('A', 'D')]
>>> DAG = bn.make_DAG(edges, methodtype='naivebayes')
>>> bn.plot(DAG)
bnlearn.bnlearn.plot(model, pos=None, scale=1, interactive=False, title='bnlearn_causal_network', node_color=None, node_size=None, node_properties=None, edge_properties=None, params_interactive={'bgcolor': '#ffffff', 'font_color': False, 'height': '800px', 'layout': None, 'notebook': False, 'width': '70%'}, params_static={'alpha': 0.8, 'arrowsize': 30, 'arrowstyle': '-|>', 'edge_alpha': 0.8, 'facecolor': 'white', 'font_color': '#000000', 'font_family': 'sans-serif', 'font_size': 14, 'height': 8, 'layout': 'fruchterman_reingold', 'maxscale': 10, 'minscale': 1, 'node_shape': 'o', 'width': 15}, verbose=3)

Plot the learned stucture.

Parameters
  • model (dict) – Learned model from the .fit() function.

  • pos (graph, optional) – Coordinates of the network. If there are provided, the same structure will be used to plot the network.. The default is None.

  • scale (int, optional) – Scaling parameter for the network. A larger number will linearily increase the network.. The default is 1.

  • interactive (Bool, (default: True)) – True: Interactive web-based graph. False: Static plot

  • title (str, optional) – Title for the plots.

  • node_color (str, optional) – Color each node in the network using a hex-color, such as ‘#8A0707’

  • node_size (int, optional) – Set the node size for each node in the network. The default size when using static plolts is 800, and for interactive plots it is 10.

  • node_properties (dict (default: None)) –

    Dictionary containing custom node_color and node_size parameters for the network. The node properties can easily be retrieved using the function: node_properties = bn.get_node_properties(model) node_properties = {‘node1’:{‘node_color’:’#8A0707’,’node_size’:10},

    ’node2’:{‘node_color’:’#000000’,’node_size’:30}}

  • edge_properties (dict (default: None).) – Dictionary containing custom node_color and node_size parameters for the network. The edge properties can be retrieved with: edge_properties = bn.get_edge_properties(model)

  • params_interactive (dict.) – Dictionary containing various settings in case of creating interactive plots.

  • params_static (dict.) – Dictionary containing various settings in case of creating static plots.

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: Error, 2: Warning, 3: Info (default), 4: Debug, 5: Trace

Returns

poslist.

Positions of the nodes.

GGraph.

Graph model

node_properties: dict.

Node properties.

Return type

dict containing pos and G

Examples

>>> import bnlearn as bn
>>>
>>> # Load asia DAG
>>> df = bn.import_example(data='asia')
>>>
>>> # Structure learning of sampled dataset
>>> model = bn.structure_learning.fit(df)
>>>
>>> # plot static
>>> G = bn.plot(model)
>>>
>>> # plot interactive
>>> G = bn.plot(model, interactive=True)
>>>
>>> # plot interactive with various settings
>>> bn.plot(model, node_color='#8A0707', node_size=35, interactive=True, params_interactive = {'height':'800px', 'width':'70%', 'layout':None, 'bgcolor':'#0f0f0f0f'})
>>>
>>> # plot with node properties
>>> node_properties = bn.get_node_properties(model)
>>> # Make some changes
>>> node_properties['xray']['node_color']='#8A0707'
>>> node_properties['xray']['node_size']=50
>>> # Plot
>>> bn.plot(model, interactive=True, node_properties=node_properties)
>>>
bnlearn.bnlearn.predict(model, df, variables, to_df=True, method='max', verbose=3)

Predict on data from a Bayesian network.

The inference on the dataset is performed sample-wise by using all the available nodes as evidence (obviously, with the exception of the node whose values we are predicting). The states with highest probability are returned.

Parameters
  • model (Object) – An object of class from bn.fit.

  • df (pd.DataFrame) – Each row in the DataFrame will be predicted

  • variables (str or list of str) – The label(s) of node(s) to be predicted.

  • to_df (Bool, (default is True)) – The output is converted to dataframe output. Note that this heavily impacts the speed.

  • method (str) – The method that is used to select the for the inferences. ‘max’ : Return the variable values based on the maximum probability. None : Returns all Probabilities

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

P – Predict() returns a dict with the evidence and states that resulted in the highest probability for the input variable.

Return type

dict or DataFrame

Examples

>>> import bnlearn as bn
>>> model = bn.import_DAG('sprinkler')
>>>
>>> # Make single inference
>>> query = bn.inference.fit(model, variables=['Rain', 'Cloudy'], evidence={'Wet_Grass':1})
>>> print(query)
>>> print(bn.query2df(query))
>>>
>>> # Lets create an example dataset with 100 samples and make inferences on the entire dataset.
>>> df = bn.sampling(model, n=1000)
>>>
>>> # Each sample will be assesed and the states with highest probability are returned.
>>> Pout = bn.predict(model, df, variables=['Rain', 'Cloudy'])
>>>
>>> print(Pout)
>>> #     Cloudy  Rain         p
>>> # 0        0     0  0.647249
>>> # 1        0     0  0.604230
>>> # ..     ...   ...       ...
>>> # 998      0     0  0.604230
>>> # 999      1     1  0.878049
bnlearn.bnlearn.print_CPD(DAG, checkmodel=False)

Print DAG-model to screen.

Parameters
  • DAG (pgmpy.models.BayesianNetwork) – model of the DAG.

  • checkmodel (bool) – Check the validity of the model. The default is True

Return type

None.

bnlearn.bnlearn.query2df(query, variables=None)

Convert query from inference model to a dataframe.

Parameters
  • query (Object from the inference model.) – Convert query object to a dataframe.

  • variables (list) – Order or select variables.

Returns

df – Dataframe with inferences.

Return type

pd.DataFrame()

bnlearn.bnlearn.sampling(DAG, n=1000, verbose=3)

Generate sample(s) using forward sampling from joint distribution of the bayesian network.

Parameters
  • DAG (dict) – Contains model and adjmat of the DAG.

  • n (int, optional) – Number of samples to generate. The default is 1000.

  • verbose (int, optional) – Print progress to screen. The default is 3. 0: None, 1: ERROR, 2: WARN, 3: INFO (default), 4: DEBUG, 5: TRACE

Returns

df – Dataframe containing sampled data from the input DAG model.

Return type

pd.DataFrame().

Example

>>> import bnlearn as bn
>>> DAG = bn.import_DAG('sprinkler')
>>> df = bn.sampling(DAG, n=1000)
bnlearn.bnlearn.save(model, filepath='bnlearn_model.pkl', overwrite=False, verbose=3)

Save learned model in pickle file.

Parameters
  • filepath (str, (default: 'bnlearn_model.pkl')) – Pathname to store pickle files.

  • overwrite (bool, (default=False)) – Overwite file if exists.

  • verbose (int, optional) – Show message. A higher number gives more informatie. The default is 3.

Returns

bool – Status whether the file is saved.

Return type

[True, False]

bnlearn.bnlearn.to_bayesiannetwork(model, verbose=3)

Convert adjacency matrix to BayesianNetwork.

Convert a adjacency to a Bayesian model. This is required as some of the functionalities, such as structure_learning output a DAGmodel. If the output of structure_learning is provided, the adjmat is extracted and processed.

Parameters

model (pd.DataFrame()) – Adjacency matrix.

Raises

Exception – The input should not be None and if a model (as dict) is provided, the key ‘adjmat’ should be included.

Returns

BayesianNetwork – BayesianNetwork that can be used in parameter_learning.fit.

Return type

Object

bnlearn.bnlearn.to_undirected(adjmat)

Transform directed adjacency matrix to undirected.

Parameters

adjmat (np.array()) – Adjacency matrix.

Returns

Directed adjacency matrix – Converted adjmat with undirected edges.

Return type

pd.DataFrame()

bnlearn.bnlearn.topological_sort(adjmat, start=None)

Topological sort.

Get nodes list in the topological sort order.

Parameters
  • adjmat (pd.DataFrame or bnlearn object.) – Adjacency matrix.

  • start (str, optional) – Start position. The default is None and the whole network is examined.

Returns

Topological sort order.

Return type

list

Example

import bnlearn as bn DAG = bn.import_DAG(‘sprinkler’, verbose=0) bn.topological_sort(DAG, ‘Rain’) bn.topological_sort(DAG)

References

https://stackoverflow.com/questions/47192626/deceptively-simple-implementation-of-topological-sorting-in-python

bnlearn.bnlearn.vec2adjmat(source, target, weights=None, symmetric=True)

Convert source and target into adjacency matrix.

Parameters
  • source (list) – The source node.

  • target (list) – The target node.

  • weights (list of int) – The Weights between the source-target values

  • symmetric (bool, optional) – Make the adjacency matrix symmetric with the same number of rows as columns. The default is True.

Returns

adjacency matrix.

Return type

pd.DataFrame

Examples

>>> import bnlearn as bn
>>> source=['Cloudy','Cloudy','Sprinkler','Rain']
>>> target=['Sprinkler','Rain','Wet_Grass','Wet_Grass']
>>> vec2adjmat(source, target)
>>> weights=[1,2,1,3]
>>> adjmat = bn.vec2adjmat(source, target, weights=weights)
bnlearn.bnlearn.vec2df(source, target, weights=None)

Convert source-target edges into sparse dataframe.

Convert edges between source and taget into a dataframe based on the weight. A weight of 2 will result that a row with the edge is created 2x.

Parameters
  • source (array-like) – The source node.

  • target (array-like) – The target node.

  • weights (array-like of int) – The Weights between the source-target values

Return type

pd.DataFrame

Examples

>>> Example 1
>>> import bnlearn as bn
>>> source=['Cloudy','Cloudy','Sprinkler','Rain']
>>> target=['Sprinkler','Rain','Wet_Grass','Wet_Grass']
>>> weights=[1,2,1,3]
>>> df = bn.vec2df(source, target, weights=weights)
>>> Example 2
>>> import bnlearn as bn
>>> vec = bn.import_example("stormofswords")
>>> df = bn.vec2df(vec['source'], vec['target'], weights=vec['weight'])