bnlearn.structure_learning

Structure learning. Given a set of data samples, estimate a DAG that captures the dependencies between the variables.

bnlearn.structure_learning.fit(df, methodtype='hc', scoretype='bic', black_list=None, white_list=None, bw_list_method=None, max_indegree=None, tabu_length=100, epsilon=0.0001, max_iter=1000000.0, root_node=None, class_node=None, fixed_edges=None, return_all_dags=False, n_jobs=-1, verbose=3)

Structure learning fit model.

Description

Search strategies for structure learning The search space of DAGs is super-exponential in the number of variables and the above scoring functions allow for local maxima.

To learn model structure (a DAG) from a data set, there are three broad techniques:
  1. Score-based structure learning (BIC/BDeu/K2 score; exhaustive search, hill climb/tabu search)
    • exhaustivesearch

    • hillclimbsearch

    • chow-liu

    • Tree-augmented Naive Bayes (tan)

    • NaiveBayesian

  2. Constraint-based structure learning (PC)
    1. chi-square test

  3. Hybrid structure learning (The combination of both techniques) (MMHC)

Score-based Structure Learning. This approach construes model selection as an optimization task. It has two building blocks: A scoring function sD:->R that maps models to a numerical score, based on how well they fit to a given data set D. A search strategy to traverse the search space of possible models M and select a model with optimal score. Commonly used scoring functions to measure the fit between model and data are Bayesian Dirichlet scores such as BDeu or K2 and the Bayesian Information Criterion (BIC, also called MDL). BDeu is dependent on an equivalent sample size.

param df

Input dataframe.

type df

pd.DataFrame()

param methodtype

String Search strategy for structure_learning. ‘hc’ or ‘hillclimbsearch’ (default) ‘ex’ or ‘exhaustivesearch’ ‘cs’ or ‘constraintsearch’ ‘cl’ or ‘chow-liu’ (requires setting root_node parameter) ‘nb’ or ‘naivebayes’ (requires <root_node>) ‘tan’ (requires <root_node> and <class_node> parameter)

type methodtype

str, (default : ‘hc’)

param scoretype

Scoring function for the search spaces. ‘bic’, ‘k2’, ‘bdeu’

type scoretype

str, (default : ‘bic’)

param black_list

List of edges are black listed. In case of filtering on nodes, the nodes black listed nodes are removed from the dataframe. The resulting model will not contain any nodes that are in black_list.

type black_list

List or None, (default : None)

param white_list

List of edges are white listed. In case of filtering on nodes, the search is limited to those edges. The resulting model will then only contain nodes that are in white_list. Works only in case of methodtype=’hc’ See also paramter: bw_list_method

type white_list

List or None, (default : None)

param bw_list_method
A list of edges can be passed as black_list or white_list to exclude or to limit the search.
  • ‘edges’ : [(‘A’, ‘B’), (‘C’,’D’), (…)] This option is limited to only methodtype=’hc’

  • ‘nodes’ : [‘A’, ‘B’, …] Filter the dataframe based on the nodes for black_list or white_list. Filtering can be done for every methodtype/scoretype.

type bw_list_method

list of str or tuple, (default : None)

param max_indegree

If provided and unequal None, the procedure only searches among models where all nodes have at most max_indegree parents. (only in case of methodtype=’hc’)

type max_indegree

int, (default : None)

param epsilon

Defines the exit condition. If the improvement in score is less than epsilon, the learned model is returned. (only in case of methodtype=’hc’)

type epsilon

float (default: 1e-4)

param max_iter

The maximum number of iterations allowed. Returns the learned model when the number of iterations is greater than max_iter. (only in case of methodtype=’hc’)

type max_iter

int (default: 1e6)

param root_node

The root node for treeSearch based methods.

type root_node

String. (only in case of chow-liu, Tree-augmented Naive Bayes (TAN))

param class_node

The class node is required for Tree-augmented Naive Bayes (TAN)

type class_node

String

param fixed_edges

A list of edges that will always be there in the final learned model. The algorithm will add these edges at the start of the algorithm and will never change it.

type fixed_edges

iterable, Only in case of HillClimbSearch.

param return_all_dags

Return all possible DAGs. Only in case methodtype=’exhaustivesearch’

type return_all_dags

Bool, (default: False)

param verbose

0: None, 1: Error, 2: Warning, 3: Info (default), 4: Debug, 5: Trace

type verbose

int, (default : 3)

rtype

dict with model.

Examples

>>> # Import bnlearn
>>> import bnlearn as bn
>>>
>>> # Load DAG
>>> model = bn.import_DAG('asia')
>>>
>>> # plot ground truth
>>> G = bn.plot(model)
>>>
>>> # Sampling
>>> df = bn.sampling(model, n=10000)
>>>
>>> # Structure learning of sampled dataset
>>> model_sl = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic')
>>>
>>> # Compute edge strength using chi-square independence test
>>> model_sl = bn.independence_test(model_sl, df)
>>>
>>> # Plot based on structure learning of sampled data
>>> bn.plot(model_sl, pos=G['pos'])
>>>
>>> # Compare networks and make plot
>>> bn.compare_networks(model, model_sl, pos=G['pos'])