bnlearn.structure_learning
Structure learning. Given a set of data samples, estimate a DAG that captures the dependencies between the variables.
- bnlearn.structure_learning.fit(df, methodtype='hc', scoretype='bic', black_list=None, white_list=None, bw_list_method=None, max_indegree=None, tabu_length=100, epsilon=0.0001, max_iter=1000000.0, root_node=None, class_node=None, fixed_edges=None, return_all_dags=False, n_jobs=-1, verbose=3)
Structure learning fit model.
Description
Search strategies for structure learning The search space of DAGs is super-exponential in the number of variables and the above scoring functions allow for local maxima.
- To learn model structure (a DAG) from a data set, there are three broad techniques:
- Score-based structure learning (BIC/BDeu/K2 score; exhaustive search, hill climb/tabu search)
exhaustivesearch
hillclimbsearch
chow-liu
Tree-augmented Naive Bayes (tan)
NaiveBayesian
- Constraint-based structure learning (PC)
chi-square test
Hybrid structure learning (The combination of both techniques) (MMHC)
Score-based Structure Learning. This approach construes model selection as an optimization task. It has two building blocks: A scoring function sD:->R that maps models to a numerical score, based on how well they fit to a given data set D. A search strategy to traverse the search space of possible models M and select a model with optimal score. Commonly used scoring functions to measure the fit between model and data are Bayesian Dirichlet scores such as BDeu or K2 and the Bayesian Information Criterion (BIC, also called MDL). BDeu is dependent on an equivalent sample size.
- param df
Input dataframe.
- type df
pd.DataFrame()
- param methodtype
String Search strategy for structure_learning. ‘hc’ or ‘hillclimbsearch’ (default) ‘ex’ or ‘exhaustivesearch’ ‘cs’ or ‘constraintsearch’ ‘cl’ or ‘chow-liu’ (requires setting root_node parameter) ‘nb’ or ‘naivebayes’ (requires <root_node>) ‘tan’ (requires <root_node> and <class_node> parameter)
- type methodtype
str, (default : ‘hc’)
- param scoretype
Scoring function for the search spaces. ‘bic’, ‘k2’, ‘bdeu’
- type scoretype
str, (default : ‘bic’)
- param black_list
List of edges are black listed. In case of filtering on nodes, the nodes black listed nodes are removed from the dataframe. The resulting model will not contain any nodes that are in black_list.
- type black_list
List or None, (default : None)
- param white_list
List of edges are white listed. In case of filtering on nodes, the search is limited to those edges. The resulting model will then only contain nodes that are in white_list. Works only in case of methodtype=’hc’ See also paramter: bw_list_method
- type white_list
List or None, (default : None)
- param bw_list_method
- A list of edges can be passed as black_list or white_list to exclude or to limit the search.
‘edges’ : [(‘A’, ‘B’), (‘C’,’D’), (…)] This option is limited to only methodtype=’hc’
‘nodes’ : [‘A’, ‘B’, …] Filter the dataframe based on the nodes for black_list or white_list. Filtering can be done for every methodtype/scoretype.
- type bw_list_method
list of str or tuple, (default : None)
- param max_indegree
If provided and unequal None, the procedure only searches among models where all nodes have at most max_indegree parents. (only in case of methodtype=’hc’)
- type max_indegree
int, (default : None)
- param epsilon
Defines the exit condition. If the improvement in score is less than epsilon, the learned model is returned. (only in case of methodtype=’hc’)
- type epsilon
float (default: 1e-4)
- param max_iter
The maximum number of iterations allowed. Returns the learned model when the number of iterations is greater than max_iter. (only in case of methodtype=’hc’)
- type max_iter
int (default: 1e6)
- param root_node
The root node for treeSearch based methods.
- type root_node
String. (only in case of chow-liu, Tree-augmented Naive Bayes (TAN))
- param class_node
The class node is required for Tree-augmented Naive Bayes (TAN)
- type class_node
String
- param fixed_edges
A list of edges that will always be there in the final learned model. The algorithm will add these edges at the start of the algorithm and will never change it.
- type fixed_edges
iterable, Only in case of HillClimbSearch.
- param return_all_dags
Return all possible DAGs. Only in case methodtype=’exhaustivesearch’
- type return_all_dags
Bool, (default: False)
- param verbose
0: None, 1: Error, 2: Warning, 3: Info (default), 4: Debug, 5: Trace
- type verbose
int, (default : 3)
- rtype
dict with model.
Examples
>>> # Import bnlearn >>> import bnlearn as bn >>> >>> # Load DAG >>> model = bn.import_DAG('asia') >>> >>> # plot ground truth >>> G = bn.plot(model) >>> >>> # Sampling >>> df = bn.sampling(model, n=10000) >>> >>> # Structure learning of sampled dataset >>> model_sl = bn.structure_learning.fit(df, methodtype='hc', scoretype='bic') >>> >>> # Compute edge strength using chi-square independence test >>> model_sl = bn.independence_test(model_sl, df) >>> >>> # Plot based on structure learning of sampled data >>> bn.plot(model_sl, pos=G['pos']) >>> >>> # Compare networks and make plot >>> bn.compare_networks(model, model_sl, pos=G['pos'])