Developer notes

Contribution Guidelines

Commit messages

Python code

Please install the pre-commit hooks using

to automatically

We use type hints, which we feel is a good way of documentation and helps us find bugs using mypy.

Some of the pre-commit hooks modify the files, e.g., they trim whitespaces or format the code. If they modify your file, you will have to run git add and git commit again. To skip the pre-commit checks (not recommended) you can use git commit --no-verify.

New features

Please make a new branch for the development of new features. Rebase on the upstream master and include a test for your new feature. (The CI checks for a drop in code coverage.)

Documentation

Currently, documentation is hosted on GitHub pages. Build it locally using make html in the doc directory and then push it to GitHub pages using

git subtree push --prefix docs/_build/html  origin gh-pages

Implementing a new PAL class

If you want to use PyPAL with a model that we do not support yet, i.e., not GPy or sklearn Gaussian process regression, it is easy to write your own class. For this, you need to inherit from PALBase and implement your _train and _predict functions (and maybe also the _set_hyperparameters and _should_optimize_hyperparameters functions) using the design_space and y attributes of the class.

For instance, if we develop some multioutput model that has a train() and a predict() method we could simply use the following design pattern

from pypal import PALBase

class PALMyModel(PALBase):
    def _train(self):
        self.models[0].train(self.design_space[self.sampled], self.y[self.sampled])

    def _predict(self):
        self.mu, self.std = self.models[0].predict(self.design_space)

Note that we typically provide the models, even if it is only one, in a list to keep the API consistent.

In some instances, you might want to perform an operation in parallel, e.g., train the models for different objectives in parallel. One convenient way to do this in Python is concurrent.futures. The only hitch is that this approach requires that the function is picklable. To ensure is, you may want to implement the function that is to be run in parallel outside the class. For example, you could use the following design pattern

from pypal import PALBase
import concurrent.futures
from functools import partial

def _train_model_picklable(i, models, design_space, objectives, sampled):
    model = models[i]
    model.fit(
        design_space[sampled[:, i]],
        objectives[sampled[:, i], i].reshape(-1, 1),
    )
    return model

class MyPal(PALBase):
    def __init__(self, *args, **kwargs):
        n_jobs = kwargs.pop("n_jobs", 1)
        validate_njobs(n_jobs)
        self.n_jobs = n_jobs
        super().__init__(*args, **kwargs)

        validate_number_models(self.models, self.ndim)

    def _train(self):
        train_single_partial = partial(
            _train_model_picklable,
            models=self.models,
            design_space=self.design_space,
            objectives=self.y,
            sampled=self.sampled,
        )
        models = []
        with concurrent.futures.ProcessPoolExecutor(
            max_workers=self.n_jobs
        ) as executor:
            for model in executor.map(train_single_partial, range(self.ndim)):
                models.append(model)
        self.models = models