To automatically generate the changelog and version numbers we use conventional commits use the prefix feat for new features, chore for updating grunt tasks etc; no production code change, fix for bug fixes and docs for changes to the documentation.
feat
chore
fix
docs
Please install the pre-commit hooks using
to automatically
format the code with black
sort the imports with isort
lint the code with prospector
We use type hints, which we feel is a good way of documentation and helps us find bugs using mypy.
Some of the pre-commit hooks modify the files, e.g., they trim whitespaces or format the code. If they modify your file, you will have to run git add and git commit again. To skip the pre-commit checks (not recommended) you can use git commit --no-verify.
git add
git commit
git commit --no-verify
Please make a new branch for the development of new features. Rebase on the upstream master and include a test for your new feature. (The CI checks for a drop in code coverage.)
Currently, documentation is hosted on GitHub pages. Build it locally using make html in the doc directory and then push it to GitHub pages using
make html
doc
git subtree push --prefix docs/_build/html origin gh-pages
If you want to use PyPAL with a model that we do not support yet, i.e., not GPy or sklearn Gaussian process regression, it is easy to write your own class. For this, you need to inherit from PALBase and implement your _train and _predict functions (and maybe also the _set_hyperparameters and _should_optimize_hyperparameters functions) using the design_space and y attributes of the class.
GPy
sklearn
_train
_predict
_set_hyperparameters
_should_optimize_hyperparameters
design_space
y
For instance, if we develop some multioutput model that has a train() and a predict() method we could simply use the following design pattern
train()
predict()
from pypal import PALBase class PALMyModel(PALBase): def _train(self): self.models[0].train(self.design_space[self.sampled], self.y[self.sampled]) def _predict(self): self.mu, self.std = self.models[0].predict(self.design_space)
Note that we typically provide the models, even if it is only one, in a list to keep the API consistent.
In some instances, you might want to perform an operation in parallel, e.g., train the models for different objectives in parallel. One convenient way to do this in Python is concurrent.futures. The only hitch is that this approach requires that the function is picklable. To ensure is, you may want to implement the function that is to be run in parallel outside the class. For example, you could use the following design pattern
from pypal import PALBase import concurrent.futures from functools import partial def _train_model_picklable(i, models, design_space, objectives, sampled): model = models[i] model.fit( design_space[sampled[:, i]], objectives[sampled[:, i], i].reshape(-1, 1), ) return model class MyPal(PALBase): def __init__(self, *args, **kwargs): n_jobs = kwargs.pop("n_jobs", 1) validate_njobs(n_jobs) self.n_jobs = n_jobs super().__init__(*args, **kwargs) validate_number_models(self.models, self.ndim) def _train(self): train_single_partial = partial( _train_model_picklable, models=self.models, design_space=self.design_space, objectives=self.y, sampled=self.sampled, ) models = [] with concurrent.futures.ProcessPoolExecutor( max_workers=self.n_jobs ) as executor: for model in executor.map(train_single_partial, range(self.ndim)): models.append(model) self.models = models