fincoretails: Maximum-likelihood fits of piece-wise Pareto distributions with finite and non-zero core

Maier, Benjamin F.

doi:10.5281/zenodo.8349920

Published September 15, 2023 | Version 0.0.0

Software Open

fincoretails: Maximum-likelihood fits of piece-wise Pareto distributions with finite and non-zero core

Maier, Benjamin F.¹

1. Technical University of Denmark

# fincoretails

Fit some heavy-tailed distributions to data. Has a focus on piece-wise defined
probability densities that have finite values on the core (are nonzero and finite
in the support region x <= xmin).

This should not be considered a fully functional fledged, smooth-edged library,
it just contains the bare necessities to fit some stuff. Please feel free
to fork and tinker with this project.

* repository: https://github.com/benmaier/fincoretails/

![Example distributions of the piecewise models](https://github.com/benmaier/fincoretails/blob/main/cookbook/example_plots/example_plots.png?raw=true)
![Example fits of the generalized models](https://github.com/benmaier/fincoretails/blob/main/cookbook/example_plots/fit_plots_0.png?raw=true)
![Example fits of the forced models](https://github.com/benmaier/fincoretails/blob/main/cookbook/example_plots/fit_plots_1.png?raw=true)

## Example

```python
from fincoretails import lognormal, general_algpareto, loglikelihood_ratio, aic

alpha = 2
xmin = 4.5
beta = alpha
Nsample = 10_000
data = general_algpareto.sample(Nsample, alpha, xmin, beta)

alpha, xmin, beta = general_algpareto.fit_params(data)
mu, sigma = lognormal.fit_params(data)

ll_fcp = general_algpareto.loglikelihoods(data, alpha, xmin, beta)
ll_logn = lognormal.loglikelihoods(data, mu, sigma)
logL_fcp = ll_fcp.sum()
logL_logn = ll_logn.sum()

R, p = loglikelihood_ratio(ll_fcp, ll_logn)
AIC_fcp = aic(logL_fcp, number_of_free_parameters=3, nsamples=Nsample)
AIC_logn = aic(logL_logn, number_of_free_parameters=2, nsamples=Nsample)

print(f"finite-core pareto: {alpha=:4.2f}, {xmin=:4.2f}, {beta=:4.2f} ")
print(f"lognormal: {mu=:4.2f}, {sigma=:4.2f}")
print(f"logL finite-core pareto = {logL_fcp:4.2f}")
print(f"logL lognormal = {logL_logn:4.2f}")
print(f"log-likelihood ratio R={R:4.2f} with significance level p={p:4.2e}")
print(f"AIC finite-core pareto = {AIC_fcp:4.2f}")
print(f"AIC lognormal = {AIC_logn:4.2f}")
```

With output:

finite-core pareto: alpha=2.01, xmin=4.47, beta=1.97
lognormal: mu=1.16, sigma=1.44
logL finite-core pareto = -28833.94
logL lognormal = -29474.10
log-likelihood ratio R=11.24 with significance level p=2.65e-29
AIC finite-core pareto = 57673.89
AIC lognormal = 58952.20

## Install

pip install fincoretails

`fincoretails` was developed and tested for

* Python 3.6
* Python 3.7
* Python 3.8

So far, the package's functionality was tested on Mac OS X and CentOS only.

## Dependencies

`fincoretails` directly depends on the following packages which will be installed by `pip` during the installation process

* `numpy>=1.23`
* `scipy>=1.9`
* `sympy>=1.12`

## Documentation

### Distributions

Every distribution is treated as a separate module in this package. There's
one list that makes all these distributions accessible.

```python
from fincoretails import distributions
```

will give you this list:

```python
distributions = [
# constant core (uniform distribution), power-law tail
fincoretails.unipareto,

# core: (2-(x/xmin)^alpha), tail: (xmin/x)^alpha
fincoretails.algpareto,

# core: (x/xmin)^alpha, tail: (xmin/x)^alpha
fincoretails.powpareto,

# core: exp[-alpha(x/xmin-1)], tail: (xmin/x)^alpha
fincoretails.expareto,

# core: (2-(x/xmin)^beta), tail: (xmin/x)^alpha
fincoretails.general_algpareto,

# core: (x/xmin)^beta, tail: (xmin/x)^alpha
fincoretails.general_powpareto,

# core: exp[-beta(x/xmin-1)], tail: (xmin/x)^alpha
fincoretails.general_expareto,

# log-normal as reference
fincoretails.lognormal,

# see Appendix D of the paper
fincoretails.santafe,
]
```

We'll comment on each of them further below.

Each distribution module can also be imported as e.g.

```python
from fincoretails import lognormal

rvs = lognormal.sample(Nsample=1000,mu=1,sigma=1)
```

The distribution modules all contain similar functions, we'll list some in the following.

#### Distribution Properties

```python
def sample(Nsample, alpha, xmin, beta):
"""Samples from the distribution based on parameters."""

def quantile(q, *parameters):
"""
Computes the quantile of the distribution for given parameters
and probability q.
"""

def get_normalization_constant(alpha, xmin, beta):
"""Computes the normalization constant."""

def Pcrit(alpha, xmin, beta):
"""Computes the CDF at xmin given distribution parameters."""

def cdf(x, alpha, xmin, beta):
"""
Compute the cumulative distribution function (CDF)
for single value or array-like x.
"""

def cdf_left(x, alpha, xmin, beta):
"""
Compute the CDF for values less than xmin.
(single data point or array-like)
"""

def cdf_right(x, alpha, xmin, beta):
"""
Compute the CDF for values greater than xmin.
(single data point or array-like)
"""

def ccdf(x, *args, **kwargs):
"""
Compute the complementary CDF of this distribution
for single value or array-like x.
"""

def pdf(x, alpha, xmin, beta):
"""
Compute the probability density function at x
(single data point or array-like).
"""

def pdf_left(x, alpha, xmin, beta):
"""
Compute the probability density function for values less than xmin.
(single data point or array-like).
"""

def pdf_right(x, alpha, xmin, beta):
"""
Compute the probability density function for values greater than xmin
(single data point or array-like).
"""

def mean(alpha, xmin, beta):
"""Calculate the mean of the distribution"""

def second_moment(alpha, xmin, beta):
"""Calculate the second moment of the distribution"""

def variance(*args, **kwargs):
"""Calculate the variance of the distribution"""

def neighbor_degree(*args, **kwargs):
"""Returns second_moment / mean"""
```

#### Distribution Fitting

```python
def fit_params(data, beta_initial_values=(1.,), minxmin=None, maxxmin=None):
"""
Fits distribution parameters based on data.

Parameters
----------
data : array-like
Data points.
beta_initial_values : tuple, optional
Initial values to start searching for beta hat.
Default is (1.,). Usually you already have an idea
what beta might be (or even if it's gonna be negative
or positive) from what the data looks like and
providing a sensible guess speeds up the fitting.
minxmin : float, optional
Minimum xmin value. Default is None.
maxxmin : float, optional
Maximum xmin value. Default is None.

Returns
-------
alpha_hat : float

xmin_hat : float

beta_hat : float
"""

def loglikelihood(data, *parameters):
"""
Computes the total log-likelihood of the distribution
for given data and parameters.
"""

def loglikelihoods(data, *parameters):
"""
Computes the individual log-likelihood of the distribution
for each data point (expects an array, returns an array)
"""

def alpha_and_log_likelihood_fixed_xmin(data, xmin, beta):
"""
Calculates alpha and log likelihood for a fixed minimum value
and fixed beta

Parameters
----------
data : array-like
Data for computation.
xmin : float
Fixed minimum value.
beta : float, optional
beta parameter of the distribution

Returns
-------
tuple
alpha, log-likelihood.
"""

def alpha_beta_and_log_likelihood_fixed_xmin(data, xmin, b0=1.5):
"""
Calculates alpha, beta, and log likelihood for a fixed minimum value.

Parameters
----------
data : array-like
Data for computation.
xmin : float
Fixed minimum value.
b0 : float, optional
Initial value of beta. Default is 1.5.

Returns
-------
tuple
alpha, beta, and log-likelihood.
"""

def alpha_xmin_beta_and_log_likelihood_fixed_xmin_index(data, j, xmins=None, beta0=[2.,]):
"""
Computes alpha, xmin, beta, and log likelihood values for a fixed minimum value index.

Parameters
----------
data : array-like
Data for computation.
j : int
Index of the fixed minimum value.
xmins : array-like, optional
Array of minimum values. If None, will consider unique sorted data values.
beta0 : list, optional
Initial beta values to try. Default is [2.,].

Returns
-------
tuple
alpha, xmin, beta, and log likelihood values.
"""

def alpha_xmin_beta_and_log_likelihood(data, beta0=[2.,], stop_at_first_max=False, minxmin=None, maxxmin=None):
"""
Computes optimal alpha, xmin, beta, and log likelihood.

Parameters
----------
data : array-like
Data for computation.
beta0 : list, optional
Initial beta values. Default is [2.,].
stop_at_first_max : bool, optional
If True, stops the search at the first maximum value. Default is False.
minxmin : float, optional
Minimum boundary for xmin. Default is None.
maxxmin : float, optional
Maximum boundary for xmin. Default is None.

Returns
-------
tuple
Optimal alpha, xmin, beta, and log likelihood.
"""
```

### Some other functionalities

You can do a complete fit analysis with the experimental analysis function

```python
from fincoretails import distributions, unipareto
from fincoretails.analysis import analysis

Nsample = 2000
atrue = 2
ytrue = 3
data = unipareto.sample(Nsample,atrue,ytrue)
dists = distributions[:-1] # disregard Santa Fe distribution

analysis(data, dists)
```

There's also functionalities to do goodness-of-fit:

```python
from fincoretails.tools import loglikelihood_ratio, aic
```

with function headers

```python
def loglikelihood_ratio(loglikelihoodsA, loglikelihoodsB, normalized_ratio=True):
"""
Compute the log-likelihood ratio and the significance level.
Return the log-likelihood ratio R and the probability p
that a random sample from a normally distributed R-value
distribution would be larger or equal to the computed
R-value.

Parameters
----------
loglikelihoodsA : array-like
Log-likelihoods under null hypothesis.
loglikelihoodsB : array-like
Log-likelihoods under alternative hypothesis.
normalized_ratio : bool, optional, default = True
If True (default), return the normalized log-likelihood ratio.

Returns
-------
R : float
The log-likelihood ratio. It is normalized if `normalized_ratio` is True.
p : float
The significance level, i.e., the probability that a random sample
from a normally distributed R-value distribution would be larger or
equal to the computed R-value.

Notes
-----
Typically, one classifies the R-value as 'significant' if p < 0.05. However,
also consider the context of the problem rather than strictly following this rule.
"""

def aic(logLL, number_of_free_parameters, nsamples=None):
"""
Compute the Akaike Information Criterion (AIC).

Parameters
----------
logLL : float
The log-likelihood.
number_of_free_parameters : int
The number of free parameters in the model.
nsamples : int, optional, default = None
The number of samples. If not provided or if the sample size is too small,
the function will return the regular AIC.

Returns
-------
AIC : float
The Akaike Information Criterion.

Notes
-----
If the number of samples and the number of free parameters are large enough,
the function will return the corrected AIC.
"""
```

## Changelog

Changes are logged in a [separate file](https://github.com/benmaier/fincoretails/blob/main/CHANGELOG.md).

## License

This project is licensed under the [MIT License](https://github.com/benmaier/fincoretails/blob/main/LICENSE).
Note that this excludes any images/pictures/figures shown here or in the documentation.

## Contributing

If you want to contribute to this project, please make sure to read the [code of conduct](https://github.com/benmaier/fincoretails/blob/main/CODE_OF_CONDUCT.md) and the [contributing guidelines](https://github.com/benmaier/fincoretails/blob/main/CONTRIBUTING.md). In case you're wondering about what to contribute, we're always collecting ideas of what we want to implement next in the [outlook notes](https://github.com/benmaier/fincoretails/blob/main/OUTLOOK.md).

[![Contributor Covenant](https://img.shields.io/badge/Contributor%20Covenant-v1.4%20adopted-ff69b4.svg)](code-of-conduct.md)

## Dev notes

Fork this repository, clone it, and install it in dev mode.

```bash
git clone git@github.com:YOURUSERNAME/fincoretails.git
make
```

If you want to upload to PyPI, first convert the new `README.md` to `README.rst`

```bash
make readme
```

It will give you warnings about bad `.rst`-syntax. Fix those errors in `README.rst`. Then wrap the whole thing

```bash
make pypi
```

It will probably give you more warnings about `.rst`-syntax. Fix those until the warnings disappear. Then do

```bash
make upload
```

Files

fincoretails-main.zip

Files (451.4 kB)

Name	Size	Download all
fincoretails-main.zip md5:b978b01b409804a2cb27a2ad9fd2379d	451.4 kB	Preview Download

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	72	65
Downloads	19	19
Data volume	8.6 MB	8.6 MB

fincoretails: Maximum-likelihood fits of piece-wise Pareto distributions with finite and non-zero core

Creators

Description

Files

fincoretails-main.zip

Files (451.4 kB)