Namespaces

Automatically generated documentation for harmonic APIs. All functionality is accessible through a pip installation of the harmonic package.

class harmonic.Chains(ndim: int)

Class to store samples from multiple MCMC chains.

__init__(ndim: int)

Construct empty Chains for parameter space of dimension ndim.

Constructor simply sets ndim. Chain samples are added by the add_chain* methods since we want to support setting up data for chains from different input data formats (e.g. data from a single chain or multiple chains at once).

Parameters:: ndim (int) – Dimension of the parameter space.

add(other)

Add other Chain object to this object.

Parameters:: other (Chains) – Other Chain object to be added to this object.
Raises:: ValueError – Raised if the new chain has a different ndim.

add_chain(samples: ndarray, ln_posterior: ndarray)

Add a single chain to a Chains object.

Parameters:

samples (np.ndarray[nsamples, ndim]) – Samples of a single chain.
ln_posterior (np.ndarray[n_new_samples]) – log_e posterior values.

Raises:

ValueError – Raised when ndim of new chain does not match previous chains.

add_chains_2d(samples: ndarray, ln_posterior: ndarray, nchains_in: int)

Add a number of chains to a Chains object assuming all chains are of the same length.

Parameters:

samples (np.ndarray[nsamples_in * nchains_in, ndim]) – Samples of multiple chains.
ln_posterior (np.ndarray[nsamples_in * nchains_in]) – log_e posterior values.
nchains_in (int) – Number of chains to be added.

Raises:

ValueError – Raised when number of samples is not multiple of the number of chains.
ValueError – Raised when ndim of new chains does not match previous chains.
ValueError – Raised when posterior and samples first length are different.

add_chains_2d_list(samples: ndarray, ln_posterior: ndarray, nchains_in: int, chain_indexes: List)

Add a number of chains to the chain class. Uses a list of indexes to determine where each chain starts and stops.

Parameters:

samples (np.ndarray[nsamples_in * nchains_in, ndim]) – Samples of multiple chains.
ln_posterior (np.ndarray[nsamples_in * nchains_in]) – log_e posterior values.
nchains_in (int) – Number of chains to be added.
chain_indexes (list) – List of the starting index of the chains.

Raises:

ValueError – Raised when ndim of new chains does not match previous chains.
ValueError – Raised when posterior and samples first length are different.
ValueError – Raised when the length of the list is not nchains_in + 1.

add_chains_3d(samples: ndarray, ln_posterior: ndarray)

Add a number of chains to a Chain object from 3D array.

Parameters:

samples (np.ndarray[nchains_in, nsamples_in, ndim]) – Samples from multiple chains.
ln_posterior (np.ndarray[nchains_in, nsamples_in]) – log_e posterior values.

Raises:

ValueError – Raised when ndim of new chains does not match previous chains.
ValueError – Raised when posterior and samples first and second length are different.

deepcopy(): Performs deep copy of the chain class (calls the module copy).

get_chain_indices(i: int)

Gets the start and end index of samples from a chain.

The end index specifies the index one passed the end of the chain, i.e. the chain samples can be accessed by self.samples[start:end,:].

Parameters:: i (int) – Index of chain of which to determine start and end indices.
Returns:: A tuple of the start and end index, i.e. (start, end).
Return type:: (int, int)
Raises:: ValueError – Raised when chain number invalid.

get_sub_chains(chains_wanted: List)

Creates a new chain instance with the chains indexed in chains_wanted. (Useful for cross-validation.)

Parameters:: chains_wanted (List) – List of indexes of chains that the new chain instance will contain.
Returns:: Chains object containing the chains wanted.
Return type:: Chains
Raises:: ValueError – If any of the chains_wanted indexes are out of bounds i.e. outside of range 0 to nchains - 1.

nsamples_per_chain()

Compute list containing number of samples in each chain.

Parameters:

None.

Returns:

1D list of length self.nchains containing the: number of samples in each chain.

Return type:

nsamples_per_chain (list)

remove_burnin(nburn: int = 100)

Remove burn-in samples from each chain.

Parameters:: nburn (int) – Number of burn-in samples to remove from each chain.
Raises:: ValueError – Raised when nburn not less then number of samples in each chain.

shallowcopy(): Performs shallow copy of the chain class (calls the module copy).

split_into_blocks(nblocks: int = 100)

Split chains into larger number of blocks.

The intention of this method is to break chains into blocks that are (approximately) independent in order to get more independent chains for computing various statistics.

Each existing chain is split into blocks (i.e. new chains), proportionally to the size of the current chains. Final blocks within each chain end up containing slightly different numbers of samples (since we do not ever want to throw away samples!). One could improve this, if required, to distribute the additional samples across all of the blocks of the chain.

Parameters:: nblocks (int) – Number of new (blocked) chains to split existing chains into.
Raises:: ValueError – Returned if nblocks < the number chains

class harmonic.Evidence(nchains: int, model, shift=Shifting.MEAN_SHIFT)

Compute inverse evidence values from chains, using posterior model.

Multiple chains can be added in sequence (to avoid having to store very long chains).

__init__(nchains: int, model, shift=Shifting.MEAN_SHIFT)

Construct evidence class for computing inverse evidence values from set number of chains and initialised posterior model.

Parameters:

nchains (int) – Number of chains that will be used in the computation.
model (Model) – An instance of a posterior model class that has been fitted.
shift (Shifting) – What shifting method to use to avoid over/underflow during computation. Selected from enumerate class.

Raises:

ValueError – Raised if the number of chains is not positive.
ValueError – Raised if the number of dimensions is not positive.
ValueError – Raised if model not fitted.

add_chains(chains, num_slices=None)

Add new chains and calculate an estimate of the inverse evidence, its variance, and the variance of the variance.

Calculations are performed by using running averages of the totals for each chain. Consequently, the method can be called many times with new samples for each chain so that the evidence estimate will improve. The rationale is that not all samples need to be stored in memory for high-dimensional problems. Note that the same number of chains needs to be considered for each call.

Parameters:

chains (Chains) – An instance of the chains class containing the chains to be used in the calculation.
num_slices (int) – Number of slices into which the samples are divided row-wise when using flow models to avoid memory issues. If None, the samples are considered all-together. Defaults to None.

Raises:

ValueError – Raised if the input number of chains to not match the number of chains already set up.
ValueError – Raised if both max and mean shift are set.

check_basic_diagnostic()

Perform basic diagonstic check on sanity of evidence calculations.

If these tests pass it does not necessarily mean the evidence is accurate and other tests should still be performed.

Returns:: Whether diagnostic tests pass.
Return type:: Boolean
Raises:: Warnings – Raised if the diagnostic tests fail.

compute_evidence()

Compute evidence from the inverse evidence.

Returns:

Tuple containing the following.

evidence (double): Estimate of evidence.

evidence_std (double): Estimate of standard deviation of evidence.

Return type:

(double, double)

Raises:

ValueError – if inverse evidence or its variance overflows.

compute_ln_evidence()

Compute log_e of evidence from the inverse evidence.

Returns:

Tuple containing the following.

ln_evidence (double): Estimate of log_e of evidence.

ln_evidence_std (double): Estimate of log_e of standard
deviation of evidence.

Return type:

(double, double)

compute_ln_inv_evidence_errors()

Compute lower and uppper errors on the log_e of the inverse evidence.

Compute the log-space error \(\hat{\zeta}_\pm\) defined by

\[\log ( \hat{\rho} \pm \hat{\sigma} ) = \log (\hat{\rho}) + \hat{\zeta}_\pm .\]

Computed in a numerically stable way by

\[\hat{\zeta}_\pm = \log(1 \pm \hat{\sigma} / \hat{\rho}) .\]

Returns:

Tuple containing the following.

ln_evidence_err_neg (double): Lower error for log_e of inverse evidence.

ln_evidence_err_pos (double): Upper error for log_e of inverse evidence.

Return type:

(double, double)

classmethod deserialize(filename)

Deserialize Evidence object from file.

Parameters:: filename (string) – Name of file from which to read evidence object.
Returns:: Evidence object deserialized from file.
Return type:: (Evidence)

get_masks(chain_start_ixs: Array) → Array

Create mask array for a 2D array of concatenated chains of different lengths. :param chain_start_ixs: Start indices of chains

in Chain object.

Returns:

Mask array with each row corresponding to a chain: and entries with boolean values depending on if given sample at that position is in that chain.

Return type:

jnp.ndarray[nchains,nsamples]

process_run()

Use the running totals of realspace running_sum and nsamples_per_chain to calculate an estimate of the inverse evidence, its variance, and the variance of the variance.

This method is ran each time chains are added to update the inverse variance estimates from the running totals.

serialize(filename)

Serialize evidence object.

Parameters:: filename (string) – Name of file to save evidence object.

set_shift(shift_value: float)

Set the shift value of log_e posterior values to aid numerical stability.

Parameters:

shift_value (float) – Shift value.

Raises:

ValueError – Raised if shift_value is NaN.
ValueError – Raised if one attempts to set shift when another shift is already set.

class harmonic.Shifting(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Enumeration to define which log-space shifting to adopt. Different choices may prove optimal for certain settings.

ABS_MAX_SHIFT = 4

MAX_SHIFT = 2

MEAN_SHIFT = 1

MIN_SHIFT = 3

harmonic.evidence.compute_bayes_factor(ev1, ev2)

Compute Bayes factor of two models.

Parameters:

ev1 (float) – Evidence value of model 1 with chains added.
ev2 (float) – Evidence value of model 2 with chains added.

Returns:

Tuple containing the following.

bf12: Estimate of the Bayes factor Z_1 / Z_2.

bf12_std: Estimate of the standard deviation of the Bayes factor
sqrt( var ( Z_1 / Z_2 ) ).

Return type:

(float, float)

Raises:

ValueError – Raised if model 1 does not have chains added.
ValueError – Raised if model 2 does not have chains added.
ValueError – If inverse evidence or its variance for model 1 or model 2 too large to store in non-log space.

harmonic.evidence.compute_ln_bayes_factor(ev1, ev2)

Computes log_e of Bayes factor of two models.

Parameters:

ev1 (float) – Evidence object of model 1 with chains added.
ev2 (float) – Evidence object of model 2 with chains added.

Returns:

Tuple containing the following.

ln_bf12: Estimate of log_e of the Bayes factor ln ( Z_1 / Z_2 ).

ln_bf12_std: Estimate of log_e of the standard deviation of the
Bayes factor ln ( sqrt( var ( Z_1 / Z_2 ) ) ).

Return type:

(float, float)

Raises:

ValueError – Raised if model 1 does not have chains added.
ValueError – Raised if model 2 does not have chains added.

class harmonic.flows.RealNVP(n_features: int, n_scaled_layers: int = 2, n_unscaled_layers: int = 4, parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)

Real-valued non-volume preserving flow using flax and tfp-jax.

Parameters:

n_features (int) – Number of features in the data.
n_scaled_layers (int, optional) – Non-zero number of layers in the flow. Defaults to 2.
n_unscaled_layers (int, optional) – Number of unscaled layers in the flow. Defaults to 4.

log_prob(x: array, temperature: float = 1.0) → array

Evaluate the log probability of the flow for a batched input.

Parameters:

x (jnp.ndarray (batch_size, ndim)) – Sample for which to predict posterior values.
temperature (float, optional) – Factor by which base Gaussian unit covariance matrix is scaled. Should be between 0 and 1 for use in evidence estimation. Defaults to 1.

Returns:

Predicted log_e posterior value.

Return type:

jnp.ndarray (batch_size,)

make_flow(temperature: float = 1.0)

Make tfp-jax distribution object containing the RealNVP flow.

Parameters:

temperature (float, optional) – Factor by which base Gaussian unit covariance matrix is scaled. Should be between 0 and 1 for use in evidence estimation. Defaults to 1.

Returns:

Base Gaussian transformed by scaled contained in the scaled_layers: attribute, followed by unscaled affine coupling layers contained in the unscaled_layers attribute.

Return type:

tfb.Distribution

Raises:

ValueError – If n_scaled_layers is not positive.

sample(rng: PRNGKey, num_samples: int, temperature: float = 1.0) → array

” Sample from the flow.

Parameters:

rng (Union[Array, PRNGKeyArray])) – Key used in random number generation process.
num_samples (int) – Number of samples generated.
temperature (float, optional) – Factor by which base Gaussian unit covariance matrix is scaled. Should be between 0 and 1 for use in evidence estimation. Defaults to 1.

Returns:

Samples from fitted distribution.

Return type:

jnp.array (num_samples, ndim)

setup()

Initializes a Module lazily (similar to a lazy __init__).

setup is called once lazily on a module instance when a module is bound, immediately before any other methods like __call__ are invoked, or before a setup-defined attribute on self is accessed.

This can happen in three cases:

Immediately when invoking apply(), init() or init_and_output().
Once the module is given a name by being assigned to an attribute of another module inside the other module’s setup method (see __setattr__()):
>>> class MyModule(nn.Module):
...   def setup(self):
...     submodule = nn.Conv(...)

...     # Accessing `submodule` attributes does not yet work here.

...     # The following line invokes `self.__setattr__`, which gives
...     # `submodule` the name "conv1".
...     self.conv1 = submodule

...     # Accessing `submodule` attributes or methods is now safe and
...     # either causes setup() to be called once.
Once a module is constructed inside a method wrapped with compact(), immediately before another method is called or setup defined attribute is accessed.

class harmonic.flows.RQSpline(n_features: int, num_layers: int, hidden_size: ~typing.Sequence[int], num_bins: int, spline_range: ~typing.Sequence[float] = (-10.0, 10.0), parent: ~flax.linen.module.Module | ~flax.core.scope.Scope | ~flax.linen.module._Sentinel | None = <flax.linen.module._Sentinel object>, name: str | None = None)

Rational quadratic spline normalizing flow model using distrax.

Parameters:

n_features (int) – Number of features in the data.
num_layers (int) – Number of layers in the flow.
num_bins (int) – Number of bins in the spline.
hidden_size (Sequence[int]) – Size of the hidden layers in the conditioner.
spline_range (Sequence[float], optional) – Range of the spline. Defaults to (-10, 10)

Note

Adapted from github.com/kazewong/flowMC

log_prob(x: array, temperature: float = 1.0) → array

Evaluate the log probability of the flow for a batched input.

Parameters:

x (jnp.ndarray (batch_size, ndim)) – Sample for which to predict posterior values.
temperature (float, optional) – Factor by which base Gaussian unit covariance matrix is scaled. Should be between 0 and 1 for use in evidence estimation. Defaults to 1.

Returns:

Predicted log_e posterior value.

Return type:

jnp.ndarray (batch_size,)

make_flow(temperature: float = 1.0)

Make distrax distribution containing the rational quadratic spline flow.

Parameters:: temperature (float, optional) – Factor by which base Gaussian unit covariance matrix is scaled. Should be between 0 and 1 for use in evidence estimation. Defaults to 1.
Returns:: Base Gaussian transformed by rational quadratic spline flow.

sample(rng: PRNGKey, num_samples: int, temperature: float = 1.0) → array

” Sample from the flow.

Parameters:

rng (Union[Array, PRNGKeyArray])) – Key used in random number generation process.
num_samples (int) – Number of samples generated.
temperature (float, optional) – Factor by which base Gaussian unit covariance matrix is scaled. Should be between 0 and 1 for use in evidence estimation. Defaults to 1.

Returns:

Samples from fitted distribution.

Return type:

jnp.array (num_samples, ndim)

setup()

Initializes a Module lazily (similar to a lazy __init__).

setup is called once lazily on a module instance when a module is bound, immediately before any other methods like __call__ are invoked, or before a setup-defined attribute on self is accessed.

This can happen in three cases:

Immediately when invoking apply(), init() or init_and_output().
Once the module is given a name by being assigned to an attribute of another module inside the other module’s setup method (see __setattr__()):
>>> class MyModule(nn.Module):
...   def setup(self):
...     submodule = nn.Conv(...)

...     # Accessing `submodule` attributes does not yet work here.

...     # The following line invokes `self.__setattr__`, which gives
...     # `submodule` the name "conv1".
...     self.conv1 = submodule

...     # Accessing `submodule` attributes or methods is now safe and
...     # either causes setup() to be called once.
Once a module is constructed inside a method wrapped with compact(), immediately before another method is called or setup defined attribute is accessed.

class harmonic.model.FlowModel(ndim_in: int, learning_rate: float = 0.001, momentum: float = 0.9, standardize: bool = False, temperature: float = 0.8)

Normalizing flow model to approximate the log_e posterior by a normalizing flow.

fit(X: Array, batch_size: int = 64, epochs: int = 3, key=Array([0, 1000], dtype=uint32), verbose: bool = False)

Fit the parameters of the model.

Parameters:

X (jnp.ndarray (nsamples, ndim)) – Training samples.
batch_size (int, optional) – Batch size used when training flow. Default = 64.
epochs (int, optional) – Number of epochs flow is trained for. Default = 3.
key (Union[jax.Array, jax.random.PRNGKeyArray], optional) – Key used in random number generation process.
verbose (bool, optional) – Controls if progress bar and current loss are displayed when training. Default = False.

Raises:

ValueError – Raised if the second dimension of X is not the same as ndim.
NotImplementedError – If called directly from FlowModel class.

predict(x: Array) → Array

Predict the value of log_e posterior at batched input x.

Parameters:: x (jnp.ndarray (batch_size, ndim)) – Sample for which to predict posterior values.
Returns:: Predicted log_e posterior value.
Return type:: jnp.ndarray (batch_size,)
Raises:: ValueError – If temperature is negative or greater than 1.

sample(n_sample: int, rng_key=Array([0, 0], dtype=uint32)) → Array

Sample from trained flow.

Parameters:

nsample (int) – Number of samples generated.
rng_key (Union[jax.Array, jax.random.PRNGKeyArray]), optional) – Key used in random number generation process.

Raises:

ValueError – If temperature is negative or greater than 1.

Returns:

Samples from fitted distribution.

Return type:

jnp.array (n_sample, ndim)

class harmonic.model.RQSplineModel(ndim_in: int, n_layers: int = 8, n_bins: int = 8, hidden_size: Sequence[int] = [64, 64], spline_range: Sequence[float] = (-10.0, 10.0), standardize: bool = False, learning_rate: float = 0.001, momentum: float = 0.9, temperature: float = 0.8)

Rational quadratic spline flow model to approximate the log_e posterior by a normalizing flow.

__init__(ndim_in: int, n_layers: int = 8, n_bins: int = 8, hidden_size: Sequence[int] = [64, 64], spline_range: Sequence[float] = (-10.0, 10.0), standardize: bool = False, learning_rate: float = 0.001, momentum: float = 0.9, temperature: float = 0.8)

Constructor setting the hyper-parameters and domains of the model.

Must be implemented by derived class (currently abstract).

Parameters:

ndim_in (int) – Dimension of the problem to solve.
n_layers (int, optional) – Number of layers in the flow. Defaults to 8.
n_bins (int, optional) – Number of bins in the spline. Defaults to 8.
hidden_size (Sequence[int], optional) – Size of the hidden layers in the conditioner. Defaults to [64, 64].
spline_range (Sequence[float], optional) – Range of the spline. Defaults to (-10.0, 10.0).
standardize (bool, optional) – Indicates if mean and variance should be removed from training data when training the flow. Defaults to False.
learning_rate (float, optional) – Learning rate for adam optimizer used in the fit method. Defaults to 0.001.
momentum (float, optional) – Learning rate for Adam optimizer used in the fit method. Defaults to 0.9.
temperature (float, optional) – Scale factor by which the base distribution Gaussian is compressed in the prediction step. Should be positive and <=1. Defaults to 0.8.

Raises:

ValueError – If the ndim_in is not positive.

class harmonic.model.RealNVPModel(ndim_in: int, n_scaled_layers: int = 2, n_unscaled_layers: int = 4, learning_rate: float = 0.001, momentum: float = 0.9, standardize: bool = False, temperature: float = 0.8)

Normalizing flow model to approximate the log_e posterior by a NVP normalizing flow.

__init__(ndim_in: int, n_scaled_layers: int = 2, n_unscaled_layers: int = 4, learning_rate: float = 0.001, momentum: float = 0.9, standardize: bool = False, temperature: float = 0.8)

Constructor setting the hyper-parameters of the model.

Parameters:

ndim_in (int) – Dimension of the problem to solve.
n_scaled_layers (int, optional) – Number of layers with scaler in RealNVP flow. Default = 2.
n_unscaled_layers (int, optional) – Number of layers without scaler in RealNVP flow. Default = 4.
learning_rate (float, optional) – Learning rate for adam optimizer used in the fit method. Default = 0.001.
momentum (float, optional) – Learning rate for Adam optimizer used in the fit method. Default = 0.9
standardize (bool, optional) – Indicates if mean and variance should be removed from training data when training the flow. Default = False
temperature (float, optional) – Scale factor by which the base distribution Gaussian is compressed in the prediction step. Should be positive and <=1. Default = 0.8.

Raises:

ValueError – If the ndim_in is not positive.
ValueError – If n_scaled_layers is not positive.

harmonic.model.make_training_loop(model)

Create a function that trains an NF model.

Parameters:: model – a neural network model with a log_prob method.
Returns:: wrapper function that trains the model.
Return type:: train_flow (Callable)

Note

Adapted from github.com/kazewong/flowMC

class harmonic.model_legacy.HyperSphere(ndim_in, domains, hyper_parameters=None)

HyperSphere Model to approximate the log_e posterior by a hyper-ellipsoid.

__init__(ndim_in, domains, hyper_parameters=None)

Constructor setting the parameters of the model.

Parameters:

dim_in (long) – Dimension of the problem to solve.
domains (list) – A list of length 1 containing a 1D array of length 2 containing the lower and upper bound of the radius of the hyper-sphere.
hyper_parameters (None) – Should not be set as there are no hyper-parameters for this model (in general, however, models can have hyper-parameters).

Raises:

ValueError – If the hyper_parameters variable is not None.
ValueError – If the length of domains list is not one.
ValueError – If the ndim_in is not positive.

fit(X, Y)

Fit the parameters of the model (i.e. its radius).

Parameters:

X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Y (double ndarray[nsamples]) – Target log_e posterior values for each sample in X.

Returns:

A tuple containing the following objects.

success (bool): Whether fit successful.

objective (double): Value of objective at optimal point.

Return type:

(bool, double)

Raises:

ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the second dimension of X is not the same as ndim.

predict(x)

Use model to predict the value of log_e posterior at point x.

Parameters:: x (double) – Sample of which to predict posterior value.
Returns:: Predicted posterior value.
Return type:: (double)

set_R(R)

Set the radius of the hyper-sphere and calculate its volume.

Parameters:

R (double) – The radius of the hyper-sphere.

Raises:

ValueError – If the radius is a NaN.
ValueError – If the radius is not positive.

set_centre(centre_in)

Set centre of the hyper-sphere.

Parameters:

centre_in (double ndarray[ndim]) – Centre of sphere.

Raises:

ValueError – If the length of the centre array is not the same as ndim.
ValueError – If the centre array contains a NaN.

set_inv_covariance(inv_covariance_in)

Set diagonal inverse covariances for the hyper-sphere.

Only diagonal covariance structure is supported.

Parameters:

inv_covariance_in (double ndarray[ndim]) – Diagonal of inverse covariance matrix that defines the ellipse.

Raises:

ValueError – If the length of the inv_covariance array is not equal to ndim.
ValueError – If the inv_covariance array contains a NaN.
ValueError – If the inv_covariance array contains a value that is not positive.

set_precomputed_values(): Precompute volume of the hyper-sphere (scaled ellipse) and squared radius.

class harmonic.model_legacy.KernelDensityEstimate(ndim, domains, hyper_parameters=[0.1])

KernelDensityEstimate model to approximate the log_e posterior using kernel density estimation.

__init__(ndim, domains, hyper_parameters=[0.1])

Constructor setting the hyper-parameters and domains of the model.

Parameters:

ndim (long) – Dimension of the problem to solve.
domains (list) – List of length 0 since domain not considered for Kernel Density Estimation.
hyper_parameters (list) – A list of length 1 containing the diameter in scaled units of the hyper-spheres to use in the Kernel Density Estimate.

Raises:

ValueError – If the hyper_parameters list is not length 1.
ValueError – If the length of domains list is not 0.
ValueError – If the ndim_in is not positive.

fit(X, Y)

Fit the parameters of the model.

Fit is performed as follows.

Set the scales of the model from the samples.

Create the dictionary containing all the information on which samples are in which pixel in a grid where each pixel size is the same as the diameter of the hyper-spheres to be placed on each sample.

The key is an index of the grid (c type ordering) and the value is a list containing the indexes in the sample array of all the samples in that index 3.

Precompute the normalisation factor.

Parameters:

X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Y (double ndarray[nsamples]) – Target log_e posterior values for each sample in X.

Returns:

Whether fit successful.

Return type:

(bool)

Raises:

ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the second dimension of X is not the same as ndim.

precompute_normalising_factor(X)

Precompute the log_e normalisation factor of the density estimation.

Parameters:: X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Raises:: ValueError – Raised if the second dimension of X is not the same as ndim.

predict(x)

Predict the value of the posterior at point x.

Parameters:: x (double ndarray[ndim]) – 1D array of sample of shape (ndim) to predict posterior value.
Returns:: Predicted log_e posterior value.
Return type:: (double)

set_scales(X)

Set the scales of the hyper-spheres based on the min and max sample in each dimension.

Parameters:: X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Raises:: ValueError – Raised if the second dimension of X is not the same as ndim.

class harmonic.model_legacy.ModifiedGaussianMixtureModel(ndim, domains, hyper_parameters=[3, 1e-08, None, None, None])

ModifiedGaussianMixtureModel (MGMM) to approximate the log_e posterior by a modified Gaussian mixture model.

__init__(ndim, domains, hyper_parameters=[3, 1e-08, None, None, None])

Constructor setting the hyper-parameters and domains of the model of the MGMM which models the posterior as a group of Gaussians.

Parameters:

ndim (long) – Dimension of the problem to solve.
domains (list) – A list of length 1 with the range of scale parameter of the covariance matrix, i.e. the range of alpha, where C’ = alpha * C_samples, and C_samples is the diagonal of the covariance in the samples in each cluster.
hyper_parameters (list) – A list of length 5, the first of which should be number of clusters, the second is the regularisation parameter gamma, the third is the learning rate, the fourth is the maximum number of iterations and the fifth is the batch size.

Raises:

ValueError – Raised if the hyper_parameters list is not length 5.
ValueError – Raised if the length of domains list is not 1.
ValueError – Raised if the ndim is not positive.

fit(X, Y)

Fit the parameters of the model as follows.

If centres and inv_covariances not set: - Find clusters using the k-means clustering from scikit learn. - Use the samples in the clusters to find the centres and covariance matricies.

Then minimize the objective function using the gradients and mini-batch stochastic descent.

Parameters:

X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Y (double ndarray[nsamples]) – Target log_e posterior values for each sample in X.

Returns:

Whether fit successful.

Return type:

(bool)

Raises:

ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the second dimension of X is not the same as ndim.

predict(x)

Predict the value of the posterior at point x.

Parameters:: x (double ndarray[ndim]) – Sample of shape (ndim) at which to predict posterior value.
Returns:: Predicted log_e posterior value.
Return type:: (double)

set_alphas(alphas_in)

Set the alphas (i.e. scales).

Parameters:

alphas_in (double ndarray[ngaussians]) – Alpha scalings.

Raises:

ValueError – Raised if the input array length is not ngaussians.
ValueError – Raised if the input array contains a NaN.
ValueError – Raised if at least one of the alphas not positive.

set_centres(centres_in)

Set the centres of the Gaussians.

Parameters:

centres_in (double ndarray[ndim, ngaussians]) – Centres.

Raises:

ValueError – Raised if the input array is not the correct shape.
ValueError – Raised if the input array contains a NaN.

set_centres_and_inv_covariance(centres_in, inv_covariance_in)

Set the centres and inverse covariance of the Gaussians.

Parameters:

centres_in (double ndarray[ndim, ngaussians]) – Centres.
inv_covariance (double ndarray[ndim, ngaussians]) – Inverse covariance of the Gaussians.

Raises:

ValueError – Raised if the input arrays are not the correct shape.
ValueError – Raised if the input arrays contain a NaN.
ValueError – Raised if the input covariance contains a number that is not positive.

set_inv_covariance(inv_covariance_in)

Set the inverse covariance of the Gaussians.

Parameters:

inv_covariance (double ndarray[ndim, ngaussians]) – Inverse covariance of the Gaussians.

Raises:

ValueError – Raised if the input array is not the correct shape.
ValueError – Raised if the input array contains a NaN.
ValueError – Raised if the input array contains a number that is not positive.

set_weights(weights_in)

Set the weights of the Gaussians.

The weights are the softmax of the betas (without normalisation), i.e. the betas are the log_e of the weights.

Parameters:

weights_in (double ndarray[ngaussians]) – 1D array containing the weights (no need to normalise).

Raises:

ValueError – Raised if the input array length is not ngaussians.
ValueError – Raised if the input array contains a NaN.
ValueError – Raised if at least one of the weights is negative.
ValueError – Raised if the sum of the weights is too close to zero.

harmonic.model_legacy.beta_to_weights_wrap(beta, ngaussians)

Wrapper to calculate the weights from the beta_weights.

Parameters:

beta (double ndarray[ngaussians]) – Beta values to be converted.
ngaussians (long) – The number of Gaussians in the model.

Returns:

Weight values.

Return type:

(double ndarray[ngaussians])

harmonic.model_legacy.calculate_gaussian_normalisation_wrap(alpha, inv_covariance, ndim)

Wrapper to calculate the normalisation for evaluate_one_gaussian.

Parameters:

alpha (double) – The scaling parameter of the covariance matrix.
inv_covariance (double ndarray[ndim]) – Diagonal of inverse covariance matrix.
ndim (long) – Dimension of the problem.

Returns:

The normalisation factor.

Return type:

(double)

harmonic.model_legacy.delta_theta_ij_wrap(x, mu, inv_covariance, ndim)

Wrapper to evaluate delta_theta_ij squared which is part of the gradient of the objective function.

Parameters:

x (double ndarray[ndim]) – Position of current sample.
mu (double ndarray[ndim]) – Centre of the Gaussian.
inv_covariance (double ndarray[ndim]) – Diagonal of inverse covariance matrix.
ndim (long) – Dimension of the problem.

Returns:

Value of delta_theta_ij squared.

Return type:

(double)

harmonic.model_legacy.evaluate_one_gaussian_wrap(x, mu, inv_covariance, alpha, weight, ndim)

Wrapper to evaluate one Gaussian.

Parameters:

x (double ndarray[ndim]) – Postion where the Gaussian is to be evaluated.
mu (double ndarray[ndim]) – Centre of the Gaussian.
inv_covariance (double ndarray[ndim]) – Diagonal of inverse covariance matrix.
alpha (double) – Scaling parameter of the covariance matrix.
weight (double) – Weight applied to that Gaussian.
ndim (long) – Dimension of the problem.

Returns:

Height of the Gaussian.

Return type:

(double)

harmonic.utils.cross_validation(chains, domains: ~typing.List, hyper_parameters: ~typing.List, nfold=2, modelClass=<class 'harmonic.model_legacy.KernelDensityEstimate'>, seed: int = -1) → List

Perform n-fold validation for given model using chains to be split into validation and training data.

First, splits data into nfold chunks. Second, fits the model using each of the hyper-parameters given using all but one of the chunks (the validation chunk). This procedure is performed for all the chunks and the average (mean) log-space variance from all the chunks is computed and returned. This can be used to decide which hyper-parameters list was better.

Parameters:

chains (Chains) – Chains containing samples (to be split into training and validation data herein).
domains (List) – Domains of the model’s parameters.
hyper_parameters (List) – List of hyper_parameters where each entry is a hyper_parameter list to be considered.
modelClass (Model) – Model that is being cross validated (default = KernelDensityEstimate).
seed (int) – Seed for random number generator when drawing the chains (if this is negative the seed is not set).

Returns:

Mean log validation variance (averaged over nfolds) for each hyper-parameter.

Return type:

(List)

Raises:

ValueError – Raised if model is not one of the posible models.

harmonic.utils.eval_func_on_grid(func, xmin, xmax, ymin, ymax, nx, ny)

Evalute 2D function on a grid.

Parameters:

func (-) – Function to evalate.
xmin (-) – Minimum x value to consider in grid domain.
xmax (-) – Maximum x value to consider in grid domain.
ymin (-) – Minimum y value to consider in grid domain.
ymax (-) – Maximum y value to consider in grid domain.
nx (-) – Number of samples to include in grid in x direction.
ny (-) – Number of samples to include in grid in y direction.

Returns:

Function values evaluated on the 2D grid. - x_grid:

x values over the 2D grid.

y_grid:
y values over the 2D grid.

Return type:

func_eval_grid

harmonic.utils.plot_getdist(samples, labels=None)

Plot triangle plot of marginalised distributions using getdist package.

Parameters:

samples (-) – 2D array of shape (ndim, nsamples) containing samples.
labels (-) – Array of strings containing axis labels.

Returns:

None

harmonic.utils.plot_getdist_compare(samples1, samples2, labels=None, fontsize=17, legend_fontsize=15)

Plot triangle plot of marginalised distributions using getdist package.

Parameters:

samples1 – 2D array of shape (ndim, nsamples) containing samples from the posterior.
samples2 – 2D array of shape (ndim, nsamples) containing samples from the concentrated flow.
labels – Array of strings containing axis labels for both sets of samples.
fontsize – Plot fontsize.
legend_fontsize – Plot legend fontsize.

Returns:

None

harmonic.utils.split_data(chains, training_proportion: float = 0.5) → Tuple

Split the data in a chains instance into two (e.g. training and test sets).

New chains instances can be used for training and calculation the evidence on the “test” set.

Chains are split so that the first chains in the original chains object go into the training set and the following go into the test set.

Parameters:

chains (Chains) – Instance of a chains class containing the data to be split.
training_proportion (float) – Proportion of data to be used in training (default=0.5)

Returns:

A tuple containing the following two Chains.

chains_train (Chains): Instance of a chains class containing
chains to be used to fit the model (e.g. training).

chains_test (Chains): Instance of a chains class containing
chains to be used to calculate the evidence (e.g. testing).

Return type:

(Chains, Chains)

Raises:

ValueError – Raised if training_proportion is not strictly between 0 and 1.
ValueError – Raised if resulting nchains in training set is less than 1.
ValueError – Raised if resulting nchains in test set is less than 1.

harmonic.utils.validation_fit_indexes(i_fold: int, nchains_in_val_set: int, nfold: int, indexes) → Tuple[List, List]

Extract the correct indexes for the chains of the validation and training sets.

Parameters:

i_fold (int) – Cross-validation iteration to perform.
nchains_in_val_set (int) – The number of chains that will go in each validation set.
nfold (int) – Number of fold validation sets to be made.
indexes (List) – List of the chains to be used in fold validation that need to be split.

Returns:

A tuple containing the following two lists of indices.

indexes_val (List): List of indexes for the validation set.

indexes_fit (List): List of indexes for the training set.

Return type:

(List, List)

Raises:

ValueError – Raised if the value of i_fold does not fall between 0 and nfold-1.