Namespaces
Automatically generated documentation for harmonic APIs. All functionality is accessible through a pip installation of the harmonic package.
- class harmonic.Chains(ndim)
Class to store samples from multiple MCMC chains.
- __init__(ndim)
Construct empty Chains for parameter space of dimension ndim.
Constructor simply sets ndim. Chain samples are added by the add_chain* methods since we want to support setting up data for chains from different input data formats (e.g. data from a single chain or multiple chains at once).
- Parameters
ndim (long) – Dimension of the parameter space.
- add(other)
Add other Chain object to this object.
- Parameters
other (Chains) – Other Chain object to be added to this object.
- Raises
ValueError – Raised if the new chain has a different ndim.
- add_chain(samples, ln_posterior)
Add a single chain to a Chains object.
- Parameters
samples (double ndarray[nsamples, ndim]) – Samples of a single chain.
ln_posterior (double ndarray[n_new_samples]) – log_e posterior values.
- Raises
ValueError – Raised when ndim of new chain does not match previous chains.
- add_chains_2d(samples, ln_posterior, nchains_in)
Add a number of chains to a Chains object assuming all chains are of the same length.
- Parameters
samples (double ndarray[nsamples_in * nchains_in, ndim]) – Samples of multiple chains.
ln_posterior (double ndarray[nsamples_in * nchains_in]) – log_e posterior values.
nchains_in (long) – Number of chains to be added.
- Raises
ValueError – Raised when number of samples is not multiple of the number of chains.
ValueError – Raised when ndim of new chains does not match previous chains.
ValueError – Raised when posterior and samples first length are different.
- add_chains_2d_list(samples, ln_posterior, nchains_in, chain_indexes)
Add a number of chains to the chain class. Uses a list of indexes to determine where each chain starts and stops.
- Parameters
samples (double ndarray[nsamples_in * nchains_in, ndim]) – Samples of multiple chains.
ln_posterior (double ndarray[nsamples_in * nchains_in]) – log_e posterior values.
nchains_in (long) – Number of chains to be added.
chain_indexes (list) – List of the starting index of the chains.
- Raises
ValueError – Raised when ndim of new chains does not match previous chains.
ValueError – Raised when posterior and samples first length are different.
ValueError – Raised when the length of the list is not nchains_in + 1.
- add_chains_3d(samples, ln_posterior)
Add a number of chains to a Chain object from 3D array.
- Parameters
samples (double ndarray[(nchains_in, nsamples_in, ndim]) – Samples from multiple chains.
ln_posterior (double ndarray[nchains_in, nsamples_in]) – log_e posterior values.
- Raises
ValueError – Raised when ndim of new chains does not match previous chains.
ValueError – Raised when posterior and samples first and second length are different.
- deepcopy()
Performs deep copy of the chain class (calls the module copy).
- get_chain_indices(i)
Gets the start and end index of samples from a chain.
The end index specifies the index one passed the end of the chain, i.e. the chain samples can be accessed by self.samples[start:end,:].
- Parameters
i (long) – Index of chain of which to determine start and end indices.
- Returns
A tuple of the start and end index, i.e. (start, end).
- Return type
(long, long)
- Raises
ValueError – Raised when chain number invalid.
- get_sub_chains(chains_wanted)
Creates a new chain instance with the chains indexed in chains_wanted. (Useful for cross-validation.)
- Parameters
chains_wanted (list) – List of indexes of chains that the new chain instance will contain.
- Returns
Chains object containing the chains wanted.
- Return type
- Raises
ValueError – If any of the chains_wanted indexes are out of bounds i.e. outside of range 0 to nchains - 1.
- nsamples_per_chain()
Compute list containing number of samples in each chain.
- Parameters
None. –
- Returns
- 1D list of length self.nchains containing the
number of samples in each chain.
- Return type
nsamples_per_chain (list)
- remove_burnin(nburn=100)
Remove burn-in samples from each chain.
- Parameters
nburn (int) – Number of burn-in samples to remove from each chain.
- Raises
ValueError – Raised when nburn not less then number of samples in each chain.
- shallowcopy()
Performs shallow copy of the chain class (calls the module copy).
- split_into_blocks(nblocks=100)
Split chains into larger number of blocks.
The intention of this method is to break chains into blocks that are (approximately) independent in order to get more independent chains for computing various statistics.
Each existing chain is split into blocks (i.e. new chains), proportionally to the size of the current chains. Final blocks within each chain end up containing slightly different numbers of samples (since we do not ever want to throw away samples!). One could improve this, if required, to distribute the additional samples across all of the blocks of the chain.
- Parameters
nblocks (int) – Number of new (blocked) chains to split existing chains into.
- Raises
ValueError – Returned if nblocks < the number chains
- class harmonic.Evidence(nchains, model, shift=Shifting.MEAN_SHIFT)
Compute inverse evidence values from chains, using posterior model.
Multiple chains can be added in sequence (to avoid having to store very long chains).
- __init__(nchains, model, shift=Shifting.MEAN_SHIFT)
Construct evidence class for computing inverse evidence values from set number of chains and initialised posterior model.
- Parameters
nchains (long) – Number of chains that will be used in the computation.
model (Model) – An instance of a posterior model class that has been fitted.
shift (Shifting) – What shifting method to use to avoid over/underflow during computation. Selected from enumerate class.
- Raises
ValueError – Raised if the number of chains is not positive.
ValueError – Raised if the number of dimensions is not positive.
ValueError – Raised if model not fitted.
- add_chains(chains)
Add new chains and calculate an estimate of the inverse evidence, its variance, and the variance of the variance.
Calculations are performed by using running averages of the totals for each chain. Consequently, the method can be called many times with new samples for each chain so that the evidence estimate will improve. The rationale is that not all samples need to be stored in memory for high-dimensional problems. Note that the same number of chains needs to be considered for each call.
- Parameters
chains – An instance of the chains class containing the chains to be used in the calculation.
- Raises
ValueError – Raised if the input number of chains to not match the number of chains already set up.
ValueError – Raised if both max and mean shift are set.
- check_basic_diagnostic()
Perform basic diagonstic check on sanity of evidence calculations.
If these tests pass it does not necessarily mean the evidence is accurate and other tests should still be performed.
- Returns
Whether diagnostic tests pass.
- Return type
Boolean
- Raises
Warnings – Raised if the diagnostic tests fail.
- compute_evidence()
Compute evidence from the inverse evidence.
- Returns
Tuple containing the following.
evidence (double): Estimate of evidence.
evidence_std (double): Estimate of standard deviation of evidence.
- Return type
(double, double)
- compute_ln_evidence()
Compute log_e of evidence from the inverse evidence.
- Returns
Tuple containing the following.
ln_evidence (double): Estimate of log_e of evidence.
- ln_evidence_std (double): Estimate of log_e of standard
deviation of evidence.
- Return type
(double, double)
- classmethod deserialize(filename)
Deserialize Evidence object from file.
- Parameters
filename (string) – Name of file from which to read evidence object.
- Returns
Evidence object deserialized from file.
- Return type
(Evidence)
- process_run()
Use the running totals of realspace running_sum and nsamples_per_chain to calculate an estimate of the inverse evidence, its variance, and the variance of the variance.
This method is ran each time chains are added to update the inverse variance estimates from the running totals.
- serialize(filename)
Serialize evidence object.
- Parameters
filename (string) – Name of file to save evidence object.
- set_shift(shift_value)
Set the shift value of log_e posterior values to aid numerical stability.
- Parameters
shift_value (double) – Shift value.
- Raises
ValueError – Raised if shift_value is NaN.
ValueError – Raised if one attempts to set shift when another shift is already set.
- class harmonic.Shifting(value)
Enumeration to define which log-space shifting to adopt. Different choices may prove optimal for certain settings.
- ABS_MAX_SHIFT = 4
- MAX_SHIFT = 2
- MEAN_SHIFT = 1
- MIN_SHIFT = 3
- harmonic.evidence.compute_bayes_factor()
Compute Bayes factor of two models.
- Parameters
ev1 (double) – Evidence object of model 1 with chains added.
ev2 (double) – Evidence object of model 2 with chains added.
- Returns
Tuple containing the following.
bf12: Estimate of the Bayes factor Z_1 / Z_2.
- bf12_std: Estimate of the standard deviation of the Bayes factor
sqrt( var ( Z_1 / Z_2 ) ).
- Return type
(double, double)
- Raises
ValueError – Raised if model 1 does not have chains added.
ValueError – Raised if model 2 does not have chains added.
- harmonic.evidence.compute_ln_bayes_factor()
Computes log_e of Bayes factor of two models.
- Parameters
ev1 (double) – Evidence object of model 1 with chains added.
ev2 (double) – Evidence object of model 2 with chains added.
- Returns
Tuple containing the following.
ln_bf12: Estimate of log_e of the Bayes factor ln ( Z_1 / Z_2 ).
- ln_bf12_std: Estimate of log_e of the standard deviation of the
Bayes factor ln ( sqrt( var ( Z_1 / Z_2 ) ) ).
- Return type
(double, double)
- Raises
ValueError – Raised if model 1 does not have chains added.
ValueError – Raised if model 2 does not have chains added.
- class harmonic.model.HyperSphere(ndim_in, domains, hyper_parameters=None)
HyperSphere Model to approximate the log_e posterior by a hyper-ellipsoid.
- __init__(ndim_in, domains, hyper_parameters=None)
Constructor setting the parameters of the model.
- Parameters
dim_in (long) – Dimension of the problem to solve.
domains (list) – A list of length 1 containing a 1D array of length 2 containing the lower and upper bound of the radius of the hyper-sphere.
hyper_parameters (None) – Should not be set as there are no hyper-parameters for this model (in general, however, models can have hyper-parameters).
- Raises
ValueError – If the hyper_parameters variable is not None.
ValueError – If the length of domains list is not one.
ValueError – If the ndim_in is not positive.
- fit(X, Y)
Fit the parameters of the model (i.e. its radius).
- Parameters
X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Y (double ndarray[nsamples]) – Target log_e posterior values for each sample in X.
- Returns
A tuple containing the following objects.
success (bool): Whether fit successful.
objective (double): Value of objective at optimal point.
- Return type
(bool, double)
- Raises
ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the second dimension of X is not the same as ndim.
- is_fitted()
Specify whether model has been fitted.
- Returns
Whether the model has been fitted.
- Return type
(bool)
- predict(x)
Use model to predict the value of log_e posterior at point x.
- Parameters
x (double) – Sample of which to predict posterior value.
- Returns
Predicted posterior value.
- Return type
(double)
- set_R(R)
Set the radius of the hyper-sphere and calculate its volume.
- Parameters
R (double) – The radius of the hyper-sphere.
- Raises
ValueError – If the radius is a NaN.
ValueError – If the radius is not positive.
- set_centre(centre_in)
Set centre of the hyper-sphere.
- Parameters
centre_in (double ndarray[ndim]) – Centre of sphere.
- Raises
ValueError – If the length of the centre array is not the same as ndim.
ValueError – If the centre array contains a NaN.
- set_inv_covariance(inv_covariance_in)
Set diagonal inverse covariances for the hyper-sphere.
Only diagonal covariance structure is supported.
- Parameters
inv_covariance_in (double ndarray[ndim]) – Diagonal of inverse covariance matrix that defines the ellipse.
- Raises
ValueError – If the length of the inv_covariance array is not equal to ndim.
ValueError – If the inv_covariance array contains a NaN.
ValueError – If the inv_covariance array contains a value that is not positive.
- set_precomputed_values()
Precompute volume of the hyper-sphere (scaled ellipse) and squared radius.
- class harmonic.model.KernelDensityEstimate(ndim, domains, hyper_parameters=[0.1])
KernelDensityEstimate model to approximate the log_e posterior using kernel density estimation.
- __init__(ndim, domains, hyper_parameters=[0.1])
Constructor setting the hyper-parameters and domains of the model.
- Parameters
ndim (long) – Dimension of the problem to solve.
domains (list) – List of length 0 since domain not considered for Kernel Density Estimation.
hyper_parameters (list) – A list of length 1 containing the diameter in scaled units of the hyper-spheres to use in the Kernel Density Estimate.
- Raises
ValueError – If the hyper_parameters list is not length 1.
ValueError – If the length of domains list is not 0.
ValueError – If the ndim_in is not positive.
- fit(X, Y)
Fit the parameters of the model.
Fit is performed as follows.
Set the scales of the model from the samples.
Create the dictionary containing all the information on which samples are in which pixel in a grid where each pixel size is the same as the diameter of the hyper-spheres to be placed on each sample.
The key is an index of the grid (c type ordering) and the value is a list containing the indexes in the sample array of all the samples in that index 3.
Precompute the normalisation factor.
- Parameters
X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Y (double ndarray[nsamples]) – Target log_e posterior values for each sample in X.
- Returns
Whether fit successful.
- Return type
(bool)
- Raises
ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the second dimension of X is not the same as ndim.
- is_fitted()
Specify whether model has been fitted.
- Returns
Whether the model has been fitted.
- Return type
(bool)
- precompute_normalising_factor(X)
Precompute the log_e normalisation factor of the density estimation.
- Parameters
X (double ndarray[nsamples, ndim]) – Sample x coordinates.
- Raises
ValueError – Raised if the second dimension of X is not the same as ndim.
- predict(x)
Predict the value of the posterior at point x.
- Parameters
x (double ndarray[ndim]) – 1D array of sample of shape (ndim) to predict posterior value.
- Returns
Predicted log_e posterior value.
- Return type
(double)
- set_scales(X)
Set the scales of the hyper-spheres based on the min and max sample in each dimension.
- Parameters
X (double ndarray[nsamples, ndim]) – Sample x coordinates.
- Raises
ValueError – Raised if the second dimension of X is not the same as ndim.
- class harmonic.model.ModifiedGaussianMixtureModel(ndim, domains, hyper_parameters=[3, 1e-08, None, None, None])
ModifiedGaussianMixtureModel (MGMM) to approximate the log_e posterior by a modified Gaussian mixture model.
- __init__(ndim, domains, hyper_parameters=[3, 1e-08, None, None, None])
Constructor setting the hyper-parameters and domains of the model of the MGMM which models the posterior as a group of Gaussians.
- Parameters
ndim (long) – Dimension of the problem to solve.
domains (list) – A list of length 1 with the range of scale parameter of the covariance matrix, i.e. the range of alpha, where C’ = alpha * C_samples, and C_samples is the diagonal of the covariance in the samples in each cluster.
hyper_parameters (list) – A list of length 5, the first of which should be number of clusters, the second is the regularisation parameter gamma, the third is the learning rate, the fourth is the maximum number of iterations and the fifth is the batch size.
- Raises
ValueError – Raised if the hyper_parameters list is not length 5.
ValueError – Raised if the length of domains list is not 1.
ValueError – Raised if the ndim is not positive.
- fit(X, Y)
Fit the parameters of the model as follows.
If centres and inv_covariances not set: - Find clusters using the k-means clustering from scikit learn. - Use the samples in the clusters to find the centres and covariance matricies.
Then minimize the objective function using the gradients and mini-batch stochastic descent.
- Parameters
X (double ndarray[nsamples, ndim]) – Sample x coordinates.
Y (double ndarray[nsamples]) – Target log_e posterior values for each sample in X.
- Returns
Whether fit successful.
- Return type
(bool)
- Raises
ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the first dimension of X is not the same as Y.
ValueError – Raised if the second dimension of X is not the same as ndim.
- is_fitted()
Specify whether model has been fitted.
- Returns
Whether the model has been fitted.
- Return type
(bool)
- predict(x)
Predict the value of the posterior at point x.
- Parameters
x (double ndarray[ndim]) – Sample of shape (ndim) at which to predict posterior value.
- Returns
Predicted log_e posterior value.
- Return type
(double)
- set_alphas(alphas_in)
Set the alphas (i.e. scales).
- Parameters
alphas_in (double ndarray[ngaussians]) – Alpha scalings.
- Raises
ValueError – Raised if the input array length is not ngaussians.
ValueError – Raised if the input array contains a NaN.
ValueError – Raised if at least one of the alphas not positive.
- set_centres(centres_in)
Set the centres of the Gaussians.
- Parameters
centres_in (double ndarray[ndim, ngaussians]) – Centres.
- Raises
ValueError – Raised if the input array is not the correct shape.
ValueError – Raised if the input array contains a NaN.
- set_centres_and_inv_covariance(centres_in, inv_covariance_in)
Set the centres and inverse covariance of the Gaussians.
- Parameters
centres_in (double ndarray[ndim, ngaussians]) – Centres.
inv_covariance (double ndarray[ndim, ngaussians]) – Inverse covariance of the Gaussians.
- Raises
ValueError – Raised if the input arrays are not the correct shape.
ValueError – Raised if the input arrays contain a NaN.
ValueError – Raised if the input covariance contains a number that is not positive.
- set_inv_covariance(inv_covariance_in)
Set the inverse covariance of the Gaussians.
- Parameters
inv_covariance (double ndarray[ndim, ngaussians]) – Inverse covariance of the Gaussians.
- Raises
ValueError – Raised if the input array is not the correct shape.
ValueError – Raised if the input array contains a NaN.
ValueError – Raised if the input array contains a number that is not positive.
- set_weights(weights_in)
Set the weights of the Gaussians.
The weights are the softmax of the betas (without normalisation), i.e. the betas are the log_e of the weights.
- Parameters
weights_in (double ndarray[ngaussians]) – 1D array containing the weights (no need to normalise).
- Raises
ValueError – Raised if the input array length is not ngaussians.
ValueError – Raised if the input array contains a NaN.
ValueError – Raised if at least one of the weights is negative.
ValueError – Raised if the sum of the weights is too close to zero.
- harmonic.logs.critical_log(message)
Log a critical message (e.g. core code failures etc).
- Parameters
message – Message to log.
- harmonic.logs.debug_log(message)
Log a debug message (e.g. for background logs to assist debugging).
- Parameters
message – Message to log.
- harmonic.logs.info_log(message)
Log an information message (e.g. evidence value printing, run completion etc).
- Parameters
message – Message to log.
- harmonic.logs.setup_logging(custom_yaml_path=None, default_level=10)
initialise and configure logging.
Should be called at the beginning of code to initialise and configure the desired logging level. Logging levels can be ints in [0,50] where 10 is debug logging and 50 is critical logging.
- Parameters
custom_yaml_path (string) – Complete pathname of desired yaml logging configuration. If empty will provide default logging config.
default_level (int) – Logging level at which to configure.
- Raises
ValueError – Raised if logging.yaml is not in ./logs/ directory.
- harmonic.logs.warning_log(message)
Log a warning (e.g. for internal code warnings such as large dynamic ranges).
- Parameters
message – Warning to log.
- harmonic.utils.cross_validation(chains, domains, hyper_parameters, nfold=2, modelClass=<class 'harmonic.model.KernelDensityEstimate'>, seed=-1)
Perform n-fold validation for given model using chains to be split into validation and training data.
First, splits data into nfold chunks. Second, fits the model using each of the hyper-parameters given using all but one of the chunks (the validation chunk). This procedure is performed for all the chunks and the average (mean) log-space variance from all the chunks is computed and returned. This can be used to decide which hyper-parameters list was better.
- Parameters
chains (Chains) – Chains containing samples (to be split into training and validation data herein).
domains (list) – Domains of the model’s parameters.
hyper_parameters (list) – List of hyper_parameters where each entry is a hyper_parameter list to be considered.
modelClass (Model) – Model that is being cross validated (default = KernelDensityEstimate).
seed (long) – Seed for random number generator when drawing the chains (if this is negative the seed is not set).
- Returns
Mean log validation variance (averaged over nfolds) for each hyper-parameter.
- Return type
(list)
- Raises
ValueError – Raised if model is not one of the posible models.
- harmonic.utils.split_data(chains, training_proportion=0.5)
Split the data in a chains instance into two (e.g. training and test sets).
New chains instances can be used for training and calculation the evidence on the “test” set.
Chains are split so that the first chains in the original chains object go into the training set and the following go into the test set.
- Parameters
chains (Chains) – Instance of a chains class containing the data to be split.
training_proportion (double) – Proportion of data to be used in training (default=0.5)
- Returns
A tuple containing the following two Chains.
chains_train (Chains): Instance of a chains class containing chains to be used to fit the model (e.g. training).
chains_test (Chains): Instance of a chains class containing chains to be used to calculate the evidence (e.g. testing).
- Return type
- Raises
ValueError – Raised if training_proportion is not strictly between 0 and 1.
ValueError – Raised if resulting nchains in training set is less than 1.
ValueError – Raised if resulting nchains in test set is less than 1.
- harmonic.utils.validation_fit_indexes(i_fold, nchains_in_val_set, nfold, indexes)
Extract the correct indexes for the chains of the validation and training sets.
- Parameters
i_fold (long) – Cross-validation iteration to perform.
nchains_in_val_set (long) – The number of chains that will go in each validation set.
nfold (long) – Number of fold validation sets to be made.
indexes (list) – List of the chains to be used in fold validation that need to be split.
- Returns
A tuple containing the following two lists of indices.
indexes_val (list): List of indexes for the validation set.
indexes_fit (list): List of indexes for the training set.
- Return type
(list, list)
- Raises
ValueError – Raised if the value of i_fold does not fall between 0 and nfold-1.