Parameter estimation¶
colfi.nde¶
- class colfi.nde.NDEs(obs_data, model, param_names, params_dict=None, cov_matrix=None, init_chain=None, init_params=None, nde_type='MNN', num_train=3000, num_vali=100, base_N=1000, space_type='hyperellipsoid', local_samples=None, chain_n=3, chain_leng=10000)[source]¶
Bases:
PlotPosteriorEstimating (cosmological) parameters with Neural Density Estimators (NDEs).
- Parameters:
obs_data (array-like or list) – The observations (measurements) with shape (obs_length,3), or a list of observations with shape [(obs_length_1,3), (obs_length_2,3), …]. The first column is the observational variable, the second column is the best values of the measurement, and the third column is the error of the measurement.
model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use local data sets, it should also contain ‘load_params’ and ‘load_sample’ methods.
param_names (list) – A list which contains the parameter names, e.g. [‘H0’,’ombh2’,’omch2’].
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: Nonecov_matrix (array-like, list, or None, optional) – Covariance matrix of the observational data. It should be an array with shape (obs_length, obs_length), or a list of covariance matrix with shape [(obs_length_1, obs_length_1), (obs_length_2, obs_length_2), …]. If there is no covariance for some observations, the covariance matrix should be set to None. e.g. [cov_matrix_1, None, cov_matrix_3]. Default: None
init_chain (None or array-like, optional) – The initial ANN or MCMC chain, which is usually based on prvious parameter estimation. Default: None
init_params (None or array-like, optional) – The initial settings of the parameter space. If
init_chainis given,init_paramswill be ignored. Default: Nonende_type (str or list, optional) – A string (or a list with two strings in it) that indicate which NDE should be used. There are four NDEs that can be used: ‘ANN’, ‘MDN’, ‘MNN’, or ‘ANNMC’. If a string is given, such as ‘ANN’, only ANN will be used for parameter estimation. If a list that contains two NDEs is given, such as [‘ANN’, ‘MNN’], then ANN will be used in the burn-in phase to find the burn-in end step, MNN will be used after the burn-in phase to obtain the posterior. Default: ‘MNN’
num_train (int, optional) – The number of samples of the training set. Default: 3000
num_vali (int, optional) – The number of samples of the validation set. Default: 100
base_N (int, optional) – The basic (or minimum) number of samples in the training set, which works only in the burn-in phase. Default: 1000
space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hyperellipsoid’
local_samples (None, str, or list, optional) – Path of local samples, None, or ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None
chain_n (int, optional) – If the number of ANN chains to be obtained, which also equals to the steps after the burn-in phase, it will be used to stop the whole training process. This only works after burn-in phase. Default: 3
chain_leng (int, optional) – The length of each ANN chain. Default: 10000
- Variables:
activation_func (str, optional) – The name of activation function, which can be ‘ReLU’, ‘LeakyReLU’, ‘PReLU’, ‘RReLU’, ‘ReLU6’, ‘ELU’, ‘CELU’, ‘SELU’, ‘SiLU’, ‘Sigmoid’, ‘LogSigmoid’, ‘Tanh’, ‘Tanhshrink’, ‘Softsign’, or ‘Softplus’ (see
activation()). Default: ‘Softplus’hidden_layer (int, optional) – The number of the hidden layer of the network (for a single branch network). Default: 3
branch_hiddenLayer (int, optional) – The number of the hidden layer for the branch part of the network (for a multibranch network). Default: 1
trunk_hiddenLayer (int, optional) – The number of the hidden layer for the trunk part of the network (for a multibranch network). Default: 2
lr (float, optional) – The learning rate setting of the network. Default: 1e-2
lr_min (float, optional) – The minimum of the learning rate. Default: 1e-8
batch_size (int, optional) – The batch size setting of the network. Default: 1250
auto_batchSize (bool, optional) – If True, the batch size will be set automatically in the training process, otherwise, use the setting of
batch_size. Default: Trueepoch (int, optional) – The number of epoch of the training process. Default: 2000
epoch_branch (int, optional) – The number of epoch of the training process (for the branch part of the multibranch network). Default: 2000
auto_epoch (bool, optional) – If True, the epoch will be set automatically in the training process, otherwise, use the setting of
epoch. Default: Falsecomp_type (str, optional) – The name of component used in the
MDNmethod, which can be ‘Gaussian’ or ‘Beta’. Default: ‘Gaussian’comp_n (int, optional) – The number of components used in the
MDNmethod. Default: 3spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5
noise_type (str, optional) – The type of Gaussian noise added to the training set, which should be ‘singleNormal’ or ‘multiNormal’. It only works for the NDEs
ANN,MDN, andMNN. ForANNandMNN, both ‘singleNormal’ and ‘multiNormal’ can be used, but it is recommended to use ‘multiNormal’; ForMDN, only ‘singleNormal’ can be used. Default: ‘multiNormal’factor_sigma (float, optional) – For the case of
noise_type= ‘singleNormal’,factor_sigmashould be set to 1. For the case ofnoise_type= ‘multiNormal’, it is the standard deviation of the coefficient of the observational error (standard deviation). Default: 0.2multi_noise (int, optional) – The number of realization of noise added to the measurement in one epoch. Default: 5
scale_obs (bool, optional) – If True, the observational data (measurements) will be scaled based on the base values of the data. Default: False
scale_params (bool, optional) – If True, the cosmological parameters will be scaled based on the base values of parameters. See
ParamsScaling. Default: Truenorm_obs (bool, optional) – If True, the observational data feed to the network will be normalized. Default: True
norm_params (bool, optional) – If True, the cosmological parameters will be normalized. Default: True
independent_norm_obs (bool, optional) – If True, each data point in the observational data (measurements) will be normalized independently. Default: False
independent_norm_params (bool, optional) – If True, each cosmological parameters will be normalized independently. Default: True
norm_type (str, optional) – The method of normalization, which can be ‘z_score’, ‘minmax’, or ‘mean’ (see
Normalize). Default: ‘z_score’train_branch (bool, optional) – If True, the branch part of the multibranch network will be trained before training the entire network. Default: False
repeat_n (int, optional) – The number of iterations using the same batch of data during network training, which is usually set to 1 or 3. Default: 3
fast_training (bool, optional) – If True, the batch size will be set to
batch_size*multi_noiseand the network will be trained fast. Default: Falserandn_num (float or str, optional) – A random number that identifies the saved results. Default: float
file_identity (str, optional) – A string that identifies the files saved to the disk, which is useful to identify the saved files. Default: ‘’
expectedBurnInEnd_step (int, optional) – The expected burn-in end step. If the burn-in phase does not end at a step equal to
expectedBurnInEnd_step, the training process will be broken, which means the setting of hyperparameters is not good or the NDE used is not suitable.chain_true_path (str, optional) – The path of the true chain of the posterior which can be obtained by using other methods, such as the MCMC method. Note: only
.npyand.txtfile is supported. Default: ‘’label_true (str, optional) – The legend label of the true chain. Default: ‘True’
fiducial_params (list, optional) – A list that contains the fiducial cosmological parameters. Default: []
Note
The number of samples of the training set should be large enough to ensure the network learns a reliable mapping. For example, set num_train to 1000, or a larger value like 3000.
The epoch should also be set large enough to ensure a well-learned network. e.g. set epoch to 2000, or a larger value like 3000.
The initial parameter space is suggested to set large enough to cover the true parameters. In this case, it be easier for the network to find the best-fit value of parameters.
It is better to set the number of ANN chains
chain_na large value like 3, and this will minimize the effect of randomness on the results. However, it is also acceptable to set a smaller value like 1.The advantage of this method is that we can analyze the results before the end of the training process, and determine how many steps can be used to estimate parameters.
Local samples can be used as training set to save time, so when using this method, you can generate a sample library for later reuse.
- property base_epoch¶
- property chain¶
Combined ANN chain using the result of steps after burn-in.
- property cov_copy¶
- property file_identity_str¶
- property good_chains¶
The ANN chians after the burn-in phase.
- property good_losses¶
- property num_train_burnIn¶
- property obs_errors¶
- property obs_variables¶
- simulate(nde_type, space_type, step, burnInEnd, param_devs, error_devs, spaceSigma_all, space_type_all=[], prev_space=None, chain_all=[], sim_obs=None, sim_params=None)[source]¶
Simulate training data.
- split_data(sim_obs, sim_params, burnInEnd=False)[source]¶
Split the simulated data into training set and validation set.
- class colfi.nde.Predict(obs_data=None, cov_matrix=None, path='ann', randn_num=<class 'float'>, steps_n=None)[source]¶
Bases:
PlotPosteriorReanalysis using the saved chains or the well-trained NDEs.
- Parameters:
obs_data (array-like, list, or None, optional) – The observations (measurements) with shape (obs_length,3), or a list of observations with shape [(obs_length_1,3), (obs_length_2,3), …]. The first column is the observational variable, the second column is the best values of the measurement, and the third column is the error of the measurement. If None, only the saved chains can be used for parameter estimations, and will not check variables. Default: None
cov_matrix (array-like, list, or None, optional) – Covariance matrix of the observational data. It should be an array with shape (obs_length, obs_length), or a list of covariance matrix with shape [(obs_length_1, obs_length_1), (obs_length_2, obs_length_2), …]. If there is no covariance for some observations, the covariance matrix should be set to None. e.g. [cov_matrix_1, None, cov_matrix_3]. Default: None
path (str, optional) – The path of the results saved. Default: ‘ann’
randn_num (float or str, optional) – A random number that identifies the saved results. Default: float
steps_n (None or int, optional) – The number of steps of the training process to be used. If None, the files will be found automatically. Default: None
- Variables:
chain_leng (int, optional) – The length of each ANN chain, which equals the number of samples to be generated by a NDE model when predicting an ANN chain. This only works when using the
from_net()method. Default: 10000chain_true_path (str, optional) – The path of the true chain of the posterior which can be obtained by using other methods, such as the MCMC method. Note: only
.npyand.txtfile is supported. Default: ‘’label_true (str, optional) – The legend label of the true chain. Default: ‘True’
fiducial_params (list, optional) – A list that contains the fiducial cosmological parameters. Default: []
show_idx (None or list, optional) – The index of cosmological parameters when plotting contours. This allows us to change the order of the cosmological parameters. If None, the order of parameters follows that in the ANN chain. If list, the minimum value of it should be 1. See
PlotPosterior. Default: None
- property cov_copy¶
- from_chain()[source]¶
Predict using saved chains.
- Raises:
ValueError – If variables of the input observational data are different from those used to train the NDE, an error will be raised.
- from_net()[source]¶
Predict using saved NDEs.
- Raises:
ValueError – If variables of the input observational data are different from those used to train the NDE, an error will be raised.
- property obs_variables¶
- property same_variables¶
- property trained_variables¶
- class colfi.nde.PredictNDEs(path='ann', randn_nums=[1.123, 1.123])[source]¶
Bases:
PlotMultiPosteriorReanalysis using the saved chains for several NDE results.
- Parameters:
- Variables:
chain_true_path (str, optional) – The path of the true chain of the posterior which can be obtained by using other methods, such as the MCMC method. Note: only
.npyand.txtfile is supported. Default: ‘’label_true (str, optional) – The legend label of the true chain. Default: ‘True’
show_idx (None or list, optional) – The index of cosmological parameters when plotting contours. This allows us to change the order of the cosmological parameters. If None, the order of parameters follows that in the ANN chain. If list, the minimum value of it should be 1. See
PlotPosterior. Default: None
colfi.space_updater¶
- class colfi.space_updater.Chains[source]¶
Bases:
object- static bestFit(chain, best_type='mode', out_sigma=1, symmetry_error=True, bins=100, smooth=5)[source]¶
Get the best-fit parameters from the chain.
- Parameters:
chain (array-like) – The ANN chain.
best_type (str, optional) – The type of the best values of parameters, ‘mode’ or ‘median’. If ‘mode’, it will take the mode as the best value. If ‘median’, it will take the median as the best value. Default: ‘mode’
out_sigma (int) – The output sigma, which can be 1, 2, or 3. Default: 1
symmetry_error (bool, optional) – If True, obtain symmetrical errors, otherwise, obtain unsymmetrical errors. Default: True
- static cov_matrix(chain, max_error=True, expand_factor=0)[source]¶
Get the covariance matrix.
- Parameters:
chain (array-like) – The ANN chain.
max_error (bool, optional) – If True, the diagonal elements of the covariance matrix will be replaced by the estimated maximum errors, which is useful for non-Gaussian distribution. If it is set to False, the obtained covariance matrix may be non-positive definite for some cases, so it is recommended to put it to True. Default: True
expand_factor (float, optional) – The expansion factor that is used to expand the error (the standard deviation) of each cosmological parameter. For example, if expand_factor=0.05, the error will has 5% expansion. It only works when max_error is True. Default: 0
- Returns:
cov – The covariance matrix.
- Return type:
array-like
- static error_devs(chain_1, chain_true)[source]¶
Get the absolute values of the relative deviations of error of parameters obtained from two chains.
- static sigma(chain, best_values, out_sigma=1)[source]¶
Calculate the standard deviations.
- Parameters:
chain (array-like) – The ANN chain.
best_values (1-dimension array) – The best values of parameters.
out_sigma (int) – The output sigma, which can be 1, 2, or 3. Default: 1
- Returns:
sigma_1l, sigma_2l, sigma_3l (1-dimension array) – The left 1 sigma, 2 sigma, or 3sigma deviations.
sigma_1r, sigma_2r, sigma_3r (1-dimension array) – The right 1 sigma, 2 sigma, or 3sigma deviations.
- class colfi.space_updater.CheckParameterSpace[source]¶
Bases:
object- static check_limit(p_space, limit_space)[source]¶
Check the parameter space to ensure that the parameter space does not exceed its limit range.
- Parameters:
p_space (array-like) – The parameter space to be checked.
limit_space (array-like) – The limit range of parameter space.
- Returns:
A parameter space being limited by its limit range.
- Return type:
array-like
- class colfi.space_updater.UpdateParameterSpace(step, param_names, chain_1, chain_0=None, init_params=None, spaceSigma=5, params_dict=None)[source]¶
Bases:
CheckParameterSpaceUpdate parameter space.
- Parameters:
step (int) – The number of step in the training process.
param_names (list) – A list that contains parameter names.
chain_1 (array-like) – The ANN chain of the i-th step, where \(i\geq2\).
chain_0 (None or array-like, optional) – The ANN chain of the (i-1)-th step, where \(i\geq2\), if step \(\leq2\),
chain_0should be set to None, otherwise,chain_0should be an array. Default: Noneinit_params (None or array-like) – The initial settings of the parameter space. If
chain_0is given,init_paramswill be ignored. Default: NonespaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: None
- property error_devs¶
- property param_devs¶
- params_space()[source]¶
Obtain the parameter space to be learned from chain.
- Returns:
Limited parameter space.
- Return type:
array-like
- small_dev(limit_dev=0.01)[source]¶
A small value of deviation of parameters between two steps used to end the training process.
- property spaceSigma_all¶
- property spaceSigma_max¶
- property spaceSigma_min¶
- colfi.space_updater.get_CovMatrix(chain, params_n, best_values=None)[source]¶
Calculate covariance matrix from a chain.
- Parameters:
chain (array-like) – An ANN or MCMC chain with shape (N, M), where N is the number of chain and M is the number of parameters.
params_n (array-like) – The number of parameters.
best_values (array-like) – The best-fit values.
- Returns:
cov_matrix – Covariance matrix.
- Return type:
array-like
colfi.plotter¶
- class colfi.plotter.BestFitsData(chain_all, chain, param_labels='', burnInEnd_step=None, nde_type_pair=['ANN', 'MNN'], show_initParams=False, init_params=<class 'float'>, chain_true=None, label_true='True', show_idx=None)[source]¶
Bases:
objectBest fit values of each steps, used to plot steps in
PlotPosterior- property bestFits_all¶
- property best_fit¶
- property best_fit_true¶
- class colfi.plotter.BestPredictedData(params_testSet, predParams_testSet, params_trainingSet=None, predParams_trainingSet=None, param_labels='', show_reErr=True, coef_type='R2')[source]¶
Bases:
object
- class colfi.plotter.LossesData(good_losses, alpha=0.6, title_labels='', text_labels='', show_minLoss=True)[source]¶
Bases:
objectLosses of training set and validataion set of steps after burn-in phase, which are used to plot losses in
PlotPosterior
- class colfi.plotter.PlotHparamsEffect(fiducial_params, chain_mcmc=None, randn_nums=0.123, path='ann')[source]¶
Bases:
object- property xlabel¶
- class colfi.plotter.PlotMultiPosterior(chains, param_names, path='ann', nde_types=['ANN', 'MDN'], randn_nums=[1.123, 1.123], params_dict=None)[source]¶
Bases:
PosteriorInfoPlot posterior distribution for multiple ANN chains.
- Parameters:
chains (list) – The ANN chains obtained after burn-in phase.
param_names (list) – A list which contains the parameter names, e.g. [‘H0’,’ombh2’,’omch2’].
path (str, optional) – The path of the results saved. Default: ‘ann’
nde_types (list, optional) – A list that contains names of NDEs. Default: [‘ANN’,’MDN’]
randn_nums (list, optional) – A list that contains random number which identifies the saved results. Default: [1.123,1.123]
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: None
- Variables:
chain_true_path (str, optional) – The path of the true chain of the posterior which can be obtained by using other methods, such as the MCMC method. Note: only
.npyand.txtfile is supported. Default: ‘’label_true (str, optional) – The legend label of the true chain. Default: ‘True’
fiducial_params (list, optional) – A list that contains the fiducial cosmological parameters. Default: []
show_idx (None or list, optional) – The index of cosmological parameters when plotting contours. This allows us to change the order of the cosmological parameters. If None, the order of parameters follows that in the ANN chain. If list, the minimum value of it should be 1. See
PlotPosterior. Default: Nonefile_identity_str (str, optional) – A string that identifies the files saved to the disk, which is useful to identify the saved files. Default: ‘’
- property contour_name¶
- class colfi.plotter.PlotPosterior(chain_all, chain, param_names, path='ann', nde_type_pair=['ANN', 'MNN'], randn_num=1.234, burnInEnd_step=None, params_dict=None, good_losses=None)[source]¶
Bases:
PosteriorInfoPlot posterior distribution using the ANN chains.
- Parameters:
chain_all (list) – The ANN chains obtained in all steps.
chain (array-like) – The good ANN chain obtained after burn-in phase.
param_names (list) – A list which contains the parameter names, e.g. [‘H0’,’ombh2’,’omch2’].
path (str, optional) – The path of the results saved. Default: ‘ann’
nde_type_pair (list, optional) – A list that contains two NDEs, the first NDE is used to estimate parameters in the burn-in phase, the second NDE is used to estimate parameters after the burn-in phase. Therefore, the first NDE is ued to find the burn-in end step and the second NDE is used to obtain the posterior. Default: [‘ANN’,’MNN’]
randn_num (float or str, optional) – A random number that identifies the saved results. Default: 1.234
burnInEnd_step (None or int, optional) – The burn-in end step. Default: None
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: Nonegood_losses (None or list, optional) – The losses of training set and validation set after the burn-in phase. Default: None
- Variables:
chain_true_path (str, optional) – The path of the true chain of the posterior which can be obtained by using other methods, such as the MCMC method. Note: only
.npyand.txtfile is supported. Default: ‘’label_true (str, optional) – The legend label of the true chain. Default: ‘True’
fiducial_params (list, optional) – A list that contains the fiducial cosmological parameters. Default: []
show_idx (None or list, optional) – The index of cosmological parameters when plotting contours. This allows us to change the order of the cosmological parameters. If None, the order of parameters follows that in the ANN chain. If list, the minimum value of it should be 1. See
PlotPosterior. Default: Nonefile_identity_str (str, optional) – A string that identifies the files saved to the disk, which is useful to identify the saved files. Default: ‘’
- class colfi.plotter.PlotPrediction(params_testSet, predParams_testSet, param_names, params_trainingSet=None, predParams_trainingSet=None, params_dict=None, show_reErr=True, randn_num='', path='ann', nde_type='ANN', dataSet_type='testSet', coef_type='R2')[source]¶
Bases:
objectPlot predicted cosmological parameters.
- Parameters:
params_testSet (array-like) – Cosmological parameters in the test set.
predParams_testSet (array-like) – Predicted cosmological parameters for the test set.
param_names (list) – A list which contains the parameter names, e.g. [‘H0’,’ombh2’,’omch2’].
params_trainingSet (array-like, optional) – Cosmological parameters in the training set. Default: None
predParams_trainingSet (array-like, optional) – Predicted cosmological parameters for the training set. Default: None
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: Noneshow_reErr (bool, optional) – If True, will calculate and show the best-fit values and standard deviations of the relative errors between the predicted parameters and the true parameters. Default: True
randn_num (float or str, optional) – A random number that identifies the saved results. Default: ‘’
path (str, optional) – The path of the results saved. Default: ‘ann’
nde_type (str, optional) – A string that indicate which NDE should be used. See
NDEs. Default: ‘ANN’dataSet_type (str, optional) – The type of the data set. Default: ‘testSet’
coef_type (str, optional) – A quantity that quantifies the degree of linear correlation, which can be Pearson correlation coefficient (‘r’) or coefficient of determination (‘R2’). Default: ‘R2’
show_idx (None or list, optional) – The index of cosmological parameters when plotting figures. This allows us to change the order of the cosmological parameters. If None, the order of parameters follows that in the training data. If list, the minimum value of it should be 1. See
PlotPosterior. Default: Nonefile_identity_str (str, optional) – A string that identifies the files saved to the disk, which is useful to identify the saved files. Default: ‘’
- Return type:
None.
- class colfi.plotter.PosteriorInfo(param_names, path='ann', params_dict=None)[source]¶
Bases:
objectSome information of NDEs, cosmological parameters, and chains, which will be used in
PlotPosterior- property chain_true¶
- property param_labels¶
- colfi.plotter.R2(obs, pred)[source]¶
Coefficient of determination https://en.wikipedia.org/wiki/Coefficient_of_determination https://baike.baidu.com/item/%E5%8F%AF%E5%86%B3%E7%B3%BB%E6%95%B0/8020809?fromtitle=coefficient%20of%20determination&fromid=18081717&fr=aladdin https://doi.org/10.1093/mnras/stz010
obs: observed data pred: predicted data
- colfi.plotter.pcc(x, y)[source]¶
Pearson correlation coefficient https://en.wikipedia.org/wiki/Pearson_correlation_coefficient