Training sets

colfi.cosmic_params

class colfi.cosmic_params.ParamsProperty(param_names, params_dict=None)[source]

Bases: object

property labels
property param_fullNames
property params_limit
colfi.cosmic_params.params_dict_zoo()[source]

Information of (cosmological) parameters that include the labels and physical limits: [label, limit_min, limit_max]

The labels are used to plot figures. The physical limits are used to ensure the simulated parameters have physical meaning.

Note

If the physical limits of parameters is unknown or there is no physical limits, it should be set to np.nan.

colfi.data_simulator

class colfi.data_simulator.AddGaussianNoise(obs, params=None, obs_errors=None, cholesky_factor=None, noise_type='multiNormal', factor_sigma=0.2, multi_noise=5, use_GPU=True)[source]

Bases: object

Add Gaussian noise for simulated data.

Parameters:
  • obs (torch tensor or a list of torch tensor) – The simulated observations (measurements) with shape (N, obs_length), or a list of observations with shape [(N,obs_length_1), (N,obs_length_2), …]

  • params (torch tensor or None) – The simulated cosmological parameters. Default: None

  • obs_errors (torch tensor, a list of torch tensor, or None, optional) – Observational errors (standard deviation) with shape (obs_length,), or a list of errors with shape [(obs_length_1,), (obs_length_2,), …]. If cholesky_factor is set to None, the observational errors should be given. Default: None

  • cholesky_factor (torch tensor, a list of torch tensor, or None, optional) – Cholesky factor of covariance matrix with shape (obs_length, obs_length), or a list of Cholesky factor of covariance matrix with shape [(obs_length_1, obs_length_1), (obs_length_2, obs_length_2), …]. If the cholesky factor is given, the obs_errors will be ignored. Default: None

  • noise_type (str, optional) – The type of Gaussian noise added to the training set, ‘singleNormal’ or ‘multiNormal’. Default: ‘multiNormal’

  • factor_sigma (float, optional) – For the case of noise_type = ‘singleNormal’, factor_sigma should be set to 1. For the case of noise_type = ‘multiNormal’, it is the standard deviation of the coefficient of the observational error (standard deviation). Default: 0.2

  • multi_noise (int, optional) – The number of realization of noise added to a measurement. Default: 5

  • use_GPU (bool, optional) – If True, the noise will be generated by GPU, otherwise, it will be generated by CPU. Default: True

multiNoisyObs()[source]
multiNoisySample(reorder=True)[source]
multiNormalObs(factor_sigma=0.2)[source]
multiParams()[source]
noisyObs()[source]
noisySample()[source]
obs_noise(measurement, obs_error=None, cholesky_factor=None)[source]
singleNormalObs(error_factor=1)[source]
class colfi.data_simulator.CutParams(param_names, params_dict=None)[source]

Bases: object

Cut parameter samples that crossed the parameter limits.

Parameters:
  • param_names (list) – A list that contains parameter names.

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

cut_params(params, params_limit=None)[source]
class colfi.data_simulator.ParametersFilter(param_names, sim_params, params_space, prev_space, check_include=True, rel_dev_limit=0.2)[source]

Bases: object

Select cosmological parameters from a data set according to a given parameter space.

Parameters:
  • param_names (list) – A list that contains parameter names.

  • sim_params (array-like) – The simulated cosmological parameters with the shape of (N, n), where N is the number of samples and n is the number of parameters.

  • params_space (array-like) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • check_include (bool, optional) – If True, it will check whether params_space is in the space of sim_params, otherwise, do nothing. Default: True

  • rel_dev_limit (float, optional) – The limit of the relative deviation when params_space is not in the space of sim_params, the default is 20% (this means if params_space is \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) from sim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2

filter_index()[source]
filter_params()[source]
property include

Check whether params_space is in the space of the sim_params.

Returns:

If params_space is in the space of the sim_params, return True, otherwise, return False.

Return type:

bool

class colfi.data_simulator.SimMultiObservations(branch_n, N, model, param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False, local_samples=None, prevStep_data=None, check_include=True, rel_dev_limit=0.2)[source]

Bases: SimObservations

Simulate training set containing multiple observations (for multi-branch network).

Parameters:
  • branch_n (int) – The number of branch of the network.

  • N (int) – The number of data to be simulated.

  • model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use the local data sets, it should also contain ‘load_params’, ‘load_params_space’, and ‘load_sample’ methods.

  • param_names (list) – A list that contains parameter names.

  • chain (array-like or None) – The predicted ANN chain in the previous step. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • params_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

  • space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’

  • cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True

  • cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False. Default: True

  • cross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False and cut_crossedBest=False. Default: False

  • local_samples (None, str, or list, optional) – Path of local samples, None, ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None

  • prevStep_data (None or list, optional) – Samples simulated in the previous step, if list, it should be [obs, params]. The obs or params has shape (N, n), where N is the number of samples and n is the number of data points in one measurement (or is the number of parameters). Default: None

  • check_include (bool, optional) – If True, will check whether params_space is in the space of local_samples, otherwise, do nothing. Default: True

  • rel_dev_limit (float, optional) – The limit of the relative deviation when params_space is not in the space of sim_params, the default is 20% (this means if params_space is \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) from sim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2

Variables:
  • prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • seed (None or int, optional) – Seed number which controls random draws. Default: None

Note

Either chain or params_space should be given to simulate samples.

save_samples(path='sim_data', branch_paths=['comp1', 'comp2'])[source]
simulate_obs(N)[source]
class colfi.data_simulator.SimObservations(N, model, param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False, local_samples=None, prevStep_data=None, check_include=True, rel_dev_limit=0.2)[source]

Bases: SimParameters

Simulate training set.

Parameters:
  • N (int) – The number of data to be simulated.

  • model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use the local data sets, it should also contain ‘load_params’, ‘load_params_space’, and ‘load_sample’ methods.

  • param_names (list) – A list that contains parameter names.

  • chain (array-like or None) – The predicted ANN chain in the previous step. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • params_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

  • space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’

  • cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True

  • cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False. Default: True

  • cross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False and cut_crossedBest=False. Default: False

  • local_samples (None, str, or list, optional) – Path of local samples, None, ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None

  • prevStep_data (None or list, optional) – Samples simulated in the previous step, if list, it should be [obs, params]. The obs or params has shape (N, n), where N is the number of samples and n is the number of data points in one measurement (or is the number of parameters). Default: None

  • check_include (bool, optional) – If True, will check whether params_space is in the space of local_samples, otherwise, do nothing. Default: True

  • rel_dev_limit (float, optional) – The limit of the relative deviation when params_space is not in the space of sim_params, the default is 20% (this means if params_space is \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) from sim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2

Variables:
  • prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].

  • seed (None or int, optional) – Seed number which controls random draws. Default: None

Note

Either chain or params_space should be given to simulate samples.

comb_obs(obs_1, obs_2)[source]
filter_localSample(local_sample, N_local)[source]

Select samples from the local data sets.

Parameters:
  • local_sample (str) – Folders of local samples.

  • N_local (int) – The number of local samples to be selected.

Returns:

The selected observations and parameters.

Return type:

array-like

Note

Parameter space of the local samples should be in the initial parameter space.

filter_localSamples(N_local)[source]
filter_previousSamples(N_pre)[source]

Select samples from the mock data simulated in the previous step.

Parameters:

N_pre (int) – The number of samples to be selected.

Returns:

The selected observations and parameters.

Return type:

array-like

save_samples(path='sim_data/sample')[source]
save_samples_2(multi_params=1, path='sim_data/sample', use_dataSeed=False)[source]
save_samples_3(part_size=10, multi_params=1, path='sim_data/sample')[source]
save_samples_3_onePart(params, part_size=10, part_idx=0, path='sim_data/sample')[source]
simulate()[source]
simulate_obs(N)[source]
class colfi.data_simulator.SimParameters(param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False)[source]

Bases: CutParams

Simulate parameters.

Parameters:
  • param_names (list) – A list that contains parameter names.

  • chain (array-like or None) – The predicted ANN chain in the previous step. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • params_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’. If chain is an array, params_space will be ignored. If chain is None, params_space should be given. Default: None

  • spaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5

  • params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See params_dict_zoo(). Default: None

  • space_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’

  • cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True

  • cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, ‘or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False. Default: True

  • cross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when cut_crossedLimit=False and cut_crossedBest=False. Default: False

Variables:

seed (None or int, optional) – Seed number which controls random draws. Default: None

Note

Either chain or params_space should be given to simulate samples.

property combinations
fold_sphere(params)[source]

Fold the simulated parameters using the extremum of the parameters.

https://en.wikipedia.org/wiki/Folded_normal_distribution

get_contour_edges(sigma=3)[source]
get_edge_space(sigma=3)[source]
get_multiParams(N, multi_params=1, use_dataSeed=False, reorder=True)[source]
get_params(N)[source]
hypercube(N)[source]

Generate samples uniformly in a hypercube parameter space using uniform distribution.

Parameters:

N (int) – The number of data to be simulated.

Returns:

Parameters.

Return type:

array-like

hyperellipsoid(N)[source]

Generate samples uniformly in a hyperellipsoid parameter space using covariance between parameters.

https://scipy-cookbook.readthedocs.io/items/CorrelatedRandomSamples.html https://blogs.sas.com/content/iml/2012/02/08/use-the-cholesky-transformation-to-correlate-and-uncorrelate-variables.html

Parameters:

N (int) – The number of data to be simulated.

Returns:

Parameters.

Return type:

array-like

Note

For Cholesky decomposition, the covariance matrix \(C = LL^T\). So, the transformation relationship between correlated parameters \(P_{corr}\) and uncorrelated parameters \(P_{uncorr}\) is \(P_{corr} = LP_{uncorr}\), \(P_{uncorr} = L^{-1}P_{corr}\)

hypersphere(N)[source]

Generate samples uniformly in a hypersphere parameter space.

Parameters:

N (int) – The number of data to be simulated.

Returns:

Parameters.

Return type:

array-like

in_polygon(edge, x, y, get_points=True)[source]

Judge whether the given points are in the area surrounded by the polygon.

Parameters:
  • edge (array-like) – 2-D array with shape (N, 2). The vertices of a polygon.

  • x (array-like) – 1-D array with shape (M,). The x coordinate of the data points.

  • y (array-like) – 1-D array with shape (M,). The y coordinate of the data points.

  • get_points (bool, optional) – If True, it will return data points inside the area, if False, it will return a bool array which is True if the (closed) path contains the corresponding point. Default: True

Returns:

Points in the polygon.

Return type:

array-like

lhs(N)[source]

Generate samples uniformly in a hypercube parameter space using Latin hypercube sampling.

https://en.wikipedia.org/wiki/Latin_hypercube_sampling https://blog.csdn.net/yuxeaotao/article/details/108952326

Parameters:

N (int) – The number of data to be simulated.

Returns:

Parameters.

Return type:

array-like

normal_params(N, best, sigma_max, spaceSigma)[source]
property params_n
posterior_hyperellipsoid(N, factor=<class 'float'>)[source]
random_ball(N, dimension, radius=1)[source]

Generate samples uniformly in a ball with N dimension (hypersphere).

https://www.cnpython.com/qa/349434 https://www.zhihu.com/question/277712372 https://blogs.sas.com/content/iml/2016/04/06/generate-points-uniformly-in-ball.html https://arxiv.org/pdf/1404.1347.pdf https://www.sciencedirect.com/science/article/pii/S0047259X10001211

uniform_params(N, p_space)[source]
unique_elements(list_array)[source]

Find the unique elements of a list which contains various of arrays.

Parameters:

list_array (list) – A list that contais various of arrays.

Returns:

The sorted unique elements of the list.

Return type:

array-like

colfi.data_processor

class colfi.data_processor.DataPreprocessing(obs, params, obs_base, params_base, params_vali=None)[source]

Bases: object

Data preprocessing of measurements and cosmological parameters.

get_MB_statistic()[source]
get_statistic(max_idx=None)[source]

Get statistics of observations and parameters.

Parameters:

max_idx (None or int, optional) – The maximum index of obs when calculating statistics of observations. It is useful to set a maximum index for the training set with a lot of data, which will reduce the use of computer resources. Default: None

Return type:

None.

inverseNormalize_MB_obs(obs, obs_base)[source]
inverseNormalize_obs(obs, obs_base)[source]
inverseNormalize_params(params, params_base)[source]
normalize_MB_obs(obs, obs_base)[source]
normalize_obs(obs, obs_base)[source]
normalize_params(params, params_base)[source]
class colfi.data_processor.InverseNormalize(x1, statistic={}, norm_type='z_score', a=1e-06, b=0.999999)[source]

Bases: object

Inverse transformation of class Normalize.

inverseNorm()[source]
mean()[source]
minmax()[source]
z_score()[source]
class colfi.data_processor.Normalize(x, statistic={}, norm_type='z_score', a=1e-06, b=0.999999)[source]

Bases: object

Normalize data.

mean()[source]

mean normalization

minmax()[source]

min-max normalization

Rescaling the range of features to scale the range in [0, 1] or [a,b] https://en.wikipedia.org/wiki/Feature_scaling

norm()[source]
z_score()[source]

standardization/z-score/zero-mean normalization

class colfi.data_processor.Statistic(x, dim=None)[source]

Bases: object

property mean
statistic()[source]
statistic_torch(use_GPU=True)[source]
property std
property xmax
property xmin
class colfi.data_processor.Transfer(net, obs, params, obs_base, obs_vali=None, params_vali=None, obs_errors=None, cholesky_factor=None, branch_n=<class 'int'>)[source]

Bases: object

Network and data transfer.

call_GPU(prints=True)[source]
check_GPU()[source]
transfer_MB_data()[source]
transfer_MB_trainSet()[source]
transfer_MB_valiSet()[source]
transfer_base()[source]
transfer_data()[source]
transfer_net(use_DDP=False, device_ids=None, prints=True)[source]
transfer_trainSet(transfer_base=True)[source]
transfer_valiSet()[source]
colfi.data_processor.cpu2cuda(data)[source]

Transfer data from CPU to GPU.

Parameters:

data (array-like or tensor) – Numpy array or torch tensor.

Raises:

TypeError – The data type should be np.ndarray or torch.Tensor.

Returns:

Torch tensor.

Return type:

Tensor

colfi.data_processor.cuda2numpy(data)[source]

Transfer data from the torch tensor (on GPU) to the numpy array (on CPU).

colfi.data_processor.cuda2torch(data)[source]

Transfer data (torch tensor) from GPU to CPU.

colfi.data_processor.numpy2cuda(data, device=None, dtype=<class 'torch.cuda.FloatTensor'>)[source]

Transfer data from the numpy array (on CPU) to the torch tensor (on GPU).

colfi.data_processor.numpy2torch(data, dtype=<class 'torch.FloatTensor'>)[source]

Transfer data from the numpy array (on CPU) to the torch tensor (on CPU).

colfi.data_processor.torch2cuda(data, device=None)[source]

Transfer data (torch tensor) from CPU to GPU.

colfi.data_processor.torch2numpy(data)[source]

Transfer data from the torch tensor (on CPU) to the numpy array (on CPU).