Training sets¶
colfi.cosmic_params¶
- class colfi.cosmic_params.ParamsProperty(param_names, params_dict=None)[source]¶
Bases:
object- property labels¶
- property param_fullNames¶
- property params_limit¶
- colfi.cosmic_params.params_dict_zoo()[source]¶
Information of (cosmological) parameters that include the labels and physical limits: [label, limit_min, limit_max]
The labels are used to plot figures. The physical limits are used to ensure the simulated parameters have physical meaning.
Note
If the physical limits of parameters is unknown or there is no physical limits, it should be set to np.nan.
colfi.data_simulator¶
- class colfi.data_simulator.AddGaussianNoise(obs, params=None, obs_errors=None, cholesky_factor=None, noise_type='multiNormal', factor_sigma=0.2, multi_noise=5, use_GPU=True)[source]¶
Bases:
objectAdd Gaussian noise for simulated data.
- Parameters:
obs (torch tensor or a list of torch tensor) – The simulated observations (measurements) with shape (N, obs_length), or a list of observations with shape [(N,obs_length_1), (N,obs_length_2), …]
params (torch tensor or None) – The simulated cosmological parameters. Default: None
obs_errors (torch tensor, a list of torch tensor, or None, optional) – Observational errors (standard deviation) with shape (obs_length,), or a list of errors with shape [(obs_length_1,), (obs_length_2,), …]. If
cholesky_factoris set to None, the observational errors should be given. Default: Nonecholesky_factor (torch tensor, a list of torch tensor, or None, optional) – Cholesky factor of covariance matrix with shape (obs_length, obs_length), or a list of Cholesky factor of covariance matrix with shape [(obs_length_1, obs_length_1), (obs_length_2, obs_length_2), …]. If the cholesky factor is given, the
obs_errorswill be ignored. Default: Nonenoise_type (str, optional) – The type of Gaussian noise added to the training set, ‘singleNormal’ or ‘multiNormal’. Default: ‘multiNormal’
factor_sigma (float, optional) – For the case of
noise_type= ‘singleNormal’,factor_sigmashould be set to 1. For the case ofnoise_type= ‘multiNormal’, it is the standard deviation of the coefficient of the observational error (standard deviation). Default: 0.2multi_noise (int, optional) – The number of realization of noise added to a measurement. Default: 5
use_GPU (bool, optional) – If True, the noise will be generated by GPU, otherwise, it will be generated by CPU. Default: True
- class colfi.data_simulator.CutParams(param_names, params_dict=None)[source]¶
Bases:
objectCut parameter samples that crossed the parameter limits.
- Parameters:
param_names (list) – A list that contains parameter names.
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: None
- class colfi.data_simulator.ParametersFilter(param_names, sim_params, params_space, prev_space, check_include=True, rel_dev_limit=0.2)[source]¶
Bases:
objectSelect cosmological parameters from a data set according to a given parameter space.
- Parameters:
param_names (list) – A list that contains parameter names.
sim_params (array-like) – The simulated cosmological parameters with the shape of (N, n), where N is the number of samples and n is the number of parameters.
params_space (array-like) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].
prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].
check_include (bool, optional) – If True, it will check whether
params_spaceis in the space ofsim_params, otherwise, do nothing. Default: Truerel_dev_limit (float, optional) – The limit of the relative deviation when
params_spaceis not in the space ofsim_params, the default is 20% (this means ifparams_spaceis \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) fromsim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2
- class colfi.data_simulator.SimMultiObservations(branch_n, N, model, param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False, local_samples=None, prevStep_data=None, check_include=True, rel_dev_limit=0.2)[source]¶
Bases:
SimObservationsSimulate training set containing multiple observations (for multi-branch network).
- Parameters:
branch_n (int) – The number of branch of the network.
N (int) – The number of data to be simulated.
model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use the local data sets, it should also contain ‘load_params’, ‘load_params_space’, and ‘load_sample’ methods.
param_names (list) – A list that contains parameter names.
chain (array-like or None) – The predicted ANN chain in the previous step. If
chainis an array,params_spacewill be ignored. Ifchainis None,params_spaceshould be given. Default: Noneparams_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’. If
chainis an array,params_spacewill be ignored. Ifchainis None,params_spaceshould be given. Default: NonespaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: Nonespace_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’
cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True
cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when
cut_crossedLimit=False. Default: Truecross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when
cut_crossedLimit=Falseandcut_crossedBest=False. Default: Falselocal_samples (None, str, or list, optional) – Path of local samples, None, ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None
prevStep_data (None or list, optional) – Samples simulated in the previous step, if list, it should be [obs, params]. The obs or params has shape (N, n), where N is the number of samples and n is the number of data points in one measurement (or is the number of parameters). Default: None
check_include (bool, optional) – If True, will check whether
params_spaceis in the space oflocal_samples, otherwise, do nothing. Default: Truerel_dev_limit (float, optional) – The limit of the relative deviation when
params_spaceis not in the space ofsim_params, the default is 20% (this means ifparams_spaceis \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) fromsim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2
- Variables:
prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].
seed (None or int, optional) – Seed number which controls random draws. Default: None
Note
Either
chainorparams_spaceshould be given to simulate samples.
- class colfi.data_simulator.SimObservations(N, model, param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False, local_samples=None, prevStep_data=None, check_include=True, rel_dev_limit=0.2)[source]¶
Bases:
SimParametersSimulate training set.
- Parameters:
N (int) – The number of data to be simulated.
model (cosmological (or theoretical) model instance) – A cosmological (or theoretical) model instance that is used to simulate training set, it should contains a ‘simulate’ method, and ‘simulate’ should accept input of cosmological parameters, if you use the local data sets, it should also contain ‘load_params’, ‘load_params_space’, and ‘load_sample’ methods.
param_names (list) – A list that contains parameter names.
chain (array-like or None) – The predicted ANN chain in the previous step. If
chainis an array,params_spacewill be ignored. Ifchainis None,params_spaceshould be given. Default: Noneparams_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’. If
chainis an array,params_spacewill be ignored. Ifchainis None,params_spaceshould be given. Default: NonespaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: Nonespace_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’
cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True
cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when
cut_crossedLimit=False. Default: Truecross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when
cut_crossedLimit=Falseandcut_crossedBest=False. Default: Falselocal_samples (None, str, or list, optional) – Path of local samples, None, ‘sample’ or [‘sample’] or [‘sample_1’, ‘sample_2’, …]. If None, no local samples are used. Default: None
prevStep_data (None or list, optional) – Samples simulated in the previous step, if list, it should be [obs, params]. The obs or params has shape (N, n), where N is the number of samples and n is the number of data points in one measurement (or is the number of parameters). Default: None
check_include (bool, optional) – If True, will check whether
params_spaceis in the space oflocal_samples, otherwise, do nothing. Default: Truerel_dev_limit (float, optional) – The limit of the relative deviation when
params_spaceis not in the space ofsim_params, the default is 20% (this means ifparams_spaceis \([-5\sigma, +5\sigma]\), it can deviate \(<1\sigma\) fromsim_params), note that it should be \(<0.4\) (the deviation \(<2\sigma\) for parameter space \([-5\sigma, +5\sigma]\)). Default: 0.2
- Variables:
prev_space (array-like) – The parameter space of local simulated data (or mock data in previous step), with shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit].
seed (None or int, optional) – Seed number which controls random draws. Default: None
Note
Either
chainorparams_spaceshould be given to simulate samples.- filter_localSample(local_sample, N_local)[source]¶
Select samples from the local data sets.
- Parameters:
- Returns:
The selected observations and parameters.
- Return type:
array-like
Note
Parameter space of the local samples should be in the initial parameter space.
- class colfi.data_simulator.SimParameters(param_names, chain=None, params_space=None, spaceSigma=5, params_dict=None, space_type='hypercube', cut_crossedLimit=True, cut_crossedBest=True, cross_best=False)[source]¶
Bases:
CutParamsSimulate parameters.
- Parameters:
param_names (list) – A list that contains parameter names.
chain (array-like or None) – The predicted ANN chain in the previous step. If
chainis an array,params_spacewill be ignored. Ifchainis None,params_spaceshould be given. Default: Noneparams_space (array-like or None) – The parameter space with the shape of (n, 2), where n is the number of parameters. For each parameter, it is: [lower_limit, upper_limit]. This is only used for space_type=’hypercube’ and space_type=’LHS’. If
chainis an array,params_spacewill be ignored. Ifchainis None,params_spaceshould be given. Default: NonespaceSigma (int or array-like, optional) – The size of parameter space to be learned. It is a int or a numpy array with shape of (n,), where n is the number of parameters, e.g. for spaceSigma=5, the parameter space to be learned is \([-5\sigma, +5\sigma]\). Default: 5
params_dict (dict or None, optional) – Information of cosmological parameters that include the labels, the minimum values, and the maximum values. See
params_dict_zoo(). Default: Nonespace_type (str, optional) – The type of parameter space. It can be ‘hypercube’, ‘LHS’, ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: ‘hypercube’
cut_crossedLimit (bool, optional) – If True, the data points that cross the parameter limits will be cut. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’. Default: True
cut_crossedBest (bool, optional) – If True, the folded data points that cross the best values will be cut. It is recommended to set it to True. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, ‘or ‘posterior_hyperellipsoid’, and when
cut_crossedLimit=False. Default: Truecross_best (bool, optional) – If True, the folded data points will cross the best values, otherwise, the folded data points will not cross the best values. This only works when space_type is ‘hypersphere’, ‘hyperellipsoid’, or ‘posterior_hyperellipsoid’, and when
cut_crossedLimit=Falseandcut_crossedBest=False. Default: False
- Variables:
seed (None or int, optional) – Seed number which controls random draws. Default: None
Note
Either
chainorparams_spaceshould be given to simulate samples.- property combinations¶
- hypercube(N)[source]¶
Generate samples uniformly in a hypercube parameter space using uniform distribution.
- Parameters:
N (int) – The number of data to be simulated.
- Returns:
Parameters.
- Return type:
array-like
- hyperellipsoid(N)[source]¶
Generate samples uniformly in a hyperellipsoid parameter space using covariance between parameters.
https://scipy-cookbook.readthedocs.io/items/CorrelatedRandomSamples.html https://blogs.sas.com/content/iml/2012/02/08/use-the-cholesky-transformation-to-correlate-and-uncorrelate-variables.html
- Parameters:
N (int) – The number of data to be simulated.
- Returns:
Parameters.
- Return type:
array-like
Note
For Cholesky decomposition, the covariance matrix \(C = LL^T\). So, the transformation relationship between correlated parameters \(P_{corr}\) and uncorrelated parameters \(P_{uncorr}\) is \(P_{corr} = LP_{uncorr}\), \(P_{uncorr} = L^{-1}P_{corr}\)
- hypersphere(N)[source]¶
Generate samples uniformly in a hypersphere parameter space.
- Parameters:
N (int) – The number of data to be simulated.
- Returns:
Parameters.
- Return type:
array-like
- in_polygon(edge, x, y, get_points=True)[source]¶
Judge whether the given points are in the area surrounded by the polygon.
- Parameters:
edge (array-like) – 2-D array with shape (N, 2). The vertices of a polygon.
x (array-like) – 1-D array with shape (M,). The x coordinate of the data points.
y (array-like) – 1-D array with shape (M,). The y coordinate of the data points.
get_points (bool, optional) – If True, it will return data points inside the area, if False, it will return a bool array which is True if the (closed) path contains the corresponding point. Default: True
- Returns:
Points in the polygon.
- Return type:
array-like
- lhs(N)[source]¶
Generate samples uniformly in a hypercube parameter space using Latin hypercube sampling.
https://en.wikipedia.org/wiki/Latin_hypercube_sampling https://blog.csdn.net/yuxeaotao/article/details/108952326
- Parameters:
N (int) – The number of data to be simulated.
- Returns:
Parameters.
- Return type:
array-like
- property params_n¶
- random_ball(N, dimension, radius=1)[source]¶
Generate samples uniformly in a ball with N dimension (hypersphere).
https://www.cnpython.com/qa/349434 https://www.zhihu.com/question/277712372 https://blogs.sas.com/content/iml/2016/04/06/generate-points-uniformly-in-ball.html https://arxiv.org/pdf/1404.1347.pdf https://www.sciencedirect.com/science/article/pii/S0047259X10001211
colfi.data_processor¶
- class colfi.data_processor.DataPreprocessing(obs, params, obs_base, params_base, params_vali=None)[source]¶
Bases:
objectData preprocessing of measurements and cosmological parameters.
- get_statistic(max_idx=None)[source]¶
Get statistics of observations and parameters.
- Parameters:
max_idx (None or int, optional) – The maximum index of obs when calculating statistics of observations. It is useful to set a maximum index for the training set with a lot of data, which will reduce the use of computer resources. Default: None
- Return type:
None.
- class colfi.data_processor.InverseNormalize(x1, statistic={}, norm_type='z_score', a=1e-06, b=0.999999)[source]¶
Bases:
objectInverse transformation of class
Normalize.
- class colfi.data_processor.Normalize(x, statistic={}, norm_type='z_score', a=1e-06, b=0.999999)[source]¶
Bases:
objectNormalize data.
- minmax()[source]¶
min-max normalization
Rescaling the range of features to scale the range in [0, 1] or [a,b] https://en.wikipedia.org/wiki/Feature_scaling
- class colfi.data_processor.Statistic(x, dim=None)[source]¶
Bases:
object- property mean¶
- property std¶
- property xmax¶
- property xmin¶
- class colfi.data_processor.Transfer(net, obs, params, obs_base, obs_vali=None, params_vali=None, obs_errors=None, cholesky_factor=None, branch_n=<class 'int'>)[source]¶
Bases:
objectNetwork and data transfer.
- colfi.data_processor.cpu2cuda(data)[source]¶
Transfer data from CPU to GPU.
- Parameters:
data (array-like or tensor) – Numpy array or torch tensor.
- Raises:
TypeError – The data type should be
np.ndarrayortorch.Tensor.- Returns:
Torch tensor.
- Return type:
Tensor
- colfi.data_processor.cuda2numpy(data)[source]¶
Transfer data from the torch tensor (on GPU) to the numpy array (on CPU).
- colfi.data_processor.numpy2cuda(data, device=None, dtype=<class 'torch.cuda.FloatTensor'>)[source]¶
Transfer data from the numpy array (on CPU) to the torch tensor (on GPU).
- colfi.data_processor.numpy2torch(data, dtype=<class 'torch.FloatTensor'>)[source]¶
Transfer data from the numpy array (on CPU) to the torch tensor (on CPU).