# mystats module¶

mystats.bootstrap(data, num_samples)[source]

Bootstraps data to determine errors. Resamples the data num_samples times. Returns errors of a bootstrap simulation at the 100.*(1 - alpha) confidence interval.

Parameters: data : array-like Array of data in the form of an numpy.ndarray num_samples : int Number of times to resample the data. conf_int : tuple, float Lower error and upper error at 100*(1-alpha) confidence of the data. samples : array-like Array of each resampled data. Will have one extra dimension than the data of length num_samples, representing each simulation.

Notes

-> arrays can be initialized with numpy.empty -> random samples can be retrieved from an array with random.sample

Examples

>>> import scipy
>>> import numpy as np
>>> data = scipy.random.f(1, 2, 100)
>>> data.shape
(100,)
>>> samples = bootstrap(data, 50)
>>> samples.shape
(50, 100,)

mystats.bootstrap_model(data, model, num_samples=100, alpha=0.05, data_error=None, sigma=None, verbose=True)[source]

Bootstraps data with models a given number of times and calculates the Goodness of fit for each run. The standard deviation of the Goodness-of-fit values is then used to estimate the confidence interval.

Parameters: data : array_like The observed data, must be the same size as the model model : array_like The model data, must be the same size as the observed data. num_samples : int, optional Number of runs in the bootstrapping. alpha : float, optional Significance of confidence interval. data_error : float, array_like, optional If unset, the error will be the standard deviation of the data. If an array, it must have the same dimensions as the observed data. sigma : float, optional If set, the confidence interval will be calculated using the number of standard deviations from the mean. verbose : bool, optional Print out progress? out : list A list, [confidence interval, goodness of fit array]
mystats.bootstrap_residuals(data, model, num_samples=100, statistic=<function mean>)[source]

Bootstraps data with models a given number of times and calculates the Goodness of fit for each run. The standard deviation of the Goodness-of-fit values is then used to estimate the confidence interval.

Parameters: data : array_like The observed data, must be the same size as the model model : array_like The model data, must be the same size as the observed data. num_samples : int, optional Number of runs in the bootstrapping. alpha : float, optional Significance of confidence interval. data_error : float, array_like, optional If unset, the error will be the standard deviation of the data. If an array, it must have the same dimensions as the observed data. sigma : float, optional If set, the confidence interval will be calculated using the number of standard deviations from the mean. verbose : bool, optional Print out progress? out : list A list, [confidence interval, goodness of fit array]
mystats.calc_bootstrap_error(samples, alpha)[source]

Returns errors of a bootstrap simulation at the 100.*(1 - alpha) confidence interval. Errors are computed by deriving a cumulative distribution function of the means of the sampled data and determining the distance between the mean and the value including alpha/2 % of the data, and the value including alpha/2 % of the data.

Parameters: samples : array-like Array of each resampled data. conf_int : tuple, float Mean of the data, the lower error and the upper error at 100*(1-alpha) confidence of the data.

Notes

-> To find the index in an array closest to a given value, use the numpy.argmin function to find the index of the minimum value in an array. For example to find the value closest to 11.1 in an array of 10, 11, and 12:

>>> import numpy as np
>>> a = np.array([10, 11, 12])
>>> print(np.argmin(np.abs(a - 11.1)))
1


Examples

>>> import scipy
>>> import numpy as np
>>> data = scipy.random.f(1, 2, 100)
>>> samples = bootstrap(data, 50)
>>> errors = calc_bootstrap_error(samples, 0.05)

mystats.calc_cdf(y, return_axis=False)[source]
mystats.calc_cdf_error(y, alpha=0.32)[source]
mystats.calc_chisq(model, data, uncertainty, reduced=True, dof=1)[source]

Calculates chi squared statistic given a model, data, and associated uncertainty.

Parameters: model : array-like Model of data. data : array-like Observed data same shape as model. uncertainty : float, array-like Uncertainty on data. reduced : bool Calculate reduced chi squared? dof : int Degrees of freedom.
mystats.calc_hessian(x)[source]

Calculate the hessian matrix with finite differences Parameters:

• x : ndarray
Returns:
an array of shape (x.dim, x.ndim) + x.shape where the array[i, j, ...] corresponds to the second derivative x_ij
mystats.calc_likelihood_conf(likelihoods, conf, df=1)[source]

Calculates confidence intervals for each axis.

mystats.calc_logL(model, data, data_error=None, weights=None)[source]

Calculates log likelihood

http://www.physics.utah.edu/~detar/phys6720/handouts/curve_fit/curve_fit/node2.html

mystats.calc_pdf(x, y)[source]

Calculates probability density function of the data. Uses a non-parametric approach to estimate the PDF.

mystats.calc_symmetric_error(x, y=None, alpha=0.05)[source]
Parameters: x : array-like y : array-like, optional If provided, treated as the PDF of x
mystats.ftest(chi1, chi2, dof1, dof2)[source]

The function ftest() c omputes the probability for a value drawn from the F-distribution to equal or exceed the given value of F. This can be used for confidence testing of a measured value obeying the F-distribution (i.e., ffor testing the ratio of variances, or equivalently for the addition of parameters to a fitted model).

P_F(X > F; DOF1, DOF2) = PROB

In specifying the returned probability level the user has three choices:

• return the confidence level when the /CLEVEL keyword is passed; OR
• return the significance level (i.e., 1 - confidence level) when the /SLEVEL keyword is passed (default); OR
• return the “sigma” of the probability (i.e., compute the probability based on the normal distribution) when the /SIGMA keyword is passed.

Note that /SLEVEL, /CLEVEL and /SIGMA are mutually exclusive.

For the ratio of variance test, the two variances, VAR1 and VAR2, should be distributed according to the chi-squared distribution with degrees of freedom DOF1 and DOF2 respectively. The F-value is computed as:

F = (VAR1/DOF1) / (VAR2/DOF2)

and then the probability is computed as:

PROB = MPFTEST(F, DOF1, DOF2, ... )

For the test of additional parameters in least squares fitting, the user should perform two separate fits, and have two chi-squared values. One fit should be the “original” fit with no additional parameters, and one fit should be the “new” fit with M additional parameters.

CHI1 - chi-squared value for original fit

DOF1 - number of degrees of freedom of CHI1 (number of data
points minus number of original parameters)

CHI2 - chi-squared value for new fit

DOF2 - number of degrees of freedom of CHI2

Note that according to this formalism, the number of degrees of freedom in the “new” fit, DOF2, should be less than the number of degrees of freedom in the “original” fit, DOF1 (DOF2 < DOF1); and also CHI2 < CHI1.

With the above definition, the F value is computed as:

F = ( (CHI1-CHI2)/(DOF1-DOF2) ) / (CHI2/DOF2)

where DOF1-DOF2 is equal to M, and then the F-test probability is computed as:

PROB = MPFTEST(F, DOF1-DOF2, DOF2, ... )

Note that this formalism assumes that the addition of the M parameters is a small peturbation to the overall fit. If the additional parameters dramatically changes the character of the model, then the first “ratio of variance” test is more appropriate, where F = (CHI1/DOF1) / (CHI2/DOF2).

mystats.fvalue(chi1, chi2, dof1, dof2)[source]
mystats.gauss(x, width, amp, x0)[source]
mystats.get_rms(x, axis=None)[source]

Calculates the rms of an array.

mystats.logL2L(logL, normalize=True)[source]
mystats.main()[source]
mystats.normalize_logL(logL)[source]
class mystats.rv2d_discrete(likelihoods=None, param_grid1=None, param_grid2=None, param_name1='param1', param_name2='param2', L_scalar=None)[source]

Bases: object

A generic discrete 2D random variable class meant for subclassing. Similar to scipy.stats.rv_discrete.

Parameters: likelihoods : array-like N x M array of relative likelihoods corresponding to parameter 1 and 2 values. param_grid1, param_grid2 : array-like Parameter values corresponding to element positions of likelihoods. The lengths of param_grid1 and param_grid2 must be N and M respectively. param_name1, param_name2 : str Names of parameters 1 and 2. L_scalar : float Inverse scalar of the likelihoods to calculate the 2D PDF. The likelihoods are divided by the minimum non-zero likelihood otherwise.

Methods

np = <module 'numpy' from '/usr/lib64/python2.7/site-packages/numpy/__init__.pyc'>
rvs()[source]

Returns parameters random sample from the pdf.

class mystats.rv3d_discrete(likelihoods=None, param_grid1=None, param_grid2=None, param_grid3=None, param_name1='param1', param_name2='param2', param_name3='param3', L_scalar=None)[source]

Bases: object

A generic discrete 3D random variable class meant for subclassing. Similar to scipy.stats.rv_discrete.

Parameters: likelihoods : array-like N x M X P array of relative likelihoods corresponding to parameter 1 2, and 3 values. param_grid1, param_grid2, param_grid3: array-like Parameter values corresponding to element positions of likelihoods. The lengths of param_grid1, param_grid2 and param_grid3 must be N and M and P respectively. param_name1, param_name2, param_name3 : str Names of parameters 1, 2 and 3. L_scalar : float Inverse scalar of the likelihoods to calculate the 3D PDF. The likelihoods are divided by the minimum non-zero likelihood otherwise.

Methods

np = <module 'numpy' from '/usr/lib64/python2.7/site-packages/numpy/__init__.pyc'>
rvs()[source]

Returns parameters random sample from the pdf.

mystats.sigfig(value, sig_digits=1)[source]

Converts number to have the significant digits.

mystats.test_bootstrap()[source]
mystats.unique_permutations(elements)[source]

Calculates the unique permutations of a set of elements

Parameters: elements : array-like List of N elements to be permutated permutations : array-like N x M array, where M is the number of unique permutations.