cstat

Synopsis

A maximum likelihood function from XSPEC

Description

The cstat statistic is equivalent to the XSPEC implementation of the Cash statistic, where the background has to be modelled. The wstat statistic should be used if the background data should be treated as a model component. New in CIAO 4.18 is the cstatnegativepenalty statistic which can help the optimiser if the model can predict negative values.

This statistic corresponds to the "Poisson data (cstat)" section of the Statistics in XSPEC appendix.

Counts are sampled from the Poisson distribution, and so the best way to assess the quality of model fits is to use the product of individual Poisson probabilities computed in each bin i, or the likelihood L:

L = (product)_i [ M(i)^D(i)/D(i)! ] * exp[-M(i)]

where M(i) = S(i) + B(i) is the sum of source and background model amplitudes, and D(i) is the number of observed counts, in bin i.

The cstat statistic (Cash 1979, ApJ 228, 939) is derived by (1) taking the logarithm of the likelihood function, (2) changing its sign, (3) dropping the factorial term (which remains constant during fits to the same dataset), (4) adding an extra data-dependent term, and (4) multiplying by two:

C = 2 * (sum)_i [ M(i) - D(i) + D(i)*[log D(i) - log M(i)] ]

The factor of two exists so that the change in cstat statistic from one model fit to the next, (Delta)C, is distributed approximately as (Delta)chi-square when the number of counts in each bin is high (> 5). One can then in principle use (Delta)C instead of (Delta)chi-square in certain model comparison tests. However, unlike chi-square, the cstat statistic may be used regardless of the number of counts in each bin.

An advantage of cstat over Sherpa's implementation of cash is that one can assign an approximate goodness-of-fit measure to a given value of the cstat statistic, i.e. the observed statistic, divided by the number of degrees of freedom, should be of order 1 for good fits. Note that simulations should be used to verify that this reduced statistic is robust.

Background Subtraction

When using the CStat statistic, the background should not be subtracted but modelled simultaneously with the source; that is, a background model should be used. The WStat statistic is the extension of CStat to include the background data as a model component (i.e. without the need to write a model for the background).

Zero and negative value numbers

The cstat statistic function evaluates the logarithm of each data point. If the number of counts is zero or negative, it's not possible to take the log of that number. The behavior in this case is controlled by the truncate and trunc_value settings in the .sherpa.rc file; see "ahelp sherparc" for details on this file.

If truncate is set to True (the default), then log(<trunc_value>) is substituted into the equation, and the statistics calculation proceeds. The default trunc_value is 1.0e-25.

If truncate is set to False, C-stat returns an error and stops the calculation when the number of counts in a bin is zero or negative. The trunc_value setting is not used.

Models that go negative

The statistic is also unable to handle models which produce negative values, thanks to the "log M(i)" term above. The same replacement scheme as that used for data values is used when calculating the statistic. This means that any model that predicts negative values is penalised but there is little information that the optimiser can use to improve the fit. This can result in fits failing to find a local minimum. New to CIAO 4.18 is the "cstatnegativepenalty" statistic, which adds a penalty for each negative model value, with the penalty varynig depending on the absolute size of the values. This can help the optimiser move out of the problem area of the search space.

Example

sherpa> set_stat("cstat")
sherpa> show_stat()
Statistic: CStat
...

Set the fitting statistic and then confirm the new value.

Bugs

See the bugs pages on the Sherpa website for an up-to-date listing of known bugs.