User blog:Wizongod/Calculating Confidence Intervals for Underlying Probabilities

About
This post introduces formulas that would prove very useful for players who are researching and conducting experiments, collecting data for analysis of the game’s mechanics. It deals with back-calculating the underlying probability from a set of data collected including the xx% confidence bounds, thus allowing hypothesis testing.

Background
(Feel free to skip reading this section)

Kancolle is a game of numbers and statistics. With much of the underlying probabilities not released to the gamer, there is a need to collect statistics to form conclusions regarding the non-obvious mechanics of the game, allowing much of the qualitative results or haunches to be backed by statistical reasoning. Even monitoring the JSON stream in communications to the server do not provide any of the underlying probability values, although it certainly makes the collection of statistics easier. An obvious application of being able to know the underlying probabilities is crafting - an often frustrating process. Other applications are usually found in combat, e.g. cut-in chance based on type of equipment.

Collecting statistics to draw conclusions about the game mechanics is not new. Many enthusiastic players have spent hours on end collecting data sets and displaying their results, often in tables, for others to see and judge, while publishing their conclusions and participating in discussions. However, what I have noticed so far is that everyone concludes in a arbitrary sense whether a set of data is “significant”, “not so significant”, or “not significant”. There is a quantitative way of doing so, and thus the following section has been developed.

Motivation
Calculating the underlying probability may seem like an easy task. “If 10 attempts were made at crafting and only 1 succeeded, then the success rate is 10%.” This is true, but it is the success rate of one particular series of crafts, and moreover, would you trust someone who said that the chances of getting Yamato in LSC was 100% since they tried once and got her immediately? The success rate of a series of data is not the same as the underlying probability. To appreciate the difference, think about an example of crafting 46cm guns. Say someone crafted it once out of 10 times, and told you that the success rate is therefore 10%. Another person tells you they tried 1000 times, and succeeded 100 times, therefore the success rate is 10%. Which dataset would you trust more? Whose dataset gives you more certainty? And most importantly, is that certainty measurable?

It is, of course, possible to place some figures to that probability, and would then allow better judgement. Continuing from the example, it is pointless to say that the probability is exactly 10%, i.e. 10.00000...%. For all we know, it could be 10.01%, and that would still have given the same effective results. A band is needed instead, so we could say that it is likely to be within a range of 8-12%. Further, we could also say that this range lies in the 99% confidence interval (or 95%, 90%, etc.). This means that there is a 99% chance of the actual underlying probability falling within that range that was just stated.

So far, the above figures are just for example. The band and the confidence level has to be calculated, and this section will continue to show how it is done. As one might expect, this is not applicable to just crafting alone. For example, a certain ship cuts-in 50 times collected over 100 tries. After changing the equipment, it cuts in at 58 times over 100 tries. Is that a significant change? From the data, the bands at various confidence intervals can be calculated and thus the result can be judged to be significant or not significant, rather than leaving it to just “gut feel”.

Calcuations
To calculate the upper bounds and lower bounds of the band, the whole space over which results are possible must be established. That is to say, that if all underlying probabilities are considered, i.e. 0% to 100%, and totalled, the result should be 100%. Another way of saying this is that the certainty of the underlying probability being within 0% to 100% is absolutely certain (100% certain), which is obvious. However, this forms the basis for the formula:

$$\int_{0}^{1}{}^{N}C_{r}(1-x)^{N-r}x^{r}dx=1$$

As can be seen, all that is really done is to integrate across all probabilities of x for a particular outcome of r successes in N attempts for a binomial distribution, where NCr is the “n choose r” binomial coefficient function. However, the integral does not actually equate to 1. There is a missing normalising constant, which is found easily upon inspection, and thus the complete formula is:

$$(N+1){}^{N}C_{r}\int_{0}^{1}(1-x)^{N-r}x^{r}dx=1$$

To calculate a confidence interval (CI) which x is located in, all that is needed is to replace the integral limits with two variables: xupper and xlower as such:

$$(N+1){}^{N}C_{r}\int_{x_{lower}}^{x_{upper}}(1-x)^{N-r}x^{r}dx=CI$$

Now the difficult part is to find xupper and xlower which is a pain when N is large as needed for a narrow interval. Firstly, it must be noted that the above equation is asymmetrical (except at x =50%) and so xupper and xlower cannot be calculated from the peak of the distribution, moving outwards till the area between both bounds equal to the confidence level. Instead, xupper and xlower should be moved in from the ends of 1 and 0 respectively, with the tails having equal probability on both sides. Thus:

$$(N+1){}^{N}C_{r}\int_{x_{upper}}^{1}(1-x)^{N-r}x^{r}dx=\frac{1-CI}{2}$$

$$(N+1){}^{N}C_{r}\int_{0}^{x_{lower}}(1-x)^{N-r}x^{r}=\frac{1-CI}{2}$$

which unfortunately has no closed form solution. However, a MatLab script can be written to find each of the boundaries (which I have done).