User blog comment:Homuhomu123/Estimation of Radar Modifiers on Artillery Spotting Chance/@comment-25534439-20141213144852/@comment-25534439-20141213210604

Pulling primarily from http://en.wikipedia.org/wiki/Checking_whether_a_coin_is_fair, the standard error calculation is s = (p*(1-p)/n)^0.5, and the confidence limit E = Z*s. Taking the Experimental CA/DI Rates you had as a fair estimate of the actual probability p, the n tests for each boat, and a Z value of 1.6 (90% confidence interval), I get the resulting intervals posted above.

As I mentioned, the proper method is to use the integral of a beta function that the wikipedia article mentions about halfway down, but that's more math. It works better because the probability density isn't normalized and it's also restricted to a limited interval (0 to 100%).

As for the sample size, it's a function of how precise you want to be, what the probability actually is, and how confident you want to be with your results. If you want to talk about comparing precise numbers like +13.5% vs +15%, you probably want your precision to be +-0.5% or less. Lastly, the confidence interval that you choose will be larger for higher Z values and will require more tests to reach the desired precision (95% confidence of a +-5% precision takes more tests than 80% confidence of the same +-5% precision). Taking the standard error calculation and rearranging for n: n=(p*(1-p))/(E/Z)^2

Some examples to give you a feel for the numbers (Z values from wikipedia):

For a CI/DA rate of 75% (p=0.75), precision of +-5% (E=0.05) and 90% confidence (Z=1.6), n=192. For a CI/DA rate of 50% (p=0.50), precision of +-5% (E=0.05) and 90% confidence (Z=1.6), n=256. For a CI/DA rate of 75% (p=0.75), precision of +-2% (E=0.02) and 90% confidence (Z=1.6), n=1200. For a CI/DA rate of 75% (p=0.75), precision of +-1% (E=0.01) and 90% confidence (Z=1.6), n=4800. For a CI/DA rate of 75% (p=0.75), precision of +-5% (E=0.05) and 68% confidence (Z=1.0), n=75. For a CI/DA rate of 75% (p=0.75), precision of +-1% (E=0.01) and 68% confidence (Z=1.0), n=1875.

This is, as mentioned, the rough version of the calculations; as you can see, getting high precision this way takes an enormous amount of testing to really nail down a % to high precision and confidence. As I said, you're in the right ballpark for comparisons of values ~5% apart (like Air vs Surface radar impacts). My recommendation would be to aim for 300 tests, so long as you're working alone. Once you've got a working formula and want more precision, it can be tested with more people simultaneously.