With what probability one coin is better than the other?

With what probability one coin is better than the other?



Let's say we have two biased coins C1 and C2 both having different probability of turning head.


C1


C2



We toss C1 n1 times and get H1 heads, C2 n2 times and get H2 heads. And we find that the ratio of heads for one coin is higher than the other.


C1


n1


H1


C2


n2


H2



What is the probability with which we can say that one coin is better than the other? (better here means higher actual probability of turning head).





$begingroup$
Isn't this a hypothesis testing problem where you have to test whether the estimates of probability of turning up heads for the coins are different.
$endgroup$
– naive
Sep 7 '18 at 10:33





$begingroup$
If you want a probability that one coin is better than another, you can do that with a Bayesian approach; all you need is a prior distribution on the probability. It's a fairly straightforward calculation.
$endgroup$
– Glen_b
Sep 7 '18 at 12:49






$begingroup$
There is no way by which this data is gonna give you a probability that one coin is better than the other. You need additional info/assumptions about a prior distribution for how coins are distributed. Say in a Bayesian approach then the result will differ a lot based on your prior assumptions. For instance if those coins are regular coins than they are (a priori) very likely to be equal and you would need a large amount of trials to find decent probability that one is 'better' than the other (because the probability is high that you got two similar coins that accidentally test different)
$endgroup$
– Martijn Weterings
Sep 9 '18 at 14:08






$begingroup$
@katosh, it is as exact as the prior. Say you have coins between p=0.49 and p=0.51. If in such case you use the assumption that the coins are uniform distributed between 0 and 1, then the probabilities that you assign are more like updated bad assumptions rather than updated probabilities. You will too often assign a high probability of being 'better' to the wrong coin. It would be wrong to consider the outcome as 'the' probability for the coin being 'better' than the other. Bad data in = bad results out. In this problem we shouldn't work on solving the math but on solving the knowledge.
$endgroup$
– Martijn Weterings
Sep 11 '18 at 6:20






$begingroup$
In that way it is stated correctly. It is a 'believe' and this is not so easily equated to a 'probability'.
$endgroup$
– Martijn Weterings
Sep 11 '18 at 12:05




2 Answers
2



It is easy to calculate the probability for making that observation, given the fact the two coins are equal. This can be done by Fishers exact test. Given these observations



$$
beginarray c
&textcoin 1 &textcoin 2 \
hline
textheads &H_1 &H_2\
hline
texttails &n_1-H_1 &n_2-H_2\endarray
$$



the probability to observe these numbers while the coins are equal given the number of tries $n_1$, $n_2$ and the total amount of heads $H_1+H_2$ is
$$
p(H_1, H_2|n_1, n_2, H_1+H_2) = frac(H_1+H_2)!(n_1+n_2-H_1-H_2)!n_1!n_2!H_1!H_2!(n_1-H_1)!(n_2-H_2)!(n_1+n_2)!.
$$



But what you are asking for is the probability that one coin is better.
Since we argue about a believe on how biased the coins are we have to use a Bayesian approach to calculate the result. Please note, that in Bayesian inference the term belief is modeled as probability and the two terms are used interchangeably (s. Bayesian probability). We call the probability that coin $i$ tosses heads $p_i$. The posterior distribution after observation, for this $p_i$ is given by Bayes' theorem:
$$
f(p_i|H_i,n_i)= fracf(H_if(n_i,H_i)
$$

The probability density function (pdf) $f(H_i|p_i,n_i)$ is given by the Binomial probability, since the individual tries are Bernoulli experiments:
$$
f(H_i|p_i,n_i) = binomn_iH_ip_i^H_i(1-p_i)^n_i-H_i
$$

I assume the prior knowledge on $f(p_i)$ is that $p_i$ could lie anywhere between $0$ and $1$ with equal probability, hence $f(p_i) = 1$. So the nominator is $f(H_i|p_i,n_i)f(p_i)= f(H_i|p_i,n_i)$.



In Order to calculate $f(n_i,H_i)$ we use the fact that the integral over a pdf has to be one $int_0^1f(p|H_i,n_i)mathrm dp = 1$. So the denominator will be a constant factor to achieve just that. There is a known pdf that differs from the nominator by only a constant factor, which is the beta distribution. Hence
$$
f(p_i|H_i,n_i) = frac1B(H_i+1, n_i-H_i+1)p_i^H_i(1-p_i)^n_i-H_i.
$$



The pdf for the pair of probabilities of independent coins is
$$
f(p_1,p_2|H_1,n_1,H_2,n_2) = f(p_1|H_1,n_1)f(p_2|H_2,n_2).
$$



Now we need to integrate this over the cases in which $p_1>p_2$ in order to find out how probable coin $1$ is better then coin $2$:
$$beginalign
mathbb P(p_1>p_2)
&= int_0^1 int_0^p‘_1 f(p‘_1,p‘_2|H_1,n_1,H_2,n_2)mathrm dp‘_2 mathrm dp‘_1\
&=int_0^1 fracB(p‘_1;H_2+1,n_2-H_2+1)B(H_2+1,n_2-H_2+1)
f(p‘_1|H_1,n_1)mathrm dp‘_1
endalign$$



I cannot solve this last integral analytically but one can solve it numerically with a computer after plugging in the numbers. $B(cdot,cdot)$ is the beta function and $B(cdot;cdot,cdot)$ is the incomplete beta function. Note that $mathbb P(p_1=p_2) = 0$ because $p_1$ is a continues variable and never exactly the same as $p_2$.



Concerning the prior assumption on $f(p_i)$ and remarks on it: A good alternative to model many believes is to use a beta distribution $Beta(a_i+1,b_i+1)$. This would lead to a final probability
$$
mathbb P(p_1>p_2)
=int_0^1 fracB(p‘_1;H_2+1+a_2,n_2-H_2+1+b_2)B(H_2+1+a_2,n_2-H_2+1+b_2)
f(p‘_1|H_1+a_1,n_1+a_1+b_1)mathrm dp‘_1.
$$

That way one could model a strong bias towards regular coins by large but equal $a_i$, $b_i$. It would be equivalent to tossing the coin $a_i+b_i$ additional times and receiving $a_i$ heads hence equivalent to just having more data. $a_i + b_i$ is the amount of tosses we would not have to make if we include this prior.



The OP stated that the two coins are both biased to an unknown degree. So I understood all knowledge has to be inferred from the observations. This is why I opted for an uninformative prior that dose not bias the result e.g. towards regular coins.



All information can be conveyed in the form of $(H_i, n_i)$ per coin. The lack of an informative prior only means more observations are needed to decide which coin is better with high probability.



Here is the code in R that provides a function P(n1, H1, n2, H2) $=mathbb P(p_1>p_2)$ using the uniform prior $f(p_i)=1$:


P(n1, H1, n2, H2)


mp <- function(p1, n1, H1, n2, H2)
f1 <- pbeta(p1, H2 + 1, n2 - H2 + 1)
f2 <- dbeta(p1, H1 + 1, n1 - H1 + 1)
return(f1 * f2)


P <- function(n1, H1, n2, H2)
return(integrate(mp, 0, 1, n1, H1, n2, H2))



You can draw $P(p_1>p_2)$ for different experimental results and fixed $n_1$, $n_2$ e.g. $n_1=n_2=4$ with this code sniped:


library(lattice)
n1 <- 4
n2 <- 4
heads <- expand.grid(H1 = 0:n1, H2 = 0:n2)
heads$P <- apply(heads, 1, function(H) P(n1, H[1], n2, H[2])$value)
levelplot(P ~ H1 + H2, heads, main = "P(p1 > p2)")



P(p1 > p2) for n1=n2=4



You may need to install.packages("lattice") first.


install.packages("lattice")



One can see, that even with the uniform prior and a small sample size, the probability or believe that one coin is better can become quite solid, when $H_1$ and $H_2$ differ enough. An even smaller relative difference is needed if $n_1$ and $n_2$ are even larger. Here is a plot for $n_1=100$ and $n_2=200$:



enter image description here



Martijn Weterings suggested to calculate the posterior probability distribution for the difference between $p_1$ and $p_2$. This can be done by integrating the pdf of the pair over the set $S(d)=$:
$$beginalign
f(d|H_1,n_1,H_2,n_2)
&= int_S(d)f(p_1,p_2|H_1,n_1,H_2,n_2) mathrm dgamma\
&= int_0^1-d f(p,p+d|H_1,n_1,H_2,n_2) mathrm dp + int_d^1 f(p,p-d|H_1,n_1,H_2,n_2) mathrm dp\
endalign$$



Again, not an integral I can solve analytically but the R code would be:


d1 <- function(p, d, n1, H1, n2, H2)
f1 <- dbeta(p, H1 + 1, n1 - H1 + 1)
f2 <- dbeta(p + d, H2 + 1, n2 - H2 + 1)
return(f1 * f2)


d2 <- function(p, d, n1, H1, n2, H2)
f1 <- dbeta(p, H1 + 1, n1 - H1 + 1)
f2 <- dbeta(p - d, H2 + 1, n2 - H2 + 1)
return(f1 * f2)


fd <- function(d, n1, H1, n2, H2)
if (d==1) return(0)
s1 <- integrate(d1, 0, 1-d, d, n1, H1, n2, H2)
s2 <- integrate(d2, d, 1, d, n1, H1, n2, H2)
return(s1$value + s2$value)



I plotted $f(d|n_1,H_1,n_2,H_2)$ for $n_1=4$, $H_1=3$, $n_2=4$ and all values of $H_2$:


n1 <- 4
n2 <- 4
H1 <- 3
d <- seq(0, 1, length = 500)

get_f <- function(H2) sapply(d, fd, n1, H1, n2, H2)
dat <- sapply(0:n2, get_f)

matplot(d, dat, type = "l", ylab = "Density",
main = "f(d | 4, 3, 4, H2)")
legend("topright", legend = paste("H2 =", 0:n2),
col = 1:(n2 + 1), pch = "-")



enter image description here



You can calculate the probability of $|p_1-p_2|$ to be above a value $d$ by integrate(fd, d, 1, n1, H1, n2, H2). Mind that the double application of the numerical integral comes with some numerical error. E.g. integrate(fd, 0, 1, n1, H1, n2, H2) should always equal $1$ since $d$ always takes a value between $0$ and $1$. But the result often deviates slightly.


integrate(fd, d, 1, n1, H1, n2, H2)


integrate(fd, 0, 1, n1, H1, n2, H2)





$begingroup$
I don't know the actual value of p1
$endgroup$
– Thirupathi Thangavel
Sep 9 '18 at 12:55





$begingroup$
I am sorry for my bad notation 😅 but you do not need to plug in a fixed value for $p_1$. The (now changed) $p‘_1$ in the integral is the variable you integrate over. Just like you can integrate $int_0^1 x^2 mathrm dx$ without having a fixed value for $x$.
$endgroup$
– katosh
Sep 9 '18 at 13:38





$begingroup$
Fisher's exact test is more specifically about the hypothesis that the coins have the same probability and the marginal totals are fixed. This is not the case in this coin problem. If you do the test again then you could observe some other total amount of heads.
$endgroup$
– Martijn Weterings
Sep 9 '18 at 14:04





$begingroup$
@MartijnWeterings in my case, the probability of turning head for a coin is fixed always. Isn't that enough?
$endgroup$
– Thirupathi Thangavel
Sep 9 '18 at 14:12





$begingroup$
@ThirupathiThangavel the problem with Fisher's test is about the not fixed marginal totals. The exact test model assumes that the probability of heads are the same and fixed, but also marginals are fixed before the experiment. That second part is not the case. This gives a different conditional probability for extreme values. Overall the Fisher test will be conservative. The probability of the result given the TRUE hypothesis (ie. fixed and similar probability for heads, but not necessarily fixed marginal totals) is smaller than calculated (you get higher p-values).
$endgroup$
– Martijn Weterings
Sep 9 '18 at 14:43




I've made a numerical simulation with R, probably you're looking for an analytical answer, but I thought this could be interesting to share.


R


set.seed(123)
# coin 1
N1 = 20
theta1 = 0.7

toss1 <- rbinom(n = N1, size = 1, prob = theta1)

# coin 2
N2 = 25
theta2 = 0.5

toss2 <- rbinom(n = N2, size = 1, prob = theta2)

# frequency
sum(toss1)/N1 # [1] 0.65
sum(toss2)/N2 # [1] 0.52



In this first code, I simply simulate two coin toss. Here you can see of course that it theta1 > theta2, then of course the frequency of H1 will be higher than H2. Note the different N1,N2 sizes.


theta1 > theta2


H1


H2


N1


N2



Let's see what we can do with different thetas. Note the code is not optimal. At all.


thetas


simulation <- function(N1, N2, theta1, theta2, nsim = 100)
count1 <- count2 <- 0

for (i in 1:nsim)
toss1 <- rbinom(n = N1, size = 1, prob = theta1)
toss2 <- rbinom(n = N2, size = 1, prob = theta2)

if (sum(toss1)/N1 > sum(toss2)/N2) count1 = count1 + 1
#if (sum(toss1)/N1 < sum(toss2)/N2) count2 = count2 + 1


count1/nsim


set.seed(123)
simulation(20, 25, 0.7, 0.5, 100)
#[1] 0.93



So 0.93 is the frequency of times (out of a 100) that the first coin had more heads. This seems ok, looking at theta1 and theta2 used.


theta1


theta2



Let's see with two vector of thetas.


thetas


theta1_v <- seq(from = 0.1, to = 0.9, by = 0.1)
theta2_v <- seq(from = 0.9, to = 0.1, by = -0.1)

res_v <- c()
for (i in 1:length(theta1_v))

res <- simulation(1000, 1500, theta1_v[i], theta2_v[i], 100)
res_v[i] <- res



plot(theta1_v, res_v, type = "l")



Remember that res_v are the frequencies where H1 > H2, out of 100 simulations.


res_v


H1 > H2



enter image description here



So as theta1 increases, then the probability of H1 being higher increases, of course.


theta1


H1



I've done some other simulations and it seems that the sizes N1,N2 are less important.


N1


N2



If you're familiar with R you can use this code to shed some light on the problem. I'm aware this is not a complete analysis, and it can be improved.


R





$begingroup$
Interesting how res_v changes continuously when the thetas meet. I understood the question as it was asking about the coins intrinsic bias after making just a single observation. You seem to answer what observations one would make after knowing the bias.
$endgroup$
– katosh
Sep 7 '18 at 13:30


res_v



Thanks for contributing an answer to Cross Validated!



But avoid



Use MathJax to format equations. MathJax reference.



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)