Cmput 455 Lecture 15, details for slide "Wrong Choice in Bandits", code binomial-select.py Detailed explanation for the first result computed by the program and shown in binomial-select-experiment.txt 
 p1 = 0.9, p2 = 0.2 Both have 1 simulations. Prob. of wrong arm choice 0.15 Why? 4 cases, each of the 2 sim. can be either win or loss. Compute the probability of each case. 50/50 p1 win and p2 win: 0.9*0.2 = 0.18 GOOD p1 win and p2 loss: 0.9*0.8 = 0.72 BAD p1 loss and p2 win: 0.1*0.2 = 0.02 50/50 p1 loss, p2 loss: 0.1*0.8 = 0.08 50/50: if both arms have the same number of wins, we choose randomly GOOD: the better arm (p1 here) has more wins BAD: the worse arm (p2 here) has more wins p(make the right choice) = 0.72 + 0.5(0.18 + 0.08) = 0.72 + 0.13 = 0.85 p(make wrong choice) = 0.02 + 0.5(0.18 + 0.08) = 0.02 + 0.13 = 0.15