Given by Geoffrey Fox at NPAC Java Academy February--April 99 on Mar 20 1999. Foils prepared Mar 20 1999
Outside Index
Summary of Material
We learnt how to build Applets that simulate Tossing of Coins and Throwing Dice |
Here we describe implications for Polling |
Details at Java Academy Site |
Outside Index Summary of Material
If you follow the process of an election or the White House or Republicans deciding on the correct strategy, one goes out and takes a poll. |
Often the same issue is polled by lots of different groups; CNN, New York Times; political parties etc. You may have noticed that most polls use between 500 and 1500 people. Here we will try to show you why! |
You may have also noticed that such polls quote a margin of error in their results |
e.g. they might say that 50% of the people polled thought Donald Duck would win the election with a margin of an error of 3 percent points. |
They predict that somewhere between 47% and 53% of the public would vote for Donald Duck. |
If all 100 million or so of the US voting population vote, you get the real result. However you can get to within 3% of the answer by just talking to 1000 people – a tiny fraction .00001 of the total. |
This could save a lot of money on Election Day and says you can stay home and learn more Java unless it is a close election. Actually this is not a very democratic way to think, and everybody should vote – otherwise those that love Java would always stay at home and we would never elect Politicians who program... |
So if one makes a poll, the error comes from two sources – the so-called statistical error and the so-called bias.
|
There is not much point in using more 1000 or so in a poll as you will decrease the statistical error but it will be hard to decrease the overall error much more. |
Let us consider the case of rolling 200 Dice and summing the spots. This will lead to a result between 201 and 1200 with an average of 700. |
Subtracting 200, we get the equivalent of 1000 people with a poll whose result is 50% yes and 50% no. Any one summing of 200 Dice will not get 700 but a value near this. The applet does this process a number totalFrq times and plots the result as a histogram or frequency plot of dice sums. |
Frequency |
Value of 200 Summed Dice faces |
700 Mean |
+24 |
-24 |
Note that the histogram is quite a pretty bell shape and in fact there is a fancy theory called the "central limit theorem" which derives the shape as a so-called Gaussian. |
There is a separate discussion of Gaussians on the web site |
+24 |
-24 |
700 Mean |
One finds a percentage error or standard deviation of 2.4%. |
10% of events are in upper yellow region > 3% larger than mean |
So a better model is tossing coins a 1000 times. |
Shift over coin toss by 200 so results lie in range 201..1200 as in the Dice accumulation. |
Now the frequency distribution is narrower. |
It is still a beautiful bell shaped Gaussian but the percentage error is 1.6% not 2.4%. |
You will get better answers from asking 1 person from 1000 households rather than 5 people from each of 200 households. |
Actually Dice are not a very good representation of polling, as a result with 1000 cells corresponds to only 200 Dice |
This is like polling where you went to 200 households and asked 5 people in each household. It is not 1000 independent samples. |
+16 |
-16 |
700 Mean |
1000 Tosses Count Number of Heads |
Add 200 |
With 100 coin tosses (a polling sample of 100), the statistical error is 5% -- too big to be comfortable. There is a simple rule for such errors
|
100 Coin Tosses |
400 Coin Tosses |
Error 5% |
Error 2.5% |
Shape is still a Gaussian |
1000 Coin Toss Error a factor of Sqrt(10) Smaller than 100 Tosses |
Suppose we reduce the number of trials from the excessive 300,000 to a more immediate 10,000. |
Then the means and error estimates come out just fine but the histogram is more ragged. |
The histogram is not so near the red lines (The Gaussian Bell) but has random deviations above and below it |
I got a bit worried in these long computations. Was the program running OK? |
So I used a useful dodge of inserting print statements to the Java Console |
(if(i%5000 == 0 ) System.out.println ("Generating " +i);) |
Program wrote a message every 5000 trials. |
I also wrote out some messages in paint method to show it had got there. |
Note messages in paint are repeated each time page reloads but earlier messages from init are not. |