Bayes’ Theorem

Bayes’ Theorem mathematically represents parts and whole relations that involve probability. To illustrate, let’s use a 10 x 10 grid of rectangles of equal area:Dart Board 1 (2).jpgLet’s say that it is equally probable or likely that something is positioned within the bounds of any one of these rectangles. Why might we think this? Well, let’s say that the grid represents a dart board, and you are throwing a dart behind your head. You hear the dart hit the board with its distinct, wooden thud, which sounds nothing like a dart hitting the drywall around it (Yes, we play dangerously). The grid represents equally sectioned-off spaces within the dart board that exhaust all the space of that dart board. And so, since there are 100 rectangles, the probability that the dart is in any particular square, based on the information we have, is 1/100.

Next, let’s say that we receive new information. We catch a glance at the board through the reflection of a copper urn, and, although hard to make out for being mirrored and distorted, we are able to tell that the dart has ended up in one particular area of the board, the one represented by ‘here’ on the grid above. Any further information than this is too difficult to tell. What is the probability as far as we now know, or as far as our new information allows, that the dart is on one particular rectangle marked “here” on the board as opposed to some other “here” rectangle? Well, there are 4 “here” rectangles, and we have no further information to decide whether it is one or another (or even that one is to be favored over another). So the answer is 1/4.

Our ability to make these simple yet crucial updates to how likely some event is given new information is reflected in Bayes’ Theorem.

The probability we are measuring in the example is the probability that the dart landing on a particular “here” rectangle (Let’s pick one and call it H1) granted that it did land in some “here” rectangle (let’s call this entire area of “here” rectangles H). Bayes’ Theorem is an equation that calculates this probability. It goes as follows:

The probability that the dart lands on H1 given that it lands on H as a whole equals the probability that it lands on H1 (granted the first information that it hits the board) multiplied by the probability that the dart lands on H granted it has hit the particular one (H1), and this is all divided by the probability that the dart lands on H (granted the first information that it hits the board). This equation is summarized as

P (H1 | H) = P(H1) P(H | H1)
                       ——————
                              P (H)

“P (H)” means the probability that event H has occurred, and “|” indicates “given that” or “granted that” or “on the condition that” so that “P (H1 | H)” means the probability that the event H1 occurs granted that H has occurred (such probabilities are usually called conditional probabilities).

With this, we can fill in the numbers and calculate this simple equation. The probability that the dart lands on H1 for the initial throw is 1/100, the probability it hits on some H granted it hits H1 is 1 (or 100%), and the probability it hits an H for the initial throw is 4/100. This gets us

P (H1 | H) = 1/100 * 1
                       ———— ,      which equals
                         4/100

                         1/100
                         ——— ,        which equals
                         4/100       

                             1
                            —
                             4                             

So the probability that the dart has landed on the particular “here” rectangle we called H1 given that it has hit some “here” rectangle in H is 1/4.

In general, the probability of some target outcome (often called the hypothesis) given some information or evidence, or P (T | E), equals the probability of that target outcome P (T) multiplied by the probability of the evidence given the target outcome P (E | T) divided by the probability of the evidence P (E):

P (T | E) = P(T) P(E | T)
                     —————
                           P (E)

To understand more about why the right side of Bayes’ Theorem involves the three parts it does, let’s consider how the probabilities should shift when the probability of these parts of it shift, and how the equation accurately reflects these shifts.

Consider the same dart board as before, except that now it has a known defect. The wooden board is missing behind the two left squares marked “here” as well as the square below the bottom-left “here” square. There is only a thin paper over this area, so any dart that hits it goes right through. You throw the dart behind you and you hear it go through the paper. With just this information, what’s the probability that it went through one of the previously marked “here” squares?

Where WT is “went through”, or indicates the event that the dart has gone through the defective paper part of the board, and H is the area marked with “here” squares, the equation is

P (H | WT) = P(H) P(WT | H)
                        ———————
                              P (WT)

With the probabilities filled in, that makes

P (H | WT) = 4/100 * 1/2          
                          —————         =    
                              3/100                

                               2
                              —
                               3

And 2/3rds is clearly correct. Just look at the graph (where the squares with the light-green boarders are only paper):Dart Board 2.jpgNow, what should happen if it is very probable to hit a defective area of the board? Let’s say that, in addition to the squares already defective, an additional 47 are. If so, knowing that a dart went through a defective area would not make hitting a “here” square so likely this time. If all the defective squares all went to non-“here” squares, the likelihood that the dart has hit a “here” square given that it went through the board would be

P (H | WT) = 4/100 * 1/2         
                             ————        = 
                              50/100    

                               1
                              —
                             25

This makes sense. If we know a dart hits a square that it went through, then there are only 2 out of 50 total squares that are marked “here”—Parts and wholes in action!

What happens if we return the defect to how it was previously with the three squares, and just change how many “here” squares there are to include the entire board minus the one defective square that was also previously not a “here” square (that is, this square remains as not marked by a “here”, and indeed it is the only such non-here square on the board). We should expect nothing to change in the probability, despite the change in the initial probability before throwing a dart that there is a mere 1/100 chance of hitting a non-here square. Once the dart is known to go through the paper part of the board, that probability should then jump up to 1/3 that it hits the non-“here” square, or 2/3 that it hits a“here” square:

P (H | WT) = 99/100 * 2/99        
                          ——————          =  
                              3/100      

                               2
                              —
                               3          

Notice how two of the components of the equation have changed, but they balance one another out to produce the same expected result of 2/3 (4/100 * 1/2 and 99/100 * 2/99 both equal 2/100, which is then divided by the same number). This is because the numbers track the same parts and whole relations, of “here” squares to the whole board, and then of how many of those are defective squares. This is clearly the same in each case and the equation reflects this.

Furthermore, if the previous scenario is repeated, except that the non-”here” square is on a functional square of the board, then there is a 100% chance that a “here” square was hit, as faithfully reflected in the result of the equation:

(99/100 * 3/99)
————————            =         1
          3/100

Bayes’ Theorem is a way to calculate parts and whole relations of probability. It is particularly useful in calculating updates to probability via new pieces of information or evidence that are also probabilistic. Simple scenarios can go a long way toward understanding how the equation works so that it may be more readily used in much more complex cases, for example ones where it is ill advised to draw a graph to solve for having probabilities that extend many decimal places. Yet the simplicity of how all of the parts hang together is always the same!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s