-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_2.Rmd
4050 lines (3386 loc) · 159 KB
/
MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_2.Rmd"
author: "John HHU"
date: "2022-11-05"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
## Course / Unit 2: Conditioning and independence / Unit overview
# 1. Motivation
You build a local communication network consisting
of routers, computers, and some hard-wired links
that connect them.
However, its link has a small probability
of being nonfunctional because of a hardware failure.
Assuming that failures at different lengths
happen independently, how likely is it
that the message from router A will still
be able to reach your computer?
Suppose now that you received notice that one of the routers
is down.
Given this information, how would you
update the probability that the message from router A
will still be able to reach your computer?
By the end of this unit, you will
be able to answer questions of this type
by using the concept of conditional probability
and by giving a precise meaning to the concept
of independent failures.
# 2. Unit 2 overview
In this unit we introduce the concepts of conditioning and independence. Conditioning leads to revised (“conditional") probabilities that take into account partial information on the outcome of a probabilistic experiment. Conditioning is a very useful tool that allows us to “divide and conquer" complex problems. Independence is used to model situations involving non-interacting probabilistic phenomena and also plays an important role in building complex models from more elementary ones.
In the first lecture, we introduced probabilities as a
way of describing our beliefs about the likelihood that a
given event will occur.
But our beliefs will in general depend on the
information that we have.
Taking into account new information leads us to
consider so-called conditional probabilities.
These are revised probabilities that take into
account the new information.
Conditional probabilities are very useful whenever we want
to break up a model into simpler pieces using a divide
and conquer strategy.
This is done using certain tools that we will develop and
which we will keep applying throughout this course in
different guises.
They are also the foundation of the field of inference.
And we will see how they arise in that context.
Then, in the second lecture of this unit, we will consider a
special case where one event does not convey useful
information about another, a situation that we call
independence.
Independence usually describes a situation where the
occurrence or non-occurrence of different events is
determined by factors that are completely unrelated.
Independence is what allows us to build complex models out of
simple ones.
This is because it is often the case that a complex system
is made up of several components that are affected
by unrelated, that is, independent sources of
randomness.
And so with the tools to be developed in this unit, we
will be ready to calculate probabilities in fairly
complex probabilistic models.
## Course / Unit 2: Conditioning and independence / Lec. 2: Conditioning and Bayes' rule
# 1. Lecture 2 overview and slides
This lecture sequence introduces conditional probabilities and three basic tools: the multiplication rule, the total probability theorem, and Bayes' rule.
Suppose I look at the registry of residents of my town and
pick a person at random.
What is the probability that this person is
under 18 years of age?
The answer is about 25%.
Suppose now that I tell you that this person is married.
Will you give the same answer?
Of course not.
The probability of being less than 18 years
old is now much smaller.
What happened here?
We started with some initial probabilities that reflect
what we know or believe about the world.
But we then acquired some additional
knowledge, some new evidence--
for example, about this person's family situation.
This new knowledge should cause our beliefs to change,
and the original probabilities must be replaced with new
probabilities that take into account the new information.
These revised probabilities are what we call conditional
probabilities.
And this is the subject of this lecture.
We will start with a formal definition of conditional
probabilities together with the motivation behind this
particular definition.
We will then proceed to develop three tools that rely
on conditional probabilities, including the Bayes rule,
which provides a systematic way for incorporating new
evidence into a probability model.
The three tools that we introduce in this lecture
involve very simple and elementary mathematical
formulas, yet they encapsulate some very powerful ideas.
It is not an exaggeration to say that much of this class
will revolve around the repeated application of
variations of these three tools to increasingly
complicated situations.
In particular, the Bayes rule is the foundation for the
field of inference.
It is a guide on how to process data and make
inferences about unobserved quantities or phenomena.
As such, it is a tool that is used all the time, all over
science and engineering.
Printable transcript available here.
https://courses.edx.org/assets/courseware/v1/d967d5ec3112ed1d5f2c25ef76b2e0e9/asset-v1:MITx+6.431x+2T2022+type@asset+block/transcripts_L02-Overview.pdf
Lecture slides: [clean] [annotated]
https://courses.edx.org/assets/courseware/v1/fbd43b5de350748c166d3590b6ab2806/asset-v1:MITx+6.431x+2T2022+type@asset+block/lectureslides_L02Cleanslides.pdf
https://courses.edx.org/assets/courseware/v1/7e33e8ca9cfaf01b34ebcfd3e463735a/asset-v1:MITx+6.431x+2T2022+type@asset+block/lectureslides_L02annotatedslides.pdf
The same material, in live lecture hall format, can be found here and here.
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-041-probabilistic-systems-analysis-and-applied-probability-fall-2010/video-lectures/lecture-2-conditioning-and-bayes-rule/
http://www.youtube.com/watch?v=TluTv5V0RmE
More information is given in the text:
Conditional probability: Section 1.3
https://courses.edx.org/courses/course-v1:MITx+6.431x+2T2022/pdfbook/0/chapter/1/5
Total probability theorem and Bayes' rule: Section 1.4
https://courses.edx.org/courses/course-v1:MITx+6.431x+2T2022/pdfbook/0/chapter/1/5
# 2. Conditional probabilities



Conditional probabilities are probabilities associated with
a revised model that takes into account some additional
information about the outcome of a probabilistic experiment.
The question is how to carry out this
revision of our model.
We will give a mathematical definition of conditional
probabilities, but first let us motivate this definition by
examining a simple concrete example.
Consider a probability model with 12 equally likely
possible outcomes, and so each one of them has probability
equal to 1/12.
We will focus on two particular events, event A and
B, two subsets of the sample space.
Event A has five elements, so its probability is 5/12, and
event B has six elements, so it has probability 6/12.
Suppose now that someone tells you that event B has occurred,
but tells you nothing more about the outcome.
How should the model change?
First, those outcomes that are outside event B
are no longer possible.
So we can either eliminate them, as was done in this
picture, or we might keep them in the picture but assign them
0 probability, so that they cannot occur.
How about the outcomes inside the event B?
So we're told that one of these has occurred.
Now these 6 outcomes inside the event B were equally
likely in the original model, and there is no reason to
change their relative probabilities.
So they should remain equally likely in revised model as
well, so each one of them should have now probability
1/6 since there's 6 of them.
And this is our revised model, the
conditional probability law.
0 probability to outcomes outside B, and probability 1/6
to each one of the outcomes that is inside the event B.
Let us write now this down mathematically.
We will use this notation to describe the conditional
probability of an event A given that some other event B
is known to have occurred.
We read this expression as probability of A given B. So
what are these conditional probabilities in our example?
So in the new model, where these outcomes are equally
likely, we know that event A can occur in
two different ways.
Each one of them has probability 1/6.
So the probability of event A is 2/6 which
is the same as 1/3.
How about event B. Well, B consists of 6 possible
outcomes each with probability 1/6.
So event B in this revised model should have probability
equal to 1.
Of course, this is just saying the obvious.
Given that we already know that B has occurred, the
probability that B occurs in this new model
should be equal to 1.
How about now, if the sample space does not consist of
equally likely outcomes, but instead we're given the
probabilities of different pieces of the sample space, as
in this example.
Notice here that the probabilities are consistent
with what was used in the original example.
So this part of A that lies outside B has probability
3/12, but in this case I'm not telling you how that
probability is made up.
I'm not telling you that it consists of 3
equally likely outcomes.
So all I'm telling you is that the collective probability in
this region is 3/12.
The total probability of A is, again, 5/12 as before.
The total probability of B is 2 plus 4 equals
6/12, exactly as before.
So it's a sort of similar situation as before.
How should we revise our probabilities and create--
construct--
conditional probabilities once we are told
that event B has occurred?
First, this relation should remain true.
Once we are told that B has occurred, then B is certain to
occur, so it should have conditional
probability equal to 1.
How about the conditional probability of A given that B
has occurred?
Well, we can reason as follows.
In the original model, and if we just look inside event B,
those outcomes that make event A happen had a collective
probability which was 1/3 of the total probability assigned
to B. So out of the overall probability assigned to B, 1/3
of that probability corresponds to outcomes in
which event A is happening.
So therefore, if I tell you that B has occurred, I should
assign probability equal to 1/3 that event A is
also going to happen.
So that, given that B happened, the conditional
probability of A given B should be equal to 1/3.
By now, we should be satisfied that this approach is a
reasonable way of constructing conditional probabilities.
But now let us translate our reasoning into a formula.
So we wish to come up with a formula that gives us the
conditional probability of an event given another event.
The particular formula that captures our way of thinking,
as motivated before, is the following.
Out of the total probability assigned to B--
which is this--
we ask the question, which fraction of that probability
is assigned to outcomes under which event A also happens?
So we are living inside event B, but within that event, we
look at those outcomes for which event A also happens.
So this is the intersection of A and B. And we ask, out of
the total probability of B, what fraction of that
probability is allocated to that intersection of A with B?
So this formula, this definition, captures our
intuition of what we did before to construct
conditional probabilities in our particular example.
Let us check that the definition indeed does what
it's supposed to do.
In this example, the probability of the
intersection was 2/12 and the total probability of B was
6/12, which gives us 1/3, which is the answer that we
had gotten intuitively a little earlier.
At this point, let me also make a comment that this
definition of conditional probabilities makes sense only
if we do not attempt to divide by zero.
That this, only if the event B on which we're conditioning,
has positive probability.
If B, if an event B has 0 probability, then conditional
probabilities given B will be left undefined.
And one final comment.
This is a definition.
It's not a theorem.
What does that mean?
It means that there is no question whether this equality
is correct or not.
It's just a definition.
There's no issue of correctness.
The earlier argument that we gave was just a motivation of
the definition.
We tried to figure out what the definition should be if we
want to have a certain intuitive and meaningful
interpretation of the conditional probabilities.
Let us now continue with a simple example.
# 3. Exercise: Conditional probabilities

Discussion
Topic: Unit 2: Conditioning and independence:Lec. 2: Conditioning and Bayes' rule / 3. Exercise: Conditional probabilities
Not quite understand the difference between these two questions
question posted 8 days ago by hjiangfighting
what is the difference between "conditional probability law on B" and "conditional probability law on Omega"? Can anyone explain? Thank you!
This post is visible to everyone.
2 responses
Abolfazl66
8 days ago
Conditional probability law on B is defined for all events inside B given B already happened; whereas conditional probability law on Omega is defined for all events that are either inside B or B_complement given that fact that B already happened.
OK! I see, so that means the probability of the conditional probability law on omega, given that B occurred is 1. I think I know what is going on! Thank you!!
posted 7 days ago by hjiangfighting
you are right that P(omiga|B) = 1. however probability law on omiga and probability of the set, the sample space omiga are two different things. the latter is always 1 because that is the definition of sample space.
the probability law on any set (including a universal set like sample space) specifies how the probability is distributed within the elements of that particular set. we are not talking about the overall probability of as set here, we are talking about the relationships between probabilities of different elements inside that set
posted 7 days ago by lockedarte
thank you lockedarte for clarify! this is very important and useful to me!
probability law on any set (including a universal set like sample space) specifies how the probability is distributed within the elements of that particular set
posted 7 days ago by hjiangfighting
Add a comment
Your question or idea (required)
mueed27
about an hour ago
Dear Abolfazl66, Your comment helped me answer the question correctly in 1st attempt. Otherwise I was going to get carried away....
So Thank you.
Man this Professors course is very tricky.
[][discrete uniform probability]
In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n. Wikipedia
# 4. A die roll example
This is a simple example where we want to just apply the
formula for conditional probabilities
and see what we get.
The example involves a four-sided die, if you can
imagine such an object, which we roll twice, and we record
the first roll, and the second roll.
So there are 16 possible outcomes.
We assume to keep things simple, that each one of those
16 possible outcomes, each one of them has the same
probability, so each outcome has the probability 1/16.
Let us consider now a particular event B on which
we're going to condition.
This is the event under which the smaller of the two die
rolls is equal to 2, which means that one of the dice
must have resulted in two, and the other die has resulted in
something which is 2 or larger.
So this can happen in multiple ways.
And here are the different ways that it can happen.
So at 2, 2, or 2, 3, or 2, 4; then a 3, 2 and a 4, 2.
All of these are outcomes in which one of the dice has a
value equal to 2, and the other die
is at least as large.
So we condition on this event.
This results in a conditional model where each one of those
five outcomes are equally likely since they used to be
equally likely in the original model.
Now let's look at this quantity.
The maximum of the two die rolls--
that is, the largest of the results.
And let us try to calculate the following quantity--
the conditional probability that the maximum is equal to 1
given that the minimum is equal to 2.
So this is the conditional probability of
this particular outcome.
Well, this particular outcome cannot happen.
If I tell you that the smaller number is 2, then the larger
number cannot be equal to 1, so this outcome is impossible,
and therefore this conditional probability is equal to 0.
Let's do something a little more interesting.
Let us now look at the conditional probability that
the maximum is equal to 3 given the information that
event B has occurred.
It's best to draw a picture and see what that event
corresponds to.
M is equal to 3--
the maximum is equal to 3--
if one of the dice resulted in a 3, and the other die
resulted in something that's 3 or less.
So this event here corresponds to the blue
region in this diagram.
Now let us try to calculate the conditional probability by
just following the definition.
The conditional probability of one event given another is the
probability that both of them--
both of the two events--
occur, divided by the probability of the
conditioning event.
That is, out of the total probability in the
conditioning event, we ask, what fraction of that
probability is assigned to outcomes in which the event of
interest is also happening?
So what is this event?
The maximum is equal to 3, which is the blue event.
And simultaneously, the red event is happening.
These two events intersect only in two places.
This is the intersection of the two events.
And the probability of that intersection is 2 out of 16,
since there's 16 outcomes and that event happens only with
two particular outcomes.
So this gives us 2/16 in the numerator.
How about the denominator?
Event B consists of a total of five possible outcomes.
Each one has probability 1/16, so this is 5/16, so the final
answer is 2/5.
We could have gotten that same answer in a simple and perhaps
more intuitive way.
In the original model, all outcomes were equally likely.
Therefore, in the conditional model, the five outcomes that
belong to B should also be equally likely.
Out of those five, there's two of them that make the event of
interest to occur.
So given that we live in B, there's two ways out of five
that the event of interest will materialize.
So the event of interest has
conditional probability [equal to]
2/5.
# 5. Exercise: Conditional probabilities in a continuous model


# 6. Conditional probabilities obey the same axioms

I now want to emphasize an important point.
Conditional probabilities are just the same as ordinary
probabilities applied to a different situation.
They do not taste or smell or behave any differently than
ordinary probabilities.
What do I mean by that?
I mean that they satisfy the usual probability axioms.
For example, ordinary probabilities must also be
non-negative.
Is this true for conditional probabilities?
Of course it is true, because conditional probabilities are
defined as a ratio of two probabilities.
Probabilities are non-negative.
So the ratio will also be non-negative, of course as
long as it is well-defined.
And here we need to remember that we only talk about
conditional probabilities when we condition on an event that
itself has positive probability.
How about another axiom?
What is the probability of the entire sample space,
given the event B?
Let's check it out.
By definition, the conditional probability is the probability
of the intersection of the two events involved divided by the
probability of the conditioning event.
Now, what is the intersection of omega with B?
B is a subset of omega.
So when we intersect the two sets, we're
left just with B itself.
So the numerator becomes the probability of B. We're
dividing by the probability of B, and so the
answer is equal to 1.
So indeed, the sample space has unit probability, even
under the conditional model.
Now, remember that when we condition on an event B, we
could still work with the original sample space.
However, possible outcomes that do not belong to B are
considered impossible, so we might as well think of B
itself as being our sample space.
If we proceed like that and think now of B as being our
new sample space, what is the probability of this new sample
space in the conditional model?
Let's apply the definition once more.
It's the probability of the intersection of the two events
involved, B intersection B, divided by the probability of
the conditioning event.
What is the numerator?
The intersection of B with itself is just B, so the
numerator is the probability of B. We're dividing by the
probability of B. So the answer is, again, 1.
Finally, we need to check the additivity axiom.
Recall what the additivity axiom says.
If we have two events, two subsets of the sample space
that are disjoint, then the probability of their union is
equal to the sum of their individual probabilities.
Is this going to be the case if we now condition on a
certain event?
What we want to prove is the following statement.
If we take two events that are disjoint, they have empty
intersection, then the probability of the union is
the sum of their individual probabilities, but where now
the probabilities that we're employing are the conditional
probabilities, given the event B. So let us verify whether
this relation, this fact is correct or not.
Let us take this quantity and use the
definition to write it out.
By definition, this conditional probability is the
probability of the intersection of the first
event of interest, the one that appears on this side of
the conditioning, intersection with the event on which we are
conditioning.
And then we divide by the probability of the
conditioning event, B. Now, let's look at this quantity,
what is it?
We're taking the union of A with C, and then intersect it
with B. This union consists of these two pieces.
When we intersect with B, what is left is
these two pieces here.
So A union C intersected with B is the union of two pieces.
One piece is A intersection B, this piece here.
And another piece, which is C intersection B, this is the
second piece here.
So here we basically used a set theoretic identity.
And now we divide by the same [denominator]
as before.
And now let us continue.
Here's an interesting observation.
The events A and C are disjoint.
The piece of A that also belongs in B, therefore, is
disjoint from the piece of C that also belongs to B.
Therefore, this set here and that set here are disjoint.
Since they are disjoint, the probability of their union has
to be equal to the sum of their individual
probabilities.
So here we're using the additivity axiom on the
original probabilities to break this probability up into
two pieces.
And now we observe that here we have the ratio of an
intersection by the probability of B. This is just
the conditional probability of A given B using the definition
of conditional probabilities.
And the second part is the conditional probability of C
given B, where, again, we're using the definition of
conditional probabilities.
So we have indeed checked that this additivity property is
true for the case of conditional probabilities when
we consider two disjoint events.
Now, we could repeat the same derivation and verify that it
is also true for the case of a disjoint union, of finitely
many events, or even for countably
many disjoint events.
So we do have finite and countable additivity.
We're not proving it, but the argument is exactly the same
as for the case of two events.
So conditional probabilities do satisfy all of the standard
axioms of probability theory.
So conditional probabilities are just like ordinary
probabilities.
This actually has a very important implication.
Since conditional probabilities satisfy all of
the probability axioms, any formula or theorem that we
ever derive for ordinary probabilities will remain true
for conditional probabilities as well.
# 7. A radar example: models based on conditional probabilities and three basic tools

Let us now examine what conditional
probabilities are good for.
We have already discussed that they are used to revise a
model when we get new information, but there is
another way in which they arise.
We can use conditional probabilities to build a
multi-stage model of a probabilistic experiment.
We will illustrate this through an example involving
the detection of an object up in the sky by a radar.
We will keep our example very simple.
On the other hand, it turns out to have all the basic
elements of a real-world model.
So, we are looking up in the sky, and either there's an
airplane flying up there or not.
Let us call Event A the event that an airplane is indeed
flying up there, and we have two possibilities.
Either Event A occurs, or the complement of A occurs, in
which case nothing is flying up there.
At this point, we can also assign some probabilities to
these two possibilities.
Let us say that through prior experience, perhaps, or some
other knowledge, we know that the probability that something
is indeed flying up there is 5% and with probability 95%
nothing is flying.
Now, we also have a radar that looks up there, and there are
two things that can happen.
Either something registers on the radar
screen or nothing registers.
Of course, if it's a good radar, probably Event B will
tend to go together with Event A. But it's also possible that
the radar will make some mistakes.
And so we have various possibilities.
If there's a plane up there, it's possible that the radar
will detect it, in which case Event B will also happen.
But it's also conceivable that the radar will not detect it,
in which case we have a so-called miss.
And so a plane is flying up there, but the radar missed
it, did not detect it.
Another possibility is that nothing is flying up there,
but the radar does detect something, and this is a
situation that's called a false alarm.
Finally, there's the possibility that nothing is
flying up there, and the radar did not see anything either.
Now, let us focus on this particular situation.
Suppose that Event A has occurred.
So we are living inside this particular universe.
In this universe, there are two possibilities, and we can
assign probabilities to these two possibilities.
So let's say that if something is flying up there, our radar
will find it with probability 99%, but will also miss it
with probability 1%.
What's the meaning of this number, 99%?
Well, this is a probability that applies to a situation
where an airplane is up there.
So it is really a conditional probability.
It's the conditional probability that we will
detect something, the radar will detect the plane, given
that the plane is already flying up there.
And similarly, this 1% can be thought of as the conditional
probability that the complement of B occurs, so the
radar doesn't see anything, given that there is a plane up
in the sky.
We can assign similar probabilities
under the other scenario.
If there is no plane, there is a probability that there will
be a false alarm, and there is a probability that the radar
will not see anything.
Those four numbers here are, in essence, the
specs of our radar.
They describe how the radar behaves in a world in which an
airplane has been placed in the sky, and some other
numbers that describe how the radar behaves in a world where
nothing is flying up in the sky.
So we have described various probabilistic properties of
our model, but is it a complete model?
Can we calculate anything that we might wish to calculate?
Let us look at this question.
Can we calculate the probability that
both A and B occur?
It's this particular scenario here.
How can we calculate it?
Well, let us remember the definition of conditional
probabilities.
The conditional probability of an event given another event
is the probability of their intersection divided by the
probability of the conditioning event.
But this doesn't quite help us because if we try to calculate
the numerator, we do not have the value of the probability
of A given B. We have the value of the probability of B
given A. What can we do?
Well, we notice that we can use this definition of
conditional probabilities, but use it in the reverse
direction, interchanging the roles of A and B. If we
interchange the roles of A and B, our definition leads to the
following expression.
The conditional probability of B given A is the probability
that both events occur divided by the probability, again, of
the conditioning event.
Therefore, the probability that A and B occur is equal to
the probability that A occurs times the conditional
probability that B occurs given that A occurred.
And in our example, this is 0.05 times the conditional
probability that B occurs, which is 0.99.
So we can calculate the probability of this particular
event by multiplying probabilities and conditional
probabilities along the path in this tree diagram that
leads us here.
And we can do the same for any other leaf in this diagram.
So for example, the probability that this happens
is going to be the probability of this event times the
conditional probability of B complement given that A
complement has occurred.
How about a different question?
What is the probability, the total probability, that the
radar sees something?
Let us try to identify this event.
The radar can see something under two scenarios.
There's the scenario where there is a plane up in the sky
and the radar sees it.
And there's another scenario where nothing is up in the
sky, but the radar thinks that it sees something.
So these two possibilities together make up the event B.
And so to calculate the probability of B, we need to
add the probabilities of these two events.
For the first event, we already calculated it.
It's 0.05 times 0.90.
For the second possibility, we need to do a similar
calculation.
The probability that this occurs is equal to 0.95 times
the conditional probability of B occurring under the scenario
where A complement has occurred, and this is 0.1.
If we add those two numbers together, the answer turns out
to be 0.1445.
Finally, a last question, which is perhaps the most
interesting one.
Suppose that the radar registered something.
What is the probability that there is an
airplane out there?
How do we do this calculation?
Well, we can start from the definition of the conditional
probability of A given B, and note that we already have in
our hands both the numerator and the denominator.
So the numerator is this number, 0.05 times 0.99, and
the denominator is 0.1445, and we can use our calculators to
see that the answer is approximately 0.34.
So there is a 34% probability that an airplane is there
given that the radar has seen or thinks
that it sees something.
So the numerical value of this answer is somewhat interesting
because it's pretty small.
Even though we have a very good radar that tells us the
right thing 99% of the time under one scenario and 90%
under the other scenario.
Despite that, given that the radar has seen something, this
is not really convincing or compelling evidence that there
is an airplane up there.
The probability that there's an airplane up there is only
34% in a situation where the radar thinks
that it has seen something.
So in the next few segments, we are going to revisit these
three calculations and see how they can generalize.
In fact, a large part of what is to happen in the remainder
of this class will be elaboration
on these three ideas.
They are three types of calculations that will show up
over and over, of course, in more complicated forms, but
the basic ideas are essentially captured in this
simple example.
# 8. The multiplication rule

As promised, we will now start developing generalizations of
the different calculations that we carried out in the
context of the radar example.
The first kind of calculation that we carried out goes under
the name of the multiplication rule.
And it goes as follows.
Our starting point is the definition of conditional
probabilities.
The conditional probability of A given another event, B, is
the probability that both events have occurred divided
by the probability of the conditioning event.
We now take the denominator term and send it to the other
side of this equality to obtain this relation, which we
can interpret as follows.
The probability that two events occur is equal to the
probability that a first event occurs, event B in this case,
times the conditional probability that the second
event, event A, occurs, given that event B has occurred.
Now, out of the two events, A and B, we're of course free to
choose which one we call the first event and which one we
call the second event.
So the probability of the two events happening is also equal
to an expression of this form, the probability that A occurs
times the conditional probability that B occurs,
given that A has occurred.
We used this formula in the context of a tree diagram.
And we used it to calculate the probability of a leaf of
this tree by multiplying the probability of taking this
branch, the probability that A occurs, times the conditional
probability of taking this branch, the probability that
event B also occurs given that event A has occurred.
How do we generalize this calculation?
Consider a situation in which the experiment has an
additional third stage that has to do with another event,
C, that may or may not occur.
For example, if we have arrived here, A and B have
both occurred.
And then C also occurs, then we reach this particular leaf
of the tree.
Or there could be other scenarios.
For example, it could be the case that A did not occur.
Then event B occurred, and finally, event C did not
occur, in which case we end up at this particular leaf.
What is the probability of this scenario happening?
Let us try to do a calculation similar to the one that we
used for the case of two events.
However, we need to deal here with three events.
What should we do?
Well, we look at the intersection of these three
events and think of it as the intersection of a composite
event, A complement intersection B, then
intersected with the event C complement.
Clearly, you can form the intersection of three events
by first taking the intersection of two of them
and then intersecting with a third.
After we group things this way, we're dealing with the
probability of two events happening, this composite
event and this ordinary event.
And the probability of two events happening is equal to
the probability that the first event happens, and then the
probability that the second event happens, given that the
first one has happened.
Can we simplify this even further?
Yes.
The first term is the probability
of two events happening.
So it can be simplified further as the probability
that A complement occurs times the conditional probability
that B occurs, given that A complement has occurred.
And then we carry over the last term
exactly the way it is.
The conclusion is that we can calculate the probability of
this leaf by multiplying the probability of the first
branch times the conditional probability of the second
branch, given that the first branch was taken, and then
finally multiply with the probability of the third
branch, which is the probability that C complement
occurs, given that A complement and B
have already occurred.
In other words, we can calculate the probability of a
leaf by just multiplying the probabilities of the different
branches involved and where we use conditional probabilities
for the intermediate branches.
At this point, you can use your imagination to see that
such a formula should also be valid for the case of more
than three events.
The probability that a bunch of events all occur should be
the probability of the first event times a number of
factors, each corresponding to a branch in a
tree of this kind.
In particular, the probability that events A1, A2, up to An
all occur is going to be the probability that the first
event occurs times a product of conditional probabilities
that the i-th event occurs, given that all of the previous
events have already occurred.
And we obtain a term of this kind for every event, Ai,
after the first one, so this product ranges from 2 up to n.
And this is the most general version of the multiplication
rule and allows you to calculate the probability of
several events happening by multiplying probabilities and
conditional probabilities.
# 9. Exercise: The multiplication rule




# 10. Total probability theorem

Let us now revisit the second calculation that we carried
out in the context of our earlier example.
In that example, we calculated the total probability of an
event that can occur under different scenarios.
And it involves the powerful idea of divide and conquer
where we break up complex situations
into simpler pieces.
Here is what is involved.
We have our sample space.
And our sample space is partitioned into a number of
subsets or events.
In this picture we take that number to be 3, so we'll have
it partitioned into three possible scenarios.
It is a partition which means that these events cover the
entire sample, space and they're
disjoint from each other.
For each one of the scenarios we're given their
probabilities.
If you prefer, you can also draw this situation
in terms of a tree.
There are three different scenarios that can happen.
We're interested in a particular event, B. That
event B can happen in three different ways.
It can happen under scenario one, under scenario two, or
under scenario three.
And this corresponds to these particular sub-events.
So for example, this is the event
where scenario A1 happens.
And then event B happens as well.
In terms of a tree diagram, the
picture becomes as follows.
If scenario A1 materializes, event B may occur or event B