-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_5.Rmd
5412 lines (4849 loc) · 230 KB
/
MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_5.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_5.Rmd"
author: "John HHU"
date: "2022-11-05"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
## Course / Unit 5: Continuous random variables / Lec. 9: Conditioning on an event; Multiple r.v.'s
# 1. Lecture 9 overview and slides

In this lecture, we continue our discussion of continuous
random variables.
We will start by bringing conditioning into the picture
and discussing how the PDF of a continuous random variable
changes when we are told that a certain event has occurred.
We will take the occasion to develop counterparts of some
of the tools that we developed in the discrete case such as
the total probability and total expectation theorems.
In fact, we will push the analogy even further.
In the discrete case, we looked at the geometric PMF in
some detail and recognized an important memorylessness
property that it possesses.
In the continuous case, there is an entirely analogous story
that we will follow, this time involving the exponential
distribution which has a similar
memorylessness property.
We will then move to a second theme which is how to describe
the joint distribution of multiple random variables.
We did this in the discrete case by
introducing joint PMFs.
In the continuous case, we can do the same using
appropriately defined joint PDFs and by
replacing sums by integrals.
As usual, we will illustrate the various concepts through
some simple examples and also take the opportunity to
introduce some additional concepts such as mixed random
variables and the joint cumulative
distribution function.
# 2. Conditioning a continuous random variable on an event
















In this segment, we pursue two themes. Every concept has a conditional counterpart. [][We know about PDFs, but if we live in a conditional universe, then we deal with conditional probabilities. And we need to use conditional PDFs. The second theme is that discrete formulas have continuous counterparts in which summations get replaced by integrals, and PMFs by PDFs. ]
So let us recall the definition of a conditional
PMF, which is just the same as an ordinary PMF but applied to
a conditional universe.
In the same spirit, we can start with a PDF, which we can
interpret, for example, in terms of probabilities of
small intervals.
If we move to a conditional model in which event A is
known to have occurred, probabilities of small
intervals will then be determined by a conditional
PDF, which we denote in this manner.
Of course, we need to assume throughout that the
probability of the conditioning event is positive
so that conditional probabilities are
well-defined.
Let us now push the analogy further.
We can use a PMF to calculate probabilities.
The probability that X takes [a] value in a certain set is
the sum of the probabilities of all the possible
values in that set.
And a similar formula is true if we're dealing with a
conditional model.
Now, in the continuous case, we use a PDF to calculate the
probability that X takes values in a certain set.
And by analogy, we use a conditional PDF to calculate
conditional probabilities.
We can take this relation here to be the definition of a
conditional PDF.
So a conditional PDF is a function that allows us to
calculate probabilities by integrating this function over
the event or set of interest.
Of course, probabilities need to sum to 1.
This is true in the discrete setting.
And by analogy, it should also be true in
the continuous setting.
This is just an ordinary PDF, except that it applies to a
model in which event A is known to have occurred.
But it still is a legitimate PDF.
It has to be non-negative, of course.
But also, it needs to integrate to 1.
When we condition on an event and without any further
assumption, there's not much we can say about the form of
the conditional PDF.
However, if we condition on an event of a special kind, that
X takes values in a certain set, then we can actually
write down a formula.
So let us start with a random variable X that has a given
PDF, as in this diagram.
And suppose that A is a subset of the real line, for example,
this subset here.
What is the form of the conditional PDF?
We start with the interpretation of PDFs and
conditional PDFs in terms of
probabilities of small intervals.
The probability that X lies in a small interval is equal to
the value of the PDF somewhere in that interval times the
length of the interval.
And if we're dealing with conditional probabilities,
then we use the corresponding conditional PDF.
To find the form of the conditional PDF, we will work
in terms of the left-hand side in this equation and try to
rewrite it.
Let us distinguish two cases.
Suppose that little X lies somewhere out here, and we
want to evaluate the conditional PDF at that point.
So trying to evaluate this expression, we consider a
small interval from little x to little x plus delta.
And now, let us write the definition of a conditional
probability.
A conditional probability, by definition, is equal to the
probability that both events occur divided by the
probability of the conditioning event.
Now, because the set A and this little interval are
disjoint, these two events cannot occur simultaneously.
So the numerator here is going to be 0.
And this will imply that the conditional PDF is
also going to be 0.
This, of course, makes sense.
Conditioned on the event that X took values in this set,
values of X out here cannot occur.
And therefore, the conditional density out here
should also be 0.
So the conditional PDF is 0 outside the set A. And this
takes care of one case.
Now, the second case to consider is when little x lies
somewhere inside here inside the set A. And in that case,
our little interval from little x to little x plus
delta might have this form.
In this case, the intersection of these two events, that X
lies in the big set and X lies in the small set, the
intersection of these two events is the event that X
lies in the small set.
So the numerator simplifies just to the probability that
the random variable X takes values in the interval from
little x to little x plus delta.
And then we rewrite the denominator.
Now, the numerator is just an ordinary probability that the
random variable takes values inside a small interval.
And by our interpretation of PDFs, this is approximately
equal to the PDF evaluated somewhere in that small
interval times delta.
At this point, we notice that we have deltas on both sides
of this equation.
By cancelling this delta with that delta, we finally end up
with a relation that the conditional PDF should be
equal to this expression that we have here.
So to summarize, we have shown a formula for
the conditional PDF.
The conditional PDF is 0 for those values of X that cannot
occur given the information that we are given, namely that
X takes values at that interval.
But inside this interval, the conditional PDF has a form
which is proportional to the unconditional PDF.
But it is scaled by a certain constant.
So in terms of a picture, we might have
something like this.
And so this green diagram is the form of
the conditional PDF.
The particular factor that we have here in the denominator
is exactly that factor that is required, the scaling factor
that is required so that the total area under the green
curve, under the conditional PDF is equal to 1.
So we see once more the familiar theme, that
conditional probabilities maintain the same relative
sizes as the unconditional probabilities.
And the same is true for conditional PMFs or PDFs,
keeping the same shape as the unconditional ones, except
that they are re-scaled so that the total probability
under a conditional PDF is equal to 1.
We can now continue the same story and revisit everything
else that we had done for discrete random variables.
For example, we have the expectation of a discrete
random variable and the corresponding conditional
expectation, which is just the same kind of object, except
that we now rely on conditional probabilities.
Similarly, we can take the definition of the expectation
for the continuous case and define a conditional
expectation in the same manner, except that we now
rely on the conditional PDF.
So this formula here is the definition of the conditional
expectation of a continuous random variable given a
particular event.
We have a similar situation with the expected value rule,
which we have already seen for discrete random variables in
both of the unconditional and in the conditional setting.
We have a similar formula for the continuous case.
And at this point, you can guess the form that the
formula will take in the
continuous conditional setting.
This is the expected value rule in the conditional
setting, and it is proved exactly the same way as for
the unconditional continuous setting, except that here in
the proof, we need to work with conditional probabilities
and conditional PDFs, instead of the unconditional ones.
So to summarize, there is nothing really different when
we condition on an event in the continuous case compared
to the discrete case.
We just replace summations with integrations.
And we replace PMFs by PDFs.
# 3. Exercise: A conditional PDF




# 4. Conditioning example






Let us now look at an example.
Consider a piecewise constant PDF of the form
shown in this diagram.
Suppose that we condition on the event that x lies between
a plus b over 2, which is here, and b.
So we're conditioning on x lying in this
particular red interval.
What is the conditional PDF?
The conditional PDF is going to be 0 outside of the
interval on which we are conditioning.
So the conditional PDF is 0 in this range, and also, it is 0
in this range.
Within the range of values of x that are allowed given the
conditioning information, the conditional PDF must retain
the same shape as the unconditional one.
And the unconditional one is constant in that range.
So the conditional PDF will also be a constant.
Because in this case the length of this interval is
half of the distance between b minus a--
so the length of this interval is b minus a over 2--
in order for the area under this curve to be equal to 1,
it means that the height of this curve has to be equal to
2 over b minus a.
The conditional expectation in this example is just the
ordinary expectation but applied to
the conditional model.
Since the conditional PDF is uniform, the conditional
expectation will be the midpoint of the range of this
conditional PDF.
And in this case, the midpoint is 1/2 the left end of the
interval, which is a plus b over 2 plus 1/2 the right end
point of the interval, which is b.
And so this evaluates to 1/4 times a plus 3/4 times b.
We can also calculate the expected value of X squared in
the conditional model using the expected value rule.
According to the expected value rule, it's going to be
an integral of the conditional PDF, which is 2 over b minus a
multiplied by x squared.
And this integral runs over the range where the
conditional PDF is actually non-zero.
So it's an integral that ranges from a plus b
over 2 up to b.
And this an integral which is not too hard to evaluate, and
there's no point in carrying out the evaluation to the end.
# 5. Memorylessness of the exponential PDF













We now revisit the exponential random variable that we
introduced earlier and develop some intuition about what it
represents.
We do this by establishing a memorylessness property,
similar to the one that we established earlier in the
discrete case for the geometric PMF.
Suppose that it is known that light bulbs have a lifetime
until they burn out, which is an
exponential random variable.
You go to a store, and you are given two choices, to buy a
new light bulb, or to buy a used light bulb that has been
working for some time and has not yet burned out.
Which one should you take?
We want to approach this question mathematically.
So let us denote by capital T the lifetime of the bulb.
So time starts at time 0, and then at some random time that
we denote by capital T, the light bulb will burn out.
And we assume that this random variable is exponential with
some given parameter lambda.
In one of our earlier calculations, we have shown
that the probability that capital T is larger than some
value little x falls exponentially
with that value x.
We are now told that a certain light bulb has already been
operating for t time units without failing.
So we know that the value of the random variable capital T
is larger than little t.
We are interested in how much longer the light bulb will be
operating, and so we look at capital X, which is the
remaining lifetime from the current time until the light
bulb burns out.
So capital X is this particular random variable
here, and it is equal to capital T minus little t.
Let us now calculate the probability that the light
bulb lasts for another little x time units.
That is, that this random variable, capital X, is at
least as large as some little x.
That is, that the light bulb remains alive
until time t plus x.
We use the definition of conditional probabilities to
write this expression as the probability that capital X is
bigger than little x.
On the other hand, capital X is T minus t, so we
write it this way--
T minus t is bigger than little x, and also that T is
bigger than little t, divided by the probability of the
conditioning event.
Just write this event in a cleaner form, capital T being
larger than little t plus x and being larger than little
t, again divided by the probability of the
conditioning event.
And now notice that capital T will be greater than little t
and also greater than little t plus x, that is, capital T is
larger than this number and this number, if and only if it
is larger than this second number here.
So in other words, the intersection of these two
events is just this event here, that capital T is larger
than little t plus x.
Now, we can use the formula for the probability that
capital T is larger than something.
We apply this formula, except that instead of little x, we
have t plus x.
And so here we have e to the minus lambda t plus x divided
by the probability that capital T is bigger than t.
So we use this formula, but with little t in the place of
little x, and we obtain e to the minus lambda t.
We have a cancellation, and we're left with e to the minus
lambda x, which is a final answer in this calculation.
What do we observe here?
[][The probability that the used light bulb will live for another x time units is exactly the same as the corresponding probability that the new light bulb will be alive for another x time units]. So new and used light bulbs are described by the same probabilities, and they're probabilistically identical, the same. Differently said, the used light bulb does not remember, and it is not affected by how long it has been running. And this is the memorylessness property of exponential random variables.
Let us now build some additional insights on
exponential random variables.
We have a formula for the density, the PDF.
And from this, we can calculate the probability that
T lies in a small interval.
For example, for a small delta, this probability here
is going to be approximately equal to the density of T
evaluated at 0 times delta, which is lambda times e to the
0, which is 1, times delta.
What if we are told that the light bulb has been alive for
t time units?
What is the probability that it burns out during the next
delta times units?
Since a used but still alive light bulb is
probabilistically identical to a new one, this conditional
probability is the same as this probability here that a
new light bulb burns out in the next delta times units.
And so this is also approximately
equal to lambda delta.
So we see that independently of how long a light bulb has
been alive, during the next delta time units it will have
a lambda delta probability of failing.
One way of thinking about this situation is that the time
interval is split into little intervals of length delta.
And as long as the light bulb is alive, if it is alive at
this point, it will have probability lambda delta of
burning out during the next interval of length delta.
This is like flipping a coin.
Once every delta time steps, there is a probability lambda
delta that there is a success in that coin flip, where
success corresponds to having the light bulb actually burn
down, and the exponential random variable corresponds to
the total time elapsed until the first success.
In this sense, the exponential random variable is a close
analog of the geometric random variable, which was the time
until the first success in a discrete time setting.
This analogy turns out to be the foundation behind the
Poisson process that we will be studying
later in this course.
# 6. Exercise: Memorylessness of the exponential








# 7. Total probability and expectation theorems










We now continue with the development of continuous
analogs of everything we know for the discrete case.
We have already seen a few versions of the total
probability theorem, one version for events and one
version for PMFs.
Let us now develop a continuous analog.
Suppose, as always, that we have a partition of the sample
space into a number of disjoint scenarios.
Three scenarios in this picture.
More generally, n scenarios in these formulas.
Let X be a continuous random variable and let us take B to
be the event that the random variable takes a value less
than or equal to some little x.
By the total probability theorem, this is the
probability of the first scenario times the conditional
probability of this event given that the first scenario
has materialized, and then we have similar terms for the
other scenarios.
Let us now turn this equation into CDF notation.
The left-hand side is what we have defined as the CDF of the
random variable x.
On the right-hand side, what we have is the probability of
the first scenario multiplied, again, by a CDF of the random
variable X. But it is a CDF that applies in a conditional
model where event A1 has occurred.
And so we use this notation to denote the conditional CDF,
the CDF that applies to the conditional universe.
And then we have similar terms for the other scenarios.
Now, we know that the derivative of a CDF is a PDF.
We also know that any general fact, such as this one that
applies to unconditional models will also apply without
change to a conditional model, because a conditional model is
just like any other ordinary probability model.
So let us now take derivatives of both
sides of this equation.
On the left-hand side, we have the derivative of a
CDF, which is a PDF.
And on the right-hand side, we have the probability of the
first scenario, and then the derivative of the conditional
CDF, which has to be the same as the conditional PDF.
So we use here the fact that derivatives of CDFs are PDFs,
and then we have similar terms under the different scenarios.
So we now have a relation between densities.
To interpret this relation, we think as follows.
The probability of falling inside the little interval
around x is determined by the probability of falling inside
that little interval under each one of the different
scenarios and where each scenario is weighted by the
corresponding probability.
Now, we multiply both sides of this equation by x, and then
integrate over all x's.
We do this on the left-hand side.
And similarly, on the right-hand side to obtain a
term of this form.
And we have similar terms corresponding
to the other scenarios.
What do we have here?
On the left-hand side, we have the expected value of x.
On the right-hand side, we have this probability
multiplied by the conditional expectation of X given that
scenario A1 has occurred.
And so we obtain a version of the total expectation theorem.
It's exactly the same formula as we had in the discrete
case, except that now X is a continuous random variable.
Let us now look at a simple example that involves a model
with different scenarios.
Bill wakes up in the morning and wants to go to the
supermarket.
There are two scenarios.
With probability one third, a first scenario occurs.
And under that scenario, Bill will go at a time that's
uniformly distributed between 0 and 2 hours from now.
So the conditional PDF of X, in this case, is uniform on
the interval from 0 to 2.
There's a second scenario that Bill will take long nap and
will go later in the day.
That scenario has a probability of 2/3.
And under that case, the conditional PDF of X is going
to be uniform on the range between 6 and 8.
By the total probability theorem for densities, the
density of X, of the random variable--
the time at which he goes to the supermarket--
consists of two pieces.
One piece is a uniform between 0 and 2.
This uniform ordinarily would have a height or 1/2.
On the other hand, it gets weighted by the corresponding
probability, which is 1/3.
So we obtain a piece here that has a height of 1/6.
Under the alternative scenario, the conditional
density is a uniform on the interval between 6 and 8.
This uniform has a height of 1/2 again, but it gets
multiplied by a factor of 2/3.
And this results in a height for this term that we have
here, which is 1/3.
And this is the form of the PDF of the time at which Bill
will go to the supermarket.
We can now finally use the total expectation theorem.
The conditional expectation under the two scenarios can be
found as follows.
Under one scenario, we have a uniform between 0 and 2.
And so the conditional expectation is 1, and it gets
weighted by the corresponding probability, which is 1/3.
Under the second scenario, which has probability 2/3, the
conditional expectation is the midpoint of this uniform,
which is 7.
And this gives us the expected value of the
time at which he goes.
So this is a simple example, but it illustrates nicely how
we can construct a model that involves a number
of different scenarios.
And by knowing the probability distribution under each one of
the scenarios, we can find the probability
distribution overall.
And we can also find the expected value for the overall
experiment.
# 8. Exercise: Total probability theorem II



# 9. Mixed random variables







We now look at an example similar to the previous one,
in which we have again two scenarios, but in which we
have both discrete and continuous
random variables involved.
You have $1 and the opportunity
to play in the lottery.
With probability 1/2, you do nothing and you're left with
the dollar that you started with.
With probability 1/2, you decide to play the lottery.
And in that case, you get back an amount of money which is
random and uniformly distributed
between zero and two.
Is the random variable, X, discrete?
The answer is no, because it takes values on
a continuous range.
Is the random variable, X, continuous?
The answer is no, because the probability that X takes the
value of exactly one is equal to 1/2.
Even though X takes values in a continuous range, this is
not enough to make it a continuous random variable.
We defined continuous random variables to be those that can
be described by a PDF.
And you have seen it in such a case, any individual point
should have zero probability.
But this is not the case here, and so X is not continuous.
We call X a mixed random variable.
More generally, we can have a situation where the random
variable X with some probability is the same as a
particular discrete random variable, and with some other
probability it is equal to some other
continuous random variable.
Such a random variable, X, does not have a PMF because it
is not discrete.
Also, it does not have a PDF because it is not continuous.
How do we describe such a random variable?
Well, we can describe it in terms of a cumulative
distribution function.
CDFs are always well defined for all
kinds of random variables.
We have two scenarios, and so we can use the Total
Probability Theorem and write that the CDF is equal to the
probability of the first scenario, which is p, times
the probability that the random variable Y is less than
or equal to x.
This is a conditional model under the first scenario.
And with some probability, we have the second scenario.
And under that scenario, X will take a value less than
little x, if and only if our random variable Z will take a
value less than little x.
Or in CDF notation, this is p times the CDF of the random
variable Y evaluated at this particular x plus another
weighted term involving the CDF of the random variable Z.
We can also define the expected value of X in a way
that is consistent with the Total Expectation Theorem,
namely define the expected value of X to be the
probability of the first scenario, in which case X is
discrete times the expected value of the associated
discrete random variable, plus the probability of the second
scenario, under which X is continuous, times the expected
value of the associated continuous random variable.
Going back to our original example, we have two
scenarios, the scenarios that we can call A1 and A2.
Under the first scenario, we have a uniform PDF, and the
corresponding CDF is as follows.
It's flat until zero, then it rises linearly.
And then it stays flat, and the value
here is equal to one.
So the slope here is 1/2.
So the slope is equal to the corresponding PDF.
Under the second scenario, we have a discrete, actually a
constant random variable.
And so the CDF is flat at zero until this value, and at that
value we have a jump equal to one.
We then use the Total Probability Theorem, which
tells us that the CDF of the mixed random variable will be
1/2 times the CDF under the first scenario plus 1/2 times
the CDF under the second scenario.
So we take 1/2 of this plot and 1/2 of that plot
and add them up.
What we get is a function that rises now at the slope of 1/4.
Then we have a jump, and the size of that to jump is going
to be equal to 1/2.
And then it continues at a slope of 1/4 until it reaches
this value.
And after that time, it remains flat.
So this is a simple illustration that for mixed
random variables it's not too hard to obtain the
corresponding CDF even though this random variable does not
have a PDF or a PMF of its own.
# 10. Exercise: A mixed random variable



# 11. Joint PDFs












In this segment, we start a discussion of multiple
continuous random variables.
Here are some objects that we're already familiar with.
But exactly as in the discrete case, if we are dealing with
two random variables, it is not enough to know their
individual PDFs.
We also need to model the relation between the two
random variables, and this is done through a joint PDF,
which is the continuous analog of the joint PMF.
We will use this notation to indicate joint PDFs where we
use f to indicate that we're dealing with a density.
So what remains to be done is to actually define this object
and see how we use it.
Let us start by recalling that joint PMFs were defined in
terms of the probability that the pair of random variables X
and Y take certain specific values little x and little y.
Regarding joint PDFs, we start by saying that it has to be
non-negative.
However, a more precise interpretation in terms of
probabilities has to wait a little bit.
Joint PDFs will be used to calculate probabilities.
And this will be done in analogy with
the discrete setting.
In the discrete setting, the probability that the pair of
random variables falls inside a certain set is just the sum
of the probabilities of all of the possible pairs inside that
particular set.
For the continuous case, we introduce an analogous formula. We use the joint density instead of the joint PMF. And instead of having a summation, we now integrate. As in the discrete setting, we have one total unit of probability. **The joint PDF tells us how this unit of probability is spread over the entire continuous two-dimensional plane**. And we use it, we use the joint PDF, to calculate the probability of a certain set by finding the volume under the joint PDF that lies on top of that set. This is what this integral really represents. We integrate over a particular two-dimensional set, and we take this value that we integrate. And we can think of this as the height of an object that's sitting on top of that set. Now, this relation here, this calculation of probabilities, is not something that we are supposed to prove. This is, rather, the definition of what a joint PDF does.
A legitimate joint PDF is any function of two variables,
which is non-negative and which integrates to 1.
And we will say that two random variables are jointly
continuous if there is a legitimate joint PDF that can
be used to calculate the associated probabilities
through this particular formula.
So we have really an indirect definition.
Instead of defining the joint PDF as a probability, we
actually define it indirectly by saying what it does, how it
will be used to calculate probabilities.
A picture will be helpful here.
Here's a plot of a possible joint PDF.
These are the x and y-axes.
And the function being plotted is the joint PDF of these two
random variables.
This joint PDF is higher at some places and lower at
others, indicating that certain regions of the x,y
plane are more likely than others.
The joint PDF determines the probability of a set B by
integrating over that set B. Let's say it's this set.
Integrating the PDF over that set.
Pictorially, what this means is that we look at the volume
that sits on top of that set, but below the PDF, below the
joint PDF, and so we obtain some three-dimensional object
of this kind.
And this integral corresponds to actually finding this
volume here, the volume that sits on top of the set B but
which is below the joint PDF.
Let us now develop some additional understanding of
joint PDFs.
As we just discussed, for any given set B, we can integrate
the joint PDF over that set.
And this will give us the probability of
that particular set.
Of particular interest is the case where we're dealing with
a set which is a rectangle, in which case the situation is a
little simpler.
So suppose that we have a rectangle where the
x-coordinate ranges from A to B and the y-coordinate ranges
from some C to some D. Then, the double integral over this
particular rectangle can be written in a form where we
first integrate with respect to one of the variables that
ranges from A to B. And then, we integrate over all possible
values of y as they range from C to D.
Of particular interest is the special case where we're
dealing with a small rectangle such as this one.
A rectangle with sizes equal to some delta where delta is a
small number.
In that case, the double integral, which is the volume
on top of that rectangle, is simpler to evaluate.
It is equal to the value of the function that we're
integrating at some point in the rectangle --- let's take
that corner ---
times the area of that little rectangle, which is equal to
delta square.
So we have an interpretation of the joint PDF in terms of
probabilities of small rectangles.
Joint PDFs are not probabilities.
But rather, they are probability densities.
They tell us the probability per unit area.
And one more important comment.
For the case of a single continuous random variable, we
know that any single point has 0 probability.
This is again, true for the case of two jointly continuous
random variables.
But more is true.
If you take a set B that has 0 area.
For example, a certain curve.
Suppose that this curve is the entire set B. Then, the volume
under the joint PDF that's sitting on top of that curve
is going to be equal to 0.
So 0 area sets have 0 probability.
And this is one of the characteristic features of
jointly continuous random variables.
Now, let's think of a particular situation.
Suppose that X is a continuous random variable, and let Y be
another random variable, which is identically equal to X.
Since X is a continuous random variable, Y is also a
continuous random variable.
However, in this situation, we are certain that the outcome
of the experiment is going to fall on the line
where x equals y.
All the probability lies on top of a line, and
a line has 0 area.
So we have positive probability on the set of 0
area, which contradicts what we discussed before.
Well, this simply means that X and Y are not jointly
continuous.
Each one of them is continuous, but together
they're not jointly continuous.
Essentially, joint continuity is something more than
requiring each random variable to be continuous by itself.
For joint continuity, we want the probability to be really
spread over two dimensions.
Probability is not allowed to be concentrated on a
one-dimensional set.
On the other hand, in this example, the probability is
concentrated on a one-dimensional set.
And we do not have joint continuity.
# 12. Exercise: Jointly continuous r.v.'s


# 13. Exercise: From joint PDFs to probabilities




# 14. From the joint to the marginal















In the discrete case, we saw that we could recover the PMF
of X and the PMF of Y from the joint PMF.
Indeed, the joint PMF is supposed to contain a complete
probabilistic description of the two random variables.
It is their probability law, and any quantity of interest
can be computed if we know the joint.
Things are similar in the continuous setting.
You can easily guess the formula through
the standard recipe.
Replace sums by integrals, and replace PMFs by PDFs. But a proof of this formula is actually instructive. So let us start by first finding the CDF of X. **The CDF of X is, by definition, the probability that the random variable X takes a value less than or equal to a certain number little x**. And this is the probability of a particular set that we can visualize on the two dimensional plane. If here is the value of little x, then we're talking about the set of all pairs x, y, for which the x component is less than or equal to a certain number.
[][So we need to integrate over this two-dimensional set the joint density. So it will be a double integral of the joint density over this particular two-dimensional set]. Now, since we've used the symbol x here to mean something specific, let us use different symbols for the dummy variables that we will use in the integration. And we need to integrate with respect to the two variables, let's say with respect to t and with respect to s. The variable t can be anything. So it ranges from minus infinity to infinity. But the variable s, the first argument, ranges from minus infinity up to this point, which is x. *Think of this double integral as an integral with respect to the variable s of this complicated function inside the brackets. *
[][*Now, to find the density of X, all we need to do is to differentiate the CDF of X*]. And when we have an integral of this kind and we differentiate with respect to the upper limit of the integration, what we are left with is the integrand. That is this expression here.
[][******************************************************************************************************************]
What is integrand example?
Mathwords: Integrand. The function being integrated in either a definite or indefinite integral. Example: x2cos 3x is the integrand in ∫ x2cos 3x dx.
[][******************************************************************************************************************]
It is an integral with respect to the second variable.
And it's an integral over the entire space, from minus
infinity to plus infinity.
Here is an example.
The simplest kind of a joint PDF is a PDF of that is
constant on a certain set, S, and is 0 outside that set.
So the overall probability, one unit of probability, is
spread uniformly over that set.
Because the total volume under the joint PDF must be equal to
1, the height of the PDF must be equal to 1 over the area.
To calculate the probability of a certain set A, we want to
ask how much volume is sitting on top of that set.
And because in this case, the PDF is constant, we need to
take the height of the PDF times the relevant area.
What is the relevant area?
Well, actually, the PDF is 0 outside the set S. So the
relevant area is only this part here, which is the
intersection of the two sets, S and A.
So the total volume sitting on top of this little set is
going to be the base, the area of the base, which is the area
of A intersection S times the height of the
PDF at those places.
Now, the height of the PDF is 1 over the area of S. So this
is the formula for calculating the probability of a certain
set, A.
Let's now look at a specific example.
Suppose that we have a uniform PDF over this particular set,
S. This set has an area that is equal to 4.
It consists of four units rectangles arranged next to
each other.
So the height of the joint PDF in this example