-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_6.Rmd
6578 lines (5887 loc) · 280 KB
/
MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_6.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_6.Rmd"
author: "John HHU"
date: "2022-11-05"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
## Course / Unit 6: Further topics on random variables / Lec. 11: Derived distributions
# 1. Lecture 11 overview and slides







In this lecture, we will deal with a single topic.
How to find the distribution, that is, the PMF or PDF of a
random variable that is defined as a function of one
or more other random variables with known distributions.
Why is this useful?
Quite often, we construct a model by first defining some
basic random variables.
These random variables usually have simple distributions and
often they are independent.
But we may be interested in the distribution of some more
complicated random variables that are defined in terms of
our basic random variables.
In this lecture, we will develop systematic methods for
the task at hand.
After going through a warm-up, the case of discrete random
variables, we will see that there is a general, very
systematic 2-step procedure that relies on cumulative
distribution functions.
We will pay special attention to the easier case where we
have a linear function of a single random variable.
We will also see that when the function involved is
monotonic, we can bypass CDFs and jump directly to a formula
that is easy to apply.
We will also see an example involving a function of two
random variables.
In such examples, the calculations may be more
complicated but the basic approach based on CDFs is
really the same.
Let me close with a final comment.
Finding the distribution of the function g of X is indeed
possible, but we should only do it when we really need it.
If all we care about is the expected value of g of X we
can just use the expected value rule.
Printable transcript available here.
https://courses.edx.org/assets/courseware/v1/54c9d0ee91b8434d2ab1db4c0ae7d2bb/asset-v1:MITx+6.431x+2T2022+type@asset+block/transcripts_L11-Overview.pdf
Lecture slides: [clean] [annotated]
https://courses.edx.org/assets/courseware/v1/1b8585e226baa3938df06b2408ea9f1a/asset-v1:MITx+6.431x+2T2022+type@asset+block/lectureslides_L11cleanslides.pdf
https://courses.edx.org/assets/courseware/v1/22d77661646e89b846880bb0955256a3/asset-v1:MITx+6.431x+2T2022+type@asset+block/lectureslides_L11annotatedslides.pdf
More information is given in Section 4.1 of the text.
https://courses.edx.org/courses/course-v1:MITx+6.431x+2T2022/pdfbook/0/chapter/1/31
# 2. The PMF of a function of a discrete r.v.













As a warm-up towards finding the distribution of the
function of random variables, let us start by considering
the discrete case.
So let X be a discrete random variable and let Y be defined
as a given function of X. We know the PMF of X and wish to
find the PMF of Y. Here's a simple example.
The random variable X takes the values 2, 3, 4, and 5 with
the probabilities given in the figure, and Y is the function
indicated here.
Then, for example, the probability that Y takes a
value of 4.
This is also the value of the PMF of Y evaluated at 4.
This is simply the sum of the probabilities of the possible
values of X that give rise to a value of Y
that is equal to 4.
Therefore, this expression is equal to the probability that
X equals to 4 plus the probability that
X is equal to 5.
Or, in PMF notation, we can write it in this manner.
And in this numerical example, it would be 0.3 plus 0.4.
More generally, for any given value of little y, the
probability that the random variable capital Y takes this
particular value is the sum of the probabilities of the
little x that result in that particular value.
So the probability that the random variable capital Y,
which is the same as g of X, takes on a specific value is
the sum of the probabilities of all possible values of
little x where we only consider those values of
little x that give rise to the specific value, little y, that
you're interested.
Let us now look into the special case where we have a
linear function of a discrete random variable.
Suppose that X is described by the PMF shown in this diagram,
and let us consider the random variable Z, which is defined
as 2 times X. We would like to plot the PMF of Z.
First, let us note the values that Z can take.
When X is equal to minus 1, Z is going to be
equal to minus 2.
When X is equal to 1, Z is going to be equal to 2.
And when X is equal to 2, Z is going to be equal to 4.
This event that X is equal to minus 1 happens with
probability 2/6, and when that event happens, Z will take a
value of minus 2.
So this event happens with probability 2/6.
With probability 1/6, X takes a value of 1 so that Z takes a
value of 2.
And this happens with probability 1/6.
6 And finally, this last event here happens
with probability 3/6.
We have thus found the PMF of Z. Notice that it has the same
shape as the PMF of X, except that it is stretched or scaled
horizontally by a factor of 2.
Let us now consider the random variable Y, defined as 2X plus
3, or what is the same as Z plus 3.
With probability 2/6, Z is equal to minus 2.
And in that case, Y is going to be equal to plus 1.
And this event happens with probability 2/6.
With probability 1/6, Z takes a value of 2 so that Y it
takes a value of 5.
And finally, with probability 3/6, Z takes a value of 4 so
that Y it takes a value of 7.
What we see here is that the PMF of Y has exactly the same
shape as the PMF of Z, except that it is shifted to the
right by 3.
To summarize, in order to find the PMF of a linear function
such as 2X plus 3, what we do is that we first stretch the
PMF of X by a factor of 2 and then shift it
horizontally by 3.
We can also describe the PMF of Y through a formula.
For any given value of little y, the PMF is going to be
equal to the probability that our random variable Y takes on
the specific value little y.
Then we recall that Y has been defined in our example to be
equal to 2X plus 3, so we're looking at the probability of
this event.
But this is the same as the event that X takes a value
equal to y minus 3 divided by 2.
And in PMF notation, we can write it in this form.
So what this is saying is that the probability that Y takes
on a specific value is the same as the probability that X
takes on some other specific value.
And that value here is that value of X that would give
rise to this particular value little y.
Now, we can generalize the calculation that we just did.
And more generally, if we have a linear function of a
discrete random variable X, the PMF of the random variable
Y is given by this formula in terms of the PMF of the random
variable X. The derivation is the same.
We use b instead the specific number 3, and we have a
general constant a instead of the 2 that
we had in this example.
And this formula describes exactly what we did
graphically in our previous example.
This factor of a here serves to stretch the PMF by a factor
of a, and this term b here serves to shift the PMF by b.
# 3. Exercise: Linear functions of discrete r.v.'s




# 4. A linear function of a continuous r.v.


















We now move to the case of continuous random variables.
We will start with a special case where we want to find the
PDF of a linear function of a continuous random variable.
We will start by considering a simple example, and study it
using an intuitive argument.
And afterwards, we will justify our conclusions
mathematically.
So we start with a random variable X that has a PDF over
the form shown in this figure so that it is a piecewise
constant PDF.
We then consider a random variable z, which is defined
to be 2 times X. The random variable x takes values
between minus 1 and 1.
So z takes values between minus 2 and 2.
Now, values of X between minus 1 and 0 correspond to values
of Z between minus 2 and 0.
The different values of X in this range are, in some sense,
equally likely, because we have a constant PDF.
And that argues that the corresponding values of Z
should also be, in some sense, equally likely.
So the PDF should be constant over this range.
By a similar argument, the PDF of Z should also be constant
over the range from 0 to 2.
And the PDF must, of course, be 0 outside this range,
because these are values of Z that are impossible.
Let us now try to figure out the parameters of this PDF.
The probability that X is positive is the
area of this rectangle.
And the area of this rectangle is 2/3.
So the area of this rectangle should also be 2/3.
And that means that the height of this rectangle should be
equal to 1/3.
Similarly, the probability that X is negative is the area
of this rectangle, and the area of this rectangle is
equal to 1/3.
When X is negative, Z is also negative, so the probability
of a negative value should be equal to 1/3.
And for the area of this rectangle to be 1/3, it means
that the height of this rectangle should be 1/6.
So what happened here?
We started with a PDF of X and essentially stretched it out
by a factor of 2 while keeping the same shape.
However, we also scaled it down by a
corresponding amount.
So 2/3 became 1/3, and 1/3 became 1/6.
The reason for this scaling down is because we need the
total probability, the total area under this PDF, to be
equal to 1.
If we now add a number, let's say 3, to the random variable
Z, what is going to happen?
The random variable Y now will take values from
minus 2 plus 3--
this is plus 1--
all the way up to 2 plus 3, which is plus 5.
Values in the range from 1 to 3 correspond to values of Z in
the range from minus 2 to 0.
These values are all, in some sense, equally likely.
So they should also be equally likely here.
And by a similar argument, these values in the range from
3 to 5 should also be equally likely.
This rectangle corresponds to this rectangle here.
So the area should be the same.
And therefore, the height should also be the same.
Therefore, the height here should be 1/6.
And by the same argument, the height here
should be equal to 1/3.
So what hap5. Covariance of the multinomial
Level 2 headings may be created by course providers in the future.
pens here is that when we add 3 to a random
variable, the PDF just gets shifted by 3 but otherwise
retains the same shape.
So the story is entirely similar to what happened in
the discrete case.
We start with a PDF of X. We stretch it horizontally by a
factor of 2.
And then we shift it horizontally by 3.
The only difference is that here in the continuous case,
we also need to scale the plot in the vertical dimension by a
factor of 2.
Actually, make it smaller by a factor of 2.
And this needs to be done in order to keep the total area
under the PDF equal to 1.
Let us now go through a mathematical argument with the
purpose of also finding a formula that represents what
we just did in our previous example.
Let Y be equal to aX plus b.
Here, X is a random variable with a given PDF.
a and b are given constants.
Now, if a is equal to 0, then Y is identically equal to b.
So it is a constant random variable and
does not have a PDF.
So let us exclude this case and start by assuming that a
is a positive number.
We can try to work, as in the discrete case, and try
something like the following.
The probability that Y takes on a specific value is the
same as the probability that aX plus b takes on a specific
value, which is the same as the probability that X takes
on the specific value, y minus b divided by a.
This equality was useful in the discrete case.
Is it useful here?
Unfortunately not.
When we're dealing with continuous random variables,
the probability that the continuous random variable is
exactly equal to a given number, this probability is
going to be equal to 0.
And the same applies to this side as well.
So we have that 0 is equal to 0.
And this is uninformative, and we have not made any progress.
So instead of working with probabilities of individual
points which will always be 0, we will work with
probabilities of intervals that generally have non-zero
probability.
The trick is to work with CDFs.
So let us try to find the CDF of Y. The CDF of the random
variable Y is defined as the probability that the random
variable is less than or equal to a certain number.
Now, in our case, Y is aX plus b.
We move b to the other side of the inequality and then divide
both sides of the inequality by a.
And we get that this is the same as the probability that X
is less than or equal to y minus b divided by a, which is
the same as the CDF of X evaluated at y minus b over a.
So we have a formula for the CDF of Y in terms of the CDF
of X.
How can we find the PDF?
Simply by differentiating.
We differentiate both sides of this equation.
The derivative of a CDF is a PDF.
And therefore, the PDF of Y is going to be equal to the
derivative of this side.
Here we need to use the chain rule.
First, we take the derivative of this function.
And the derivative of the CDF is a PDF, so the PDF of X
evaluated at this particular number.
But then we also need to take the derivative of the argument
inside with respect to y.
And that derivative is equal to 1/a.
And this gives us a formula for the PDF of Y in terms of
the PDF of X.
How about the case where a is less than 0?
What is going to change?
The first step up to here remains valid.
But now when we divide both sides of the inequality by a,
the direction of the inequality gets reversed.
So we obtain instead the probability that X is larger
than or equal to y minus b divided by a.
And this is 1 minus the probability that X is less
than y minus b over a.
Now, X is a continuous random variable, so the probability
is not going to change if here we make the inequality to be a
less than or equal sign.
And what we have here is 1 minus the CDF of X evaluated
at y minus b over a.
We use the chain rule once more, and we obtain that the
PDF of Y, in this case, is equal to minus the PDF of X
evaluated at y minus b over a times 1/a.
Now, when a is positive, a is the same as the
absolute value of a.
When a is negative and we have this formula, we have here a
minus a, which is the same as the absolute value of a.
So we can unify these two formulas by replacing the
occurrences of a and that minus sign by just using the
absolute value.
And this gives us this formula for the PDF of Y in terms of
the PDF of X. And it is a formula that's valid whether a
is positive or negative.
What this formula represents is the following.
Because of the factor of a that we have here, we take the
PDF of X and scale it horizontally by a factor of a.
Because of the term b that we have here, the PDF also gets
shifted horizontally by b.
And finally, this term here corresponds to a vertical
scaling of the plot that we have.
And the reason that this term is present is so that the PDF
of Y integrates to 1.
It is interesting to also compare with the corresponding
discrete formula that we derived earlier.
The discrete formula has exactly the same appearance
except that the scaling factor is not present.
So for the case of continuous random variables, we need to
scale vertically the PDF.
But in the discrete case, such a scaling is not present.
# 5. Exercise: Linear functions of continuous r.v.'s




# 6. A linear function of a normal r.v.





Let us now consider an application of what we have
done so far.
Let X be a normal random variable with
given mean and variance.
This means that the PDF of X takes the familiar form.
We consider random variable Y, which is a linear function of
X. And to avoid trivialities, we assume that a is
different than zero.
We will just use the formula that we
have already developed.
So we have that the density of Y is equal to 1 over the
absolute value of a.
And then we have the density of X, but evaluated at x equal
to this expression.
So this expression will go in the place
of x in this formula.
And we have y minus b over a minus mu squared divided by 2
sigma squared.
And now we collect these constant terms here.
And then in the exponent, we multiply by a squared the
numerator and the denominator, which gives us this form here.
We recognize that this is again, a normal PDF.
It's a function of y.
We have a random variable Y. This is
the mean of the normal.
And this is the variance of that normal.
So the conclusion is that the random variable Y is normal
with mean equal to b plus a mu.
And with variance a squared, sigma squared.
The fact that this is the mean and this is the variance of Y
is not surprising.
This is how means and variances behave when you form
linear functions.
The interesting part is that the random variable Y is
actually normal.
Intuitively, what happened here is that we started with a
normal bell shaped curve.
A bell shaped PDF for X. We scale it vertically and
horizontally, and then shift it horizontally by b.
As we do these operations, the PDF still remains bell shaped.
And so the final PDF is again a bell shaped normal PDF.
# 7. The PDF of a general function






















In this important segment, we will develop a method for
finding the PDF of a general function of a continuous
random variable, a function g of X, which, in general, could
be nonlinear.
The method is very general and involves two steps.
The first step is to find the CDF of Y. And then the second
step is to take the derivative of the CDF and
then find the PDF.
Most of the work lies here in finding the CDF of Y. And how
do we do that?
Well, since Y is a function of the random variable X, we
replace Y by g of X. And now we're dealing with a
probability problem that involves a random variable, X,
with a known PDF.
And we somehow calculate this probability.
So let us illustrate this procedure
through some examples.
In our first example, we let X be a random variable which is
uniform on the range from 0 to 2.
And so the height of the PDF is 1/2.
And we wish to find the PDF of the random variable Y which is
defined as X cubed.
So since X goes all the way up to 2, Y goes all
the way up to 8.
The first step is to find the CDF of Y. And since Y is a
specific function of X, we replace that functional form.
And we write it this way.
So we want to calculate the probability that x cubed is
less than or equal to a certain number y.
Let us take cubic roots of both sides of this inequality.
This is the same as the probability that X is less
than or equal to y to the 1/3.
Now, we only care about values of y that are between 0 and 8.
So this calculation is going to be for those values of y.
For other values of y, we know that the PDF is equal to 0.
And there's no work that needs to be done there.
OK.
Now, y is less than or equal to 8, so the cubic root of y
is less than or equal to 2.
So y to the 1/3 is going to be a number
somewhere in this range.
Let's say this number.
We want the probability that X is less than or
equal to that value.
So that probability is equal to this area under the PDF of
X. And since it is uniform, this area is easy to find.
It's the height, which is 1/2 times the base,
which is y to the 1/3.
So we continue this calculation, and we get 1/2
times y to the 1/3.
So this is the formula for the CDF of Y for values of little
y between 0 and 8.
This completes step one.
The second step is simple calculus.
We just need to take the derivative of the CDF.
And the derivative is 1/2 times 1/3, this exponent, y to
the power of minus 2/3.
Or in a cleaner form, 1/6 times 1 over y
to the power 2/3.
So the form of this PDF is not a constant anymore.
Y is not a uniform random variable.
The PDF becomes larger and larger as y approaches 0.
And in fact, in this example, it even blows up when y
becomes closer and closer to 0.
So this is the shape of the PDF of Y.
Our second example is as follows.
You go to the gym, you jump on the treadmill, and you set the
speed on the treadmill to some random value which we call X.
And that random value is somewhere between 5 and 10
kilometers per hour.
And the way that you set it is chosen at random and uniformly
over this interval.
So X is uniformly distributed on the interval
between 5 and 10.
You want to run a total of 10 kilometers.
How long is it going to take you?
Let the time it takes you be denoted by Y. And the time
it's going to take you is the distance you want to travel,
which is 10 divided by the speed with
which you will be going.
So the random variable y is defined in terms of x through
this particular expression.
We want to find the PDF of y.
First let us look at the range of the random variable Y.
Since x takes values between 5 and 10, Y takes values
between 1 and 2.
Therefore, the PDF of Y is going to be 0
outside that range.
And let us now focus on values of Y that belong to this
interesting range.
So 1 less than y less than or equal to 2.
And now we start with our two-step program.
We want to find the CDF of Y, namely, the probability that
capital Y takes a value less than or equal to a certain
little y in this range.
We recall the definition of capital Y. So now we're
dealing with a probability problem that involves the
random variable capital X, whose
distribution is given to us.
Now, we rewrite this event as follows.
We move X to the other side.
This is the probability that X is larger than or equal after
we move the little y also to the left-hand side.
X being larger than or equal to 10 over little y.
Now, y is between 1 and 2.
10/y is going to be a number between 5 and 10.
So 10/y is going to be somewhere in this range.
We're interested in the probability that X is larger
than or equal to that number.
And this probability is going to be the area of this
rectangle here.
And the area of that rectangle is equal to the height of the
rectangle--
now, the height of this rectangle is going to be 1/5.
This is the choice that makes the total area under this
curve be equal to 1--
times the base.
And the length of the base is this number
10 minus that number.
It's 10 minus 10/y.
So this is the form of the CDF of Y for y's in this range.
To find the PDF of Y, we just take the derivative.
And we get 1/5 times the derivative of this term, which
is minus 10, divided by y squared.
But when we take the derivative of 1/y, that gives
us another minus sign.
The two minus signs cancel, and we
obtain 2 over y squared.
And if you wish to plot this, it starts at 2.
And then as y increases, the PDF actually decreases.
And this is the form of the PDF of the random variable y.
This is the form which is true when y lies between 1 and 2.
And of course, the PDF is going to be 0 for other
choices of little y.
So what we have seen here is a pretty systematic approach
towards finding the PDF of the random variable Y. Again, the
first step is to look at the CDF, write the CDF in terms of
the random variable X, whose distribution is known, and
then solve a probability problem that involves this
particular random variable.
And then in the last step, we just need to differentiate the
CDF in order to obtain the PDF.
# 8. Exercise: PDF of a general function




# 9. The monotonic case





















We have already worked through some examples in which X was a
random variable with a given PDF, and we considered the
problem of finding the PDF of Y for the case where Y was the
function x cubed or the function of the form a/X. What
both of these examples have in common is that Y is a
monotonic function of X.
In this case, Y is increasing with X. In this case, Y was
decreasing with X. It turns out that there is a general
formula that gives us the PDF of Y in terms of the PDF of X
in the special case where we're dealing
with a monotonic function.
So, let us assume that g is a strictly increasing function.
And what that means is that, if x is a number or smaller
than some other number x prime, the value of g of x is
going to be smaller than the value of g x prime.
So, when you increase the argument of the function, the
function increases.
To keep things simple, we will also assume that the function
g is smooth, in particular that it is differentiable.
Then we have a diagram such as this one.
Here is x, and y is given by a function of x.
It's a smooth function, and that function keeps
increasing.
Now, because of the assumptions we have made on g,
we have an interesting situation.
Given a value of x, a corresponding value of y will
be determined according to the function g.
But we can also go the other way.
If I tell you a value of y, then you can specify for me
one and only one value of x that gives rise to this
particular y.
So, the function g takes us from x's to y's, but you can
also go back the opposite way from y's to values of x.
And the mapping that takes us from y's to x's, this is the
inverse of the function g.
And we give a name to that inverse function,
and we call it h.
So, h of y is the value of x that produces a
specific value y.
Let us now move on with the program of finding the PDF of
Y. We will follow the usual two step procedure.
And the first step is to find the CDF of Y.
So we fix some little y, And we want to find the
probability that the random variable y takes a value in
this range.
When does this happen?
For Y to take a value in this range, it must be the case
that X takes a value in this range here.
Values of X smaller than this particular number result in
values of Y that are less than or equal to
this particular number.
So, we can rewrite the event of interest in terms of the
random variable X and write it as follows.
We need to have x less than or equal to h of little y.
But this is just the CDF of X evaluated at h of y.
We now carry out to the second step of our program.
We take derivatives of both sides and we find that the PDF
of Y is equal to the derivative of the right hand
side, the derivative of the CDF is a PDF.
And then the chain rule tells us that we also need to take
the derivative of the term inside here with respect to
its argument.
And this is a general formula for the PDF of a strictly
increasing function of a random variable X. How about
the case of a decreasing function?
So, let us assume that g now is a strictly decreasing
function of X.
So, we might have a plot for g that looks
something like this.
What happens in this case?
We can start doing a calculation of this kind.
But now, how can we rewrite this event?
The random variable Y will take a value less than or
equal to this number little y.
When does this happen?
When the value of g of x is less than y.
And that happens for x's in this range.
So, this is the set of x's for which is the value of g of x
is less than or equal to this particular number y.
So the event of interest in that case is the event that X
is larger than or equal to h of y, which is 1 minus the
probability that X is less than h of y.
Because X is a continuous random variable, we can change
this inequality to one that allows the
possibility of equality.
And so this is 1 minus the CDF of X evaluated at h of y.
Now we take the derivatives of both sides and we find the PDF
or Y being equal to, there's a minus sign here, then the
derivative of the CDF, which is the PDF.
And finally, the derivative of the function h.
Now in this case, g is a decreasing function of x.
So when x goes down, y goes up.
When x goes up, y goes down.
This means that when y goes up, x goes down.
So it means that the inverse function h is going to be also
monotonically decreasing.
Since it is decreasing, it means that the slope, the
derivative of the function h is going to
be either 0 or negative.
And so minus a negative value gives us the absolute value of
that number.
So we can rewrite this by removing this minus sign here,
and putting an absolute value in this place.
Of course, in the case where g is an increasing function,
when x goes up, y goes up.
This means that when y goes up, x goes up.
So h in that case would have been an increasing function,
so this number here would have been a non-negative number,
and so it would be the same as the absolute value.
So using these absolute values, we obtain formulas
that are exactly the same in both cases of increasing and
decreasing functions, and so our final conclusion is that
in either case, the PDF of Y is given in terms of the PDF
of X times the derivative of this inverse function.
Let us now apply the formula that we have in our hands for
the monotonic case to a particular example, where y is
the square of X, and where X is uniform on the
interval 0 to 1.
So the function g, in our case, the function g is the
square function.
Now, you could argue here that this function is not
monotonic, so how can we apply our results?
On the other hand, the random variable X takes values on the
interval from 0 to 1, and therefore the form of the
function g outside that range does not concern us.
Over the range of values of interest, the function g is a
monotonic function.
So, what is the correspondence?
y is going to be equal to x squared.
That's the g of x function.
And when that happens, we have the relation that x is going
to be the square root of y.
This tells us that the inverse function, h of y, which tells
us what is the particular x associated with a given y, the
inverse function takes the form square root of y.
So now we can go ahead and use the formula.
The density at some particular little y where that little y,
belongs to the range of values of interest, x things values
between 0 and 1, so y also takes values between 0 and 1.
So over that range, the density of Y is the density of
X, which is uniform, therefore it is equal to 1, times the
derivative of the square root function.
And the derivative of the square root function is 1 over
2 times the square root of y.
As you can see, the amount of calculations involved here are
rather simpler compared to what we would have to do if we
were to go through our two step program
and work with CDFs.
All that you need to do is essentially to identify the
inverse function that given a y produces x's, and write down
the corresponding derivative.
# 10. Exercise: Using the formula for the monotonic case






# 11. The intuition for the monotonic case










The formula that we just derived for the monotonic case
has a nice intuitive explanation that we will
develop now.
Suppose that g is a monotonic function of x and that it's
monotonically increasing.
Let us fix a particular x and a corresponding y so that the
two of them are related as follows-- y is equal to g of
x, or we could argue in terms of the inverse function so
that x is equal to h of y.
Recall that h is the inverse function, that given a value
of y, tells us which one is the corresponding value of x.
Now let us consider a small interval in the
vicinity of this x.
Whenever x falls somewhere in this range, then y is going to
fall inside another small interval.
The event that x belongs here is the same as the event that
y belongs there.
So these two events have the same probability.
And we can, therefore, write that the probability that Y
falls in this interval is the same as the probability that X
falls in the corresponding little interval on the x-axis.
This interval has a certain length delta 1.
This interval has a certain length delta 2.
Now remember our interpretation of
probabilities of small intervals in terms of PDFs so
this probability here is approximately equal to the PDF
of Y evaluated at the point y times the length of the
corresponding interval.
Similarly, on the other side, the probability that X falls
on the interval is the PDF of X times the
length of that interval.
So this gives us already a relation between the PDF of Y
and the PDF of X, but it involves those two numbers
delta 1 and delta 2.
How are these two numbers related?
If x moves up by the amount of delta 1, how much is y going
to move up?
It's going to move up by an amount which is delta 1 times
the slope of the function g at that particular point.
So that gives us one relation that delta 2 is approximately
equal to delta 1 times the derivative of the function of
g at that particular x.
However, it's more useful to work the other way, thinking
in terms of the inverse function.
The inverse function maps y to x, and it maps y plus delta to
2 to x plus delta 1.
When y advances by delta 2, x is going to advance by an
amount which is how much y advanced times the slope, or
the derivative, of the function that
maps y's into x's.
And this function is the inverse function.
So this is the relation that we're going to use.
And so we replace delta 1 by this expression that we have
here in terms of delta 2.
And now we cancel the delta 2 from both sides of this
equality, and we obtain the final formula that the PDF of
Y evaluated at a certain point is equal to the PDF of x
evaluated at the corresponding point, or we could write this
as the PDF of X evaluated at the value x that's associated
to that y that's given by the inverse function, times the
derivative of the function h, the inverse function.
And this is just the same formula as the one that we had
derived earlier using CDFs.
This derivation is quite intuitive.
It associates probabilities of small intervals on the x-axis
to probabilities of corresponding small intervals
on the y-axis.
These two probabilities have to be equal, and this implies
a certain relation between the two PDFs.
# 12. A nonmonotonic example










All of our examples so far have involved functions, g of
x, that are monotonic in X, at least over the
range of x's of interest.
Let us now look at an example that involves a