MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_5.Rmd

---
title: "MITx 6.431x -- Probability - The Science of Uncertainty and Data + Unit_5.Rmd"
author: "John HHU"
date: "2022-11-05"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r cars}
summary(cars)
```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.


## Course  /  Unit 5: Continuous random variables  /  Lec. 9: Conditioning on an event; Multiple r.v.'s

# 1. Lecture 9 overview and slides

![](C:/Users/qp/Pictures/Screenshots/1. Lecture 9 overview and slides.png)
In this lecture, we continue our discussion of continuous
random variables.
We will start by bringing conditioning into the picture
and discussing how the PDF of a continuous random variable
changes when we are told that a certain event has occurred.
We will take the occasion to develop counterparts of some
of the tools that we developed in the discrete case such as
the total probability and total expectation theorems.
In fact, we will push the analogy even further.
In the discrete case, we looked at the geometric PMF in
some detail and recognized an important memorylessness
property that it possesses.
In the continuous case, there is an entirely analogous story
that we will follow, this time involving the exponential
distribution which has a similar
memorylessness property.
We will then move to a second theme which is how to describe
the joint distribution of multiple random variables.
We did this in the discrete case by
introducing joint PMFs.
In the continuous case, we can do the same using
appropriately defined joint PDFs and by
replacing sums by integrals.
As usual, we will illustrate the various concepts through
some simple examples and also take the opportunity to
introduce some additional concepts such as mixed random
variables and the joint cumulative
distribution function.


# 2. Conditioning a continuous random variable on an event

![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 1.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 2.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 3.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 4.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 5.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 6.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 7.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 8.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 9.png)
![11 duplicated](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 10.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 12.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 13.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 14.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 15.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 16.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditioning a continuous random variable on an event - 17.png)
In this segment, we pursue two themes.  Every concept has a conditional counterpart.  [][We know about PDFs, but if we live in a conditional universe, then we deal with conditional probabilities.  And we need to use conditional PDFs.  The second theme is that discrete formulas have continuous counterparts in which summations get replaced by integrals, and PMFs by PDFs.  ]

So let us recall the definition of a conditional
PMF, which is just the same as an ordinary PMF but applied to
a conditional universe.
In the same spirit, we can start with a PDF, which we can
interpret, for example, in terms of probabilities of
small intervals.
If we move to a conditional model in which event A is
known to have occurred, probabilities of small
intervals will then be determined by a conditional
PDF, which we denote in this manner.
Of course, we need to assume throughout that the
probability of the conditioning event is positive
so that conditional probabilities are
well-defined.
Let us now push the analogy further.
We can use a PMF to calculate probabilities.
The probability that X takes [a] value in a certain set is
the sum of the probabilities of all the possible
values in that set.
And a similar formula is true if we're dealing with a
conditional model.
Now, in the continuous case, we use a PDF to calculate the
probability that X takes values in a certain set.
And by analogy, we use a conditional PDF to calculate
conditional probabilities.
We can take this relation here to be the definition of a
conditional PDF.
So a conditional PDF is a function that allows us to
calculate probabilities by integrating this function over
the event or set of interest.
Of course, probabilities need to sum to 1.
This is true in the discrete setting.
And by analogy, it should also be true in
the continuous setting.
This is just an ordinary PDF, except that it applies to a
model in which event A is known to have occurred.
But it still is a legitimate PDF.
It has to be non-negative, of course.
But also, it needs to integrate to 1.
When we condition on an event and without any further
assumption, there's not much we can say about the form of
the conditional PDF.
However, if we condition on an event of a special kind, that
X takes values in a certain set, then we can actually
write down a formula.
So let us start with a random variable X that has a given
PDF, as in this diagram.
And suppose that A is a subset of the real line, for example,
this subset here.
What is the form of the conditional PDF?
We start with the interpretation of PDFs and
conditional PDFs in terms of
probabilities of small intervals.
The probability that X lies in a small interval is equal to
the value of the PDF somewhere in that interval times the
length of the interval.
And if we're dealing with conditional probabilities,
then we use the corresponding conditional PDF.
To find the form of the conditional PDF, we will work
in terms of the left-hand side in this equation and try to
rewrite it.
Let us distinguish two cases.
Suppose that little X lies somewhere out here, and we
want to evaluate the conditional PDF at that point.
So trying to evaluate this expression, we consider a
small interval from little x to little x plus delta.
And now, let us write the definition of a conditional
probability.
A conditional probability, by definition, is equal to the
probability that both events occur divided by the
probability of the conditioning event.
Now, because the set A and this little interval are
disjoint, these two events cannot occur simultaneously.
So the numerator here is going to be 0.
And this will imply that the conditional PDF is
also going to be 0.
This, of course, makes sense.
Conditioned on the event that X took values in this set,
values of X out here cannot occur.
And therefore, the conditional density out here
should also be 0.
So the conditional PDF is 0 outside the set A. And this
takes care of one case.
Now, the second case to consider is when little x lies
somewhere inside here inside the set A. And in that case,
our little interval from little x to little x plus
delta might have this form.
In this case, the intersection of these two events, that X
lies in the big set and X lies in the small set, the
intersection of these two events is the event that X
lies in the small set.
So the numerator simplifies just to the probability that
the random variable X takes values in the interval from
little x to little x plus delta.
And then we rewrite the denominator.
Now, the numerator is just an ordinary probability that the
random variable takes values inside a small interval.
And by our interpretation of PDFs, this is approximately
equal to the PDF evaluated somewhere in that small
interval times delta.
At this point, we notice that we have deltas on both sides
of this equation.
By cancelling this delta with that delta, we finally end up
with a relation that the conditional PDF should be
equal to this expression that we have here.
So to summarize, we have shown a formula for
the conditional PDF.
The conditional PDF is 0 for those values of X that cannot
occur given the information that we are given, namely that
X takes values at that interval.
But inside this interval, the conditional PDF has a form
which is proportional to the unconditional PDF.
But it is scaled by a certain constant.
So in terms of a picture, we might have
something like this.
And so this green diagram is the form of
the conditional PDF.
The particular factor that we have here in the denominator
is exactly that factor that is required, the scaling factor
that is required so that the total area under the green
curve, under the conditional PDF is equal to 1.
So we see once more the familiar theme, that
conditional probabilities maintain the same relative
sizes as the unconditional probabilities.
And the same is true for conditional PMFs or PDFs,
keeping the same shape as the unconditional ones, except
that they are re-scaled so that the total probability
under a conditional PDF is equal to 1.
We can now continue the same story and revisit everything
else that we had done for discrete random variables.
For example, we have the expectation of a discrete
random variable and the corresponding conditional
expectation, which is just the same kind of object, except
that we now rely on conditional probabilities.
Similarly, we can take the definition of the expectation
for the continuous case and define a conditional
expectation in the same manner, except that we now
rely on the conditional PDF.
So this formula here is the definition of the conditional
expectation of a continuous random variable given a
particular event.
We have a similar situation with the expected value rule,
which we have already seen for discrete random variables in
both of the unconditional and in the conditional setting.
We have a similar formula for the continuous case.
And at this point, you can guess the form that the
formula will take in the
continuous conditional setting.
This is the expected value rule in the conditional
setting, and it is proved exactly the same way as for
the unconditional continuous setting, except that here in
the proof, we need to work with conditional probabilities
and conditional PDFs, instead of the unconditional ones.
So to summarize, there is nothing really different when
we condition on an event in the continuous case compared
to the discrete case.
We just replace summations with integrations.
And we replace PMFs by PDFs.


# 3. Exercise: A conditional PDF

![](C:/Users/qp/Pictures/Screenshots/3. Exercise A conditional PDF - 1.png)
![](C:/Users/qp/Pictures/Screenshots/3. Exercise A conditional PDF - 2.png)
![](C:/Users/qp/Pictures/Screenshots/3. Exercise A conditional PDF - 3.png)
![](C:/Users/qp/Pictures/20221008_145629.jpg)


# 4. Conditioning example

![](C:/Users/qp/Pictures/Screenshots/4. Conditioning example - 1.png)
![](C:/Users/qp/Pictures/Screenshots/4. Conditioning example - 2.png)
![](C:/Users/qp/Pictures/Screenshots/4. Conditioning example - 3.png)
![](C:/Users/qp/Pictures/Screenshots/4. Conditioning example - 4.png)
![](C:/Users/qp/Pictures/Screenshots/4. Conditioning example - 5.png)
![](C:/Users/qp/Pictures/Screenshots/4. Conditioning example - 6.png)
Let us now look at an example.
Consider a piecewise constant PDF of the form
shown in this diagram.
Suppose that we condition on the event that x lies between
a plus b over 2, which is here, and b.
So we're conditioning on x lying in this
particular red interval.
What is the conditional PDF?
The conditional PDF is going to be 0 outside of the
interval on which we are conditioning.
So the conditional PDF is 0 in this range, and also, it is 0
in this range.
Within the range of values of x that are allowed given the
conditioning information, the conditional PDF must retain
the same shape as the unconditional one.
And the unconditional one is constant in that range.
So the conditional PDF will also be a constant.
Because in this case the length of this interval is
half of the distance between b minus a--
so the length of this interval is b minus a over 2--
in order for the area under this curve to be equal to 1,
it means that the height of this curve has to be equal to
2 over b minus a.
The conditional expectation in this example is just the
ordinary expectation but applied to
the conditional model.
Since the conditional PDF is uniform, the conditional
expectation will be the midpoint of the range of this
conditional PDF.
And in this case, the midpoint is 1/2 the left end of the
interval, which is a plus b over 2 plus 1/2 the right end
point of the interval, which is b.
And so this evaluates to 1/4 times a plus 3/4 times b.
We can also calculate the expected value of X squared in
the conditional model using the expected value rule.
According to the expected value rule, it's going to be
an integral of the conditional PDF, which is 2 over b minus a
multiplied by x squared.
And this integral runs over the range where the
conditional PDF is actually non-zero.
So it's an integral that ranges from a plus b
over 2 up to b.
And this an integral which is not too hard to evaluate, and
there's no point in carrying out the evaluation to the end.


# 5. Memorylessness of the exponential PDF

![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 1.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 2.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 3.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 4.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 5.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 6.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 7.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 8.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 9.png)
![look at this, think about this](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 9.5.png)
![this is because of the memorylessness](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 10.png)
![this is because of the memorylessness](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 11.png)
![](C:/Users/qp/Pictures/Screenshots/5. Memorylessness of the exponential PDF - 12.png)
We now revisit the exponential random variable that we
introduced earlier and develop some intuition about what it
represents.
We do this by establishing a memorylessness property,
similar to the one that we established earlier in the
discrete case for the geometric PMF.
Suppose that it is known that light bulbs have a lifetime
until they burn out, which is an
exponential random variable.
You go to a store, and you are given two choices, to buy a
new light bulb, or to buy a used light bulb that has been
working for some time and has not yet burned out.
Which one should you take?
We want to approach this question mathematically.
So let us denote by capital T the lifetime of the bulb.
So time starts at time 0, and then at some random time that
we denote by capital T, the light bulb will burn out.
And we assume that this random variable is exponential with
some given parameter lambda.
In one of our earlier calculations, we have shown
that the probability that capital T is larger than some
value little x falls exponentially
with that value x.
We are now told that a certain light bulb has already been
operating for t time units without failing.
So we know that the value of the random variable capital T
is larger than little t.
We are interested in how much longer the light bulb will be
operating, and so we look at capital X, which is the
remaining lifetime from the current time until the light
bulb burns out.
So capital X is this particular random variable
here, and it is equal to capital T minus little t.
Let us now calculate the probability that the light
bulb lasts for another little x time units.
That is, that this random variable, capital X, is at
least as large as some little x.
That is, that the light bulb remains alive
until time t plus x.
We use the definition of conditional probabilities to
write this expression as the probability that capital X is
bigger than little x.
On the other hand, capital X is T minus t, so we
write it this way--
T minus t is bigger than little x, and also that T is
bigger than little t, divided by the probability of the
conditioning event.
Just write this event in a cleaner form, capital T being
larger than little t plus x and being larger than little
t, again divided by the probability of the
conditioning event.
And now notice that capital T will be greater than little t
and also greater than little t plus x, that is, capital T is
larger than this number and this number, if and only if it
is larger than this second number here.
So in other words, the intersection of these two
events is just this event here, that capital T is larger
than little t plus x.
Now, we can use the formula for the probability that
capital T is larger than something.
We apply this formula, except that instead of little x, we
have t plus x.
And so here we have e to the minus lambda t plus x divided
by the probability that capital T is bigger than t.
So we use this formula, but with little t in the place of
little x, and we obtain e to the minus lambda t.
We have a cancellation, and we're left with e to the minus
lambda x, which is a final answer in this calculation.
What do we observe here?

[][The probability that the used light bulb will live for another x time units is exactly the same as the corresponding probability that the new light bulb will be alive for another x time units].  So new and used light bulbs are described by the same probabilities, and they're probabilistically identical, the same.  Differently said, the used light bulb does not remember, and it is not affected by how long it has been running.  And this is the memorylessness property of exponential random variables.

Let us now build some additional insights on
exponential random variables.
We have a formula for the density, the PDF.
And from this, we can calculate the probability that
T lies in a small interval.
For example, for a small delta, this probability here
is going to be approximately equal to the density of T
evaluated at 0 times delta, which is lambda times e to the
0, which is 1, times delta.
What if we are told that the light bulb has been alive for
t time units?
What is the probability that it burns out during the next
delta times units?
Since a used but still alive light bulb is
probabilistically identical to a new one, this conditional
probability is the same as this probability here that a
new light bulb burns out in the next delta times units.
And so this is also approximately
equal to lambda delta.
So we see that independently of how long a light bulb has
been alive, during the next delta time units it will have
a lambda delta probability of failing.
One way of thinking about this situation is that the time
interval is split into little intervals of length delta.
And as long as the light bulb is alive, if it is alive at
this point, it will have probability lambda delta of
burning out during the next interval of length delta.
This is like flipping a coin.
Once every delta time steps, there is a probability lambda
delta that there is a success in that coin flip, where
success corresponds to having the light bulb actually burn
down, and the exponential random variable corresponds to
the total time elapsed until the first success.
In this sense, the exponential random variable is a close
analog of the geometric random variable, which was the time
until the first success in a discrete time setting.
This analogy turns out to be the foundation behind the
Poisson process that we will be studying
later in this course.


# 6. Exercise: Memorylessness of the exponential

![](C:/Users/qp/Pictures/Screenshots/6. Exercise Memorylessness of the exponential - 1.png)
![](C:/Users/qp/Pictures/Screenshots/6. Exercise Memorylessness of the exponential - 2.png)
![](C:/Users/qp/Pictures/Screenshots/6. Exercise Memorylessness of the exponential - 3.png)
![](C:/Users/qp/Pictures/Screenshots/6. Exercise Memorylessness of the exponential - 4.png)
![](C:/Users/qp/Pictures/Screenshots/6. Exercise Memorylessness of the exponential - 5.png)
![](C:/Users/qp/Pictures/20221008_145235.jpg)
![](C:/Users/qp/Pictures/20221008_145323.jpg)
![](C:/Users/qp/Pictures/20221008_145342.jpg)


# 7. Total probability and expectation theorems

![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 1.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 2.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 3.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 4.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 5.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 6.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 7.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 8.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 9.png)
![](C:/Users/qp/Pictures/Screenshots/7. Total probability and expectation theorems - 10.png)
We now continue with the development of continuous
analogs of everything we know for the discrete case.
We have already seen a few versions of the total
probability theorem, one version for events and one
version for PMFs.
Let us now develop a continuous analog.
Suppose, as always, that we have a partition of the sample
space into a number of disjoint scenarios.
Three scenarios in this picture.
More generally, n scenarios in these formulas.
Let X be a continuous random variable and let us take B to
be the event that the random variable takes a value less
than or equal to some little x.
By the total probability theorem, this is the
probability of the first scenario times the conditional
probability of this event given that the first scenario
has materialized, and then we have similar terms for the
other scenarios.
Let us now turn this equation into CDF notation.
The left-hand side is what we have defined as the CDF of the
random variable x.
On the right-hand side, what we have is the probability of
the first scenario multiplied, again, by a CDF of the random
variable X. But it is a CDF that applies in a conditional
model where event A1 has occurred.
And so we use this notation to denote the conditional CDF,
the CDF that applies to the conditional universe.
And then we have similar terms for the other scenarios.
Now, we know that the derivative of a CDF is a PDF.
We also know that any general fact, such as this one that
applies to unconditional models will also apply without
change to a conditional model, because a conditional model is
just like any other ordinary probability model.
So let us now take derivatives of both
sides of this equation.
On the left-hand side, we have the derivative of a
CDF, which is a PDF.
And on the right-hand side, we have the probability of the
first scenario, and then the derivative of the conditional
CDF, which has to be the same as the conditional PDF.
So we use here the fact that derivatives of CDFs are PDFs,
and then we have similar terms under the different scenarios.
So we now have a relation between densities.
To interpret this relation, we think as follows.
The probability of falling inside the little interval
around x is determined by the probability of falling inside
that little interval under each one of the different
scenarios and where each scenario is weighted by the
corresponding probability.
Now, we multiply both sides of this equation by x, and then
integrate over all x's.
We do this on the left-hand side.
And similarly, on the right-hand side to obtain a
term of this form.
And we have similar terms corresponding
to the other scenarios.
What do we have here?
On the left-hand side, we have the expected value of x.
On the right-hand side, we have this probability
multiplied by the conditional expectation of X given that
scenario A1 has occurred.
And so we obtain a version of the total expectation theorem.
It's exactly the same formula as we had in the discrete
case, except that now X is a continuous random variable.
Let us now look at a simple example that involves a model
with different scenarios.
Bill wakes up in the morning and wants to go to the
supermarket.
There are two scenarios.
With probability one third, a first scenario occurs.
And under that scenario, Bill will go at a time that's
uniformly distributed between 0 and 2 hours from now.
So the conditional PDF of X, in this case, is uniform on
the interval from 0 to 2.
There's a second scenario that Bill will take long nap and
will go later in the day.
That scenario has a probability of 2/3.
And under that case, the conditional PDF of X is going
to be uniform on the range between 6 and 8.
By the total probability theorem for densities, the
density of X, of the random variable--
the time at which he goes to the supermarket--
consists of two pieces.
One piece is a uniform between 0 and 2.
This uniform ordinarily would have a height or 1/2.
On the other hand, it gets weighted by the corresponding
probability, which is 1/3.
So we obtain a piece here that has a height of 1/6.
Under the alternative scenario, the conditional
density is a uniform on the interval between 6 and 8.
This uniform has a height of 1/2 again, but it gets
multiplied by a factor of 2/3.
And this results in a height for this term that we have
here, which is 1/3.
And this is the form of the PDF of the time at which Bill
will go to the supermarket.
We can now finally use the total expectation theorem.
The conditional expectation under the two scenarios can be
found as follows.
Under one scenario, we have a uniform between 0 and 2.
And so the conditional expectation is 1, and it gets
weighted by the corresponding probability, which is 1/3.
Under the second scenario, which has probability 2/3, the
conditional expectation is the midpoint of this uniform,
which is 7.
And this gives us the expected value of the
time at which he goes.
So this is a simple example, but it illustrates nicely how
we can construct a model that involves a number
of different scenarios.
And by knowing the probability distribution under each one of
the scenarios, we can find the probability
distribution overall.
And we can also find the expected value for the overall
experiment.


# 8. Exercise: Total probability theorem II

![](C:/Users/qp/Pictures/Screenshots/8. Exercise Total probability theorem II - 1.png)
![](C:/Users/qp/Pictures/Screenshots/8. Exercise Total probability theorem II - 2.png)
![](C:/Users/qp/Pictures/20221008_155853.jpg)


# 9. Mixed random variables

![](C:/Users/qp/Pictures/Screenshots/9. Mixed random variables - 1.png)
![](C:/Users/qp/Pictures/Screenshots/9. Mixed random variables - 2.png)
![](C:/Users/qp/Pictures/Screenshots/9. Mixed random variables - 3.png)
![](C:/Users/qp/Pictures/Screenshots/9. Mixed random variables - 4.png)
![](C:/Users/qp/Pictures/Screenshots/9. Mixed random variables - 5.png)
![](C:/Users/qp/Pictures/Screenshots/9. Mixed random variables - 6.png)
![](C:/Users/qp/Pictures/Screenshots/9. Mixed random variables - 7.png)
We now look at an example similar to the previous one,
in which we have again two scenarios, but in which we
have both discrete and continuous
random variables involved.
You have $1 and the opportunity
to play in the lottery.
With probability 1/2, you do nothing and you're left with
the dollar that you started with.
With probability 1/2, you decide to play the lottery.
And in that case, you get back an amount of money which is
random and uniformly distributed
between zero and two.
Is the random variable, X, discrete?
The answer is no, because it takes values on
a continuous range.
Is the random variable, X, continuous?
The answer is no, because the probability that X takes the
value of exactly one is equal to 1/2.
Even though X takes values in a continuous range, this is
not enough to make it a continuous random variable.
We defined continuous random variables to be those that can
be described by a PDF.
And you have seen it in such a case, any individual point
should have zero probability.
But this is not the case here, and so X is not continuous.
We call X a mixed random variable.
More generally, we can have a situation where the random
variable X with some probability is the same as a
particular discrete random variable, and with some other
probability it is equal to some other
continuous random variable.
Such a random variable, X, does not have a PMF because it
is not discrete.
Also, it does not have a PDF because it is not continuous.
How do we describe such a random variable?
Well, we can describe it in terms of a cumulative
distribution function.
CDFs are always well defined for all
kinds of random variables.
We have two scenarios, and so we can use the Total
Probability Theorem and write that the CDF is equal to the
probability of the first scenario, which is p, times
the probability that the random variable Y is less than
or equal to x.
This is a conditional model under the first scenario.
And with some probability, we have the second scenario.
And under that scenario, X will take a value less than
little x, if and only if our random variable Z will take a
value less than little x.
Or in CDF notation, this is p times the CDF of the random
variable Y evaluated at this particular x plus another
weighted term involving the CDF of the random variable Z.
We can also define the expected value of X in a way
that is consistent with the Total Expectation Theorem,
namely define the expected value of X to be the
probability of the first scenario, in which case X is
discrete times the expected value of the associated
discrete random variable, plus the probability of the second
scenario, under which X is continuous, times the expected
value of the associated continuous random variable.
Going back to our original example, we have two
scenarios, the scenarios that we can call A1 and A2.
Under the first scenario, we have a uniform PDF, and the
corresponding CDF is as follows.
It's flat until zero, then it rises linearly.
And then it stays flat, and the value
here is equal to one.
So the slope here is 1/2.
So the slope is equal to the corresponding PDF.
Under the second scenario, we have a discrete, actually a
constant random variable.
And so the CDF is flat at zero until this value, and at that
value we have a jump equal to one.
We then use the Total Probability Theorem, which
tells us that the CDF of the mixed random variable will be
1/2 times the CDF under the first scenario plus 1/2 times
the CDF under the second scenario.
So we take 1/2 of this plot and 1/2 of that plot
and add them up.
What we get is a function that rises now at the slope of 1/4.
Then we have a jump, and the size of that to jump is going
to be equal to 1/2.
And then it continues at a slope of 1/4 until it reaches
this value.
And after that time, it remains flat.
So this is a simple illustration that for mixed
random variables it's not too hard to obtain the
corresponding CDF even though this random variable does not
have a PDF or a PMF of its own.


# 10. Exercise: A mixed random variable

![](C:/Users/qp/Pictures/Screenshots/10. Exercise A mixed random variable - 1.png)
![](C:/Users/qp/Pictures/Screenshots/10. Exercise A mixed random variable - 2.png)
![](C:/Users/qp/Pictures/20221008_182247.jpg)


# 11. Joint PDFs

![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 1.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 2.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 3.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 4.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 5.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 6.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 7.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 8.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 9.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 10.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 11.png)
![](C:/Users/qp/Pictures/Screenshots/11. Joint PDFs - 12.png)
In this segment, we start a discussion of multiple
continuous random variables.
Here are some objects that we're already familiar with.
But exactly as in the discrete case, if we are dealing with
two random variables, it is not enough to know their
individual PDFs.
We also need to model the relation between the two
random variables, and this is done through a joint PDF,
which is the continuous analog of the joint PMF.
We will use this notation to indicate joint PDFs where we
use f to indicate that we're dealing with a density.
So what remains to be done is to actually define this object
and see how we use it.
Let us start by recalling that joint PMFs were defined in
terms of the probability that the pair of random variables X
and Y take certain specific values little x and little y.
Regarding joint PDFs, we start by saying that it has to be
non-negative.
However, a more precise interpretation in terms of
probabilities has to wait a little bit.
Joint PDFs will be used to calculate probabilities.
And this will be done in analogy with
the discrete setting.
In the discrete setting, the probability that the pair of
random variables falls inside a certain set is just the sum
of the probabilities of all of the possible pairs inside that
particular set.

For the continuous case, we introduce an analogous formula.  We use the joint density instead of the joint PMF.  And instead of having a summation, we now integrate.  As in the discrete setting, we have one total unit of probability.  **The joint PDF tells us how this unit of probability is spread over the entire continuous two-dimensional plane**.  And we use it, we use the joint PDF, to calculate the probability of a certain set by finding the volume under the joint PDF that lies on top of that set.  This is what this integral really represents.  We integrate over a particular two-dimensional set, and we take this value that we integrate.  And we can think of this as the height of an object that's sitting on top of that set.  Now, this relation here, this calculation of probabilities, is not something that we are supposed to prove.  This is, rather, the definition of what a joint PDF does.  
A legitimate joint PDF is any function of two variables,
which is non-negative and which integrates to 1.
And we will say that two random variables are jointly
continuous if there is a legitimate joint PDF that can
be used to calculate the associated probabilities
through this particular formula.
So we have really an indirect definition.
Instead of defining the joint PDF as a probability, we
actually define it indirectly by saying what it does, how it
will be used to calculate probabilities.
A picture will be helpful here.
Here's a plot of a possible joint PDF.
These are the x and y-axes.
And the function being plotted is the joint PDF of these two
random variables.
This joint PDF is higher at some places and lower at
others, indicating that certain regions of the x,y
plane are more likely than others.
The joint PDF determines the probability of a set B by
integrating over that set B. Let's say it's this set.
Integrating the PDF over that set.
Pictorially, what this means is that we look at the volume
that sits on top of that set, but below the PDF, below the
joint PDF, and so we obtain some three-dimensional object
of this kind.
And this integral corresponds to actually finding this
volume here, the volume that sits on top of the set B but
which is below the joint PDF.
Let us now develop some additional understanding of
joint PDFs.
As we just discussed, for any given set B, we can integrate
the joint PDF over that set.
And this will give us the probability of
that particular set.
Of particular interest is the case where we're dealing with
a set which is a rectangle, in which case the situation is a
little simpler.
So suppose that we have a rectangle where the
x-coordinate ranges from A to B and the y-coordinate ranges
from some C to some D. Then, the double integral over this
particular rectangle can be written in a form where we
first integrate with respect to one of the variables that
ranges from A to B. And then, we integrate over all possible
values of y as they range from C to D.
Of particular interest is the special case where we're
dealing with a small rectangle such as this one.
A rectangle with sizes equal to some delta where delta is a
small number.
In that case, the double integral, which is the volume
on top of that rectangle, is simpler to evaluate.
It is equal to the value of the function that we're
integrating at some point in the rectangle --- let's take
that corner ---
times the area of that little rectangle, which is equal to
delta square.
So we have an interpretation of the joint PDF in terms of
probabilities of small rectangles.
Joint PDFs are not probabilities.
But rather, they are probability densities.
They tell us the probability per unit area.
And one more important comment.
For the case of a single continuous random variable, we
know that any single point has 0 probability.
This is again, true for the case of two jointly continuous
random variables.
But more is true.
If you take a set B that has 0 area.
For example, a certain curve.
Suppose that this curve is the entire set B. Then, the volume
under the joint PDF that's sitting on top of that curve
is going to be equal to 0.
So 0 area sets have 0 probability.
And this is one of the characteristic features of
jointly continuous random variables.
Now, let's think of a particular situation.
Suppose that X is a continuous random variable, and let Y be
another random variable, which is identically equal to X.
Since X is a continuous random variable, Y is also a
continuous random variable.
However, in this situation, we are certain that the outcome
of the experiment is going to fall on the line
where x equals y.
All the probability lies on top of a line, and
a line has 0 area.
So we have positive probability on the set of 0
area, which contradicts what we discussed before.
Well, this simply means that X and Y are not jointly
continuous.
Each one of them is continuous, but together
they're not jointly continuous.
Essentially, joint continuity is something more than
requiring each random variable to be continuous by itself.
For joint continuity, we want the probability to be really
spread over two dimensions.
Probability is not allowed to be concentrated on a
one-dimensional set.
On the other hand, in this example, the probability is
concentrated on a one-dimensional set.
And we do not have joint continuity.


# 12. Exercise: Jointly continuous r.v.'s

![](C:/Users/qp/Pictures/Screenshots/12. Exercise Jointly continuous r.v.'s - 1.png)
![](C:/Users/qp/Pictures/Screenshots/12. Exercise Jointly continuous r.v.'s - 2.png)


# 13. Exercise: From joint PDFs to probabilities

![](C:/Users/qp/Pictures/Screenshots/13. Exercise From joint PDFs to probabilities - 1.png)
![](C:/Users/qp/Pictures/Screenshots/13. Exercise From joint PDFs to probabilities - 2.png)
![](C:/Users/qp/Pictures/Screenshots/13. Exercise From joint PDFs to probabilities - 3.png)
![](C:/Users/qp/Pictures/Screenshots/13. Exercise From joint PDFs to probabilities - 4.png)


# 14. From the joint to the marginal

![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 1.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 1.5.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 2.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 3.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 4.png)
![to find the density of X, all we need to do is to differentiate the CDF of X](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 5.png)
![outer is take integration, if you then take derivative of that integration, you get itself](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 6.png)
![Think about this](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 7.png)
![Think about this](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 8.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 9.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 10.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 11.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 12.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 12.5.png)
![](C:/Users/qp/Pictures/Screenshots/14. From the joint to the marginal - 13.png)
In the discrete case, we saw that we could recover the PMF
of X and the PMF of Y from the joint PMF.
Indeed, the joint PMF is supposed to contain a complete
probabilistic description of the two random variables.
It is their probability law, and any quantity of interest
can be computed if we know the joint.
Things are similar in the continuous setting.
You can easily guess the formula through
the standard recipe.
Replace sums by integrals, and replace PMFs by PDFs.  But a proof of this formula is actually instructive.  So let us start by first finding the CDF of X.  **The CDF of X is, by definition, the probability that the random variable X takes a value less than or equal to a certain number little x**.  And this is the probability of a particular set that we can visualize on the two dimensional plane.  If here is the value of little x, then we're talking about the set of all pairs x, y, for which the x component is less than or equal to a certain number.  

[][So we need to integrate over this two-dimensional set the joint density.  So it will be a double integral of the joint density over this particular two-dimensional set].  Now, since we've used the symbol x here to mean something specific, let us use different symbols for the dummy variables that we will use in the integration.  And we need to integrate with respect to the two variables, let's say with respect to t and with respect to s.  The variable t can be anything.  So it ranges from minus infinity to infinity.  But the variable s, the first argument, ranges from minus infinity up to this point, which is x.  *Think of this double integral as an integral with respect to the variable s of this complicated function inside the brackets.  *

[][*Now, to find the density of X, all we need to do is to differentiate the CDF of X*]. And when we have an integral of this kind and we differentiate with respect to the upper limit of the integration, what we are left with is the integrand.  That is this expression here.  

[][******************************************************************************************************************]
What is integrand example?
Mathwords: Integrand. The function being integrated in either a definite or indefinite integral. Example: x2cos 3x is the integrand in ∫ x2cos 3x dx.
[][******************************************************************************************************************]

It is an integral with respect to the second variable.
And it's an integral over the entire space, from minus
infinity to plus infinity.
Here is an example.
The simplest kind of a joint PDF is a PDF of that is
constant on a certain set, S, and is 0 outside that set.
So the overall probability, one unit of probability, is
spread uniformly over that set.
Because the total volume under the joint PDF must be equal to
1, the height of the PDF must be equal to 1 over the area.
To calculate the probability of a certain set A, we want to
ask how much volume is sitting on top of that set.
And because in this case, the PDF is constant, we need to
take the height of the PDF times the relevant area.
What is the relevant area?
Well, actually, the PDF is 0 outside the set S. So the
relevant area is only this part here, which is the
intersection of the two sets, S and A.
So the total volume sitting on top of this little set is
going to be the base, the area of the base, which is the area
of A intersection S times the height of the
PDF at those places.
Now, the height of the PDF is 1 over the area of S. So this
is the formula for calculating the probability of a certain
set, A.
Let's now look at a specific example.
Suppose that we have a uniform PDF over this particular set,
S. This set has an area that is equal to 4.
It consists of four units rectangles arranged next to
each other.
So the height of the joint PDF in this example
is going to be 1/4.
It is one 1/4 on that set, but of course, it's going to be 0
outside that set.
We can now find the marginal PDF at some particular x.
So we can fix a particular value of x,
let's say this one.
To find the value of the marginal PDF, we need to
integrate over y along that particular line.
And the integral is going to have a contribution only on
that segment.
On that segment, the value of the joint PDF is 1/4.
And we're integrating over an interval that
has a length of one.
So the integral is going to be equal to 1/4.
But if x is somewhere around here, as we integrate over
that line, we integrate the value of 1/4, the value of the
PDF, over an interval that has a length equal to 3.
And so the result turns out to be 3/4.
There's a similar calculation for the marginal PDF of y.
For any particular value of little y, to find the marginal
PDF, we integrate along this line the joint PDF.
The joint PDF is 0 out here.
It's nonzero only on that interval.
And on that interval, it has a value of 1/4.
And the interval has a length of 1, so the integral is going
to end up equal to 1/4.
But if we were to take a line somewhere here, we integrate
the value of 1/4 over an interval of length 2.
And so the result would be 1/2.
So we have recovered from the joint PDF the marginal PDF of
X and also the marginal PDF of Y.


# 15. Exercise: Finding a marginal PDF

![](C:/Users/qp/Pictures/Screenshots/15. Exercise Finding a marginal PDF - 1.png)
![](C:/Users/qp/Pictures/Screenshots/15. Exercise Finding a marginal PDF - 2.png)
![](C:/Users/qp/Pictures/20221008_222135.jpg)


# 16. Continuous analogs of various properties

![](C:/Users/qp/Pictures/Screenshots/16. Continuous analogs of various properties - 1.png)
![](C:/Users/qp/Pictures/Screenshots/16. Continuous analogs of various properties - 2.png)
![](C:/Users/qp/Pictures/Screenshots/16. Continuous analogs of various properties - 3.png)
![No 5](C:/Users/qp/Pictures/Screenshots/16. Continuous analogs of various properties - 4.png)
![](C:/Users/qp/Pictures/Screenshots/16. Continuous analogs of various properties - 6.png)
In this segment we will go very fast through a few
definitions and facts that remain true in
the continuous case.
Everything is completely analogous to
the discrete case.
And there are absolutely no surprises here.
So, for example, we have defined joint PMFs for the
case of more than two discrete random variables.
And we have a bunch of facts about them.
In a similar manor, we can define joint PDFs for more
than two random variables.
And if you have understood the material so far, you can guess how such a joint PDF will be used.  **For example, you can calculate the probability of a three dimensional set by integrating the joint PDF over that three dimensional set**.  And there are analogs off all of the other formulas that we
have here where we follow the usual recipe.
Sums become integrals, and PMFs are replaced by PDFs.
Finally, when you deal with a random variable, which is defined as a function of jointly continuous random variables, we can use an expected value rule that takes
the same form as in the discrete case.
And using the expected value rule, we can establish, once
more, the usual linearity properties of expectations.
So absolutely no surprises here.
The derivations are either completely straightforward.
Or they follow exactly the same line of argument as in
the discrete case, with just minor changes in notation.


# 17. Exercise: From joint PDFs to the marginals

![](C:/Users/qp/Pictures/Screenshots/17. Exercise From joint PDFs to the marginals - 1.png)
![](C:/Users/qp/Pictures/Screenshots/17. Exercise From joint PDFs to the marginals - 2.png)
![](C:/Users/qp/Pictures/Screenshots/17. Exercise From joint PDFs to the marginals - 3.png)
![](C:/Users/qp/Pictures/Screenshots/17. Exercise From joint PDFs to the marginals - 4.png)


# 18. Joint CDFs

![Remember the equation, taking derivative of CDF F_X(x) over derivative x](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 1.png)
![](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 2.png)
![](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 3.png)
![](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 4.png)
![taking derivative with respect of x and then with respect to y](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 5.png)
![](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 6.png)
![](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 7.png)
![Think here](C:/users/qp/Pictures/Screenshots/18. Joint CDFs - 8.png)
Besides PMFs and PDFs, we can also describe the distribution
of a random variable, as we know, using a CDF.
A CDF is always well-defined.
And for the case of a continuous random variable, [][the CDF can be found by integrating the PDF].
And conversely, **we can recover the PDF from the CDF by differentiating**.
There is something similar that happens for the case of
multiple random variables, as well.
We can define the joint CDF as the probability that X and Y,
the pair X-Y, takes values that are below certain
numbers, little x and little y.
So we are talking about the probability of the blue set in
this diagram.
This probability can be found by integrating the joint PDF
over the blue set.
[][And, since we're using x and y to be some specific numbers, let us use some different dummy variables to carry out the integration]*so here x and y is actually a value say s=x=12, t=y=13*.  What is the range of the integration?
The first variable, which is s in this integral, ranges from
minus infinity up to x.
And the other variable, which is the one that we're
integrating with respect to, in the outer integral--
the t variable--
ranges from minus infinity to y.
[][Now, let us see what happens if we start taking derivatives of this expression].  If we take the derivative of this expression with respect to y, what is left is the inner integral.  And if we take, now, a derivative with respect to x of this inner integral, we will be left with just the joint PDF.  
And it will be the joint PDF evaluated at the particular
limits of the integration.
So, it's going to be f sub xy at little x, little y.
So, we have this particular formula.
**By taking derivative with respect to x, and then with respect to y, or maybe in the opposite order.  It doesn't matter.  This particular derivative gives us back the PDF.  **
Let us look at an example.
Suppose that we have a uniform
distribution on the unit square.
So the PDF is equal to 1 on this green square.
And is equal to 0 otherwise.
So, in this example, if we take some x and y, so that the
xy pair falls inside the rectangle, the probability of
the blue set is going to be just the probability of that
little rectangle here.
Because everything outside has zero probability.
With a uniform joint PDF, which is equal to 1, the
probability is just the area of the set that we are
considering.
And since this set that we are considering is a rectangle
with [sides]
x and y, the joint CDF is equal to x times y.
Now, if we take the derivative of this expression with
respect to x, and then with respect to y, then we're left
just with a constant equal to 1--
which is as it should be, so that it integrates to one.
So, we have seen that CDFs also apply to the case of
multiple random variables, and that we can recover the joint
PDF from the joint CDF.


# 19. Exercise Joint CDFs - 1

![](C:/users/qp/Pictures/Screenshots/19. Exercise Joint CDFs - 1.png)
![](C:/users/qp/Pictures/Screenshots/19. Exercise Joint CDFs - 2.png)
![Thinking here](C:/users/qp/Pictures/Screenshots/19. Exercise Joint CDFs - 3.png)


## Course  /  Unit 5: Continuous random variables  /  Lec. 10: Conditioning on a random variable; Independence; Bayes' rule

# 1. Lecture 10 overview and slides

In this lecture, we introduce conditional PDFs, for describing the conditional distribution of a continuous random variable given another. We also introduce the concept of independence of continuous random variables and present some of the consequences of independence. We finally develop four variants of the Bayes rule, including variants that apply to the case where one random variable is discrete and another is continuous.

![](C:/Users/qp/Pictures/Screenshots/1. Lecture 10 overview and slides.png)
In this lecture we complete our discussion of multiple
continuous random variables.
In the first half, we talk about the conditional
distribution of one random variable, given
the value of another.
We will see that the mechanics are essentially the same as in
the discrete case.
Here, we will actually face some subtle issues, because we
will be conditioning on any event that has 0 probability.
Nevertheless, all formulas will still have the form that
one should expect.
And in particular, we will see natural versions of the total
probability and total expectation theorems.
We will also define independence of continuous
random variables, a concept that has the same intuitive
content as in the discrete case.
That is, when we have independent random variables,
the values of some of them do not cause any revision of our
beliefs about the remaining ones.
Then, in the second half of the lecture, we will focus on
the Bayes rule.
This will be the methodological foundation for
when, later in this course, we dive into
the subject of inference.
The Bayes rule allows us to revise our beliefs about a
random variable.
That is, replace an original probability distribution by a
conditional one, after we observe the value of some
other random variable.
Depending on whether the random variables involved are
discrete or continuous, we will get four different
versions of the Bayes rule.
They all have the same form, with small differences.
And we will see how to apply them through some examples.


# 2. Conditional PDFs

![no need 1](C:/Users/qp/Pictures/Screenshots/2. Conditional PDFs - 2.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditional PDFs - 3.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditional PDFs - 4.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditional PDFs - 5.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditional PDFs - 6.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditional PDFs - 7.png)
![](C:/Users/qp/Pictures/Screenshots/2. Conditional PDFs - 8.png)
By now, we have introduced all sorts of PMFs for
the discrete case.
The joint PMF, the conditional PMF--
given an event--
and the conditional PMF of one random variable given another.
And we're moving along with the program of defining
analogous concepts for the continuous case.
We have already discussed the joint PDF and the conditional
PDF, given an event.
The next item in our menu is to define a conditional PDF of
one random variable, given another random variable.
We proceed by first looking at the definition for the
discrete case.
A typical entry of the conditional PMF is just a
conditional probability, but in different notation.
And using the definition of conditional probabilities,
this is equal to the ratio of the joint divided by the
probability of the conditioning event.
Unfortunately, in the continuous case, a definition
of this form would be problematic, because the event
that Y takes on a specific value is an event that has 0
probability.
And we know that we cannot condition on a
0 probability event.
However, we can take this expression as a guide on how
to define a conditional PDF in the continuous case.
And this is the definition, which just mimics the formula
that we have up here.
Notice that this conditional PDF-- defined this way-- is
well defined, as long as the denominator
is a positive quantity.
Let us now try to make sense of this definition.
Let us first recall the interpretation of the
conditional PDF, given an event, A, that has positive
probability.
We know that the PDF is used to determine the probability
of a small interval.
And similarly, the conditional PDF is used to calculate the
conditional probability of a small interval given the
conditioning event.
We would like to do something similar for the conditional
PDF, where we would like to take the event A to be
something like the event that Y is equal to some particular
value-- little y.
But as we said, this is problematic, because this
event does not have positive probability.
So instead, we can take A to be the event that Y is
approximately equal to a certain value.
So we're dealing with a little interval around this value,
little y, which in general would be an event of positive
probability.
And we can try to have a similar interpretation.
Let us see how this works out.
So what does it mean that Y is approximately equal to some
particular value, little y?
We interpret that as follows.
We're told that the random variable, Y, takes a value
that is within epsilon--
where epsilon is a small number--
of a given value, little y.
And given this conditioning information, we want to
calculate the probability of a small interval.
How do we do that?
Well, here--
because this, in general, will be a
positive probability event--
we can use the definition of conditional probabilities.
And it would be equal to the probability of both events
happening, divided by the probability of the
conditioning event.
What is the probability of both events happening?
This is a probability of a small rectangle in xy space.
At that rectangle, the joint PDF, has a certain value.
And because we're integrating over that rectangle--
and that rectangle has dimensions delta and epsilon--
of that probability, that small rectangle, is
approximately equal to this.
Then we need the denominator, which is the probability of
the conditioning event.
And this is approximately equal to the density of Y
evaluated at that point, times the length
of the small interval.
We cancel the epsilons.
And then we notice that the ratio we have here is what we
defined as the conditional PDF.
So we get this relation times delta.
So what do we see?
We see that the probability of a small interval is equal to a
PDF times the length of the small interval.
However, because we are conditioning on Y being
approximately equal to a certain value, we end up using
a corresponding conditional PDF, where the conditional PDF
is defined this way.
So we now have an interpretation of the
conditional PDF in terms of
probabilities of small intervals.
Now that we have an intuitive interpretation of the
conditional PDF, we can also use it to calculate
conditional probabilities of more general
events, not just intervals.
And we do this as follows.
In general, for continuous random variables, we can find
the probability that X belongs to a certain set by
integrating a PDF over that set.
Because here we're dealing with a conditional situation
where we're given the value of Y, we use the conditional PDF
instead of the true PDF.
And this way, we calculate the conditional probability.
Now, the difficulty is that this conditional probability
is not a well-defined quantity according to what we did early
on in this class.
We cannot condition on zero probability events.
But we can get the around this difficulty as follows.
This quantity is well-defined.
And we can use this quantity as the definition of this
conditional probability.
And so we have managed to provide definition of
conditional probabilities, given a 0 probability event of
a certain type.
It turns out that this definition is sound and
consistent with everything else that we are doing.
But when we're dealing with particular problems and
applications, we can generally forget about all of these
subtleties that we have been discussing here.
The bottom line is that we will be
treating conditional PDFs--
given the value of a random variable, Y--
just as ordinary PDFs, but given the information that
this random variable took on a specific value.
And in that conditional universe, we will calculate
probabilities the usual way, by using conditional PDFs
instead of ordinary PDFs.


# 3. Exercise Conditional PDF - 1

![](C:/Users/qp/Pictures/Screenshots/3. Exercise Conditional PDF - 1.png)
![](C:/Users/qp/Pictures/Screenshots/3. Exercise Conditional PDF - 2.png)
![](C:/Users/qp/Pictures/Screenshots/3. Exercise Conditional PDF - 3.png)
![](C:/Users/qp/Pictures/Screenshots/3. Exercise Conditional PDF - 4.png)


# 4. Comments on conditional PDFs

![](C:/Users/qp/Pictures/Screenshots/4. Comments on conditional PDFs - 1.png)
![](C:/Users/qp/Pictures/Screenshots/4. Comments on conditional PDFs - 2.png)
![](C:/Users/qp/Pictures/Screenshots/4. Comments on conditional PDFs - 3.png)
![](C:/Users/qp/Pictures/Screenshots/4. Comments on conditional PDFs - 4.png)
![](C:/Users/qp/Pictures/Screenshots/4. Comments on conditional PDFs - 5.png)
![](C:/Users/qp/Pictures/Screenshots/4. Comments on conditional PDFs - 6.png)
The definition of the
conditional PDF is very simple.
It is just this formula, which is analogous to the one for
the discrete case.
In all respects--
mathematical and intuitive--
it is very similar to the conditional PMF.
Even so, developing a solid grasp of this concept does
take some further thinking, so we will now make some
observations that should be helpful in this respect.
The first and obvious observation is that the
conditional PDF is non-negative.
It's defined when the denominator is positive, the
numerator is a non-negative quantity, so it's always a
non-negative quantity.
A more interesting observation is that for any given value of
little y, the conditional PDF looks like a slice
of the joint PDF.
Indeed, if you fix the value of little y, then the
denominator in this definition is a constant, and we have a
function that varies with x the same way that the joint
PDF varies with x.
Pictorially, let us consider this particular joint PDF, and
let this be the x-axis and let that be the y-axis.
If we fix a certain value of y, if we condition on Y having
taken this particular value so that our universe is now this
particular line, on that universe the value of the
denominator in this definition is a constant, and the
conditional PDF is going to vary according to the height
of the joint on that
particular conditional universe.
So the height of the joint, if we trace it, is one of those
curves up here, and [then]
goes down.
So it is really a slice taken out of the joint PDF.
If we condition on a different y, we get a different slice of
the joint PDF, and so on.
Actually, the conditional is not exactly
the same as the slice.
We also have this term on the denominator that serves as a
scaling factor.
It turns out that this scaling factor is exactly what we need
for the conditional PDF, given a specific value of little y,
to integrate to 1.
Indeed, if we fix little y and take the integral over all
x's, using the definition, and because this term is a
constant and does not involve x, we only need to integrate
the numerator.
And we recognize that the numerator corresponds to our
earlier formula for the marginal distribution--
the marginal PDF of Y. From the joint, this is how we
recover the marginal PDF of Y.
So the numerator turns out to be the same as the
denominator, and so we get a ratio 1.
Therefore, the conditional PDF for a given value of the
random variable Y behaves in all respects
like an ordinary PDF.
It is non-negative and it integrates to 1.
A last observation is that we can take this definition and
move the denominator to the other side to obtain this
formula, which has the familiar form of the
multiplication rule.
The probability of two events happening is the probability
of the first times the probability of the second
given the first, except that here we're not really dealing
with probabilities, we're dealing with densities.
By symmetry, a similar formula must also be true when we
interchange the roles of X and Y. So, algebraically,
everything is similar to what we have seen for the case of
discrete random variables.
It's the same form of the multiplication rule, although
the interpretation is a bit different because densities
are not probabilities.


# 5. Exercise: Conditional PDFs

![](C:/Users/qp/Pictures/Screenshots/5. Exercise Conditional PDFs - 1.png)
![](C:/Users/qp/Pictures/Screenshots/5. Exercise Conditional PDFs - 2.png)
![](C:/Users/qp/Pictures/Screenshots/5. Exercise Conditional PDFs - 3.png)


# 6. Total probability and total expectation theorems

![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 1.png)
![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 2.png)
![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 3.png)
![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 4.png)
![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 5.png)
![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 6.png)
![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 7.png)
![](C:/Users/qp/Pictures/Screenshots/6. Total probability and total expectation theorems - 8.png)
Conditional PDFs share most of the properties
of conditional PMFs.
All facts for the discrete case have continuous analogs.
The intuition is more or less the same, although it is much
easier to grasp in the discrete case.
For example, we have seen this version of the total
probability theorem.
There is a continuous analog in which we
replace sums by integrals.
And we replace PMFs by PDFs.
The proof of this fact is actually pretty simple.
By the multiplication rule, the integrand, here, is just the joint PDF of X and Y.  [][And we know that if we take the joint PDF and integrate with respect to one variable then we recover the marginal PDF of the other random variable].  So this is one theorem that extends to the continuous case.  

Moving along, we have defined the **conditional expectation** in this manner in the discrete case.  And we define it similarly for the continuous case.  So actually here we now have a new definition.  This definition is also consistent with the definition of the expectation of a continuous random variable.
The expected value for continuous random variable is
the integral of X times [a] density.
Except that here we live in a conditional universe where
we're conditioning on this event.
And therefore, we need to use the
corresponding conditional PDF.
**Finally, we have the total expectation theorem** in the discrete case.  And there is the obvious analog in the continuous case where we are using an integral and a density.  The interpretation is that we consider all possibilities for Y.  [][**Under each possibility of Y we find the expected value of X**].  And then we weigh those different possibilities
according to the corresponding values of the density.
So we're taking a weighted average of these conditional
expectations to obtain the overall expectation of the
random variable X.
The derivation of this fact is maybe a little instructive
because it uses various facts that we have in our hands.
So let's see how to derive it.
We start from this expression in the right-hand side and we
will show that it is equal to the expected value of X. The
expression on the right-hand side is equal to the
following, it's the integral of the density of Y.
And then, inside here, we have the conditional expectation
which is defined this way.
So we just plug-in the definition.
And then what we do, is we take this term and move it
inside the integral.
Which we can do because this integral is with respect to x.
And therefore, this is like a constant.
And we also interchange the order of integration.
Now, the inner integration is with respect to y.
As far as Y is concerned, this term, x, is a constant.
So we can take it and move it outside this first integral
and place it out there.
So this term disappears and goes out there.
What do we have here?
This part, by the previous fact, the total probability
theorem, is just the density of X. So we're left with the
integral of x times the density of x dx.
And this is the expected value of X.
Finally, we have various forms of the expected value rule,
which barely deserve writing down.
Because they're exactly what you might expect.
Consider, for example, an expression such as this one,
the expected value of a function of the random
variable X but conditioned on the value of the random
variable Y. How do we calculate this quantity?
Well, the expected value rule tells us that we should
integrate g of x times the density of X. But because,
here we live in a conditional universe, we should actually
use the corresponding conditional PDF of X. And
there are many other versions of the expected value rule.
Any version that we have seen for the discrete case has,
also, a continuous analog which looks about the same
except that we integrate and that we use densities.


# 7. Exercise: Expected value rule and total expectation theorem

![](C:/Users/qp/Pictures/Screenshots/7. Exercise Expected value rule and total expectation theorem - 1.png)
![](C:/Users/qp/Pictures/Screenshots/7. Exercise Expected value rule and total expectation theorem - 2.png)
![](C:/Users/qp/Pictures/Screenshots/7. Exercise Expected value rule and total expectation theorem - 3.png)
![](C:/Users/qp/Pictures/Screenshots/7. Exercise Expected value rule and total expectation theorem - 4.png)
![](C:/Users/qp/Pictures/Screenshots/7. Exercise Expected value rule and total expectation theorem - 5.png)
![](C:/Users/qp/Pictures/Screenshots/7. Exercise Expected value rule and total expectation theorem - 6.png)
[][*************************************************************************************************************]


# 8. Independence

![](C:/Users/qp/Pictures/Screenshots/8. Independence - 1.png)
![](C:/Users/qp/Pictures/Screenshots/8. Independence - 2.png)
![](C:/Users/qp/Pictures/Screenshots/8. Independence - 3.png)
![](C:/Users/qp/Pictures/Screenshots/8. Independence - 4.png)
![](C:/Users/qp/Pictures/Screenshots/8. Independence - 5.png)
Independence is one of the central concepts of
probability theory, because it allows us to build large
models from simpler ones.
How should we define independence in
the continuous case?
Our guide comes from the discrete definition.
By analogy with the discrete case, we will say that two
jointly continuous random variables are independent if
the joint PDF is equal to the product of the marginal PDFs.
We can now compare with the multiplication rule, which is
always true as long as the density of Y is positive.
So this is always true.
In the case of independence, this is true.
So in the case of independence, we must have
that this term is equal to that term, at least whenever
this quantity--
the marginal of Y--
is positive.
So to restate it, independence is equivalent to having the
conditional, given Y, be the same as the unconditional PDF
of X. And this has to be true whenever Y has a positive
density so that this quantity is well defined, and it also
has to be true for all xs.
Now, what does this really mean?
The conditional PDF, as we have discussed, in terms of
pictures, is a slice of the joint PDF.
Therefore, independence is the same as requiring that all of
the slices of the joint have the same shape, and it is the
shape of the marginal PDF.
For a more intuitive interpretation, no matter what
value of Y you observe, the distribution
of X does not change.
In this sense, Y does not convey any information about
X. Notice also that this definition is symmetric as far
as X and Y are concerned.
So by symmetry, when we have independence, it also means
that X does not convey any information about Y, and that
the conditional density of Y, given X, has to be the same as
the unconditional density of Y.
We can also define independence of multiple
random variables.
The definition is the obvious one.
The joint PDF of all the random variables involved must
be equal to the product of the marginal PDFs.
Intuitively, what that means is that knowing the values of
some of the random variables does not affect our beliefs
about the remaining random variables.
Finally, let us note some consequences of independence,
which are identical to the corresponding properties that
we had in the discrete case, and the proofs are also
exactly the same.
So the expectation of the product of independent random
variables is the product of the expectations, the variance
of the sum of independent random variables is the sum of
the variances, and functions of independent random
variables are also independent, which, in
particular, implies, using the previous rule, that the
expected value of a product of this kind is going to be the
product of these expectations.
So independence of continuous random variables is pretty
much the same as independence of discrete random variables
as far as mathematics are concerned, and the intuitive
content of the independence assumption is the same as in
the discrete case.
One random variable does not provide any
information about the other.


# 9. Exercise: Definition of independence

![](C:/Users/qp/Pictures/Screenshots/9. Exercise Definition of independence - 1.png)
![](C:/Users/qp/Pictures/Screenshots/9. Exercise Definition of independence - 2.png)
![](C:/Users/qp/Pictures/Screenshots/9. Exercise Definition of independence - 3.png)


# 10. Exercise: Independence and expectations II

![](C:/Users/qp/Pictures/Screenshots/10. Exercise Independence and expectations II - 1.png)
![](C:/Users/qp/Pictures/Screenshots/10. Exercise Independence and expectations II - 2.png)
![](C:/Users/qp/Pictures/Screenshots/10. Exercise Independence and expectations II - 3.png)


# 11. Exercise: Independence and CDFs

![](C:/Users/qp/Pictures/Screenshots/11. Exercise Independence and CDFs - 1.png)
![](C:/Users/qp/Pictures/Screenshots/11. Exercise Independence and CDFs - 2.png)
![Math notation](C:/Users/qp/Pictures/Screenshots/11. Exercise Independence and CDFs - 3.png)
![](C:/Users/qp/Pictures/20221011_224001.jpg)
![](C:/Users/qp/Pictures/20221011_224605.jpg)


# 12. Stick-breaking example

![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 1.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 2.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 3.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 4.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 5.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 6.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 7.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 8.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 9.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 10.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 11.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 12.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 13.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 14.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 15.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 16.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 17.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 18.png)
![](C:/Users/qp/Pictures/Screenshots/12. Stick-breaking example - 19.png)
We will now go through an example that brings together
all of the concepts that we have introduced.
We have a stick of length l.
And we break that stick at some random location, which
corresponds to a random variable, X.
And we assume that this random variable is uniform over the
length of the stick.
So its PDF has this particular shape.
And for the PDF to integrate to 1, the height of this PDF
must be equal to 1 over l.
Then we take the piece of the stick that we are left with,
which has length X, and we break it at a random location,
which we call Y. And we assume that this location Y is
uniformly distributed over the length of the stick that we
were left with.
What does this assumption mean?
It means that if the first break was at some particular
value, x, then the random variable Y has a conditional
distribution, which is uniform over the interval from 0 to x.
So the conditional PDF is uniform.
A conditional PDF, like any other PDF, must
integrate to 1.
So the height of this conditional PDF is
equal to 1 over x.
Are X and Y independent?
No.
One way to see it is that if you change little x, the
conditional PDF of Y would have been something different.
Whereas if we have independence, all the
conditional PDFs have to be the same when you change the
value of little x.
Another way to see it is that if I tell you that x is 0.5,
this gives you lots of information about Y. It tells
you that Y has to be less than or equal to 0.5.
So the value of the random variable X gives you plenty of
information about the other random variable.
And so we do not have independence.
Notice that in this example, instead of starting with a
full description of the random variables in terms of a joint
PDF, we use a marginal PDF and then a conditional PDF to
construct our model.
Of course, with these two pieces of information, we can
reconstruct the joint PDF using the multiplication rule.
The marginal is 1 over l.
The conditional is 1 over x.
So the joint is equal to 1 over lx.
But for which values of x and y is this the correct
expression?
It is correct only for those values that are possible.
So 0 has to be less than y, less than x, less than l.
This is the range of values that are possible in this
particular experiment.
And we can visualize those values.
They are those that correspond to this shaded triangle here.
x and y are less than or equal to l.
And y has to be less than or equal to x.
If you try to visualize the joint PDF, notice that since
it only depends on x not on y, if you fix a value of x and
you look at the slice of the joint PDF, the value of the
joint PDF is going to be a constant on that slice.
On this slice, it's going to be another constant, actually
a bigger one.
On that slice, an even bigger constant.
And actually, this constant is bigger and bigger and goes to
infinity as we approach 0.
Of course, the fact that the slice is constant is just a
reflection of the fact that the conditional PDF is
constant over the range of values that the random
variable can take.
Let us now continue with some calculations.
Let us find the marginal PDF of Y. How do we do it?
Since we have in our hands the joint PDF, we can find the
marginal by integrating the joint.
And in our case, the joint is equal to 1 over lx.
And we integrate over all x's.
Now, what is the range of the integration?
If we fix a certain value of y, the joint PDF is actually 0
in this region and in that region.
So we should only integrate over x's that correspond to
this interval.
What is that interval?
It's the interval that ends at l.
And because this is a line of slope 1, this value
here is also y.
So we integrate over an interval where x
ranges from y to l.
In fact, this is just the range of x's that are possible
for a given value of y.
x must always be larger than or equal to y.
Now, the integral of 1 over x is a logarithm.
And using this fact, we can evaluate this integral.
And it's 1 over l times the logarithm of l over y.
For what y's is this a correct expression?
Well, it makes sense only for those y's that are possible in
this experiment.
And that's the range from 0 to l.
When y is equal to l, we have the logarithm of 1, which is
equal to 0.
So the value of the PDF is 0 here.
As y decreases, this ratio
increases and goes to infinity.
So the log of that also blows up to infinity.
And we get a shape of this form, where the function that
we're dealing with goes to infinity as we approach 0.
Is this a problem having a PDF that blows up to infinity?
Not really.
As long as the area under this PDF is equal to 1, it's still
a legitimate PDF.
And blowing up to infinity is not an issue.
Let us now calculate the expected value of Y. One way
of doing this is by using the definition of the expectation.
It's the integral of y times the density of y, which is 1
over l times the log of l over y.
And the range of integration has to be those values for
which we have a non-zero density.
So we integrate from 0 to l, which are the possible values
of the random variable Y. This is an integral
that's pretty messy.
One can actually integrate it using integration by parts.
But the calculation is a bit tedious.
So let us look for an alternative
and more clever approach.
The idea is to divide and conquer.
We're going to use the total expectation theorem, where
we're going to condition on X. The total expectation theorem
tells us that the expected value of Y is the integral
over all possible values of the random variable X, which
is from 0 to l.
The density of X, which is 1 over l, times the conditional
expectation of Y given that X is equal to some little x.
And we integrate over all x's.
Why is this simpler?
When we condition on X taking a specific value, Y has a
uniform distribution between 0 and x.
And therefore, this conditional expectation is the
expectation of a uniform, which is 1/2 the
range of that uniform.
So we obtain the integral from 0 to l.
1 over l times x over 2, dx.
And finally, that's an integral that we
can evaluate easily.
Or we can think even in a simpler way.
This expression here is the density of x.
This is x itself.
So the integral of this times x gives us the expected value
of X. And there's only a factor of 1/2
that's left out there.
So we obtain 1/2 the expected value of X. But now, X itself
is uniform on an interval that has length l.
And therefore, the expected value of x is l over 2.
And so we get the final answer, which is 1/2 times l
over 2, which is l over four.
This answer makes intuitive sense.
If we break a stick once, the expected value or what we're
left with is half of what we started with.
But if we break it once more, then we expect it on the
average to be cut by a factor again of 1/2.
And so we expect to be left with a stick that has length
1/4 of what we started with.
So this example is a particularly nice one, because
we used all of the concepts that we have introduced--
marginal PDFs, joint PDFs, conditional PDFs, and the
relations between them, as well as expectations,
calculations of expectations, and conditional expectations,
as well as the total probability theorem.


# 13. Exercise: Stick-breaking

![](C:/Users/qp/Pictures/Screenshots/13. Exercise Stick-breaking - 1.png)
![](C:/Users/qp/Pictures/Screenshots/13. Exercise Stick-breaking - 2.png)
![](C:/Users/qp/Pictures/Screenshots/13. Exercise Stick-breaking - 3.png)
![](C:/Users/qp/Pictures/Screenshots/13. Exercise Stick-breaking - 4.png)


# 14. Independent normals

![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 1.png)
![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 2.png)
![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 3.png)
![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 4.png)
![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 5.png)
![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 6.png)
![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 7.png)
![](C:/Users/qp/Pictures/Screenshots/14. Independent normals - 8.png)
Just in order to get some more familiarity with joint PDFs,
let us look at independent normals.
Actually, this is an important example because noise is often
modeled by normal random variables, and noise terms
that show up at different parts of a system, or at
different times, are often assumed to be independent.
Suppose that we have two standard normal random
variables, X and Y, with zero means and unit variances.
If their independent, their joint PDF is the product of
the marginal PDFs and takes this form.
This is just the PDF of a standard normal X and the PDF
of a standard normal Y and we multiply them.
If we are to plot this joint PDF we obtain this figure.
It looks like a bell which is centered at the origin--
at the point with coordinates zero, zero.
One way to think about what is going on here is to rewrite
this expression as 1 over 2pi, and then the exponential of
minus 1/2 x squared plus y squared.
If we look at the unit circle in xy space, which is the set
of points at which x squared plus y squared is equal to 1,
then, on that circle, the PDF takes a constant value because
this quantity is constant on that circle.
And the same is true for any other circle.
On any circle the PDF takes a constant value, of course, a
different constant.
So the circles centered at the origin are the so-called
contours of the joint PDF.
On each contour the joint PDF is a constant.
Let us now generalize.
Consider two independent normal random variables, but
with general means mu x and mu y, and variances sigma x
squared and sigma y squared.
The joint is, again, the product of the marginal PDFs
and, therefore, takes this form.
This looks intimidating but, in fact, it is pretty simple.
This part is just a normalizing constant.
It is the constant that's needed so that the joint PDF
integrates to 1.
What we have here is the negative exponential of a
quadratic function of x and y.
Let us plot the contours of this quadratic.
Remember that contour is the set of points where the
quadratic takes a constant value.
And by consequence, the joint PDF also
takes a constant value.
If you have set this quadratic to a constant, what you have
is the equation that describes an ellipse.
And it is an ellipse whose principal axes run along the x
and y directions, and those ellipses are all centered at
this particular point, mu x, mu y.
The joint PDF is largest when the exponent is equal to zero.
And this happens when x is equal to mu x, and y
is equal to mu y.
That is, right at the center of the ellipse.
That's where the joint PDF is largest.
As you move to ellipses that are further out on this outer
ellipse, this expression is a constant.
It's the exponential of the negative of
some positive numbers.
So you get a smaller value for the joint PDF.
If you move to a further ellipse further out, then
again, the joint PDF will be a constant, but it's going to be
a smaller constant.
Now, for the case of standard normals, the joint PDF was
circularly symmetric.
The contours were actually circles, instead of ellipses.
But this is not the case in general.
For example, suppose that the variance of Y is bigger than
the variance of X. Then you get a shape as the one shown
in this figure.
Since the variance of Y is larger, we expect Y to take
values over a bigger range, and to be larger typically
than the values of X. And so the bell shape that we have
for the joint PDF is stretched in the y direction.
It extends further out in the y direction than it does in
the x direction.
To conclude, the joint PDF of two independent normals has
the shape of a bell.
The center of the bell is determined by the means.
Furthermore, the bell is stretched in the x and y
directions by an amount that is determined by the variances
of x and y.
However, the stretching is always along
the coordinate axes.
If you wanted a bell that stretches in some diagonal
direction, or if you have contours that are ellipses but
with some different kinds of axes, then you will have
dependence between the two random variables.
In that case, we will be dealing with a so-called
bivariate normal distribution, but we will not pursue this
any further at this point.


# 15. Exercise: Independent normals

![](C:/Users/qp/Pictures/Screenshots/15. Exercise Independent normals - 1.png)
![](C:/Users/qp/Pictures/Screenshots/15. Exercise Independent normals - 2.png)
![](C:/Users/qp/Pictures/Screenshots/15. Exercise Independent normals - 3.png)
![](C:/Users/qp/Pictures/20221013_193811.jpg)
![](C:/Users/qp/Pictures/Screenshots/15. Exercise Independent normals - 4.png)


# 16. Bayes rule variations

![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 1.png)
![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 2.png)
![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 3.png)
![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 4.png)
![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 5.png)
![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 6.png)
![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 7.png)
![](C:/Users/qp/Pictures/Screenshots/16. Bayes rule variations - 8.png)
If you remember our discussion from a long time ago, we said
that much of this class consists of variations of a
few basic skills and ideas, one of which is the Bayes
rule, the foundation of inference.
So let's look here at the Bayes rule again and its
different incarnations.
In a discrete setting we have a random variable with a known
PMF but whose values are not observed.
Instead we observe the value of another random variable,
call it Y, which has some relation with X.
And we will use the value of Y to make some inferences about
X. The relation between the two random variables is
captured by specifying the conditional PMF of Y given any
value of X. Think of X as an unknown state of the world and
of Y as a noisy observation of X. The conditional PMF tells
us the distribution of Y under each possible
state of the world.
Once we observe the value of Y we obtain some information
about X. And we use this information to make inferences
about the likely values of X. Mathematically, instead of
relying on the prior for X, we form some revised beliefs.
That is, we form the conditional [PMF]
of X given the particular observation that we have seen.
All this becomes possible because of the Bayes rule.
We have seen the Bayes rule for events.
But it is easy to translate into PMF notation.
We take the multiplication rule.
And we use it twice in different orders to get two
different forms--
or two different expressions--
for the joint PMF.
We then take one of the terms involved here and send it to
the other side.
We obtain this expression, which is the Bayes rule.
What [do] we have here?
We want to calculate the conditional distribution of X
which we typically call the posterior.
And to do this we rely on the prior of X as well as on the
model that we have for the observations.
The denominator requires us to compute the marginal of Y. But
this is something that is easily done because we have
the joint available.
The numerator, this expression here, is just the joint PMF.
And using the joint PMF you can always find
the marginal PMF.
Essentially, we're using here the total probability theorem.
And we're using the pieces of information that were given to
us, the prior and the model of the observations.
When we're dealing with continuous random variables
the story is identical.
We still have two versions of the multiplication rule.
By sending one term--
this term--
to the other side of the equation we
get the Bayes rule.
And then we use the total probability theorem to
calculate the denominator term.
So as far as mathematics go, the story is pretty simple.
It is exactly the same in the discrete and
the continuous case.
This story will be our stepping stone for dealing
with more complex models and also when we go into more
detail on the subject of inference
later in this course.


# 17. Exercise: The discrete Bayes rule

![](C:/Users/qp/Pictures/Screenshots/17. Exercise The discrete Bayes rule - 1.png)
![](C:/Users/qp/Pictures/Screenshots/17. Exercise The discrete Bayes rule - 2.png)
![](C:/Users/qp/Pictures/Screenshots/17. Exercise The discrete Bayes rule - 3.png)
![](C:/Users/qp/Pictures/20221013_204650.jpg)
![](C:/Users/qp/Pictures/20221013_204730.jpg)
![](C:/Users/qp/Pictures/20221013_204800.jpg)


# 18. Mixed Bayes rule

![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 1.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 2.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 3.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 4.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 4.5.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 5.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 6.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 7.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 8.png)
![](C:/Users/qp/Pictures/Screenshots/18. Mixed Bayes rule - 9.png)
We have seen two versions of the Bayes rule--
one involving two discrete random variables, and another
that involves two continuous random variables.
But there are many situations in real life when one has to
deal simultaneously with discrete and
continuous random variables.
For example, you may want to recover a discrete digital
signal that was sent to you, but the signal has been
corrupted by continuous noise so that your observation is a
continuous random variable.
So suppose that we have a discrete random variable K,
and another continuous random variable, Y. In order to get a
variant of the Bayes rule that applies to this situation, we
will proceed as in the more standard cases.
We will use the multiplication rule twice to get two
alternative expressions for the probability
of two events happening.
We will equate those expressions, and from these,
derive a version of the Bayes rule.
So we will look at the probability that the discrete
random variable takes on a certain numerical value, and,
simultaneously, the continuous random variable takes a value
inside a certain small interval.
So here, delta is a positive number, which we will take to
be very small.
And in fact, we will be interested in the limiting
case as delta goes to 0.
So now we use the multiplication rule.
The probability of two events is equal to the probability of
the first event times the conditional probability of the
second event given that the first event has occurred.
But we know that we can use the multiplication rule in any
order, so the probability of two events happening can also
be written as the probability that the second event occurs
times the conditional probability that the first
event occurs, given that the second event has occurred.
So these two expressions that we obtain from the
multiplication rule have to be equal.
Let us rewrite those expressions using PMF notation
and PDF notation.
What do we have here?
The probability that a discrete random variable takes
on a certain value--
that's just the PMF of this random variable evaluated at a
particular point.
And what do we have here?
The probability that the random variable, Y, a
continuous random variable, takes values inside an
interval is always equal to the PDF of that random
variable times the length of this interval.
And this is an approximate equality.
However, because here we're talking about the probability
of being in a small interval conditioned on a certain
event, we should be using a conditional PDF.
It's the conditional PDF conditioned on the random
variable, capital K, and conditioned on the specific
event that this discrete random variable takes on a
certain value, little k.
Let us do a similar notation change for the second
expression.
Here we have the probability--
the unconditional probability--
that Y takes a value inside a small interval, and when delta
is small, this is approximately equal to the PDF
of the random variable Y times the length of the interval.
And what do we have here?
The probability that a discrete random variable takes
on a certain value, that just corresponds to the PMF of that
the random variable.
However, we're talking about a conditional probability given
that a random variable Y takes a value that's approximately
equal to a certain little y.
So this is a notation that we have not used before, but its
meaning should be unambiguous at this point.
But just by arguing by analogy to what we have been doing all
along, it's a PMF of a discrete random variable.
But it is a conditional PMF.
It describes to us the probability distribution of
the discrete random variable K when the random variable Y,
which happens to be a continuous one, takes on a
specific value.
So we can cancel the deltas from both sides, and we have
that this expression is approximately equal to that
expression, and this approximate equality is more
and more exact as we send delta to 0.
But delta has already disappeared from here, so we
can set these two expressions equal to each other.
At this point, now, we can take this term and move it to
the other side of the equality so it will go to the
denominator.
And we obtain this version of the Bayes rule.
It gives us the conditional probability of a random
variable K given that a certain continuous random
variable Y has taken on a specific value.
So this version is useful if we have a continuous noisy
observation, Y, on the basis of which we're trying to say
something, to make inferences about the discrete random
variable K. And in order to apply the Bayes rule, we need
to know the unconditional distribution of the random
variable K, and we also need to have a model of the noisy
observation--
a model of that observation under each possible
conditional universe.
So for any possibility for the random variable K, we need to
know the distribution of the random variable Y.
Or, alternatively, we can take this term and send it to the
denominator of the other side, and we get a different version
of the Bayes rule.
This version of the Bayes rule applies if we're trying to
make an inference about a continuous random variable Y,
given that we know the value of a certain related
observation, K, of a random variable, capital K.
In both versions of the Bayes rule, there's also a
denominator term which needs to be evaluated.
This term gets evaluated similar to the cases that we
have considered earlier, and they are determined by using a
suitable version of the total probability theorem.
This is a version of the total probability theory that we
have already seen.
We have a conditional density of Y under different scenarios
for the random variable capital K, and we get the
density of Y by considering the conditional densities and
weighing them according to the probabilities of the different
discrete scenarios.
This version of the total probability theorem is
something that we have not proved so far, and we
have not seen it.
On the other hand, it's not hard to derive.
If we fix the value of k, this is a density, and therefore it
must integrate to 1.
So the integral of this ratio, with respect to y, has to be
equal to 1.
Now, there's no y in the denominator, so the integral
of the numerator divided by the denominator has to be
equal to 1, which means that the denominator must be equal
to the integral of the numerator when we integrate
overall y's, and this is just what this
expression is saying.
So what we will do next will be to consider one example for
each one of these two cases of the Bayes rule that we have
just derived.


# 19. Detection of a binary signal

![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 1.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 2.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 3.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 4.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 5.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 6.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 7.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 8.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 9.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 10.png)
![](C:/Users/qp/Pictures/Screenshots/19. Detection of a binary signal - 11.png)
We will now use the Bayes rule in an important application
that involves a discrete unknown random variable and a
continuous measurement.
Our discrete unknown random variable will be one that
takes the values plus or minus 1 with equal probability.
And the measurement will be another random variable, Y,
which is equal to the discrete random variable, but corrupted
by additive noise that we denote by W. So what we get to
observe is the sum of K and W.
This is a common situation in digital communications.
We're trying to send one bit of information whether K is
plus 1 or minus 1, but the observation that we're making
is corrupted by a communication channel, by some
noise that is present in the channel, and on the basis of
the value of Y that we will observe, we will try to guess
what was sent.
The assumption that we will make about the noise is that
it is a standard normal random variable.
So suppose that we observed a specific value for the random
variable Y. We want to make a guess about the random
variable capital K. Of course, there's no way to guess with
complete certainty.
The only thing that we can say is to determine how likely it
is that a 1 was sent as opposed to how likely it is
that a minus 1 was sent.
How do we approach such a problem?
Well, we use the version of the Bayes rule that we have
already developed, which is this formula that gives us the
conditional probabilities that we want.
And in particular, here, we're asking a question about the
conditional probability that K takes the value of 1 given
that a value of y has been observed.
This is what we want to calculate.
So let us look at the various terms involved here and see
what each term is.
First, we need the prior probability
of K. This is simple.
The prior probabilities are 1/2 for k equal to minus 1 or
plus 1, because we said that the two possibilities are
equally likely.
Then we need the conditional density of Y given K. So what
does this assumption mean?
It means that Y is a standard normal random variable to
which we add the value of K. So if K is equal to 1, we're
taking a standard normal, and we add a value of plus 1.
So Y, given that K is equal to plus 1, is going to be a
standard normal plus 1.
What does that do?
If we take a standard normal and add a constant to it, that
changes the mean and makes the mean equal to 1, and does not
change the variance.
On the other hand, if K happens to be equal to minus
1, then the observation that we see is going to be a
standard normal plus a value of minus 1, and that changes
the mean to become minus 1, but with a variance of 1.
So if we are to plot the density of Y, that density, of
course, will depend on what the value of K was.
And if K is equal to 1, then we will obtain a normal that
has a mean of 1, so it's centered here.
But if K is equal to minus 1, then our observation will be a
normal with unit variance, but centered at minus 1.
So if we are to write this in terms of symbols, the
distribution of Y is normal with variance equal to 1.
So the PDF is given by this form, e to the minus 1/2 y
minus the mean of Y. But given the value of K, the mean of Y
is equal to k, plus or minus 1, depending on what k is.
So this is the PDF of a normal with unit variance and mean
equal to k.
And it corresponds, when you set k equal to 1, it
corresponds to this graph.
When you set K equal to minus 1, it
corresponds to that graph.
Let us continue with the next term in this expression.
We need the term in the denominator, which is obtained
by taking a sum over the different choices of k.
There are 2 choices, and each choice has a
probability of 1/2.
From the first choice, we have 1/2 times the density of Y
when k is equal to minus 1.
And when k is equal to minus 1, we obtain this expression.
And we have another term that corresponds to the case where
k is equal to plus one, in which case we have this
expression here.
Once more, this expression here corresponds to this
normal with a mean of minus 1.
This expression here corresponds to a normal with a
mean of plus 1, which is this graph here.
So at this point, we have in our hands expressions for
everything that is involved here, and we can just apply
the formula and carry out a fair amount of algebra.
There are some very nice simplifications that happen
along the way, and we end up with an answer that has the
following form.
It's 1 divided by 1 plus e to the minus 2 y.
And this gives us the probability that a 1 was sent.
Let us try to make sense of this expression.
Let's see what it looks like by plotting it as
a function of y.
So what we're plotting here is this expression.
OK, if y is very large, as y goes to plus infinity, this
term disappears, and we obtain a 1.
If, on the other hand, y is very, very negative--
so y goes to minus infinity--
here we get to e to the infinity, which is a very
large number.
So this ratio is going to converge to 0.
So we have a graph that starts at 0.
It actually rises monotonically, and in the
limit, converges to 1.
If y is equal to 0, then this term is 1,
and we obtain a 1/2.
Let us interpret this plot.
If y is very large, it is much more likely that y is coming
out of this distribution so that K is equal to 1.
So the probability that K is equal to 1, if we obtain this
observation, is almost 1.
We have almost certainty.
If, on the other hand, y is very, very negative, then it
is much more likely that what we're seeing is coming from
this distribution so that K is equal to minus 1.
And in that case, the probability that K was 1 is
going to be approximately 0.
Finally, if y is 0, then we're just in the middle of the two
possibilities, and by symmetry, either choice of K
is equally likely.
Therefore, the posterior probability that K is equal to
1, given that Y was equal to 0--
that probability is 1/2.
When Y is equal to 0, it's equally likely that either
signal was sent.
This example is a prototype of the kind of calculations that
are done in the analysis of communication systems.
This is the simplest model of communication of a single bit
in the presence of additive noise, but of course, there
can also be more complicated models in which we have more
complicated signals that are sent, and more complicated
models of the noise.
But the general principles of the analysis are
always of this kind.
We're using the Bayes rule, and we need to write down the
different terms that are involved.


# 20. Exercise: Discrete unknown, continuous measurement

![](C:/Users/qp/Pictures/Screenshots/20. Exercise Discrete unknown, continuous measurement - 1.png)
![](C:/Users/qp/Pictures/Screenshots/20. Exercise Discrete unknown, continuous measurement - 2.png)
![](C:/Users/qp/Pictures/Screenshots/20. Exercise Discrete unknown, continuous measurement - 3.png)
![](C:/Users/qp/Pictures/20221014_142053.jpg)
![](C:/Users/qp/Pictures/20221014_142115.jpg)
![](C:/Users/qp/Pictures/20221014_142136.jpg)


# 21. Inference of the bias of a coin

![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 1.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 2.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 3.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 4.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 5.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 6.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 7.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 8.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 9.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 10.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 11.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 12.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 13.png)
We now look at an application of the Bayes rule that's
involves a continuous unknown random variable, which we try
to estimate based on a discrete measurement.
Our model will be as follows.
We observe the discrete random variable K, which is
Bernoulli, so it can take two values, 1 or 0.
And it takes those values with probabilities y and 1 minus y,
respectively.
This is our model of K. The catch is that the value of y
is not known.
And it is modeled as a random variable by itself.
You can think of a situation where we are dealing with a
single coin flip.
We observe the outcome of the coin flip,
but the coin is biased.
The probability of heads is some unknown number, y.
And we try to infer or say something about the bias of
the coin on the basis of the observation that we have made.
So what do we assume about this y or
the bias of the coin?
If we know nothing about this random variable, we might as
well model it as a uniform random
variable on the unit interval.
And the question now is, given that we made one observation
and the outcome was 1, what can we say about the
probability distribution of Y given this particular
information?
So the question that we're asking is, what we can tell
about the density of Y given that the value
of 1 has been observed.
The way to approach this problem is by using a version
of the Bayes rule.
We want to calculate this quantity for the special case
where k is equal to 1.
So let us calculate the various pieces on the right
hand side of this equation.
The first piece is the density of Y. This is the prior
density before we obtain any measurement.
And since the random variable is uniform, this is equal to 1
for y in the unit interval.
And of course, it is 0 otherwise.
The next piece that we need is the distribution of K given
the value of Y. Well, given Y, K takes a value of 1, with
probability equal to Y--
so the probability of 1, if we're told the value of y is
just a y itself.
y is the bias of the coin that we're dealing with.
The next term that we need is the denominator.
We will use this formula.
It is the integral of the density of Y,
which is equal to 1.
And it is equal to 1 only on the range from 0 to 1, times
this probability that K takes a value, a certain value.
In this case, we're dealing with a value of 1, so here
we're going to put 1 instead of k.
And therefore, we're dealing with this expression here,
which is just y.
And we integrated over y's.
So this is y squared over 2, evaluated at 0 and 1, which
gives us 1/2.
So this is the unconditional probability that
K is equal to 1.
If we know nothing about Y, by symmetry, higher biases are
equally likely as lower biases.
So we should expect that it's equally likely to give us a 1
as it is to give us a 0.
Now, we have in our hands all the pieces that go into this
particular formula.
And we can go ahead with the final calculation.
So in the numerator, we have 1 times this term, evaluated at
k equal to 1, which is equal to y.
And then in the denominator, we have a term that
evaluates to 1/2.
So the final answer is 2y.
Over what range of y's is this correct?
Only for those y's that are possible.
So this is for y's in the unit interval.
If we are to plot this PDF, it has this shape.
This is a plot of the PDF of Y given that the random variable
K takes on a value of 1.
Initially, we started with a uniform for Y. So all values
of Y were equally likely.
But once we observed an outcome of 1, this tells us
that perhaps Y is on the higher end
rather than lower end.
So after we obtain our observation, the random
variable Y has this distribution, with higher
values being more likely than lower values.
This example is a prototype of situations where we want to
estimate a continuous random variable based on discrete
measurements.
Essentially it is the same as trying to estimate the bias of
a coin based on a single measurement of the
result of a coin flip.
As you can imagine, there are generalizations in which we
observe multiple coin flips.
And this is an example that we will see
later on in this class.


# 22. Exercise: Inference of the bias of a coin

![](C:/Users/qp/Pictures/Screenshots/22. Exercise Inference of the bias of a coin - 1.png)
![](C:/Users/qp/Pictures/Screenshots/22. Exercise Inference of the bias of a coin - 2.png)
![](C:/Users/qp/Pictures/Screenshots/21. Inference of the bias of a coin - 13.png)


## Course  /  Unit 5: Continuous random variables  /  Standard normal table

![](C:/Users/qp/Pictures/Screenshots/Unit 5 Continuous random variables - Standard Normal Table.png)


## Course  /  Unit 5: Continuous random variables  /  Solved problems

# 1. Calculating a CDF

![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 0.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 1.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 2.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 3.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 4.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 5.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 6.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 7.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 8.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 9.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 10.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 11.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 12.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 13.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 14.png)
![](C:/Users/qp/Pictures/Screenshots/1. Calculating a CDF - 15.png)
Hi.
In this problem, we'll get some practice working with
PDFs and also using PDFs to calculate CDFs.
So the PDF that we're given in this problem is here.
So we have a random variable, Z, which is a continuous
random variable.
And we're told that the PDF of this random variable, Z, is
given by gamma times 1 plus z squared in the range of z
between negative 2 and 1.
And outside of this range, it's 0.
All right, so first thing we need to do and the first part
of this problem is we need to figure out what gamma is
because it's not really a fully specified PDF yet.
We need to figure out exactly what the value gamma is.
And how do we do that?
Well, we've done analogous things before for
the discrete case.
So the tool that we use is that the PDF must
integrate to 1.
So in the discrete case, the analogy was that the PMF had
to sum to 1.
So what do we know?
We know that when you integrate this PDF from
negative infinity to infinity, fZ of z, it has to equal 1.
All right, so what do we do now?
Well, we know what the PDF is--
partially, except for gamma-- so let's plug that in.
And the first thing that we'll do is we'll simplify this
because we know that the PDF is actually only non-zero in
the range negative 2 to 1.
So instead of integrating from negative infinity to infinity,
we'll just integrate from negative 2 to 1.
And now let's plug in this gamma times 1
plus z squared dz.
And now the rest of the problem is just applying
calculus and integrating this.
So let's just go through that process.
So we get z plus 1/3 z cubed from minus 2 to 1.
And now we'll plug in the limits.
And we get gamma, and that's 1 plus 1/3 minus minus 2 plus
1/3 times minus 2 cubed.
And then if we add this all up, you get 4/3 plus 2 plus
8/3, which will give you 6.
So what we end up with in the end is that 1
is equal to 6 gamma.
So what does that tell us?
That tells us that, in this case, gamma is 1/6.
OK, so we've actually figured out what this PDF really is.
And let's just substitute that in.
So we know what gamma is.
So it's 1/6.
So from this PDF, we can calculate anything
that we want to.
This PDF, basically, fully specifies everything that we
need to know about this random variable, Z. And one of the
things that we can calculate from the PDF is the CDF.
So the next part of the problem asks us to
calculate the CDF.
So remember the CDF, we use capital F. And the definition
is that you integrate from negative infinity to this z.
And what do you integrate?
You integrate the PDF.
And all use some dummy variable, y, here in the
integration.
So what is it really doing?
It's basically just taking the PDF and taking everything to
the left of it.
So another way to think about this-- this is the probability
that the random variable is less than or equal
to some little z.
It's just accumulating probability as you go from
left to right.
So the hardest part about calculating the CDFs, really,
is actually just keeping track of the ranges, because unless
the PDF is really simple, you'll have cases where the
PDF could be 0 in some ranges and non-zero in other ranges.
And then what you really have to keep track of is where
those ranges are and where you actually have non-zero
probability.
So in this case, we actually break things down into three
different ranges because this PDF actually looks
something like this.
So it's non-zero between negative 2 and 1, and it's 0
everywhere else.
So then what that means is that our job is a little
simpler because everything to the left of negative 2, the
CDF will be 0 because there's no probability
density to the left.
And then everything to the right of 1, well we've
accumulated all the probability in the PDF because
we know that when you integrate from negative 2 to
1, you capture everything.
So anything to the right of 1, the CDF will be 1.
So the only hard part is calculating what the CDF is in
this intermediate range, between negative 2 and 1.
So let's do that case first--
so the case of z is between negative 2 and 1.
So what is the CDF in that case?
Well, the definition is to integrate from negative
infinity to z.
But we know that everything to the left of negative 2,
there's no probably density.
So we don't need to include that.
So we can actually change this lower limit to negative 2.
And the upper limit is wherever this z is.
So that becomes our integral.
And the inside is still the PDF.
So let's just plug that in.
We know that it's 1/6 1 plus--
we'll make this y squared--
by.
And now it's just calculus again.
And in fact, it's more or less the same integral, so what we
get is y plus 1/3 y cubed from negative 2 to z.
Notice the only thing that's different here is that we're
integrating from negative 2 to z instead of negative 2 to 1.
And when we calculate this out, what we get is z plus 1/3
z cubed minus minus 2 plus 1/3 minus 2 cubed, which gives us
1/6 z plus 1/3 z cubed plus plus 2 plus 8/3 gives us 14/3.
So that actually is our CDF between the range of
negative 2 to 1.
So for full completeness, let's actually write out the
entire CDF, because there's two other parts in the CDF.
So the first part is that it's 0 if z is less
than negative 2.
And it's 1 if z is greater than 1.
And in between, it's this thing that we've just
calculated.
So it's 1/6 z plus 1/3 z cubed plus 14/3 if z is between
minus 2 and 1.
So that is our final answer.
So the main point of this problem was to drill a little
bit more the concepts of PDFs and CDFs.
So for the PDF, the important thing to remember is that in
order to be a valid PDF, the PDF has to integrate to 1.
And you can use that fact to help you calculate any unknown
constants in the PDF.
And then to calculate the CDF, it's just integrating the PDF
from negative infinity to whatever point that you want
to cut off at.
And the tricky part, as I said earlier, was really just
keeping track of the ranges.
In this case, we've broke it down into three ranges.
If we had a slightly more complicated PDF, then you
would have to keep track of even more ranges.
All right, so I hope that was helpful, and we'll
see you next time.


# 2. Mixed distribution example

![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 0.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 1.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 2.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 3.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 4.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 5.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 6.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 7.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 8.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 9.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 10.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 11.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 12.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 13.png)
![](C:/Users/qp/Pictures/Screenshots/2. Mixed distribution example - 14.png)
![](C:/Users/qp/Pictures/20221015_162427.jpg)
![](C:/Users/qp/Pictures/20221015_162435.jpg)
![](C:/Users/qp/Pictures/20221015_162443.jpg)

In this video, we'll look at an example in which we compute
the expectation and cumulative density function of a mixed
random variable.
The problem is as follows.
Al arrives at some bus stand or taxi stand at a given
time-- let's say time t equals 0.
He finds a taxi waiting for him with probability 2/3 in
which he takes it.
Otherwise, he takes the next arriving taxi or bus.
The time that the next taxi arrives between 0 and 10
minutes, and it's uniformly distributed.
The next bus leaves exactly in 5 minutes.
So the question is, if X is Al's waiting time, what is the
CDF and expectation of X?
So one way to view this problem that's convenient is
the tree structure.
So I've drawn it for you here in which the events of
interest are B1, B2, and B3, B1 being Al catches the
waiting taxi, B2 being Al catches the next taxi, which
arrives between 0 and 5 minutes, and B3 being Al
catches the bus at the time t plus 5.
Notice that these three events are disjoint.
So Al catching the waiting taxi means he can't catch the
bus or the next arriving taxi.
And it also covers the entire set of outcomes.
So, in fact, B1, B2, and B3 are a partition.
So let's look at the relevant probabilities.
Whether or not B1 happens depends on whether or not the
taxi's waiting for Al.
So if the taxi is waiting for him, which happens with 2/3
probability, B1 happens.
Otherwise, with 1/3 probability, we see whether or
not a taxi is going to arrive between 0 and 5 minutes.
If it arrives, which is going to happen with what
probability?
Well, we know that the next taxi is going to arrive
between 0 and 10 minutes uniform.
It's a uniform distribution.
And so half the mass is going to be between 0 and 5.
And the other half is going to be between 5 and 10.
And so this is going to be 1/2 and 1/2.
And let's look at what X looks like.
If B1 happens, Al isn't waiting at all, so x is going
to be equal to 0.
If B3 happens, which is the other easy case, Al's going to
be waiting for 5 minutes exactly.
And if B2 happens, well, it's going to be some value
between 0 and 5.
We can actually draw the density, so let's see if we
can do that here.
The original next taxi was uniformly distributed
between 0 and 10.
But now, we're told two pieces of information.
We're told that B2 happens, which means that there's no
taxi waiting, and the next taxi arrives
between 0 and 5 minutes.
Well, the fact that there was no taxi waiting has no bearing
on that density.
But the fact that the next taxi arrives between 0 and 5
does make a difference, because the density then is
going to be definitely 0 in any region outside 0 and 5.
Now, the question is, how is it going to look
between 0 and 5?
Well, it's not going to look crazy.
It's not going to look like something different.
It's simply going to be a scale version of the original
density between 0 and 5.
You can verify this by looking at the actual formula for when
you condition events on a random variable.
Here, it's going to be 1/5 in order for this to
integrate out to 1.
And now we can jump right into figuring out the expectation.
Now, notice that X is actually a mixed random variable?
What does that mean?
Well, X either takes on values according to either a discrete
probability law or a continuous one.
So if B1 happens, for example, X is going to be exactly equal
to 0 with probability 1, which is a
discrete probability problem.
On the other hand, if B2 happens, then the value of X
depends on the density, which is going to be continuous.
So X is going to be a continuous
random variable here.
So how do you define an expectation in this case?
Well, you can do it so that it satisfies the total
expectation theorem, which means that the expectation of
X is the probability of B1 times the expectation given B1
plus the probability of B2 times the expectation given B2
plus the probability of B3 times the
expectation given B3.
So this will satisfy the total expectation theorem.
So the probability of B1 is going to be exactly 2/3.
It's simply the probability of a taxi waiting for Al.
The expected value of X-- well, when B1 happens, X is
going to be exactly equal to 0.
So the expected value is also going to be 0.
The probability of B2 happening is the probability
of a taxi not being there times the probability of a
taxi arriving between 0 and 5.
It's going to be 1/3 times 1/2.
And the expected value of X given B2 is going to be the
expected value of this density.
The expected value of this density is the midpoint
between 0 and 5.
And so it's going to be 5/2.
And the probability of B3 is going to be 1/3 times 1/2.
Finally, the expected value of X given B3.
Well, when B3 happens, X is going to be
exactly equal to 5.
So the expected value is also going to be 5.
Now we're left with 5/12 plus 5/6, which
is going to be 15/12.
And we can actually fill that in here so that we can clear
up the board to do the other part.
Now we want to compute the CDF of X. Well, what is the CDF?
Well, the CDF of X is going to be equal to the probability
that the random variable X is less than or equal
to some little x.
It's a constant [INAUDIBLE].
Before we jump right in, let's try to understand what's the
form of the CDF.
And let's consider some interesting cases.
You know that the random variable X, the waiting time,
is going to be somewhere between 0 and 5, right?
So let's consider what happens if little x is going to be
less than 0.
That's basically saying, what's the probability of the
random variable X being less than some number
that's less than 0?
Waiting time can't be negative, so the probablility
of this is going to be 0.
Now, what if X is between equaling 0 and
strictly less than 5?
In that case, either X can fall between 0 and 5 according
to this case, in the case of B2, or X can be
exactly equal to 0.
It's not clear.
So let's do that later.
Let's fill that in later.
What about if x is greater than or equal to 5?
Little x, right?
That's the probability that the random variable X is less
than some number that's bigger than or equal to 5.
The waiting time X, the random variable, is definitely going
to be less than or equal to 5, so the probability of this is
going to be 1.
So now this case.
How do we do it?
Well, let's try to use a similar kind of approach that
we did for the expected value and use the total probability
theorem in this case.
So let's try to review this.
First of all, let's assume that this is true, that little
x is between 0 and 5, including 0.
And let's use the total probability theorem, and use
the partitions B1, B2, and B3.
So what's the probability of B1?
It's the probability that Al catches waiting taxi, which
happens with probability 2/3.
What's the probability that the random variable X, which
is less than or equal to little x under this condition,
when B1 happens?
Well, if B1 happens, then random variable X is going to
be exactly equal to 0, right?
So in that case, it's definitely going to be less
than or equal to any value of x, including 0.
So the probability will be 1.
What's the probability that B2 happens now?
The probability that B2 happens is 1/3 times 1/2, as
we did before.
And the probability that the random variable X is less than
or equal to little x when B2 happens.
Well, if B2 happens, this is your density.
And this is our condition.
And so x is going to be somewhere in
between these spots.
And we'd like to compute what's the probably that
random variable X is less than or equal to little x.
So we want this area.
And that area is going to have height of 1/5 and width of x.
And so the area's going to be 1/5 times x.
And finally, the probability that B3 happens is going to be
1/3 times 1/2 again times the probability that the random
variable X is less than or equal to little x given B3.
Well, when B3 happens, X is going to be exactly 5 as a
random variable.
But little x, you know-- we're assuming in this condition--
is going to be between 0 and 5, but strictly less than 5.
So there's no way that if the random variable X is 5 and
this is strictly less than 5, this is going to be true.
And so that probability will be 0.
So we're now left with 2/3 plus 1/30.
And now we can fill this in.
2/3 plus 1/30 x.
And this is our CDF.
So now we've finished the problem, computed the expected
value here and then the CDF here, and this was a great
illustration of how you would do so for a
mixed random variable.


# 3. Mean and variance of the exponential

![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 1.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 2.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 3.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 4.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 5.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 6.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 7.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 8.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 9.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 10.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 11.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 12.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 13.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 14.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 15.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 16.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 17.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 18.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 19.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 20.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 21.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 22.png)
![](C:/Users/qp/Pictures/Screenshots/3. Mean and variance of the exponential - 23.png)
![](C:/Users/qp/Pictures/20221015_215248.jpg)
![](C:/Users/qp/Pictures/20221015_215301.jpg)
![](C:/Users/qp/Pictures/20221015_215316.jpg)
![](C:/users/qp/Pictures/20221015_215343.jpg)
![](C:/Users/qp/Pictures/20221015_215357.jpg)
![](C:/users/qp/Pictures/20221015_215408.jpg)
![](C:/users/qp/Pictures/20221015_215424.jpg)
![](C:/Users/qp/Pictures/20221015_215436.jpg)
![](C:/Users/qp/Pictures/20221015_215452.jpg)
![](C:/Users/qp/Pictures/20221015_215505.jpg)
Hi.
In this video, we're going to compute some useful quantities
for the exponential random variable.
So we're given that X is exponential with rate lambda.
The PDF looks like this, and the formula is here.
First question, part a, what's the CDF?
So let's go right in.
The CDF of X is the probability that X is less
than or equal to little x.
Let's look at some cases here.
What if little x is less than 0?
Well, X random variable only takes on non-negative values.
And so the probability that X is less than or equal to some
negative number is going to be 0.
On the other hand, if x is greater than or equal to 0, we
do actually have to integrate here.
So to do that, we take the integral from minus infinity
to x of fx(t) --
the dummy variable here used is t.
Notice that again, fx(t) is going to be 0 for negative
values, so we take the integral here from 0.
And now we plug in for fx(t).
That's minus lambda t dt.
And recall that the integral of e to the a t is 1 over a
times e to the a t.
So here in this case, we'll get lambda,
which is just a constant.
And then a here is going to be negative lambda.
So we get this, 0 to x.
Lambdas cancel and we actually get 1 minus e to the
minus lambda x.
So do this.
And we are done with the CDF.
Now for the expectation.
We use the standard formula, which is minus infinity to
infinity t times fx(t) .
So again, fx(t) is going to be 0 for a negative value.
So we do the integral from 0.
We get 0 to infinity t lambda e to the minus lambda t dt.
Now, you can try all you want to get rid of this t.
It's not going to go even if you try all kinds of u
substitution.
But at the end the day, you're going to have to pull out your
calculus textbook and find the integration by parts
formula, which is--
v du.
So the hope is that this integral is going to be easier
than the one on the left.
Notice that this is the integral of one
of the terms here.
And this is the derivative of one of the terms.
So that may help you decide on how you select u and v.
In our case actually, I'm going to use u as--
t for u.
Because when you take the derivative, it's
going to become 1.
And the derivative is what's going to go in that integral.
So this is going to be dt for du.
And then, dv I'm going to select as
whatever's left over.
It's lambda e to minus lambda t dt.
So v is going to be--
we already did the integral--
minus e to the minus lambda t.
And so if we do this, it's going to be negative t times e
to the minus lambda t.
So that's uv.
Minus v, which is negative e to the minus
lambda t times dt.
That goes from 0 to infinity.
This is evaluated from 0 to infinity.
Well, what does it mean for this to be
evaluated from 0 to infinity?
A better and easier way to look at this is to say, well,
it's going to go from 0 to x.
But then you take the limit as x goes to infinity.
So that's going to help us here.
And this negative--
these negatives cancel.
And we're left with--
let's plug in the bounds.
We're left with negative x minus lambda x plus the
integral of this is going to be 1 over negative lambda e to
the minus lambda t evaluated from 0 to infinity.
All right, so now the limit.
So for the limit, notice that x increases
as x goes to infinity.
And this exponential decays.
So they're kind of competing for each other.
But the exponential is going to win because it decays way
faster than x.
And so this first term is going to go off--
the limit is going to go to 0.
All right.
For this, if you evaluate the balance, the
infinity makes this 0.
And 0, you're going to get 1 over lambda.
So that's 1 over lambda.
All right.
And so the expectation is 1 over lambda.
OK, so now what's the variance?
That's part c, right?
So we use the standard formula for variance, which is this.
We already figured out the expectation.
We just need to figure out the expectation of X squared.
Well, we're just going to follow the same set of steps
from before.
For X squared, it's just going to be t squared, t squared, t
squared, x squared.
The only thing that's going to change is what we choose for u
here, for the u substitution.
So it's going to be t squared.
So the derivative is going to change to 2t dt.
v is going to be exactly the same.
And so here in this term, we get negative 2t e to the
minus lambda t.
But there's a negative sign out here, so the negatives
cancel and we're left with a positive sign here.
This is going to change.
All right.
OK.
So in order to do this integral, we can use a trick.
We can move this--
so there's a 2t here.
We move this 2 in here, leave the t inside.
And you have to leave the t inside.
But multiply by lambda and divide by lambda.
Now, look at that integral.
0 to infinity t times lambda e to the minus lambda t dt.
Exactly the expectation that we computed.
We already did that.
That is just 1 over lambda, so it's 2 over lambda times 1
over lambda.
Again, the limit as x goes to infinity--
the exponential will beat x squared.
No matter what polynomial we put in there, the
exponential's going to win.
So this is going to be 0 still.
This one's going to be 2 over lambda squared.
So we're left with 2 over lambda squared for expectation
of X squared.
And so we have 1 over lambda squared for the variance.
OK, so we're done with the variance.
Part d.
We're given that X1, X2, and X3 are independent and
identically distributed.
They're exponentials with rate lambda.
We're asked for the PDF of Z, which is the max
of X1, x2, and x2.
How do we generally find a PDF?
We take the CDF and then take the derivative, right?
We first find the CDF, and then take the derivative.
So let's do that.
So first, let's see.
Part d, find the CDF of Z, which is going to be the
probability that Z is less than or equal to little z,
which is going to be equal to the probability that the max
of X1, X2, X3 is less than or equal to z.
And this is going to have the same sort of
structure as before.
If z is less than 0, X1, x2, X3 are positive--
non-negative.
And so this is the probability that if you get little z less
than 0, you're not going to have any probability there.
And so if z is greater than or equal to 0 is where it gets
interesting.
We need to do something special.
So the special thing here is to recognize that the
probability of the max being less than or equal to z is
actually also the probability of each of these random
variables individually being less than or equal to z.
Why is that true?
One way to check whether the events-- these two events are
the same is to check the two directions.
One direction say, if the max of X1, X2, X3 is less than or
equal to z, does that mean X1 is less than or equal to z, X2
is less than or equal to z, and X3 is less
than or equal to z?
Yes.
OK.
And then, if X1, X2, and X3 are individually less than or
equal to z, then the max is also less than or equal to z.
So these two events are equivalent and this is true.
By independence we can break this up.
And we get--
these are all CDFs of the exponential and they
all have this form.
So it's just going to be 1 minus e to the
minus lambda z cubed.
Plug this in here.
And then, try to take the derivative to get the PDF.
Let's see.
So it's going to be the same, like this for z less than 0.
For z greater than or equal to 0, it's going to be the
derivative of this thing.
Derivative of this thing is by chain rule, 3 times 1 minus e
to the minus lambda z squared.
Then the derivative of negative e to the minus lambda
z, that's just lambda e to the minus lambda z.
There we go.
This is the PDF we were looking for.
So last problem.
We're looking for the PDF of W, which is the
min of X1 and X2.
So let's try this as a similar approach.
Try the same thing, actually.
See if it works.
So w, w, w, w, min, less than or equal to w.
OK.
So let's see if this works.
Is it true that the min--
if the min of X1 and X2 is less than or equal to w, that
each of them is less than or equal to w?
No, right?
X1 could be less than or equal to w and X2 could
be bigger than w.
And the min could still be less than or equal to w.
So that's definitely not true.
So what do we do here?
The trick is to flip it and say we want to compute the min
of X1 and X2 being greater than w.
In that case, let's check if we can do this trick.
If the min of X1 and X2 is greater than w, then clearly
X1 is bigger than w and X2 is bigger than w.
And if X1 and X2 are individually bigger than w,
then clearly the min's also bigger than w.
So this works.
And now we can use independence as before.
And for this, this is just 1 minus the CDF here.
So it's just going to be e to the minus lambda
w for each of them.
But that's the same as e to the minus lambda 2w.
Or e to the 2 lambda w.
So it's going to be--
Notice the similarity between this and this.
The only difference is this has a 2 lambda in there.
That means that W is an exponential random variable
with rate 2 lambda.
So then the PDF is going to be an exponential, whatever it is
for an exponential.
Except with rate 2 lambda.
You can also take the derivative of this and find
that you get this.
OK, so we're done with the problems.
We computed some interesting quantities for the exponential
random variable in this video.


# 4. Normal probability calculation

![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 1.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 2.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 3.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 4.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 5.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 6.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 7.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 8.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 9.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 10.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 11.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 12.png)
![](C:/Users/qp/Pictures/Screenshots/4. Normal probability calculation - 13.png)
Hi.
In this video, we're going to do standard probability
calculations for normal random variables.
We're given that X is standard normal with mean 0 and
variance 1.
And Y is normal with mean one and variance 4.
And we're asked for a couple of probabilities.
For the normal CDF, we don't have a closed form expression.
And so people generally tabulate values and for the
standard normal case.
So if we want little X equal to 3.49, we just look for 3.4
along the rows and 0.09 along the columns, and then pick the
value appropriately.
So for part (a), we're asked what's the probability that X
is less than equal to 1.5?
That's exactly phi of 1.5 and we can look that up.
1.5 directly and that's 0.9332.
Then we asked, what's the probability that X is less
than equal to negative 1?
Notice that negative values are not on this table.
And the reason that is is because the standard normal is
symmetric around zero.
And we don't really need that.
We just recognize that the area in this region is exactly
the area in this region.
And so that's equal to the probability that X is greater
than equal to 1.
This is equal to 1 minus the probability that X
is less than 1.
And we can put the equal sign in here because X is
continuous, it doesn't matter.
And so we're going to get, this is equal to 1
minus phi of one.
And we can look up phi of 1, which is
1.00, and that's 0.8413.
OK.
For part (b), we're asked for the distribution of Y
minus 1 over 2.
So any linear function of a normal random
variable is also normal.
And you can see that by using the derived distribution for
linear functions of random variables.
So in this case, we only need to figure out what's the mean
and the variance of this normal random variable.
So the mean in this case, I'm going to write that as Y over
2 minus 1/2.
The expectation operator is linear and so
that's going to be--
and the expectation in this case is 1, so
that's going to be 0.
Now the variance.
For the shift, it doesn't affect the spread.
And so the variance is exactly going to be the same without
the minus 1/2.
And for the constant, you can just pull that
out and square it.
And the variance of Y we know is 4.
And so that's 1/4 times 4, that's 1.
OK.
So now we know that Y minus 1 over 2 is
actually standard normal.
Actually for any normal random variable, you can follow the
same procedure.
You just subtract its mean, which is 1 in this case.
And divide by its standard deviation and you will get a
standard normal distribution.
All right, so for part (c) we want the probability that Y is
between negative 1 and 1.
So let's try to massage it so that we can use the standard
normal table.
And we already know that this is standard normal, so let's
subtract both sides by negative 1.
And that's equal to--
I'm going to call this standard normal Z, so that's
easier to write.
And that's equal to negative 1 less than equal to Z, less
than equal to zero.
So we're looking for this region, 0, 1, negative 1.
So that's just the probability that it's less than zero minus
probability that it's less than negative 1.
Well for a standard normal, half the mass is below zero
and a half the mass is above.
And so that's just going to be 0.5 directly.
And for this, we've already computed this for a standard
normal, which was X in our case.
And that was 1 minus 0.8413.
Done.
So we basically calculated a few standard probabilities for
normal distributions.
And we did that by looking them up from the standard
normal table.


# 5. Densities and conditioning on an event

![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 1.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 2.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 3.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 4.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 5.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 6.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 7.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 8.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 9.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 10.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 11.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 12.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 13.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 14.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 15.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 16.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 17.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 18.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 19.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 20.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 21.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 22.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 23.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 24.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 25.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 26.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 27.png)
![](C:/Users/qp/Pictures/Screenshots/5. Densities and conditioning on an event - 28.png)
Hi.
In this problem, we're going to get more practice working
with densities and also working with conditional
densities by conditioning on an event.
The density that we're given is one for a random variable
X. And we're told that the density is x over 4 when x is
between 1 and 3.
And otherwise, it's equal to zero.
So here, I've drawn out this density.
From 1 to 3, it's x over 4.
So it goes from 1/4 to 3/4, and outside
of that, it's zero.
And we're also given a definition for an event, A,
which is the event that this random variable
X is at least 2.
So think of it as here is the line where it's 2.
A is the event that the random variable is greater than or
equal to 2.
And now, in part (a), we're asked to
find a bunch of things.
The first one is the expected value of X. The expected
value, remember, the definition is this integral.
You integrate x times the density of X.
And the important thing here to remember is that when you
plug this in, when you plug in the density, which is x over
4, you need to remember to correct the limits.
Because this density is only valid from x from 1 to 3.
So instead of integrating from minus infinity to infinity, we
only go from 1 to 3, because outside of that range, this
integral is actually zero because the density is zero.
So now it's just solving this integral, which is x squared
over 4 from 1 to 3.
So what exactly is that?
That's going to give you 1/4.
And you integrate x squared.
That's 1/3 x cubed from 1 to 3.
So that's 1/12 times 27 minus 1, which gives you 26 over 12,
which you can also further simplify, 13/6.
So that is our first answer.
And actually let's just check that this kind of makes sense.
This is a little bit over 2, so it's somewhere here.
And yes, that makes sense.
Because it makes sense it's a little past 2.
If this were flat, if this were uniform, then we know
that the expectation would be exactly the
midpoint, which is at 2.
But it's actually skewed more towards higher values of x.
So you would expect that the expectation is actually to the
right of 2, a little bit higher than 2.
So that seems to check out.
The next thing to calculate is the probability of this event,
the probability of the event A, which we could also write
as the probability that this random variable is at least 2.
Now how do you calculate the probability the random
variable is at least 2?
It's just a matter of integrating the density.
And you integrate the density starting from 2, because you
want it to be at least 2.
And you integrate it all the way up.
In general, it would be to infinity.
But we know that for values of x greater than 3,
the density is 0.
So we actually just integrate from 2 to 3.
Really, what we're doing is we're just
calculating this part.
So we can just again plug in what this is.
It's anywhere from 2 to 3 of the density.
In this range, this form will hold, so it's x over 4.
And now that's going to give you 1/8 x squared from 2 to 3.
So it's 1/8 times 8.
Sorry, times 9 minus 4, which gives you 5/8 as your answer.
And again, we can check that this kind of makes sense.
The probability that the random variable is at least 2
is 5/8, which is a little bit more than half.
And again, we can compare this to the case
where it would be uniform.
If it were uniform and flat, then the probability that it's
greater than 2 would be exactly half.
Because it's symmetric.
It's half on this side, half on this side.
But because it's skewed, again, more towards larger
values, you would expect the probability that it's at least
2 to be a little bit more than a half.
And the probably that it's less than 2 would be a little
less than a half.
So that also checks out.
Now let's actually calculate the next thing, which is the
conditional density of X given the event A. So what does it
really mean?
It means now by conditioning on A, we're saying that we're
in the universe where the random variable is
actually at least 2.
So instead of focusing on the original density, now we're
actually just in this restricted universe
of just this part.
And now we want an updated density for only this part.
And in general, what really happens is you just take the
original density that held for the entire random variable and
rescale it.
Because if you just took this, it
wouldn't be a valid density.
Because we know that this thing, this part of the
density, only integrates to 5/8.
And you need a density to integrate to 1.
So what do we do?
We take the original density and rescale it so that it does
integrate to 1.
And what that amounts to is really just taking the
original density and dividing it by the probability of A,
which is the probability of this portion of this.
And you see that that exactly is the normalizing factor that
we need in order for this to integrate to 1.
And again, we need to be a little bit careful, because
this is only true when x is within A, in other words, when
x is between 2 and 3.
And otherwise, it would be zero.
Now we know what these things are.
The original density is x over four, and that was
valid from 1 to 3.
And we're in 2 to 3, so that's still valid.
So it's x over four.
And then the probability of A we calculated earlier is 5/8.
So when we divide that by that, we get 8/5.
And so when you simplify this, you get 2x over five.
And that becomes your new density.
And we could actually also plot out what this is.
And it would actually look something like this.
It would be this portion of the density.
So notice that 2/5 is the slope as opposed to 1/4.
And really what we've done is we've taken this and rescaled
it so that this trapezoid now integrates actually to 1,
whereas this portion of it would only
integrate it to 5/8.
Now for the last part of this part (a), we're asked to find
a conditional expectation.
So conditioned on the event A, what is the expectation of X?
Or put in other words, what is the expectation of X given
that we know that it's at least 2.
Well we know that when we calculate expectations, the
general form is something like this kind of integral.
But because we're conditioning on A, we have to change the
density to the conditional density.
So we integrate as before, x.
And we use the conditional density,
which is 2x over five.
And now, we need to be careful, because again, the
limits in general are minus infinity to infinity.
But we need to write the right ones, the ones that are valid
for this expression for the conditional density.
So 2x over 5 is only valid from 2 to 3.
So we integrate actually from 2 to 3.
So what is this integral?
It's 2/5 times the integral of 2 to 3 of x squared.
So it's 2/5 integral 1/3 x cubed from 2 to 3.
So it's going to give you 2/15 of 27 minus 8.
So the final answer that you get--
this gives you 19 times 2 is 38 over 15.
And you can see that if you look at this.
This is 2 and 8/15, which is a little bit
more than 2 and 1/2.
And again, if we're in the conditional world where x is
least 2 or effectively where x is between 2 and 3.
And if, in this conditional world, the distribution were
again uniform, you would expect this conditional
expectation to be 2 and 1/2 exactly, the midpoint.
But because, just like before in the unconditional world, in
the condition world, we still had the case that it's skewed
more towards 3.
So because of that, it's a little bit
more than 2 and 1/2.
So it's actually 38/15, which you can again just do a quick
sanity check to make sure that our answer makes sense.
Now one last part, part (b).
Now introduced a new random variable Y. And we define it
as Y equal to X squared.
And we are asked to calculate some things.
So first, let's calculate the expected value of Y. Well, to
use the standard way, we would have to calculate the pdf of
Y, and then use the integral formula.
But in order to do that, we would need the pdf of Y, but
we don't actually know what that is.
But because Y is related to X, we can actually just write it
in terms of expected value of X squared
because Y is X squared.
And now, we use a formula.
The expectation of a function of a random variable is the
same thing.
And so instead of writing just X, we write the function of X.
So it's x squared times the density of X. So that's the
general form.
And now let's actually plug it in and see
what it actually is.
The density--
we go back to the original case-- is x over 4.
So we get x cubed over 4.
And now also we need to update the limits.
This density is only valid from 1 to 3.
So we have to integrate from 1 to 3.
So when we do this integral, what do we get?
Well, we get x to the fourth over 16 from 1 to 3, which
gives you 1/16.
And then you get 81--
3 to the fourth is 81--
minus 1.
So 80 over 16 is exactly 5.
So even though we don't know exactly what the distribution
for Y is, we can use what we know about X to calculate the
expectation of Y. Now the second part is we want to find
the variance of Y. And again, the usual formula is
expectation of Y squared minus expectation
of Y quantity squared.
Now we already know what this is.
Expectation of Y, we just calculated, is 5.
But we also need to know what the expectation
of Y squared is.
And we can use the same method here, because Y squared is
just X to the fourth.
So expectation of Y squared is just
expectation of X to the fourth.
And now we can use the exact same sort of formula to
calculate what that is.
So we integrate x to the fourth times the density of X.
We plug that in.
The density of X is x over 4, so this becomes x to
the fifth over 4.
And then, again, the limits we need to be careful of goes
from 1 to 3.
So when we do have this integral, we get x to the
sixth over 24 from 1 to 3, which equals 91 over 3.
So the last part now is just combining the pieces.
The variance of Y now is just expectation of Y squared,
which is 91 over 3 minus expectation of Y quantity
squared, which is 5 squared, is 25.
And that gives you 16 over 3 as your final answer.
So this problem really was kind of a drill problem that
started out with giving you a density and asked you to
calculate a variety of different quantities related
to that density.
And then we also added another wrinkle to a by conditioning
on an event and working with a conditional density.
The last thing was coming up with a related random variable
and using the original random variable to help you calculate
quantities related to that new random variable.
So we'll see you next time.


# 6. Circular uniform PDF

![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 1.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 2.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 3.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 4.png)
![No 6, 7, duplicated](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 5.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 8.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 9.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 10.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 11.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 12.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 13.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 14.png)
![](C:/Users/qp/Pictures/Screenshots/6. Circular uniform PDF - 15.png)


# 7. The absent minded professor

![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 1.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 2.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 3.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 4.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 5.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 6.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 7.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 8.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 9.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 10.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 11.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 12.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 13.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 14.png)
![](C:/Users/qp/Pictures/Screenshots/7. The absent minded professor - 15.png)
Hi.
In this problem, we have an absent-minded professor who
will inadvertently give us some practice with exponential
random variables.
So the professor has made two appointments with two students
and inadvertently made them at the same time.
And what we do is we model the duration of these appointments
with an exponential random variable.
So remember, an exponential random variable is a
continuous random variable that takes on non-negative
values, and it's parametrized by a rate parameter, lambda.
And the exponential random variable is often used to
model durations of time-- so time until something happens,
so for example, in this case, time until the student leaves
or the appointment is over.
Or sometimes you will also use it to be as a model of time
until something fails.
And one thing that will be useful is the CDF of this
exponential random variable.
So the probability that it's less than or equal to some
value, little t, is equal to 1 minus e to the minus lambda t.
So this is, of course, valid only when t is non-negative.
The other useful property is that the expected value of an
exponential random variable is just 1 over
the parameter lambda.
And the last thing that we'll use specifically in this
problem is the memoryless property of
exponential random variables.
And so recall that that just means that if you pop in the
middle of an exponential random variable, the
distribution for that random variable going forward from
the point where you popped in is exactly the same as if it
had just started over.
So that's why we call it the memoryless property.
Basically, the past doesn't really matter, and going
forward from whatever point that you observe it at, it
looks as if it had just started over afresh.
And last thing we'll use, which is a review of a concept
from earlier, is total expectation.
So let's actually model this problem
with two random variables.
Let's let T1 be the time that the first student takes in the
appointment and T2 be the time that the second student takes.
And what we're told in the problem is that they're both
exponential with mean 30 minutes.
So remember the mean being 30 minutes means that the lambda
is 1 over the mean.
And so the lambda in this case would be 1/30.
And importantly, we're also told that they are
independent.
So how long the first person takes is independent of how
long the second person takes.
So the first student arrives on time and take some random
amount of time, T1.
The second student arrives exactly five minutes late.
And whatever the second person meets with the professor, that
student will then take some random amount of time, T2.
What we're asked to do is find the expected time between when
the first student arrives--
so we can just call that time 0--
and when the second student leaves.
Now you may say, well we're dealing with expectations, so
it's easier.
And in this case, it probably is just the expectation of how
long the first student takes plus the expectation of how
long the second student takes.
So it should be about 60 minutes or exactly 60 minutes.
Now, why is that not exactly right?
It's because there is a small wrinkle, that the students may
not go exactly back to back.
So let's actually draw out a time frame of what might
actually happen.
So here's time 0, when the first student arrives.
And the first will go for some amount of time and leave.
And now let's consider two scenarios.
One scenario is that the first student takes more than five
minutes to complete.
Well then the second student will have arrived at 5 minutes
and then will be already waiting whenever this first
student leaves.
So then the second student will
immediately pick up and continue.
And in that case, we do have two exponentials back to back.
But there could be another situation.
Suppose that the first student didn't take very long at all
and finished within five minutes, in which case the
second student hasn't arrived yet.
So this professor is idle in between here.
And so we actually don't necessarily have two of them
going back to back.
So there's an empty period in between that we have to
account for.
So with that in mind, we see that we have two scenarios.
And so what does that beg to use?
Well, we can split them up into the two scenarios and
then calculate expectations with each one and then use
total expectation to find the overall
expected length of time.
OK, so let's begin with the first scenario.
The first scenario is that, let's say, the first student
finished within five minutes.
So what does that mean in terms of the definitions that
we've used?
That means T1 is less than or equal to 5.
So if the first student took less than five minutes, then
what happens?
Then we know that the amount of time that
you'd need to take--
let's call that something else.
Let's call that X. So X is the random variable that we're
interested in, the time between when the first student
comes and the second student leaves.
This is the value that we want to find.
Well we know that we're guaranteed that there will be
a five-minute interval.
So first student will come, and then the second person
will take over.
So we're guaranteed that the first five minutes will be the
difference between when time starts and when the second
student arrives.
And then, after that, it's just however long the second
student takes, which is just the expected value of T2.
And T2 is a exponential random variable with mean 30.
So in this case, it's just 35.
So the first student doesn't take very long.
Then we just get the five minutes, that little buffer,
plus however long the second student takes, which, on
average, is 30 minutes.
Now what is the probability of this happening?
The probability of this happening is the probability
that the first student takes less than five minutes.
And here is where we use the CDF that we wrote out earlier.
It's going to be 1 minus e to the minus lambda t.
So in this case, t is five and lambda is 1/30.
So it's minus 5/30 is the probability.
All right, now let's consider the second case.
The second case is that the first student actually takes
longer than five minutes.
OK, so what happens in that case?
Here's five minutes.
The first student came to five minutes.
The second student arrived, and the first
student is still going.
So he goes for some amount of time.
And then whenever he finishes, the second student continues.
So now the question is, what is the total amount of
time in this case?
Well, you can think of it as using the memoryless property.
This is where it comes in.
So the first five minutes, we know that it was already taken
because we're considering the second scenario, which we're
given that T1 is greater than 5.
And so the question now is, if we know that, how much longer
does it take?
How much longer past the five-minute mark does the
first student take?
And by the memoryless property, we know that it's as
if the first student started over.
So there was no memory of the first five minutes, and it's
as if the first student just arrived also at the
five-minute mark and met with the professor.
So past the five-minute mark, it's as if you have a new
exponential random variable, still with mean 30.
And so what we get is that, in this case, you get the
guaranteed five minutes, and then you get the memoryless
continuation of the first student's appointment.
So you get another 30 minutes on average because of the
memoryless property.
And then whenever the first student finally does finish
up, the second student will immediately take over because
he has already arrived.
It's past the five-minute mark.
And then that second student will take, again, on average,
30 more minutes.
So what you get is, in this case, the appointment lasts 65
minutes on average.
Now what is the probability of this case?
The probability of this case is the probability that T1 is
greater than 5.
And now we know that that is just the complement of this, 1
minus that.
So it's just e to the minus 5/30.
So now we have both scenarios.
We have the probabilities of each scenario, and we have the
expectation under each scenario.
Now all that remains now is to combine them using total
expectation.
So I really should have written expectation of X given
T1 is less than or equal to 5 here.
And this is expectation of X given that T1 is
greater than 5.
So expectation of X overall is the probability that T1 is
less than or equal to 5 times the expectation of X given
that T1 is less than or equal to 5 plus the probability that
T1 is greater than 5 times the expectation of X given that T1
is greater than 5.
And we have all four of these pieces here.
so it's 35 times 1 minus e to the minus 5/30 plus 65 times e
to the minus 5 5/30.
And it turns out that this is approximately
equal to 60.394 minutes.
All right, so what have we found?
We found that the original guess that we had, if we just
had two meetings back to back, was on average it would take
60 minutes.
It turns out that, because of the way that things are set
up, because of the five minute thing, it actually takes a
little longer than 60 minutes on average.
And why is that?
It's because the five sometimes adds an extra
buffer, adds a little bit of extra amount, because it would
have been shorter in this scenario because, if the both
students had arrived on time, then the second student would
have been able to pick up right here immediately.
And so both appointments would have ended sooner.
But because the second student didn't arrive until five
minutes, there was some empty space that was wasted.
And that's where you get you the little bit of extra time.
So this is a nice problem just to get some more exercise with
exponential random variables and also nicely illustrates
the memoryless property, which was a key points in order to
solve this.
And it also is nice because we get to review a useful tool
that we've been using all course long, which is to split
things into different scenarios and then solve the
simpler problems and then combine them up, for example
using total expectation.
So I hope that was helpful, and see you next time.


# 8. Uniform probabilities on a triangle

![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 1.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 2.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 3.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 4.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 5.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 6.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 7.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 8.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 9.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 10.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 11.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 12.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 13.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 14.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 15.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 16.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 17.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 18.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 19.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 20.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 21.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 22.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 23.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 24.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 25.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 26.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 27.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 28.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 29.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 30.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 31.png)
![](C:/Users/qp/Pictures/Screenshots/8. Uniform probabilities on a triangle - 32.png)
Hi.
In this problem, we're going to get a bunch of practice
working with multiple random variables together.
And so we'll look at joint PDFs, marginal PDFs,
conditional PDFs, and also get some practice calculating
expectations as well.
So the problem gives us a pair of random variables, X and Y.
And we're told that the joint distribution is uniformly
distributed on this triangle here, with vertices being 0,
0; 1, 0; and 0, 1.
So it's uniform in this triangle.
And the first part of the problem is just to figure out
what exactly is this joint PDF of the two random variables.
So in this case, it's pretty easy to calculate, because we
have a uniform distribution.
And remember, when you have a uniform distribution, you can
just imagine it being a sort of plateau
coming out of the board.
And it's flat.
And so the height of the plateau, really, in order to
calculate it, you just need to figure out what the area of
this thing is, of this triangle is.
So remember, when you had single random variables, what
we had to do was calculate, for our uniform distribution,
had to integrate to 1.
So you took the length, and you took 1 over the length was
the scaling factor.
Here, you take the area.
And the height has to make it so that the entire volume here
integrates to 1.
So the joint PDF is just going to be 1 over
whatever this area is.
And the area is pretty simple to calculate.
It's 1/2 base times height.
So it's 1/2.
And so what we have is that the area is 1/2, and so the
joint PDF of x and y is going to equal 2.
But remember, you always have to be careful when writing
these things to remember the ranges when
these things are valid.
So it's only two within this triangle.
And outside of the triangle, it's 0.
So what exactly does inside the triangle mean?
Well, we can write it more mathematically.
So this diagonal line, it's given by x plus y equals 1.
So everything in the triangle is really x plus y is less
than or equal to 1.
It means everything under this triangle.
And so we need x plus y to be less than or equal to 1, and
also x to be non-negative and y to be non-negative.
So with these inequalities, that captures everything
within this triangle.
And otherwise, the joint PDF is going to be 0.
The next part asks us to find, using this joint PDF, the
marginal PDF of Y. And remember, when you have a
joint PDF of two random variables, you essentially
have everything that you need, because from this joint PDF,
you can calculate marginals, you can calculate from
marginals, you can calculate conditionals.
And essentially, the joint PDF captures everything that there
is to know about this pair of random variables.
Now, to calculate a marginal PDF of Y, remember a marginal
really just means collapsing the other
random variable down.
And so you can just imagine taking this thing and
collapsing it down onto the y-axis.
And mathematically, that is just saying that we integrate
out the other random variable.
So the other random variable in this case will be X. We
take x and we get rid of it by integrating out from negative
infinity to infinity.
Of course, this joint PDF is 0 in a lot of places.
And so a lot of these will be 0.
And only for a certain range of x's will this integral
actually be non-zero.
And so again, the other time when we have to be careful is
when we have the limits of integration, we need to make
sure that we have the right limits.
And so we know that the joint PDF is 2.
It's nonzero only within this triangle.
And so it's only 2 within this triangle, which
means what for x?
Well, depending on what x and y are, this will
be either 2 or 0.
So let's just fix some value of y.
Pretend that we've picked some value of y, let's say here.
We want this value of y.
Well what are the values of x such that the joint PDF for
that value of y is actually nonzero?
It's actually 2.
Well, it's everything from x equals 0 to whatever
x value this is.
But this x value, actually, if you think about it, is just 1
minus y, because this line is x plus y equals 1.
So whatever y is, x is going to be 1 minus that.
And so the correct limits should actually be from
0 to 1 minus y.
And then the rest of that is pretty simple.
You integrate this.
This is a pretty simple integral.
And you get that it's actually 2 times 1 minus y.
That's a y.
But, of course, again, we need to make sure that we have the
right regions.
So this is not always true for y, of course.
This is only true for y between 0 and 1.
And otherwise, it's actually 0, because when you take a y
down here, well, no values of x will give you a
nonzero joint PDF.
And if you take a value of y higher than this, the same
thing happens.
So we can actually draw this out and see
what it looks like.
So let's actually draw a small picture here.
Here's y.
Here's the marginal PDF of Y. And here's 2.
And it actually looks like this.
It's a triangle and a 0 outside this range.
So does that make sense?
Well, first of all, you see that it actually integrates to
1, which is good.
And the other thing we notice is that there is a higher
density for smaller values of y.
So why is that?
Why are smaller values of y more likely than
larger values of y?
Well, because when you have smaller values of
y, you're down here.
And it's more likely, because there are more values of x
that go along with it that make that value y
more likely to appear.
Say you have a large value of y.
Then you're up here at the tip.
Well, there aren't very many combinations of x of y that
give you that larger value of y.
And so that large value of y becomes less likely.
Another way to think about it is, when you collapse this
down, there's a lot more stuff to collapse down at the base.
There's a lot of x's to collapse down.
But up here, there's only a very little bit of x to
collapse down.
And so the PDF of Y becomes more skewed towards smaller
values of y.
So now, the next thing that we want to do is calculate the
conditional PDF of X, given y.
Well, let's just recall what that means.
This is what we're looking for-- the conditional PDF of
X, given Y.
And remember, this is calculated by taking the joint
and dividing by the marginal of Y. So we actually have the
top and the bottom.
We have the joint PDF from part (a).
And from part (b), we calculated the marginal PDF of
Y.
So we have both pieces.
So let's actually plug them in.
Again, the thing that you have to be careful here is about
the ranges of x and y where these things are valid,
because this is only non-zero when x and y
fall within this triangle.
And this is only non-zero when y is between 0 and 1.
And so we need to be careful.
So the top, when it's nonzero, it's 2.
And the bottom, when it's non-zero, it's 2
times 1 minus y.
So you can simplify that to be 1 over 1 minus y.
And when is this true?
Well, it's true when x and y are in the triangle and y is
between 0 and 1.
So put another way, that means that this is valid when y is
between 0 and 1.
And x is between 0 and 1 minus y, because whatever x has to
be, it has to be such that they actually still fall
within this triangle.
And outside of this, it's 0.
So let's see what this actually looks like.
So this is x, and this is conditional PDF of X given Y.
Let's say this is 1 right here.
Then this is 1 here.
Then what it's saying is, we're given that y
is some little y.
Let's say it's somewhere here.
Then it's saying that the conditional PDF of X given Y
is this thing.
But notice that this value, 1 over 1 minus y, does not
depend on x.
So in fact, it actually is uniform.
So it is uniform between 0 and 1 minus y.
And the height is something like 1 over 1 minus y.
And this is so that it's scaling makes it so that it
actually is a valid PDF, because the integral is to 1.
So why is this the case?
Why is it that when you condition on y being some
value, you get that the PDF of X is actually uniform?
Well, when you look over here, let's again pretend that
you're taking this value of y.
Well, when you're conditioning on Y being this value, you're
basically taking a slice of this joint PDF at this point.
But remember, the original joint PDF was uniform.
So when you take a slice of a uniform distribution, joint
uniform distribution, you still get
something that is uniform.
Just imagine that you have, like, a cake that is flat.
Now, you take a slice at this level.
Then whatever slice you have is also going to be imagine
being a flat rectangle.
So it's still going to be uniform.
And that's why the conditional PDF of X
given Y is also uniform.
Part (d) now asks us to find a conditional expectation of X.
So we want to find the expectation of X given that Y
is some little y.
And for this, we can use the definition.
Remember, expectations are really just weighted sums.
In the continuous case, it's an integral.
So you take the value.
And then you weight it by the density.
And in this case, because we're taking conditional
expectation, what we weight it by is the conditional density.
So it's the conditional density of X given
that Y is little y.
We integrate with respect to x.
And fortunately, we know what this conditional PDF is,
because we calculated it earlier in part (c).
And we know that it's this--
1 over 1 minus y.
But again, we have to be careful, because this formula,
1 over 1 minus y, is only valid for certain cases.
So let's think about this first.
Let's think about some extreme cases.
What if little y is negative?
If little y is negative, we're conditioning on
something over here.
And so there is no density for y being negative or for y,
say, in other cases when y is greater than 1.
And so in those cases, this expectation is just undefined,
because conditioning on that doesn't really make sense,
because there's no density for those values of y.
Now, let's consider the case that actually makes sense,
where y is between 0 and 1.
Now, we're in business, because that is the range
where this formula is valid.
So this formula is valid, and we can plug it in.
So it's 1 over 1 minus y dx.
And then the final thing that we, again, need to check is
what the limits of those integration is.
So we're integrating with respect to x.
So we need to write down what values of x, what ranges of x
is this conditional PDF valid?
Well, luckily we specified that here.
x has to be between 0 and 1 minus y.
So let's actually calculate this integral.
This 1 over 1 minus y is a constant with respect to x.
You can just pull that out.
And now, you're really just integrating x from
0 to 1 minus y.
So the integral of x is 1, 1/2 x squared.
So you get a 1/2 x squared, and you integrate that from 0
to 1 minus y.
And so when you plug in the limits, you'll
get a 1 minus y squared.
That'll cancel out the 1 over 1 minus y.
And what you're left with is just 1 minus y over 2.
And again, we have to specify that this is only true for y
between 0 and 1.
Now, we can again actually verify that this makes sense.
What we're really looking for is the conditional expectation
of X given some value of y.
And we already said that conditioned on Y being some
value, X is uniformly distributed between
0 and 1 minus y.
And so remember, for our uniform distribution, the
expectation is simple.
It's just the midpoint.
So the midpoint of 0 and 1 minus y is
exactly 1 minus y/2.
So that's a nice way of verifying that this answer is
actually correct.
Now, the second part of part (d) asks us to do
a little bit more.
We have to use the total expectation theorem in order
to somehow write the expectation of X in terms of
the expectation of Y. So the first thing we'll do is use
the total expectation theorem.
So the total expectation theorem is just saying, well,
we can take these conditional expectations, and now, we can
integrate this by the marginal density of Y, then we'll get
the actual expectation of X.
You could think of it as just kind of applying a law of
iterated expectations as well.
So this integral is going to look like this.
You take the conditional expectation.
So this is the expectation of X if Y were equal to little y.
And now, what is that probability?
Well, now, we just multiply that by the density of Y at
that actual value of little y.
And we integrate with respect to y.
Now, we've already calculated what this conditional
expectation is.
It's 1 minus y/2, so let's plug that in.
1 minus y/2 times the marginal of Y. Now, there's a couple
ways of attacking this problem now.
One way is, we can actually just plug in
the marginal of y.
We've already calculated that out in part (b).
And then we can do this integral and calculate out the
expectation.
But maybe we don't really want to do so much calculus, so
let's do what the problem says and try a different approach.
So what the problem suggests is to write this in terms of
the expectation of Y. And what is the expectation of Y?
Well, the expectation of Y is going to look something like
the integral of y times the marginal of Y.
So let's see if we can identify something like that
and pull it out.
Well, yeah, we actually do have that.
We have y times the marginal of Y, integrated.
So let's isolate that.
So besides that, we also have this.
We have the integral of the first term, is 1/2 times the
marginal of Y.
And then the second term is minus 1/2 times the integral
of y of fy dy.
This is just me splitting this integral up into
two separate integrals.
Now, we know what this is.
The 1/2 we can pull out.
And then the rest of it is just the integral of the
marginal of a density from minus infinity to infinity.
And by definition, that has to be equal to 1.
So this just gives us a 1/2.
And now, what is this?
We get a minus 1/2.
And now this, we've already said that is the expectation
of Y. So what we have is the expectation of Y. So in the
second part of this part (d), we've expressed the
expectation of X in terms of the expectation of Y. Now,
maybe that seems like that's not too helpful, because we
don't know what either of those two are.
But if we think about this problem and as part (e)
suggests, we can see that there's symmetry in this
problem, because X and Y are essentially symmetric.
So imagine this is X equals Y. There's symmetry in this
problem, because if you were to swap the roles of X and Y,
you would have exactly the same joint PDF.
So what that suggests is that by symmetry then, it must be
that the expectation of X and the expectation of Y are
exactly the same.
And that is using the symmetry argument.
And that helps us now, because we can plug that in and solve
for expectation of X. So expectation of X is 1/2 minus
1/2 expectation of X.
So we have 3/2 expectation of X equals 1/2.
So expectation of X equals 1/3.
And of course, expectation of X is also 1/3.
And so it turns out that the expectation is around there.
So this problem had several parts.
And it allowed us to start out from just a raw joint
distribution, calculate marginals, correctly
conditionals, and then from there, calculate all kinds of
conditional expectations and expectations.
A couple of important points to remember are, when you do
these joint distributions, it's very important to
consider where values are valid.
So you have to keep in mind when you write out these
conditional PDFs and joint PDFs and marginal PDFs, what
ranges the formulas you calculated are valid for.
And that also translates to when you're calculating
expectations and such.
When you have integrals, you need to be very careful about
limits of your integration, to make sure that they line up
with the values or the range where the values
are actually valid.
And the last thing, which is kind of unrelated, but it is
actually a common tool that's used in a lot of problems is,
when you see symmetry in these problems, that can help a lot,
because it will simplify things and allow you to use
facts like these to help you calculate what the
final answer is.
Of course, this also comes along with practice.
You may not immediately see that there could be symmetry
argument that will help with this problem, but with
practice, when you do more of these problems, you'll
eventually build up that kind of an
appreciation for the symmetry.
So I hope that was helpful, and see you next time.


# 9. Probability that 3 pieces form a triangle

![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 1.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 2.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 3.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 4.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 5.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 6.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 7.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 8.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 9.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 10.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 11.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 12.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 13.png)
![](C:/Users/qp/Pictures/Screenshots/9. Probability that 3 pieces form a triangle - 14.png)
In this problem, we're going to look at the probability
that when you take a stick and break it into three pieces
randomly, that these three pieces can actually be used to
form a triangle.
So we start out with a stick of unit length-- so length 1.
And we'll choose a point along the stick to break.
And we'll choose that point uniformly at random.
So let's say that we chose it here.
That was a point where we'll break it.
And then independently of this first choice, we'll again
choose a second point to break it, again uniformly at random,
along the entire stick.
So let's say the second point we chose is here.
So what we have now is, we'll break it here and here.
And so we'll have three pieces--
the first one, the left one, the middle one,
and the right one.
And we want to know, what's the probability that when you
take these three pieces you can form a triangle?
So the first thing we should ask ourselves
is, what do you need?
What conditions must be satisfied in order to actually
be able to form a triangle with three pieces?
So you could think about what would stop you from being able
to do that?
Well, one possibility is that you have pieces
that look like this.
So in that case, you would try to form something
that looks like this.
But you can't get a triangle, because these two pieces are
too short, and they can't touch each other.
So actually, the condition that must be satisfied is that
when you take any two of the three pieces, their combined
length has to be greater than the length of the remaining
third piece.
And that's true for any two pieces.
And really, that's just so that any two pieces, they can
touch and still form a triangle.
So let's try to add some probability to this.
So we have a unit-length stick.
So let's actually give a coordinate system.
The stick goes from 0 to 1.
And let's say that we break it at these two points.
The first point where we choose, we'll call that X. So
that's the first point that we choose to break it.
And then the second point we choose, we'll call that Y.
Now, note that I've drawn it so that X is to the left of Y.
But it could actually be the case that the first point I
chose is here and the second point that I
chose is to the left.
But for now, let's first assume that's this scenario
holds, that the first point is to the left
of the second point.
So under this assumption, we can see that from the
definition of these random variables, we can actually see
that the lengths are given by these three lengths.
So the lengths are X. The left-most piece has length X.
The second, middle, piece has length of Y minus X. And the
last piece has length 1 minus Y. And now, let's recall our
three conditions.
So the conditions were that any two of these, the sum of
any two lengths has to be greater than the length of the
third piece.
So let's do these together.
So X plus Y minus X has to be greater than y minus 1.
So with these two pieces, you can cover this third piece.
We also need that with the first and third pieces, we can
cover the middle piece.
And we need with the second and third pieces, we can cover
the first piece.
Now, this looks kind of messy, but in fact, we can actually
simplify this.
So this actually simplifies.
X minus X-- that disappears.
And so this actually simplifies to 2Y has
to be at least 1.
Or, even more simply, y has to be greater than 1/2.
What about this one?
This one we can rearrange things again.
The X we can move over here, Y we can move over here.
And we get that 2X plus 1 has to be greater than 2Y.
Or put in other words, Y is less than X plus 1/2.
And for the last one, again, we can simplify.
The Y's cancel each other out.
And we're left with 2X is less than 1.
Or X is less than 1/2.
So these are our three conditions
that need to be satisfied.
So now, we just have to figure out what's the probability
that this is actually satisfied.
Now, let's go back to original definition and see what are
the actual distributions for these random variables, X and
Y.
Remember, we defined them to be X is the location of the
first break and Y is the location of the second break.
And as we said in the problem, these are chosen uniformly at
random, and they're independent.
And so we can actually draw out their joint PDF.
So X and Y, it can cover any point in the square.
And moreover, it's actually uniform within this square,
because each one is chosen uniformly at random.
And they're independent.
So it's anywhere in here.
And so what do we need to do?
We just need to identify what's the probability that
these three conditions hold.
Let me adjust.
Let me write this, line these up.
So these are our three conditions that we need.
And now, remember, we're still working under the assumption
that the first point that we chose is actually to the left
of the second point.
So what does that mean?
That means that we are actually in this top
triangle, top 1/2.
X is less than Y. So what do we need?
We need Y to be at least 1/2.
So here's 1/2.
So we need Y to be above this line.
We need X to be less than 1/2, so we need X to
be to the left here.
So far, we're stuck in this upper square.
And the last thing we need is Y to be less than X plus 1/2.
The line Y equals X plus 1/2 is this one.
So Y has to be less than that, so we have to be in this
triangle here.
So these three conditions tell us that in order for us to
have a triangle, we need for X and Y to fall jointly in this
small triangle here.
And now, because the joint distribution is uniform, we
know that the density is just 1, because the area
here is just 1.
So the height is just 1 as well.
And so the density or the probability of falling within
this smallest triangle is just going to be also the area of
this triangle.
And what is the area of this triangle?
Well, you can fit eight of these triangles in here.
Or you could think of it as 1/2 times 1/2 times 1/2.
So the area is just 1/8.
So assuming that X is less than Y, then the probability
forming a triangle is 1/8.
Now, that's only half the story, because it's possible
that when you chose these two break points, that we actually
had the opposite result, that X, the point that you chose
first falls to the right of the point that you chose
second, in which case everything kind of flips.
Now, we assume that Y is less than X, which means that now
we're in this lower triangle in the square.
Now, we could go through this whole exercise again.
But really, what we can see is that all we've really done is
just swap the names.
instead.
Of having X and Y, we now call X, we call it XY.
We call Y X.
And so if we just swap names, we can see that--
let's just fast forward through all these steps and
see that we could just swap names here too as well in the
three conditions.
So instead of needing Y to be greater than 1/2, we just need
X to be greater than 1/2.
And instead of having X less than 1/2, we
need Y less than 1/2.
And then we also swap this.
So we need X to be less than Y plus 1/2.
Or Y is greater than X minus 1/2.
Now, let's figure out what this corresponds to.
We need X to be greater than 1/2.
So it needs to be to the right of here.
We need Y to be less than 1/2.
So we need it to be below this line.
And we need Y to be greater than X minus 1/2.
What is the line Y equals X minus 1/2?
That is this line here.
And we need Y to be greater than that.
So it needs to be above this line.
And so we get that this is the triangle, the small triangle
that we need in this case.
Notice that it's exactly the same area as this one.
And so we get another contribution of 1/8 here.
So the final answer is, 1/8 plus 1/8 is 1/4.
So the probability of forming a triangle using this three
pieces is exactly 1/4.
And so notice that what we've done is, you've set things up
very methodically in the beginning by assigning these
random variables.
You consider different cases.
Because you don't actually know the order in which X and
Y fall, let's just assume that one particular order and work
from there .
And then do the other case as well.
And it just so happened that because of the symmetry of the
problem, the second case was actually very simple.
We could just see that it is actually symmetric.
And so we get the same answer.
So this is kind of an interesting problem because
it's actually a practical application of something that
you might actually do.
And you can see that just by applying these probability
concepts, you can actually calculate an interesting
probability.
And the answer to this question is 1/4.
So I hope that was interesting, and we'll
see you next time.


# 10. Buffon's needle and Monte Carlo simulation

[][Why I cant model something like this into a purly probability situration, think about it]
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 1.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 3.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 4.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 5.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 6.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 7.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 8.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 9.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 10.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 11.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 12.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 13.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 14.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 15.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 16.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 17.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 18.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 19.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 20.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 21.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 22.png)
![](C:/Users/qp/Pictures/Screenshots/10. Buffon's needle and Monte Carlo simulation - 23.png)
![](C:/Users/qp/Pictures/20221019_212406.jpg)
![](C:/Users/qp/Pictures/20221019_212213.jpg)
![](C:/Users/qp/Pictures/20221019_212256.jpg)
In this segment, we will look at the famous example, which
was posed by Comte de Buffon, a French naturalist back in
the 18th century.
And it marks the beginning of a subject that is known as the
subject of geometric probability.
The problem is pretty simple.
We have the infinite plane.
And we draw lines that are parallel to each other.
And they are spaced apart d units.
So this distance here is d.
And the same for all the other lines.
We take a needle that has a certain length, l.
And we throw it at random on the plane.
So the needle might fall this way, so that it
doesn't cross any line.
Or it might fall this way, so that it ends up crossing one
of the lines.
If the needle is long enough, it might actually even end up
crossing two of the lines.
But we will make the assumption that the length of
the needle is less than the distance between
two adjacent lines.
So that we're going to have either this configuration or
that configuration.
So in this setting, we're interested in the question of
how likely is it that the needle is going to intersect
one of the lines if the needle is thrown
completely at random?
We will answer this question.
And we will proceed as follows.
First, we need to model the probabilistic experiment
mathematically.
That is, we need to define an appropriate sample space,
define some relevant random variables, choose an
appropriate probability law, identify the event of
interest, and then calculate it.
Let us see what it takes to describe a typical outcome of
the experiment.
Suppose that the needle fell this way.
So that the nearest line is the one above.
And let us mark here the center of the needle.
One quantity of interest is this vertical distance between
the needle and the nearest line.
Let us call this quantity x.
We're using here a lowercase x, because we're dealing with
a numerical value in one particular outcome of the
experiment.
But we think of this x as being the realization of a
certain random variable that we we'll denote by capital X.
What else does it take to describe the needle?
Suppose that the needle had fallen somewhere, so that it
is at the same vertical distance from the nearest
line, but it has an orientation of this kind.
This orientation compared to that one should make a
difference.
Because when it falls that way, it's more likely that
it's going to cut the next line, as opposed to this case.
So the angle that the needle is making with the parallel
lines should also be relevant.
So let us give a name to that particular angle.
So let's extend that line until it
crosses one of the lines.
And let us give a name to this angle, and call it theta.
So if I tell you x and theta, you know how far away the
needle is from the nearest line, and at what angle it is.
It looks like these are two useful variables to describe
the outcome of the experiment.
So let us try to working with these.
So our model is going to involve two random variables,
defined the way we discussed just now.
What is the range of these random variables?
Since we took x to be the distance from the nearest
line, and the lines are d units apart, this means that x
is going to be somewhere between 0 and d over 2.
How about theta?
So the needle makes two angles with the parallel line.
It's this angle, and the complementary one.
Which one do we take?
Well, we use the convention that theta is defined as the
acute angle that the direction of the needle is
making with the lines.
So that theta will vary over a range from 0 to pi over 2.
And our sample space for the experiment will be the set of
all pairs of x and theta that satisfy these two conditions.
These will be the possible x's and thetas.
Having defined the sample space, next we need to define
a probability law.
At this point, we do not want to make any arbitrary
assumptions.
We only have the words completely at random to go by.
But what do these words mean?
We will interpret them to mean that there are
no preferred x values.
So that's all x values are, in some sense, equally likely.
So we're going to assume that X is a
uniform random variable.
Since it is uniform, it's going to be a constant over
this range.
And in order to integrate to 1, that constants will have to
be 2 over d.
And we understand that the PDF of X is 0 outside that range.
Similarly, for theta that we do not want to assume that
some orientations are more likely than other
orientations.
So we will again assume a uniform probability
distribution.
And therefore that PDF must be equal to 2 over pi for thetas
over this particular range.
So far we have specified the marginal PDF's for each one of
the two random variables.
How about their joint PDF?
In order to have a complete model, we need to have a joint
PDF in our hands.
Here, we're going to make the assumption that X and THETA
are independent of each other.
And in that case, the joint PDF is determined by just
taking the product of the marginal PDF's.
So the joint PDF is going to be equal to 4
divided by pi times d.
By this point, we have completely specified the
probabilistic model.
We have made some assumptions, which you might
even consider arbitrary.
But these assumptions are a reasonable attempt at
capturing the idea that the needle is thrown
completely at random.
This completes the subjective part, the modeling part.
The next step is much more streamlined.
There's not going to be any choices.
We just need to consider the event of interest, express it
in terms of the random variables that we have in our
hands, and then you use the probability model that we have
to calculate the probability of this particular event.
So let us identify the event of interest.
When will the needle intersect the nearest line?
This will depend on the following.
We can look at the vertical extent of the needle.
By vertical extent, I mean the following.
Let's see how far the needle goes in the vertical
direction, which is the length of this green segment here.
In this example, the vertical extent of the needle is less
than the distance from the next line.
And we do not have an intersection.
If the figure was something like this, the vertical extent
of the needle would have been that, but x would have been
just this little segment.
The vertical extent is bigger than x.
And the needle intersects the line.
So we have an intersection if and only if the vertical
extent, which is this vertical green segment, is larger than
the distance x.
Or, equivalently, if x is less than the vertical extent.
So we will have an intersection if x is less than
or equal to the vertical extent of the needle.
Now how big is this vertical extent?
Let's use some trigonometry here.
This angle here is theta.
So this angle here is also theta.
Here we have a right triangle.
And the hypotenuse of this triangle is l over 2.
This angle is theta.
Therefore this vertical segment is equal to l over 2
times the sine theta.
So this is the geometrical condition that describes the
event that the needle intersects the nearest line.
And all we need to do now is to calculate the probability
of this event.
So here's what we have so far.
This is the picture that we had before, but drawn in a
somewhat nicer way.
This is the joint PDF that we decided upon.
And we wish to calculate the probability of this particular
event, that is less than or equal to l over 2 sine THETA.
How do we calculate the probability of an event that
has to do with two random variables?
What we do is we take the joint PDF, which in our case
is 4 over pi d, and integrate it over the set of x's and
thetas for which the PDF is non-zero.
So it's only going to be over x's and
thetas in those ranges.
And also only for those x theta pairs for which the
event occurs.
So what are these pairs?
This event can occur with any choice of theta.
So theta is free to vary from 0 to pi over 2.
How about x?
For this event to occur, x can be anything that is
non-negative, as long as it is less than or
equal to this number.
So the upper limit of this integration is going to be l
over 2 times sine theta.
And all we need to do now is to
evaluate this double integral.
Let's start with the inner integral.
Because we're just integrating a constant, the inner integral
evaluates to the constant we're integrating, which is 4
times pi d, times the length of the interval over which
we're integrating, which is l over 2 sine theta.
And now we need to carry out the outer integral.
Let us pull out the constants, which is this 4 with
this 2 give us 2.
We have 2l over pi d.
And then the integral from 0 to pi over 2 of sine theta.
Now the integral of sine theta is minus cosine theta.
And we need to evaluate this at 0 and pi over 2.
This turns out to be equal to 1.
So the final result is 2l over pi d.
And this is the final answer to the problem that we have
been considering.
And now, a curious thought.
Suppose that you do not know what the number pi is.
And all you have in your hands is your floor, lines drawn on
your floor, and the needle.
And you do know the length between adjacent
lines on your floor.
And you do know your length of your needle.
How can you figure out the number pi?
Take your needle.
Throw it at random, a million times.
And count the frequency with which the needle ends up
crossing the line.
If you believe that probabilities can be
interpreted as frequencies, the frequency that you observe
gives you a good estimate of this probability.
So it gives you a good estimate of
this particular number.
And if you know the length of your needle, and of the
distance between the different lines, you can use the
estimate of that number to determine the value of pi.
This is a so-called Monte Carlo method, which uses
simulation to evaluate experimentally the value, in
this case, of the constant, pi.
Of course, for pi, we have much better ways of
calculating it.
But there are many applications in engineering
and in physics where certain
quantities are hard to calculate.
But they can be calculated using a trick of this kind of
by simulation.
Here's a typical situation.
Consider the unit cube.
And for simplicity, I'm only taking a cube in two
dimensions.
But in general, think of the unit cube in n dimensions
which is an object that's has unit volume.
Inside that unit cube, there is a complicated subset, which
is described maybe by some very complicated formulas.
And you want to calculate the volume of
this complicated subset.
The description of the subset is so complicated that using
integration, multiple integrals, and calculus is
practically impossible.
What can you do?
What you can do is to start throwing, at random, points
inside that unit cube.
So you throw points.
Some fall inside.
Some fall outside.
You count the frequency with which is the points happened
to be inside your set.
And as long as you're throwing the points uniformly over the
cube, then the probability of your complicated set is going
to be the volume of that set.
You estimated a probability by counting the frequency with
which you get points into that set.
And so, by using these observed frequencies, you can
estimate the volume of a sets, something that might be very
difficult to do through other numerical methods.
It turns out that these days physicists and many engineers
use methods of this kind quite often, and in many important
applications.


# 11. The Bayes rule with continuous random variables

![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 1.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 2.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 3.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 4.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 5.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 6.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 7.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 8.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 9.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 10.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 11.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 12.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 13.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 14.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 15.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 16.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 17.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 18.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 19.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 20.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 21.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 22.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 23.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 24.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 25.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 26.png)
![](C:/Users/qp/Pictures/Screenshots/11. The Bayes rule with continuous random variables - 27.png)
In this segment, we will go through a quite representative
application of the Bayes rule to a situation that involves
two continuous random variables.
The setting is as follows.
We start with two independent random variables, and we get
to observe the sum of the two.
So for example, we might have a signal.
That signal is transmitted through some medium, and so it
is corrupted by noise, called the noise Y. And we get to
observe Z. On the basis of the observed value of Z, we would
like to say as much as we can about the unknown signal, X.
How do we approach this?
Well, we will try to find the conditional distribution of X
given the value z of the signal that we have observed.
How do we calculate this conditional distribution?
The setting here suggests that we should be
using the Bayes rule.
So let's just write down the formula for the Bayes rule
that we already have.
On the other hand, this formula does not quite fit the
setting here.
This is the formula that would be applicable if we were to
observe Y and try to say something about X.
But here we're observing Z. We're not observing Y. What
should we do?
It's very simple.
Just use the same formula.
But instead of having y's, use z's.
So the formula takes this equivalent form.
So this is what we want to calculate--
the conditional distribution of X given an observed value
of Z. In general, we will be given the distribution of X.
So this would be a known quantity.
But how about this?
We need the conditional distribution of Z given X. In
other words, we essentially need a model of how the
observations, z, are constructed.
We're talking about a model of this box that tells us, given
a value of X, what are the properties of Z?
Suppose that instead, however, all that we have in our hands
is the distribution of Y and the knowledge that X and Y are
independent.
Somehow we need to use this information to find the
distribution of Z. So let us start working
towards this goal.
We can figure out what the answer will be quite
intuitively if I tell you that capital X takes a value of
little x, then in this conditional universe, Z is
going to be just this random variable.
And this is the same as the random variable, capital Y,
except that a constant is added to that random variable.
So Z is going to have the same distribution as Y, except that
it gets shifted by an amount of little X. So it has the
same distribution as Y, except that the
distribution is shifted.
So instead of having an argument z here, we will need
to have an argument z minus x.
So this corresponds to taking the distribution of Y and
shifting it by x.
Another way to think about why this argument, z minus x, here
is correct, notice that in order for capital Z to be
equal to some little z, then capital Y should be
equal to z minus x.
This is the value of Y that makes the sum of x and y turn
out to be the same as little z.
So intuitively, this is what we would get for the
conditional density of Z. And then we could just apply this
form of the Bayes rule.
It's instructive, though, to re-derive this relation here
in a different and somewhat more formal way.
So let us go through one more derivation.
This time we will give an argument that relies on CDFs.
And here's how it goes.
We will first find the cumulative distribution
function of Z conditioned on X. That is, we will attempt to
calculate this conditional probability.
And then, since this is a conditional CDF, we would then
take derivatives to find conditional PDF.
This probability, by definition, Z is X plus Y. So
we have this probability conditioned on X
equal to little x.
But if we know that capital X takes a value of little x,
then this capital X here can be replaced just by little x.
And we obtain this probability here.
Now, this is an expression, which is a random variable,
because Y is random.
But little x is a number.
So Y is the only random variable involved here.
Y is independent from X. So the conditional distribution
given the value of capital X should be the same as the
unconditional distribution.
And finally, we can rewrite this event in this form--
the probability of capital Y being less than or
equal to z minus x.
And we recognize that here we have the cumulative
distribution of Y, but evaluated at little z minus x.
Here we have the cumulative of the conditional distribution
of Z given X.
We take the derivative of this quantity to find the density
of the conditional distribution of Z given X. And
that density has to be equal to the derivative of this side
of the equation.
And the derivative here with respect to z is just the
density of Y evaluated at z minus x.
So we have verified, using a second method, that this
formula here is correct.
So here's a summary of what we have done so far.
We basically wrote down the form of the Bayes rule.
And we wrote down how the distribution of Z can be
expressed in terms of the distribution of Y.
And so we now have available an expression for this term
that appears in the numerator.
Now, let us look at the concrete example.
Suppose that X and Y are exponential random variables,
with a common parameter, lambda.
How are we to think about this situation?
We start at time zero.
We have a light bulb.
We wait until a light bulb burns out.
And the time until a light bulb burns out is
exponentially distributed with some parameter lambda.
Then we start using a new light bulb.
And that light bulb will also remain alive for a random
amount of time.
Call it Y.
And at that time, the second light bulb burns out as well.
The total amount of time that has elapsed, we call it Z.
Somebody tells you the value of Z. That's when the second
light bulb burned out.
And you're trying to say as much as you can about the
value of X.
You know that X, the time of the burn out of the first
bulb, was sometime before Z. But you do not to know where
exactly in that interval it's going to be.
So what we want to calculate is a probability distribution
for X, which is going to be a probability distribution on
this interval.
So let us now calculate.
In this example, we're assuming that X is
exponential.
So the density of X takes this form for x non-negative.
We're also told that Y is exponential, with the same
distribution.
So we have the same expression for the density of Y.
And now, we can fix some z non-negative.
Only non-negative z's are possible.
So we will try to calculate this conditional PDF only for
non-negative z's.
And using the form of the Bayes rule that we have, what
we got is, let's first put down the denominator term.
And here we have lambda e to the minus lambda x.
And this is valid when x is non-negative.
And then we multiply with this conditional density, which is
a conditional density of Y, evaluated at z minus x.
And this is going to be lambda e to the mines
lambda z minus x.
And this expression here is valid when the density of Y
takes this form when the argument is non-negative.
So it takes this form when the argument is non-negative.
The e argument is z minus x.
So this is valid when z minus x is non-negative.
Or what is the same, it's valid when x is less than or
equal to z.
So in this range, when x is non-negative and x is less
than or equal to z, we have this expression.
Now, we notice that we have a term, e to the minus lambda x
here, and we have a similar term here but with an extra
minus sign.
So these two terms will cancel each other.
And what we're left with is the following expression.
And this expression is valid given the constraints that we
have imposed so far. z is non-negative, x is
non-negative, and x is less than or equal to z.
So it is valid when x and z obey this condition.
And this is the form for the conditional PDF of X if I tell
you the value of Z. This is the form that it takes.
What's special about this PDF?
Remember that we have fixed Z to a certain value.
And we're trying to describe the likelihood of the
different values of X. But there is no
dependence on x here.
So the PDF is constant as a function of x.
And this means that if I tell you the value of Z, then X is
going to have a uniform conditional distribution on
this interval.
And this is quite interesting.
Essentially it tells you the following.
Going back to our light bulb example, if I tell you that
the second light bulb burned out at this time, and I ask
you, when do you think the first light bulb burned out,
you're going to tell me, I really don't know, I have
complete ignorance.
Any time in this interval is, in some sense, as likely as
any other time.
Now, this conclusion might seem perhaps surprising or
unexpected.
But it becomes quite plausible if you start thinking in terms
of coin tosses and the geometric distribution.
To make a discrete time analagous story, we could
think as follows.
We have here time that is divided in slots.
And you keep tossing a coin until heads is obtained for
the first time.
And this is a geometric random variable.
Then you keep tossing again until heads is obtained for
the second time.
So this Y is also going to be a geometric random variable.
If I tell you that the second heads showed up at this time
and I ask you, what do you think about the first heads,
well, you're going to tell me, I know that there was exactly
one head in the time up to here.
But all slots are equally likely.
The one head that we have may have appeared in any one of
those previous slots.
And each one of those previous slots has the same probability


## Course  /  Unit 5: Continuous random variables  /  Problem Set 5  

# 1. Normal random variables

![](C:/Users/qp/Pictures/Screenshots/1. Normal random variables - 1.png)
![](C:/Users/qp/Pictures/Screenshots/1. Normal random variables - 2.png)
![](C:/Users/qp/Pictures/Screenshots/1. Normal random variables - 3.png)
![](C:/Users/qp/Pictures/Screenshots/1. Normal random variables - 4.png)
![](C:/Users/qp/Pictures/20221019_223319.jpg)


# 2. CDF

![](C:/Users/qp/Pictures/Screenshots/2. CDF - 1.png)
![](C:/Users/qp/Pictures/Screenshots/2. CDF - 2.png)
![](C:/Users/qp/Pictures/Screenshots/2. CDF - 3.png)
![](C:/Users/qp/Pictures/Screenshots/2. CDF - 4.png)
![](C:/Users/qp/Pictures/Screenshots/2. CDF - 5.png)
![](C:/Users/qp/Pictures/Screenshots/2. CDF - 6.png)


# 3. A joint PDF given by a simple formula

![](C:/Users/qp/Pictures/Screenshots/3. A joint PDF given by a simple formula - 1.png)
![](C:/Users/qp/Pictures/Screenshots/3. A joint PDF given by a simple formula - 2.png)
![](C:/Users/qp/Pictures/Screenshots/3. A joint PDF given by a simple formula - 3.png)
![](C:/Users/qp/Pictures/Screenshots/3. A joint PDF given by a simple formula - 4.png)
![](C:/Users/qp/Pictures/Screenshots/3. A joint PDF given by a simple formula - 5.png)
![](C:/Users/qp/Pictures/Screenshots/3. A joint PDF given by a simple formula - 6.png)
![](C:/Users/qp/Pictures/Screenshots/3. A joint PDF given by a simple formula - 7.png)
![](C:/Users/qp/Pictures/20221021_012308.jpg)
![](C:/Users/qp/Pictures/20221021_012400.jpg)
![](C:/Users/qp/Pictures/20221021_012447.jpg)
![](C:/Users/qp/Pictures/20221021_012519.jpg)
![](C:/Users/qp/Pictures/20221021_012538.jpg)
![](C:/Users/qp/Pictures/20221021_013210.jpg)


# 4. Sophia's vacation

![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 1.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 2.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 3.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 4.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 5.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 6.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 7.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 8.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 9.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 10.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 11.png)
![](C:/Users/qp/Pictures/Screenshots/4. Sophia's vacation - 12.png)
![](C:/Users/qp/Pictures/20221022_202712.jpg)
![](C:/Users/qp/Pictures/20221022_202746.jpg)
![](C:/Users/qp/Pictures/20221022_202805.jpg)
![](C:/Users/qp/Pictures/20221022_202843.jpg)
![](C:/Users/qp/Pictures/20221022_202914.jpg)
![](C:/Users/qp/Pictures/20221022_202931.jpg)
![](C:/Users/qp/Pictures/20221022_203000.jpg)
![](C:/Users/qp/Pictures/20221022_203030.jpg)
![](C:/Users/qp/Pictures/20221022_203059.jpg)


# 5. True or False

![](C:/Users/qp/Pictures/Screenshots/5. True or False - 1.png)
![](C:/Users/qp/Pictures/Screenshots/5. True or False - 2.png)
![](C:/Users/qp/Pictures/Screenshots/5. True or False - 3.png)
![](C:/Users/qp/Pictures/Screenshots/5. True or False - 4.png)
![](C:/Users/qp/Pictures/Screenshots/5. True or False - 5.png)


# 6. Bayes' rule

![](C:/Users/qp/Pictures/Screenshots/6. Bayes' rule - 1.png)
![](C:/Users/qp/Pictures/Screenshots/6. Bayes' rule - 2.png)
![](C:/Users/qp/Pictures/Screenshots/6. Bayes' rule - 3.png)
![](C:/Users/qp/Pictures/Screenshots/6. Bayes' rule - 4.png)
![](C:/Users/qp/Pictures/Screenshots/6. Bayes' rule - 5.png)


# 7. A joint PDF on a triangular region

![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 1.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 2.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 3.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 4.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 5.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 6.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 7.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 8.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 9.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 10.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 11.png)
![](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 12.png)
![Why I get it wrong in 3 and 4](C:/Users/qp/Pictures/Screenshots/7. A joint PDF on a triangular region - 13.png)
[][***************************************************************************************************************************]
[][***************************************************************************************************************************]


## Course  /  Unit 5: Continuous random variables  /  Unit summary

# 1. Unit 5 summary

Let us now summarize what we have done with continuous
random variables.
This slide is essentially identical to our summary slide
for discrete random variables with some differences that are
marked in red.
Instead of describing random variables by PMFs of all
sorts, we started using PDFs.
Symbolically, p's get replaced by f's.
But it is important to realize that this is not just a
notation change.
We are dealing with a similar but different concept.
Other than that, all the concepts and formulas from the
discrete setting have continuous analogs, for
example, expectation, the expected value rule, the
variance, independence, and so on.
Our basic tools, the multiplication rule, the total
probability theorem, and the total expectation theorem all
have continuous analogs in which PMFs get replaced by
PDFs and sums by integrals.
And we also introduced some examples of useful continuous
random variables.
Rather than reviewing all formulas.
it may actually be more useful to review
what is new or different.
I already mentioned the general fact that we replace
sums by integrals, and PMFs by PDFs.
More important is the fact that PDFs are not
probabilities but probability densities.
They give the rate at which probability accumulates in the
vicinity of a point.
If we wish to interpret them as probabilities, then we have
to think of them as providing us with the probabilities of
small intervals.
The reason for all this is that in a continuous model,
individual points have zero probabilities.
As a consequence, conditioning on an event of
this form is tricky.
But we did find a way to get around this difficulty by
first defining a conditional PDF algebraically and then
defining conditional probabilities by integrating
the conditional PDF.
One new concept that we did introduce was the concept of
the cumulative distribution function.
We did not use it much.
But it is a convenient way of describing probability
distributions in general.
And we will see some uses later in this course.
Finally, we concluded by developing different forms of
the Bayes rule.
There are four variations, because each one of the two
random variables involved can be either discrete or
continuous.
These variations will be the foundation for our subsequent
study of the subject of inference.
And so, at this point, we have covered all of the major
concepts and formulas that are needed to manipulate
probabilities and expectations for all
kinds of random variables.
The rest of this course will be largely an application of
the skills that we have developed.
End of transcript. Skip to the start.


## Course  /  Unit 6: Further topics on random variables  /  Unit overview