Skip to content

Commit 9ae004d

Browse files
committed
Write more
1 parent 89fe98a commit 9ae004d

File tree

1 file changed

+160
-1
lines changed

1 file changed

+160
-1
lines changed

src/transforms.qmd

+160-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ That's all great, and furthermore if you want to know the probability of observi
3232

3333
The probability density function for the normal distribution with mean 0 and standard deviation 1 is
3434

35-
$$p(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2},$$
35+
$$p(x) = \frac{1}{\sqrt{2\pi}} \exp{\left(-\frac{x^2}{2}\right)},$$
3636

3737
so we could also have calculated this manually using:
3838

@@ -157,4 +157,163 @@ You can try this out with the transformation $y = -\exp(x)$: you will have to fl
157157

158158
## The Jacobian
159159

160+
In general, we may have transforms that act on multivariate distributions, for example something mapping $p(x_1, x_2)$ to $q(y_1, y_2)$.
161+
In this case, the rule above has to be extended by replacing the derivative $\mathrm{d}x/\mathrm{d}y$ with the determinant of the Jacobian matrix:
160162

163+
$$\mathbf{J} = \begin{pmatrix}
164+
\partial x_1/\partial y_1 & \partial x_1/\partial y_2 \\
165+
\partial x_2/\partial y_1 & \partial x_2/\partial y_2
166+
\end{pmatrix}.$$
167+
168+
and specifically,
169+
170+
$$q(y_1, y_2) = p(x_1, x_2) \left| \det(\mathbf{J}) \right|.$$
171+
172+
This is the same as equation (11.9) in Bishop, except that he denotes the absolute value of the determinant with just $|\mathbf{J}|$.
173+
174+
::: {.callout-note}
175+
In different contexts the Jacobian can have different 'numerators' and 'denominators' in the partial derivatives.
176+
For example, if $\mathbf{y} = f(\mathbf{x})$, then it's common to write $\mathbf{J}_f$ as a matrix of partial derivatives of elements of $y$ with respect to elements of $x$.
177+
(For example, $\mathbf{J}_f$ is used in [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method) to find the zeroes of $f$, i.e. the values of $\mathbf{x}$ such that $\mathbf{y} = \mathbf{0}$.)
178+
However, it is always the case that the elements of the 'numerator' vary with rows and the elements of the 'denominator' vary with columns.
179+
:::
180+
181+
The rest of this section will be devoted to an example to show that this works, and contains some slightly less pretty mathematics.
182+
If you are already suitably convinced by this stage, then you can skip the rest of this section.
183+
(Or if you prefer something more formal, the Wikipedia article on integration by substitution [discusses the multivariate case as well](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables).)
184+
185+
### An example: the Box–Muller transform
186+
187+
A motivating example where one might like to use a Jacobian is the [Box–Muller transform](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform), which is a technique for sampling from a normal distribution.
188+
189+
The Box–Muller transform works by first sampling two random variables from the uniform distribution between 0 and 1:
190+
191+
$$\begin{align}
192+
x_1 &\sim U(0, 1) \\
193+
x_2 &\sim U(0, 1).
194+
\end{align}$$
195+
196+
Both of these have a probability density function of $p(x) = 1$ for $0 < x \leq 1$, and 0 otherwise.
197+
Because they are independent, we can write that
198+
199+
$$p(x_1, x_2) = p(x_1) p(x_2) = \begin{cases}
200+
1 & \text{if } 0 < x_1 \leq 1 \text{ and } 0 < x_2 \leq 1, \\
201+
0 & \text{otherwise}.
202+
\end{cases}$$
203+
204+
The next step is to perform the transforms
205+
206+
$$\begin{align}
207+
y_1 &= \sqrt{-2 \log(x_1)} \cos(2\pi x_2); \\
208+
y_2 &= \sqrt{-2 \log(x_1)} \sin(2\pi x_2),
209+
\end{align}$$
210+
211+
and it turns out that with these transforms, both $y_1$ and $y_2$ are independent and normally distributed with mean 0 and standard deviation 1, i.e.
212+
213+
$$q(y_1, y_2) = \frac{1}{2\pi} \exp{\left(-\frac{y_1^2}{2}\right)} \exp{\left(-\frac{y_2^2}{2}\right)}.$$
214+
215+
How can we show that this is the case?
216+
217+
There are many ways to work out the required calculus.
218+
Some are more elegant and some rather less so!
219+
One of the less headache-inducing ways is to define the intermediate variables:
220+
221+
$$r = \sqrt{-2 \log(x_1)}; \quad \theta = 2\pi x_2,$$
222+
223+
from which we can see that $y_1 = r\cos\theta$ and $y_2 = r\sin\theta$, and hence
224+
225+
$$\begin{align}
226+
x_1 &= \exp{\left(-\frac{r^2}{2}\right)} = \exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)}; \\
227+
x_2 &= \frac{\theta}{2\pi} = \frac{1}{2\pi} \, \arctan\left(\frac{y_2}{y_1}\right).
228+
\end{align}$$
229+
230+
This lets us obtain the requisite partial derivatives in a way that doesn't involve _too_ much algebra.
231+
As an example, we have
232+
233+
$$\frac{\partial x_1}{\partial y_1} = -y_1 \exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)} = -y_1 x_1,$$
234+
235+
(where we used the product rule), and
236+
237+
$$\frac{\partial x_2}{\partial y_1} = \frac{1}{2\pi} \left(\frac{1}{1 + (y_2/y_1)^2}\right) \left(-\frac{y_2}{y_1^2}\right),$$
238+
239+
(where we used the chain rule, and the derivative $\mathrm{d}(\arctan(a))/\mathrm{d}a = 1/(1 + a^2)$).
240+
241+
Putting together the Jacobian matrix, we have:
242+
243+
$$\mathbf{J} = \begin{pmatrix}
244+
-y_1 x_1 & -y_2 x_1 \\
245+
-cy_2/y_1^2 & c/y_1 \\
246+
\end{pmatrix},$$
247+
248+
where $c = [2\pi(1 + (y_2/y_1)^2)]^{-1}$.
249+
The determinant of this matrix is
250+
251+
$$\begin{align}
252+
\det(\mathbf{J}) &= -cx_1 - cx_1(y_2/y_1)^2 \\
253+
&= -cx_1\left[1 + \left(\frac{y_2}{y_1}\right)^2\right] \\
254+
&= -\frac{1}{2\pi} x_1 \\
255+
&= -\frac{1}{2\pi}\exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)},
256+
\end{align}$$
257+
258+
Coming right back to our probability density, we have that
259+
260+
$$\begin{align}
261+
q(y_1, y_2) &= p(x_1, x_2) \cdot |\det(\mathbf{J})| \\
262+
&= \frac{1}{2\pi}\exp{\left(-\frac{y_1^2}{2}\right)}\exp{\left(-\frac{y_2^2}{2}\right)},
263+
\end{align}$$
264+
265+
as desired.
266+
267+
::: {.callout-note}
268+
We haven't yet explicitly accounted for the fact that $p(x_1, x_2)$ is 0 if either $x_1$ or $x_2$ are outside the range $(0, 1]$.
269+
For example, if this constraint on $x_1$ and $x_2$ were to result in inaccessible values of $y_1$ or $y_2$, then $q(y_1, y_2)$ should be 0 for those values.
270+
Formally, for the transformation $f: X \to Y$ where $X$ is the unit square (i.e. $0 < x_1, x_2 \leq 1$), $q(y_1, y_2)$ should only take the above value for the [image](https://en.wikipedia.org/wiki/Image_(mathematics)) of $f$, and anywhere outside of the image it should be 0.
271+
272+
In our case, the $\log(x_1)$ term in the transform varies between 0 and $\infty$, and the $\cos(2\pi x_2)$ term ranges from $-1$ to $1$.
273+
Hence $y_1$, which is the product of these two terms, ranges from $-\infty$ to $\infty$, and likewise for $y_2$.
274+
So the image of $f$ is the entire real plane, and we don't have to worry about this.
275+
:::
276+
277+
278+
## Bijectors.jl
279+
280+
All the above has purely been a mathematical discussion of how distributions can be transformed.
281+
Now, we turn to their implementation in Julia, specifically using the [Bijectors.jl package](https://github.com/TuringLang/Bijectors.jl).
282+
283+
A _bijection_ between two sets ([Wikipedia](https://en.wikipedia.org/wiki/Bijection)) is, essentially, a one-to-one mapping between the elements of these sets.
284+
That is to say, if we have two sets $X$ and $Y$, then a bijection maps each element of $X$ to a unique element of $Y$.
285+
To return to our univariate example, where we transformed $x$ to $y$ using $y = \exp(x)$, the exponentiation function is a bijection because every value of $x$ maps to one unique value of $y$.
286+
The input set (the domain) is $(-\infty, \infty)$, and the output set (the codomain) is $(0, \infty)$.
287+
288+
Since bijections are a one-to-one mapping between elements, we can also reverse the direction of this mapping to create an inverse function.
289+
In the case of $y = \exp(x)$, the inverse function is $x = \log(y)$.
290+
291+
::: {.callout-note}
292+
Technically, Bijectors.jl is concerned with functions $f: X \to Y$ for which:
293+
294+
- $f$ is continuously differentiable, i.e. the derivative $\mathrm{d}f(x)/\mathrm{d}x$ exists and is continuous (over the domain of interest $X$);
295+
- If $f^{-1}: Y \to X$ is the inverse of $f$, then that is also continuously differentiable (over _its_ own domain, i.e. $Y$).
296+
297+
These are called diffeomorphisms ([Wikipedia](https://en.wikipedia.org/wiki/Diffeomorphism)).
298+
299+
When thinking about continuous differentiability, it's important to be conscious of the domains or codomains that we care about.
300+
For example, taking the inverse function $\log(y)$ from above, its derivative is $1/y$, which is not continuous at $y = 0$.
301+
However, we specified that the bijection $y = \exp(x)$ maps values of $x \in (-\infty, \infty)$ to $y \in (0, \infty)$, so the point $y = 0$ is not within the domain of the inverse function.
302+
:::
303+
304+
It's still unclear to me how the term biject**or** was adopted over biject**ion**, which is the common mathematical term.
305+
As far as I can tell, it's only used in this specific context of transforming distributions.
306+
307+
TODO: describe and illustrate API of Bijectors
308+
309+
Maybe TODO: describe how logabsdetjac is calculated (or can be calculated) via AD
310+
311+
## Why is this useful for sampling anyway?
312+
313+
Constrained vs unconstrained variables, sampling, etc.
314+
315+
## How does DynamicPPL use bijectors?
316+
317+
link, invlink, transform, varinfo etc.
318+
319+
See [https://turinglang.org/DynamicPPL.jl/stable/internals/transformations/](https://turinglang.org/DynamicPPL.jl/stable/internals/transformations/)

0 commit comments

Comments
 (0)