You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: src/transforms.qmd
+120-16
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,12 @@ import Random
8
8
Random.seed!(468);
9
9
```
10
10
11
-
This article is about transforming distributions and Bijectors.jl.
11
+
This article seeks to motivate Bijectors.jl and how distributions are transformed in the Turing.jl probabilistic programming language.
12
+
13
+
It assumes:
14
+
15
+
- some basic knowledge of probability distributions (the notions of sampling from them and calculating the probability density function for a given distribution); and
16
+
- some calculus (the chain and product rules for differentiation, and changes of variables in integrals).
12
17
13
18
## Sampling from a distribution
14
19
@@ -152,30 +157,32 @@ This equation is (11.5) in Bishop's textbook.
152
157
153
158
::: {.callout-note}
154
159
The absolute value here takes care of the case where $f$ is decreasing, i.e., the distribution is flipped.
155
-
You can try this out with the transformation $y = -\exp(x)$: you will have to flip the integration limits round to ensure that the integral comes out positive.
160
+
You can try this out with the transformation $y = -\exp(x)$.
161
+
If $a < b$, then $-exp(a) > -exp(b)$, and so you will have to swap the integration limits to ensure that the integral comes out positive.
156
162
:::
157
163
158
164
## The Jacobian
159
165
160
166
In general, we may have transforms that act on multivariate distributions, for example something mapping $p(x_1, x_2)$ to $q(y_1, y_2)$.
161
167
In this case, the rule above has to be extended by replacing the derivative $\mathrm{d}x/\mathrm{d}y$ with the determinant of the Jacobian matrix:
This is the same as equation (11.9) in Bishop, except that he denotes the absolute value of the determinant with just $|\mathbf{J}|$.
178
+
This is the same as equation (11.9) in Bishop, except that he denotes the absolute value of the determinant with just $|\mathcal{J}|$.
173
179
174
180
::: {.callout-note}
175
181
In different contexts the Jacobian can have different 'numerators' and 'denominators' in the partial derivatives.
176
-
For example, if $\mathbf{y} = f(\mathbf{x})$, then it's common to write $\mathbf{J}_f$ as a matrix of partial derivatives of elements of $y$ with respect to elements of $x$.
177
-
(For example, $\mathbf{J}_f$ is used in [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method) to find the zeroes of $f$, i.e. the values of $\mathbf{x}$ such that $\mathbf{y} = \mathbf{0}$.)
178
-
However, it is always the case that the elements of the 'numerator' vary with rows and the elements of the 'denominator' vary with columns.
182
+
For example, if $\mathbf{y} = f(\mathbf{x})$, then it's common to write $\mathbf{J}$ as a matrix of partial derivatives of elements of $y$ with respect to elements of $x$.
183
+
Indeed, later in this article we will see that Bijectors.jl uses this convention.
184
+
185
+
It is always the case, though, that the elements of the 'numerator' vary with rows and the elements of the 'denominator' vary with columns.
179
186
:::
180
187
181
188
The rest of this section will be devoted to an example to show that this works, and contains some slightly less pretty mathematics.
@@ -289,7 +296,7 @@ Since bijections are a one-to-one mapping between elements, we can also reverse
289
296
In the case of $y = \exp(x)$, the inverse function is $x = \log(y)$.
290
297
291
298
::: {.callout-note}
292
-
Technically, Bijectors.jl is concerned with functions $f: X \to Y$ for which:
299
+
Technically, the bijections in Bijectors.jl are functions $f: X \to Y$ for which:
293
300
294
301
- $f$ is continuously differentiable, i.e. the derivative $\mathrm{d}f(x)/\mathrm{d}x$ exists and is continuous (over the domain of interest $X$);
295
302
- If $f^{-1}: Y \to X$ is the inverse of $f$, then that is also continuously differentiable (over _its_ own domain, i.e. $Y$).
@@ -301,12 +308,109 @@ For example, taking the inverse function $\log(y)$ from above, its derivative is
301
308
However, we specified that the bijection $y = \exp(x)$ maps values of $x \in (-\infty, \infty)$ to $y \in (0, \infty)$, so the point $y = 0$ is not within the domain of the inverse function.
302
309
:::
303
310
304
-
It's still unclear to me how the term biject**or** was adopted over biject**ion**, which is the common mathematical term.
305
-
As far as I can tell, it's only used in this specific context of transforming distributions.
311
+
It's not entirely clear to me who first coined the term biject**or** (as opposed to biject**ion**), which is the mathematical term.
312
+
As far as I can tell, it's only used in this specific context of transforming probability distributions, and apart from Bijectors.jl itself, it is also used in [the TensorFlow deep learning framework](https://www.tensorflow.org/probability/api_docs/python/tfp/bijectors).
313
+
314
+
Specifically, one of the primary purposes of Bijectors.jl is used to construct _bijections which map constrained distributions to unconstrained ones_.
315
+
For example, the log-normal distribution which we saw above is constrained: its _support_, i.e. the range over which $p(x) \geq 0$, is (0, $\infty$).
316
+
However, we can transform that to an unconstrained distribution (the normal distribution) using the transformation $y = \log(x)$.
317
+
The `bijector` function, when applied to a distribution, returns a bijection $f$ that can be used to map the constrained distribution to an unconstrained one.
318
+
319
+
```{julia}
320
+
import Bijectors as B
321
+
322
+
f = B.bijector(LogNormal())
323
+
```
324
+
325
+
We can apply this transformation to samples from the original distribution, for example:
306
326
307
-
TODO: describe and illustrate API of Bijectors
327
+
```{julia}
328
+
samples_lognormal = rand(LogNormal(), 5)
329
+
330
+
samples_normal = f(samples_lognormal)
331
+
```
332
+
333
+
We can also obtain the inverse of a bijection, $f^{-1}$:
334
+
335
+
```{julia}
336
+
f_inv = B.inverse(f)
337
+
338
+
f_inv(samples_normal) == samples_lognormal
339
+
```
340
+
341
+
We know that the transformation $y = \log(x)$ changes the log-normal distribution to the normal distribution.
342
+
Bijectors.jl also gives us a way to access that transformed distribution:
343
+
344
+
```{julia}
345
+
transformed_dist = B.transformed(LogNormal(), f)
346
+
```
347
+
348
+
This type doesn't immediately look like a `Normal()`, but it behaves in exactly the same way.
349
+
For example, we can sample from it and plot a histogram:
350
+
351
+
```{julia}
352
+
samples_plot = rand(transformed_dist, 5000)
353
+
histogram(samples_plot, bins=50)
354
+
```
308
355
309
-
Maybe TODO: describe how logabsdetjac is calculated (or can be calculated) via AD
356
+
We can also obtain the logpdf of the transformed distribution and check that it is the same as that of a normal distribution:
Given the discussion in the previous sections, you might not be surprised to find that the transformed distribution is implemented using the Jacobian of the transformation.
from which you can see that the equation above holds.
408
+
There are more functions available in the Bijectors.jl API; for full details do check out the [documentation](https://turinglang.org/Bijectors.jl/stable/).
409
+
For example, `logpdf_with_trans` can directly give us $\log(q(\mathbf{y}))$:
0 commit comments