You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is the same as equation (11.9) in Bishop, except that he denotes the absolute value of the determinant with just $|\mathbf{J}|$.
173
+
174
+
::: {.callout-note}
175
+
In different contexts the Jacobian can have different 'numerators' and 'denominators' in the partial derivatives.
176
+
For example, if $\mathbf{y} = f(\mathbf{x})$, then it's common to write $\mathbf{J}_f$ as a matrix of partial derivatives of elements of $y$ with respect to elements of $x$.
177
+
(For example, $\mathbf{J}_f$ is used in [Newton's method](https://en.wikipedia.org/wiki/Newton%27s_method) to find the zeroes of $f$, i.e. the values of $\mathbf{x}$ such that $\mathbf{y} = \mathbf{0}$.)
178
+
However, it is always the case that the elements of the 'numerator' vary with rows and the elements of the 'denominator' vary with columns.
179
+
:::
180
+
181
+
The rest of this section will be devoted to an example to show that this works, and contains some slightly less pretty mathematics.
182
+
If you are already suitably convinced by this stage, then you can skip the rest of this section.
183
+
(Or if you prefer something more formal, the Wikipedia article on integration by substitution [discusses the multivariate case as well](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables).)
184
+
185
+
### An example: the Box–Muller transform
186
+
187
+
A motivating example where one might like to use a Jacobian is the [Box–Muller transform](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform), which is a technique for sampling from a normal distribution.
188
+
189
+
The Box–Muller transform works by first sampling two random variables from the uniform distribution between 0 and 1:
190
+
191
+
$$\begin{align}
192
+
x_1 &\sim U(0, 1) \\
193
+
x_2 &\sim U(0, 1).
194
+
\end{align}$$
195
+
196
+
Both of these have a probability density function of $p(x) = 1$ for $0 < x \leq 1$, and 0 otherwise.
We haven't yet explicitly accounted for the fact that $p(x_1, x_2)$ is 0 if either $x_1$ or $x_2$ are outside the range $(0, 1]$.
269
+
For example, if this constraint on $x_1$ and $x_2$ were to result in inaccessible values of $y_1$ or $y_2$, then $q(y_1, y_2)$ should be 0 for those values.
270
+
Formally, for the transformation $f: X \to Y$ where $X$ is the unit square (i.e. $0 < x_1, x_2 \leq 1$), $q(y_1, y_2)$ should only take the above value for the [image](https://en.wikipedia.org/wiki/Image_(mathematics)) of $f$, and anywhere outside of the image it should be 0.
271
+
272
+
In our case, the $\log(x_1)$ term in the transform varies between 0 and $\infty$, and the $\cos(2\pi x_2)$ term ranges from $-1$ to $1$.
273
+
Hence $y_1$, which is the product of these two terms, ranges from $-\infty$ to $\infty$, and likewise for $y_2$.
274
+
So the image of $f$ is the entire real plane, and we don't have to worry about this.
275
+
:::
276
+
277
+
278
+
## Bijectors.jl
279
+
280
+
All the above has purely been a mathematical discussion of how distributions can be transformed.
281
+
Now, we turn to their implementation in Julia, specifically using the [Bijectors.jl package](https://github.com/TuringLang/Bijectors.jl).
282
+
283
+
A _bijection_ between two sets ([Wikipedia](https://en.wikipedia.org/wiki/Bijection)) is, essentially, a one-to-one mapping between the elements of these sets.
284
+
That is to say, if we have two sets $X$ and $Y$, then a bijection maps each element of $X$ to a unique element of $Y$.
285
+
To return to our univariate example, where we transformed $x$ to $y$ using $y = \exp(x)$, the exponentiation function is a bijection because every value of $x$ maps to one unique value of $y$.
286
+
The input set (the domain) is $(-\infty, \infty)$, and the output set (the codomain) is $(0, \infty)$.
287
+
288
+
Since bijections are a one-to-one mapping between elements, we can also reverse the direction of this mapping to create an inverse function.
289
+
In the case of $y = \exp(x)$, the inverse function is $x = \log(y)$.
290
+
291
+
::: {.callout-note}
292
+
Technically, Bijectors.jl is concerned with functions $f: X \to Y$ for which:
293
+
294
+
- $f$ is continuously differentiable, i.e. the derivative $\mathrm{d}f(x)/\mathrm{d}x$ exists and is continuous (over the domain of interest $X$);
295
+
- If $f^{-1}: Y \to X$ is the inverse of $f$, then that is also continuously differentiable (over _its_ own domain, i.e. $Y$).
296
+
297
+
These are called diffeomorphisms ([Wikipedia](https://en.wikipedia.org/wiki/Diffeomorphism)).
298
+
299
+
When thinking about continuous differentiability, it's important to be conscious of the domains or codomains that we care about.
300
+
For example, taking the inverse function $\log(y)$ from above, its derivative is $1/y$, which is not continuous at $y = 0$.
301
+
However, we specified that the bijection $y = \exp(x)$ maps values of $x \in (-\infty, \infty)$ to $y \in (0, \infty)$, so the point $y = 0$ is not within the domain of the inverse function.
302
+
:::
303
+
304
+
It's still unclear to me how the term biject**or** was adopted over biject**ion**, which is the common mathematical term.
305
+
As far as I can tell, it's only used in this specific context of transforming distributions.
306
+
307
+
TODO: describe and illustrate API of Bijectors
308
+
309
+
Maybe TODO: describe how logabsdetjac is calculated (or can be calculated) via AD
310
+
311
+
## Why is this useful for sampling anyway?
312
+
313
+
Constrained vs unconstrained variables, sampling, etc.
314
+
315
+
## How does DynamicPPL use bijectors?
316
+
317
+
link, invlink, transform, varinfo etc.
318
+
319
+
See [https://turinglang.org/DynamicPPL.jl/stable/internals/transformations/](https://turinglang.org/DynamicPPL.jl/stable/internals/transformations/)
0 commit comments