MonteCarloMarkovChain

This is the final project of the Advanced Statistics for Physics Analysis course. It is an analysis and implementation of some Adaptive Monte Carlo Markov Chains (MCMC). The main code containing all the useful functions is MCMC.R file: these functions are used to generate the results in the many notebooks.

Normal multi-dimensional MCMC

I implemented the Metropolis-Hastings algorithm, as done in class with the coda/rjags packages, but using plain code, which allowed me to:

enlarge the dimensionality of the generated data ('Normal Metropolis' in the file)
verify wheter the Gibbs sampling improved the performances or not ('Gibbs Sampling' in the file)

The code used, from now on, is the modification of a simple code that implements the normal 1-dimension Metrpolis-Hastings algorithm, which is made of the following functions:

A function to generate the new guessed point (enlarged in many dimensions for all the next programs):

generation_s = function (x0, std) {

    CDF = function (x) {pnorm(x, mean = x0, sd = std)}
    inv_CDF = inverse(CDF, lower = -Inf, upper = Inf)

    random = runif(as.integer(length(x0)))
    new_value = inv_CDF(random)

    return(new_value)
}

A function to evaluate the ratio between the two distribution in the case of asymmetrical sampling functions:

evaluate_Q = function(x0, std, point) {

    dx = dnorm(point, mean = x0, sd = std)
    sx = dnorm(x0, mean = point, sd = std)

    return(sx/dx)
}

A function to return the sequence of the samples for a determined wanted distribution, starting from an initial parameter. It returnes the chain values:

random_steps = function (func_wanted, theta_init, n_samples, sigma, print_accept=FALSE) {

    # Initilalizing the parameters
    current_theta = theta_init
    current_function = func_wanted(theta_init)
    samples = matrix(data = NA, nrow = n_samples, ncol = length(theta_init) + 1)

    # For statistical purposes
    accepted = 0

    # Evolution loop
    for (n in 1:n_samples) {

        # Take a guessed new theta (s in the slides) and evaluate its probability
        guessed_theta = generation_s(current_theta, sigma)
        guessed_function = func_wanted(guessed_theta)

        # Acceptance conditions
        Q_ratio = evaluate_Q(current_theta, sigma, guessed_theta)
        rho = guessed_function/current_function * Q_ratio
        # cat(guessed_theta, guessed_function, Q_ratio, rho)

        # And then update the general conditions
        if (rho > runif(1)) {
            current_theta = guessed_theta
            current_function = guessed_function
            accepted = accepted + 1
        } # else they remain unchanged and then loaded direcctly in the solutions vector

        # Saving the generated samples because R doesn't accept two elements in returns
        samples[n,] = unlist(c(current_function, current_theta))

    }

    if(print_accept) {
        cat("Acceptance rate = ", round(accepted/n_samples*100, 5), '%\n')
    }

    return(samples)
}

MCMC using "mvtnorm" package

After writing these functions, I found many R packages able to create and sample from gaussian distributions from the mean and the covariance of the multi-dimensional gaussian. I decided to use the mvtnorm package, because of its similarity to the normal declaration functions in R. An example is given by the code for retrieving the distribution value (to evaluate Q-ratio):

dmvnorm(point, mean = x0, sigma = cov, log = FALSE)

And similarly here I added the possibility for the Gibbs sampling in the multi-dimentional case

Are these improvements worth it?

In order to see if the effort of improving the function is worthy, I run it in order to find the 3-D gaussian with fixed different variance, timed the program and looked at the acceptance rate, finding the results in Table:

Function	Acceptance Rate (%)	Time needed (s)	Coda Autocorrelation
1D MCMC	60	30	low
3D MCMC Metropolis	48	130	low
3D MCMC Gibbs	59	135	low
3D MCMC mvtnorm	50	40	low
3D MCMC mvtnorm Gibbs	57	140	low

These results have been retrieved running the program three times with different initial point looking for a total of 100_000 points using a very wide gaussian for the sampling (unchanged) in order to present the strenghts and weakness of the three methods. The table shows that:

1D is very performing, but has the limit of the dimensions
3D Metropolis has a low acceptance rate: the gaussian for the selection of the new point is very wide, resulting in a small acceptance rate
3D Gibbs returns to the higher acceptance rate found in the 1D case, since it moves in 1D (file )
3D mvtnorm gives the possibility of having covariance, but at the cost of time speed and acceptance rate (it is done without Gibbs sampling, in order to spare time and computational resources, otherwise they would scale linearly with the number of dimentions)
3D mvtnorm Gibbs, as done in the previous case, increases the acceptance rate at the cost of time

All of them have a low Autocorrelation due to the very wide sampling ditribution, but unluckily all of them suffer from the same problem: no covariance in the selecting distribution. We want an algorithm that keeps the autocorrelation low, but increases the acceptance rate, shaping the sampling distribution at will in order to archieve these goals

Attention: I cannot use these algorithms for adaptive MCMC, because they don't allow to shape properly the multi-dimensional gaussian, therefore I need an algorithm that uses directly the covariance matrix

The code for generating such analysis and plots is available in MCMC/MCMC_Gibbs.ipynb for the usual implementation and in MCMC/MCMC_mnorm.ipynb for the mvnorm package implementation.

Adaptive MCMC

There are a lot of different implementations for an adaptive MCMC, which all have strenghts and weatknesses that I want to analize: all of them do have a change in the sampling distribution parameters along the chain.

These algorithms are not hard to be implemented in order to have an adaptive MCMC, as presented by Haario et al. $[1]$ and well studied in many works. The idea behind them is to sample from a very wide multi-dimentional distribution at the beginning, then to update this covariance matrix, with always small corrections as well as marginalization factors in order not to have an exploding matrix in time.

Adaptive Metropolis algorithms (AM, in file `MCMC/MCMC_AM_simple.ipynb`)

The logical schema for the easiest adaptive MCMC algorithm is presented as shown by Andrieu et al. $[2]$ :

Initialize $X_0, \mu_0, \Sigma_0$ as initial step, initial mean and initial covariance matrix for the distribution
Then take a sample $X_t$ from the sample distribution $q(X_0, \dots, X_{t-1})$ (usually it is taken $N(0, \Sigma)$ as symmetric simple equation), weighted with the ratio between the functions and the Q-ratio $\alpha(X_{t-1}, Y)$ as:

$$X_t \backsim P^{SRWM}_q = \int_{A-X_{t-1}} \alpha(X_{t-1}, X_{t-1} + z) q(z \ | \ \mu, \Sigma) dz + \mathbb{I}(X_{t-1} \in A)\int_{X-X_{t-1}} (1 - \alpha(X_{t-1}, X_{t-1} + z)) q(z \ | \ \mu, \Sigma) dz$$

where I report a simplification of the real formula, which most of the physical distributions

Update the parameters according to the new value taken from the distribution and a non-decreasing non-converging series $\left[ \gamma_i \right]$, as follows:

$$ \begin{cases} \mu_{i+1} = \mu_i + \gamma_{i+1} (X_{i+1} - \mu_i) \\ \Sigma_{i+1} = \Sigma_i + \gamma_{i+1} ((X_{i+1} - \mu_i) \cdot (X_{i+1} - \mu_i)^T - \Sigma_i) \end{cases} $$

Repeat points 3. and 4. until maximum number of samples

NB: This implementation is therefore useless, because of the fact that it is hard to perform the computations and obtain a shape for $P^{SRWM}_q$ and then one has to implement some approximations in order to retrieve a consistent chain. In my code I therefore show that this implementation (introducing a semplification) is useless.

Rao-Blackwellised algorithm (Rao, in file `MCMC/MCMC_AM_rao.ipynb`)

This algorithm is a very slightly modification of the previous one: here, in order to solve the problem for the choice of the new value, it uses a normal sampling distribution $Y \backsim P^{Rao} = N(X_i, \Sigma_i)$ and the value is accepted with probability $\alpha(X_{t-1}, X_t)$, otherwise take $X_t = X_{t-1}$, as done in the Simple MCMC.

In this algorithm one has also the choice of the sequence $\gamma_{i+1}$, therefore I tried to implement some non-decreasing functions, such as an exponential decreasing or the reciprocal function rescaled: one can choose a series at will and optimize the evolution of the chain.

Useful adaptation (Haario): In many applications (and I tried it as well), the mean is corrected recursively and similarly the covariance matrix update $C_i = s_d \cdot \Sigma_i$ follows:

$$ C_t = \begin{cases} C_0 \hspace{41.5 mm} \text{for} \ t \leq t_0 \\ s_d \left[\text{cov}(x_0, \dots, x_{t-1}) + \epsilon I_d\right] \hspace{5 mm} \text{for} \ t > t_0 \end{cases} $$

And it is found to be $s_d = \frac{2.38}{d^2}$ with $d$ dimension of the sampled distribution $[1]$ and $\epsilon$ is a small parameter, much smaller with respect to the typical dimensionality of the distribution.

Global adaptive scaling algorithm (GAM, in file `MCMC/MCMC_AM_global.ipynb`)

In the previous implementations, the covariance matrix was able to reshape itself, but it was not so able to change the variance in the many dimentions, leading to some problems in the increase of the acceptance rate. In the global-adaptive scaling algorithm there is the introduction of a global parameter that is able to scale the covariance matrix: $\lambda_i$, which is taken into account when evaluating $Y \backsim P^{GAM} = N(X_i, \lambda_i \cdot \Sigma_i)$

$$ \begin{cases} \mu_{i+1} = \mu_i + \gamma_{i+1} (X_{i+1} - \mu_i) \\ \Sigma_{i+1} = \Sigma_i + \gamma_{i+1} ((X_{i+1} - \mu_i) \cdot (X_{i+1} - \mu_i)^T - \Sigma_i) \\ \text{log}\left(\lambda_{i+1}\right) = \text{log}\left(\lambda_i\right) + \gamma_{i+1} \left(\alpha(X_i, X_{i+1}) - \bar{\alpha}\right) \end{cases} $$

where $\bar{\alpha}$ is the mean of the previous $\alpha \text{'s}$ that have been evaluated for the previous points. This allows to:

enlarge the variances if $\alpha(X_{t-1}, Y) > \bar{\alpha}$
decrease them otherwise

Results

In order to check the improvements of the introduced implementations, I run all the algorithms (available in "class.hmm") and made them run for some iterations. I only used algorithms from the introduction of the mtvnorm package and using the same number of "burn_in" and "total" steps. I divide then the performances based on the searched distribution and on the hyperparameters used, such as the number of "burn_in" and "total" steps.

I underline the fact that, using a gaussian sampling function, the acceptance rate wanted is the one approximating 68,3%, because of the fact that one should expect to switch to the new point according to $1\sigma$ of the gaussian distributioin.

1. 3D Gaussian with good proportionate sample distribution (`cases/case_proportionate.ipynb`)

For a gaussian in three dimentions one sees that the fastest and most efficient way to retrieve a good result is using the "mvtnorm" function, when initializing with a proper correlation matrix. This is due to the fact that there is no need for the algorithms to improve the sampling distribution, because and the well-initialized distribution is rewarded: this applies always when initializing a distribution with a defined central shape and the right dimentions for the covariance matrix.

Function	Acceptance Rate (%)	Time needed (s)	Coda Autocorrelation	Distribution
mvtnorm	49	60	zero	good
mvtnorm Gibbs	60	190	zero	good
Haario	13	40	zero	cresty
Haario Gibbs	16	160	zero	cresty
Rao	50	35	first 10%/zero *	peaked/good
Rao Gibbs	60	160	zero	good
Global	40	65	first 20%/zero *	peaked/good
Global Gibbs	55	70	zero	good

$* = $ the results strongly depends on the run (it can have a little bit of correlation, which vanishes when using )

2. 3D Gaussian with small sample distribution (`cases/case_small.ipynb`)

In order to see the difficulties of these algorithms, I take a smaller covariance matrix: this is a problem for all the algorithms that don't allow a reshape of the matrix, because they al will result in a high autocorrelation.

Function	Acceptance Rate (%)	Time needed (s)	Coda Autocorrelation	Distribution
mvtnorm	95	50	circa 10% constant *	peaked
mvtnorm Gibbs	<95	160	circa 10% constant *	peaked
Haario	17	40	zero	cresty
Haario Gibbs	20	65	zero	cresty
Rao	55	35	circa 10% at begginning *	good
Rao Gibbs	65	60	zero	perfect
Global	50	60	first 20%/zero *	cresty
Global Gibbs	65	100	zero	good

3. 2D Gaussian times 1D Cauchy (`cases/case_2g_1c.ipynb`)

In order to stress the algorithms giving them different hard distributions, I decided to take a two-cauchy distribution (with two maxima) in order to see how it would have shaped the covariance distributions, in order to look for these two different maxima in the 3D space.

Function	Acceptance Rate (%)	Time needed (s)	Coda Autocorrelation	Distribution
mvtnorm	75	65	high at beginning in Cauchy *	"too big" Cauchy tails
mvtnorm Gibbs	85	160	high at very beginning in Cauchy *	"too big" Cauchy tails
Haario	17	35	zero	cresty
Haario Gibbs	35	65	zero	good
Rao	45	30	first 10%/good *	peaked
Rao Gibbs	60	60	zero	good
Global	55	70	first 20%/good *	peaked
Global Gibbs	70	100	zero	good

4. 1D Gaussian times 2D Cauchy (`cases/case_1g_2c.ipynb`)

To check if the algorithm was clever enough, I tried to enlarge the Cauchy distribution in another dimention, resulting in similar performances (even though I expected them to be worse)

Function	Acceptance Rate (%)	Time needed (s)	Coda Autocorrelation	Distribution
mvtnorm	80	65	circa 10% constant *	"too big" Cauchy tails
mvtnorm Gibbs	85	160	circa 10% constant *	"too big" Cauchy tails
Haario	17	35	zero	cresty
Haario Gibbs	40	70	zero	good
Rao	40	40	zero	cresty
Rao Gibbs	50	60	zero	second peak not resoluted
Global	45	70	zero	cresty
Global Gibbs	75	90	zero	good

5. 5D Gaussian times 1D Cauchy (`cases/case_5g.ipynb`)

I also tried to increase the number of dimentions and the algorithm seemd to be pretty robust for central distributions (and a little less for those with multiple peaks)

Function	Acceptance Rate (%)	Time needed (s)	Coda Autocorrelation	Distribution
mvtnorm	70	80	circa 10% constant *	"too big" Cauchy tails
mvtnorm Gibbs	85	290	high at very beginning in Cauchy *	"too big" Cauchy tails
Haario	15	50	zero	cresty
Haario Gibbs	30	120	zero	cresty in some dimentions
Rao	35	40	zero	peaked
Rao Gibbs	50	70	zero	good
Global	50	50	circa 10% at beginning	cresty
Global Gibbs	70	90	zero	good

NB: I tried also other distributions, but the generated chains really are dependent on the choice of the initial point and covariance matrix, resulting in very unpleasant distributions

Further improvements

There are other algorithms that I can experiment in the future, with a little adaptation of the code, which allow to increase the performances, at least theoretically speaking, like:

Promote the global parameter $\lambda$ to vector in order to shape each gaussian per se, from using all the previous algorithms
Make the algorithm change only in some localized dimentions per iteration, resulting in more robust results (as done in ML algorithms)
Use principal components in order to get the best parameters for the covariance of the distribution

Further more, my code can be improved and written in a more efficient way, other than summarized into a unique function where to only choose the wanted algorithm.

Bibliography

Haario, H., Saksman, E., Tamminen, J.: "An adaptive Metropolis algorithm"
Christophe Andrieu, Johannes Thoms: "A tutorial on adaptive MCMC"
Marko Laine: "Adaptive MCMC Methods with applications in environmental and geophysical models"

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
MCMC		MCMC
cases		cases
html_results		html_results
LICENSE		LICENSE
MCMC.R		MCMC.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MonteCarloMarkovChain

Normal multi-dimensional MCMC

MCMC using "mvtnorm" package

Are these improvements worth it?

Adaptive MCMC

Adaptive Metropolis algorithms (AM, in file `MCMC/MCMC_AM_simple.ipynb`)

Rao-Blackwellised algorithm (Rao, in file `MCMC/MCMC_AM_rao.ipynb`)

Global adaptive scaling algorithm (GAM, in file `MCMC/MCMC_AM_global.ipynb`)

Results

1. 3D Gaussian with good proportionate sample distribution (`cases/case_proportionate.ipynb`)

2. 3D Gaussian with small sample distribution (`cases/case_small.ipynb`)

3. 2D Gaussian times 1D Cauchy (`cases/case_2g_1c.ipynb`)

4. 1D Gaussian times 2D Cauchy (`cases/case_1g_2c.ipynb`)

5. 5D Gaussian times 1D Cauchy (`cases/case_5g.ipynb`)

Further improvements

Bibliography

About

Releases

Packages

Languages

License

GiosueSardoInfirri/MonteCarloMarkovChain

Folders and files

Latest commit

History

Repository files navigation

MonteCarloMarkovChain

Normal multi-dimensional MCMC

MCMC using "mvtnorm" package

Are these improvements worth it?

Adaptive MCMC

Adaptive Metropolis algorithms (AM, in file MCMC/MCMC_AM_simple.ipynb)

Rao-Blackwellised algorithm (Rao, in file MCMC/MCMC_AM_rao.ipynb)

Global adaptive scaling algorithm (GAM, in file MCMC/MCMC_AM_global.ipynb)

Results

1. 3D Gaussian with good proportionate sample distribution (cases/case_proportionate.ipynb)

2. 3D Gaussian with small sample distribution (cases/case_small.ipynb)

3. 2D Gaussian times 1D Cauchy (cases/case_2g_1c.ipynb)

4. 1D Gaussian times 2D Cauchy (cases/case_1g_2c.ipynb)

5. 5D Gaussian times 1D Cauchy (cases/case_5g.ipynb)

Further improvements

Bibliography

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Adaptive Metropolis algorithms (AM, in file `MCMC/MCMC_AM_simple.ipynb`)

Rao-Blackwellised algorithm (Rao, in file `MCMC/MCMC_AM_rao.ipynb`)

Global adaptive scaling algorithm (GAM, in file `MCMC/MCMC_AM_global.ipynb`)

1. 3D Gaussian with good proportionate sample distribution (`cases/case_proportionate.ipynb`)

2. 3D Gaussian with small sample distribution (`cases/case_small.ipynb`)

3. 2D Gaussian times 1D Cauchy (`cases/case_2g_1c.ipynb`)

4. 1D Gaussian times 2D Cauchy (`cases/case_1g_2c.ipynb`)

5. 5D Gaussian times 1D Cauchy (`cases/case_5g.ipynb`)

Packages