In this example we use the package to infer the mean of a 2d Gaussian using stochastic gradient Langevin dynamics. So we assume we have independent and identically distributed data \(x_1, \dots, x_N\) with \(X_i | \theta \sim N( \theta, I_2 )\), and we want to infer \(\theta\).
First, let’s simulate the data with the following code, we set \(N\) to be \(10^4\)
library(sgmcmc)
library(MASS)
# Declare number of observations
N = 10^4
# Set theta to be 0 and simulate the data
theta = c( 0, 0 )
Sigma = diag(2)
set.seed(13)
X = mvrnorm( N, theta, Sigma )
dataset = list("X" = X)
In the last line we defined the dataset as it will be input to the relevant sgmcmc
function. A lot of the inputs to functions in sgmcmc
are defined as lists. This improves flexibility by enabling models to be specified with multiple parameters, datasets and allows separate tuning constants to be set for each parameter. We assume that observations are always accessed on the first dimension of each object, i.e. the point \(x_i\) is located at X[i,]
rather than X[,i]
. Similarly the observation \(i\) from a 3d object Y
would be located at Y[i,,]
.
The parameters are declared very similarly, but this time the value associated with each entry is its starting point. We have one parameter theta
, which we’ll just start at 0.
params = list( "theta" = c( 0, 0 ) )
Now we’ll define the functions logLik
and logPrior
. It should now become clear why the list names come in handy. The function logLik
should take two parameters as input: params
and dataset
. These parameters will be lists with the same names as those you defined for params
and dataset
earlier. There is one difference though, the objects in the lists will have automatically been converted to TensorFlow
objects for you. The params
list will contain TensorFlow
tensor variables; the dataset
list will contain TensorFlow
placeholders. The logLik
function should take these lists as input and return the value of the log-likelihood function as a tensor at point params
given data dataset
. The function should do this using TensorFlow
operations, as this allows the gradient to be automatically calculated; it also allows the wide range of distribution objects as well as matrix operations that TensorFlow
provides to be taken advantage of. A tutorial of TensorFlow
for R
is beyond the scope of this article, for more details we refer the reader to the website of TensorFlow for R.
Specifying the logLik
and logPrior
functions regularly requires specifying specific distributions. TensorFlow
already has a number of distributions implemented in the TensorFlow Probability
package. All of the distributions implemented in TensorFlow Probability are located in tf$distributions
, a list is given on the TensorFlow Probability website. More complex distributions can be specified by coding up the logLik
and logPrior
functions by hand, examples of this, as well as using various distribution functions, are given in the other tutorials. With this in place we can define the logLik
function as follows
logLik = function( params, dataset ) {
# Declare distribution of each observation
SigmaDiag = c( 1, 1 )
baseDist = tf$distributions$MultivariateNormalDiag( params$theta, SigmaDiag )
# Declare log-likelihood function and return
logLik = tf$reduce_sum( baseDist$log_prob( dataset$X ) )
return( logLik )
}
So this function basically states that our likelihood is \(\sum_{i=1}^N \log \mathcal N( x_i | \theta, I_2 )\), where \(\mathcal N( x | \mu, \Sigma )\) is a Gaussian density at \(x\) with mean \(\mu\) and variance \(\Sigma\). Most of the time just specifying the constants in these functions, such as SigmaDiag
as R
objects will be fine. But there are sometimes issues when these constants get automatically converted to tf$float64
objects by TensorFlow
rather than tf$float32
. If you run into errors involving tf$float64
then force the constants to be input as tf$float32
by using SigmaDiag = tf$constant( c( 1, 1 ), dtype = tf$float32 )
.
Next we want to define our log-prior density, which we assume is \(\log p( \theta_j ) = \log \mathcal N(\theta_j | 0,10)\), for each dimension \(j\) of \(\theta\). Similar to logLik
, logPrior
is defined as a function with input params
. In our case the definition is
logPrior = function( params ) {
baseDist = tf$distributions$Normal( 0, 10 )
logPrior = tf$reduce_sum( baseDist$log_prob( params$theta ) )
return( logPrior )
}
Before we begin running our SGLD algorithm, we need to specify the stepsize and minibatch size. A stepsize is required for each parameter, so this must be a list of numbers with names that are exactly the same as each of the parameters. The minibatch size is simply a number that is less than \(N\), or a number between 0 and 1 which will be taken to be the proportion of \(N\). It specifies how many observations are used in each iteration of SGMCMC, it is a trade off between accuracy and speed. The default is minibatchSize = 0.01
, we’ll set it to be 100.
stepsize = list( "theta" = 1e-5 )
n = 100
The stepsize parameters may require a bit of tuning before you get good results. The shorthand stepsize = 1e-5
can be used, which would set the stepsize of all parameters to be 1e-5
.
Now we can run our SGLD algorithm using the sgmcmc
function sgld
, which returns a list of Markov chains for each parameter as output. To make the results reproducible we’ll set the seed to 13. Use the argument verbose = FALSE
to hide the output of the function
chains = sgld( logLik, dataset, params, stepsize, logPrior = logPrior, minibatchSize = n,
verbose = FALSE, seed = 13 )
Finally we’ll plot the results after removing burn-in
library(ggplot2)
burnIn = 10^3
thetaOut = as.data.frame( chains$theta[-c(1:burnIn),] )
ggplot( thetaOut, aes( x = V1, y = V2 ) ) +
stat_density2d( size = 1.5 )
There are lots of other sgmcmc algorithms implemented in exactly the same way, such as sghmc
and sgnht
; as well as their control variate counterparts (sgldcv
, sghmccv
and sgnhtcv
) for improved efficiency, which take the additional small numeric input optStepsize
, the stepsize of the initial optimization step to find the MAP parameters.