Sparse GPs

GaussianProcesses.predict_fMethod
predict_f(gp::GPBase, X::Matrix{Float64}[; full_cov::Bool = false])

Return posterior mean and variance of the Gaussian Process gp at specfic points which are given as columns of matrix X. If full_cov is true, the full covariance matrix is returned instead of only variances.

source
GaussianProcesses.predictMVNMethod
Deterministic Training Conditional (DTC) multivariate normal predictions.

See Quiñonero-Candela and Rasmussen 2005, equations 20b.
    μ_DTC = μ_SoR
    Σ_DTC = Σxx - Qxx + Σ_SoR

where μ_DTC and Σ_DTC are the predictive mean and covariance
functions for the Subset of Regressors approximation.
source
Base.:\Method
We have
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ
where Λ is a diagonal matrix (here given by σ²I + diag(Kff - Qff))
The Woodbury matrix identity gives
    (A+UCV)⁻¹ = A⁻¹ - A⁻¹ U(C⁻¹ + V A⁻¹ U)⁻¹ V A⁻¹
which we use here with
    A ← Λ
    U = Kuf'
    V = Kuf
    C = Kuu⁻¹
which gives
    Σ⁻¹ = Λ⁻¹ - Λ⁻¹ Kuf'(Kuu + Kuf Λ⁻¹ Kuf')⁻¹ Kuf Λ⁻¹
        = Λ⁻¹ - Λ⁻¹ Kuf'(        ΣQR       )⁻¹ Kuf Λ⁻¹
source
GaussianProcesses.dmll_kern!Method

dmll_kern!(dmll::AbstractVector, kernel::Kernel, X::AbstractMatrix, cK::AbstractPDMat, kerneldata::KernelData, alpha::AbstractVector, Kuu, Kuf, Kuu⁻¹Kuf, Kuu⁻¹KufΣ⁻¹y, Σ⁻¹Kfu, ∂Kuu, ∂Kuf, covstrat::FullScaleApproxStrat) Derivative of the log likelihood under the Fully Independent Training Conditional (fsa) approximation.

Helpful reference: Vanhatalo, Jarno, and Aki Vehtari. "Sparse log Gaussian processes via MCMC for spatial epidemiology." In Gaussian processes in practice, pp. 73-89. 2007.

Generally, for a multivariate normal with zero mean ∂logp(Y|θ) = 1/2 y' Σ⁻¹ ∂Σ Σ⁻¹ y - 1/2 tr(Σ⁻¹ ∂Σ) ╰───────────────╯ ╰──────────╯ V T

where Σ = Kff + σ²I.

Notation: f is the observations, u is the inducing points. ∂X stands for ∂X/∂θ, where θ is the kernel hyperparameters.

In the case of the FSA approximation, we have Σ = Λ + Qff The second component gives ∂(Qff) = ∂(Kfu Kuu⁻¹ Kuf) which is used by the gradient function for the subset of regressors approximation, and so I don't repeat it here.

for ∂Λ we have (with `i` indexing each block)
Λi = Ki - Qi + σ²I
   = Ki - Kui' Kuu⁻¹ Kui + σ²
∂Λi = ∂Ki - Kui' ∂(Kuu⁻¹) Kui - 2 ∂Kui' Kuu⁻¹ Kui
    = ∂Ki + Kui' Kuu⁻¹ ∂(Kuu) Kuu⁻¹ Kui - 2 ∂Kui' Kuu⁻¹ Kui
source
GaussianProcesses.dmll_noiseMethod
dmll_noise(gp::GPE, precomp::SoRPrecompute, covstrat::FullScaleApproxStrat)

∂logp(Y|θ) = 1/2 y' Σ⁻¹ ∂Σ Σ⁻¹ y - 1/2 tr(Σ⁻¹ ∂Σ)

∂Σ = I for derivative wrt σ², so ∂logp(Y|θ) = 1/2 y' Σ⁻¹ Σ⁻¹ y - 1/2 tr(Σ⁻¹) = 1/2[ dot(α,α) - tr(Σ⁻¹) ]

We have: Σ⁻¹ = Λ⁻¹ - Λ⁻² Kuf'( ΣQR )⁻¹ Kuf Use the identity tr(A'A) = dot(A,A) to get: tr(Σ⁻¹) = tr(Λ⁻¹) - dot(Lk, Lk) . where Lk ≡ ΣQR^(-1/2) Kuf Λ⁻¹

source
GaussianProcesses.predictMVNMethod

predictMVN(xpred::AbstractMatrix, xtrain::AbstractMatrix, ytrain::AbstractVector, kernel::Kernel, meanf::Mean, logNoise::Real, alpha::AbstractVector, covstrat::FullScaleApproxStrat, Ktrain::FullScalePDMat) See Quiñonero-Candela and Rasmussen 2005, equations 24b. Some derivations can be found below that are not spelled out in the paper.

Notation: Qab = Kau Kuu⁻¹ Kub
          ΣQR = Kuu + σ⁻² Kuf Kuf'

          x: prediction (test) locations
          f: training (observed) locations
          u: inducing point locations

We have
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ
By Woodbury
    Σ⁻¹ = Λ⁻¹ - Λ⁻² Kuf'(Kuu + Kuf Λ⁻¹ Kuf')⁻¹ Kuf
        = Λ⁻¹ - Λ⁻² Kuf'(        ΣQR       )⁻¹ Kuf

The predictive mean can be derived (assuming zero mean function for simplicity)
μ = (Qxf+Λxf) (Qff + Λff)⁻¹ y
  = Qxf (Qff + Λff)⁻¹ y   + Λxf (Qff + Λff)⁻¹ y
    ╰─────────────────╯
        same as FITC
  = Kxu ΣQR⁻¹ Kuf Λff⁻¹ y + Λxf (Qff + Λff)⁻¹ y
       ╰────────────────╯       ╰─────────────╯
          ≡ alpha_u                ≡ alpha

Similarly for the posterior predictive covariance:
Σ = Σxx - (Qxf+Λxf) (Qff + Λff)⁻¹ (Qxf+Λxf)'
  = Σxx - Kxu ΣQR⁻¹ Kuf Λ⁻¹ Qxf'                # substituting result from μ
  = Σxx - Kxu ΣQR⁻¹  Kuf Λ⁻¹ Kfu Kuu⁻¹ Kux      # definition of Qxf
  = Σxx - Kxu ΣQR⁻¹ (ΣQR - Kuu) Kuu⁻¹ Kux       # using definition of ΣQR
  = Σxx - Kxu Kuu⁻¹ Kux + Kxu ΣQR⁻¹ Kux         # expanding
  = Σxx - Qxx           + Kxu ΣQR⁻¹ Kux         # definition of Qxx
source
GaussianProcesses.trinvMethod
trinv(a::BlockDiagPDMat)

Trace of the inverse of a block diagonal positive definite matrix.

This is obtained as the sum of the traces of the inverse of each block.

source
GaussianProcesses.trinvABMethod
trinvAB(A::FullScalePDMat, B::Diagonal)

Computes tr(A⁻¹ B) efficiently under the FSA approximation:
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ

Derivation:
    tr(Σ⁻¹ B) = tr[ (Λ⁻¹ - Λ⁻² Kuf'(Kuu + Kuf Λ⁻¹ Kuf')⁻¹ Kuf) B ]
              = tr(Λ⁻¹ B) - tr( Λ⁻¹ Kuf' ΣQR⁻¹ Kuf Λ⁻¹ B )
              = tr(Λ⁻¹ B) - tr( Λ⁻¹ Kuf' ΣQR^(-1/2) ΣQR^(-1/2) Kuf Λ⁻¹ B )
                                                    ╰────────────────╯
                                                        ≡ L'
              = tr(Λ⁻¹ B) - tr(L*L'*B)
              = tr(Λ⁻¹ B) - dot(L, B*L)

See also: [``](@ref).
source
LinearAlgebra.logdetMethod
The matrix determinant lemma states that
    logdet(A+UWV') = logdet(W⁻¹ + V'A⁻¹U) + logdet(W) + logdet(A)
So for
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ
    logdet(Σ) = logdet(Kuu + Kuf Λ⁻¹ Kuf')       + logdet(Kuu⁻¹) + logdet(Λ)
              = logdet(        ΣQR             ) - logdet(Kuu)   + logdet(Λ)
source
LinearAlgebra.trMethod
tr(a::FullScalePDMat)

Trace of the FSA approximation to the covariance matrix:

tr(Σ) = tr(Kuf' Kuu⁻¹ Kuf + Λ)
      = tr(Kuf' Kuu⁻¹ Kuf) + tr(Λ)
      = tr(Kuf' Kuu^{-1/2} Kuu^{-1/2} Kuf) + tr(Λ)
                          ╰──────────────╯
                             ≡  Lk
      = dot(Lk, Lk) + sum(diag(Λ))
source
Base.:\Method
We have
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ
where Λ is a diagonal matrix (here given by σ²I + diag(Kff - Qff))
The Woodbury matrix identity gives
    (A+UCV)⁻¹ = A⁻¹ - A⁻¹ U(C⁻¹ + V A⁻¹ U)⁻¹ V A⁻¹
which we use here with
    A ← Λ
    U = Kuf'
    V = Kuf
    C = Kuu⁻¹
which gives
    Σ⁻¹ = Λ⁻¹ - Λ⁻² Kuf'(Kuu + Kuf Λ⁻¹ Kuf')⁻¹ Kuf
        = Λ⁻¹ - Λ⁻² Kuf'(        ΣQR       )⁻¹ Kuf
source
GaussianProcesses.dmll_kern!Method

dmll_kern!(dmll::AbstractVector, kernel::Kernel, X::AbstractMatrix, cK::AbstractPDMat, kerneldata::KernelData, alpha::AbstractVector, Kuu, Kuf, Kuu⁻¹Kuf, Kuu⁻¹KufΣ⁻¹y, Σ⁻¹Kfu, ∂Kuu, ∂Kuf, covstrat::FullyIndepStrat) Derivative of the log likelihood under the Fully Independent Training Conditional (FITC) approximation.

Helpful reference: Vanhatalo, Jarno, and Aki Vehtari. "Sparse log Gaussian processes via MCMC for spatial epidemiology." In Gaussian processes in practice, pp. 73-89. 2007.

Generally, for a multivariate normal with zero mean ∂logp(Y|θ) = 1/2 y' Σ⁻¹ ∂Σ Σ⁻¹ y - 1/2 tr(Σ⁻¹ ∂Σ) ╰───────────────╯ ╰──────────╯ V T

where Σ = Kff + σ²I.

Notation: f is the observations, u is the inducing points. ∂X stands for ∂X/∂θ, where θ is the kernel hyperparameters.

In the case of the FITC approximation, we have Σ = Λ + Qff The second component gives ∂(Qff) = ∂(Kfu Kuu⁻¹ Kuf) which is used by the gradient function for the subset of regressors approximation, and so I don't repeat it here.

the ith element of diag(Kff-Qff) is
Λi = Kii - Qii + σ²
   = Kii - Kui' Kuu⁻¹ Kui + σ²
∂Λi = ∂Kii - Kui' ∂(Kuu⁻¹) Kui - 2 ∂Kui' Kuu⁻¹ Kui
    = ∂Kii + Kui' Kuu⁻¹ ∂(Kuu) Kuu⁻¹ Kui - 2 ∂Kui' Kuu⁻¹ Kui
source
GaussianProcesses.dmll_noiseMethod
dmll_noise(gp::GPE, precomp::SoRPrecompute, covstrat::FullyIndepStrat)

∂logp(Y|θ) = 1/2 y' Σ⁻¹ ∂Σ Σ⁻¹ y - 1/2 tr(Σ⁻¹ ∂Σ)

∂Σ = I for derivative wrt σ², so ∂logp(Y|θ) = 1/2 y' Σ⁻¹ Σ⁻¹ y - 1/2 tr(Σ⁻¹) = 1/2[ dot(α,α) - tr(Σ⁻¹) ]

We have: Σ⁻¹ = Λ⁻¹ - Λ⁻² Kuf'( ΣQR )⁻¹ Kuf Use the identity tr(A'A) = dot(A,A) to get: tr(Σ⁻¹) = tr(Λ⁻¹) - dot(Lk, Lk) . where Lk ≡ ΣQR^(-1/2) Kuf Λ⁻¹

source
GaussianProcesses.predictMVNMethod

predictMVN(xpred::AbstractMatrix, xtrain::AbstractMatrix, ytrain::AbstractVector, kernel::Kernel, meanf::Mean, logNoise::Real, alpha::AbstractVector, covstrat::FullyIndepStrat, Ktrain::FullyIndepPDMat) See Quiñonero-Candela and Rasmussen 2005, equations 24b. Some derivations can be found below that are not spelled out in the paper.

Notation: Qab = Kau Kuu⁻¹ Kub
          ΣQR = Kuu + σ⁻² Kuf Kuf'

          x: prediction (test) locations
          f: training (observed) locations
          u: inducing point locations

We have
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ
By Woodbury
    Σ⁻¹ = Λ⁻¹ - Λ⁻² Kuf'(Kuu + Kuf Λ⁻¹ Kuf')⁻¹ Kuf
        = Λ⁻¹ - Λ⁻² Kuf'(        ΣQR       )⁻¹ Kuf

The predictive mean can be derived (assuming zero mean function for simplicity)
μ = Qxf (Qff + Λ)⁻¹ y
  = Kxu Kuu⁻¹ Kuf [Λ⁻¹ - Λ⁻² Kuf' ΣQR⁻¹ Kuf] y   # see Woodbury formula above.
  = Kxu Kuu⁻¹ [ΣQR - Kuf Λ⁻¹ Kfu] ΣQR⁻¹ Kuf Λ⁻¹ y # factoring out common terms
  = Kxu Kuu⁻¹ [Kuu] ΣQR⁻¹ Kuf Λ⁻¹ y               # using definition of ΣQR
  = Kxu ΣQR⁻¹ Kuf Λ⁻¹ y                           # matches equation 16b

Similarly for the posterior predictive covariance:
Σ = Σxx - Qxf (Qff + Λ)⁻¹ Qxf'
  = Σxx - Kxu ΣQR⁻¹ Kuf Λ⁻¹ Qxf'                # substituting result from μ
  = Σxx - Kxu ΣQR⁻¹  Kuf Λ⁻¹ Kfu Kuu⁻¹ Kux      # definition of Qxf
  = Σxx - Kxu ΣQR⁻¹ (ΣQR - Kuu) Kuu⁻¹ Kux       # using definition of ΣQR
  = Σxx - Kxu Kuu⁻¹ Kux + Kxu ΣQR⁻¹ Kux         # expanding
  = Σxx - Qxx           + Kxu ΣQR⁻¹ Kux         # definition of Qxx
source
GaussianProcesses.trinvABMethod
trinvAB(A::FullyIndepPDMat, B::Diagonal)

Computes tr(A⁻¹ B) efficiently under the FITC approximation:
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ

Derivation:
    tr(Σ⁻¹ B) = tr[ (Λ⁻¹ - Λ⁻² Kuf'(Kuu + Kuf Λ⁻¹ Kuf')⁻¹ Kuf) B ]
              = tr(Λ⁻¹ B) - tr( Λ⁻¹ Kuf' ΣQR⁻¹ Kuf Λ⁻¹ B )
              = tr(Λ⁻¹ B) - tr( Λ⁻¹ Kuf' ΣQR^(-1/2) ΣQR^(-1/2) Kuf Λ⁻¹ B )
                                                    ╰────────────────╯
                                                        ≡ L
              = tr(Λ⁻¹ B) - tr(L'*L*B)
              = tr(Λ⁻¹ B) - dot(L, L*B)

See also: [``](@ref).
source
LinearAlgebra.logdetMethod
The matrix determinant lemma states that
    logdet(A+UWV') = logdet(W⁻¹ + V'A⁻¹U) + logdet(W) + logdet(A)
So for
    Σ ≈ Kuf' Kuu⁻¹ Kuf + Λ
    logdet(Σ) = logdet(Kuu + Kuf Λ⁻¹ Kuf')       + logdet(Kuu⁻¹) + logdet(Λ)
              = logdet(        ΣQR             ) - logdet(Kuu)   + logdet(Λ)
source
LinearAlgebra.trMethod
tr(a::FullyIndepPDMat)

Trace of the FITC approximation to the covariance matrix:

tr(Σ) = tr(Kuf' Kuu⁻¹ Kuf + Λ)
      = tr(Kuf' Kuu⁻¹ Kuf) + tr(Λ)
      = tr(Kuf' Kuu^{-1/2} Kuu^{-1/2} Kuf) + tr(Λ)
                          ╰──────────────╯
                             ≡  Lk
      = dot(Lk, Lk) + sum(diag(Λ))
source
Base.:\Method
We have
    Σ ≈ Kuf' Kuu⁻¹ Kuf + σ²I
By Woodbury
    Σ⁻¹ = σ⁻²I - σ⁻⁴ Kuf'(Kuu + σ⁻² Kuf Kuf')⁻¹ Kuf
        = σ⁻²I - σ⁻⁴ Kuf'(       ΣQR        )⁻¹ Kuf
source
GaussianProcesses.dmll_kern!Method
dmll_kern!(dmll::AbstractVector, kernel::Kernel, X::AbstractMatrix, cK::SubsetOfRegsPDMat, kerneldata::KernelData, ααinvcKI::AbstractMatrix, covstrat::SubsetOfRegsStrategy)

Derivative of the log likelihood under the Subset of Regressors (SoR) approximation.

Helpful reference: Vanhatalo, Jarno, and Aki Vehtari. "Sparse log Gaussian processes via MCMC for spatial epidemiology." In Gaussian processes in practice, pp. 73-89. 2007.

Generally, for a multivariate normal with zero mean ∂logp(Y|θ) = 1/2 y' Σ⁻¹ ∂Σ Σ⁻¹ y - 1/2 tr(Σ⁻¹ ∂Σ) ╰───────────────╯ ╰──────────╯ V T

where Σ = Kff + σ²I.

Notation: f is the observations, u is the inducing points. ∂X stands for ∂X/∂θ, where θ is the kernel hyperparameters.

In the SoR approximation, we replace Kff with Qff = Kfu Kuu⁻¹ Kuf

∂Σ = ∂(Qff) = ∂(Kfu Kuu⁻¹ Kuf) = ∂(Kfu) Kuu⁻¹ Kuf + Kfu ∂(Kuu⁻¹) Kuf + Kfu Kuu⁻¹ ∂(Kuf)

∂(Kuu⁻¹) = -Kuu⁻¹ ∂(Kuu) Kuu⁻¹ ––––^

Also have pre-computed α = Σ⁻¹ y, so V can now be computed efficiency (O(nm²) I think…) by careful ordering of the matrix multiplication steps.

For T, we use the identity tr(AB) = dot(A',B): tr(Σ⁻¹ ∂Σ) = 2 tr(Σ⁻¹ ∂(Kfu) Kuu⁻¹ Kuf) + tr(Σ⁻¹ Kfu ∂(Kuu⁻¹) Kuf) = 2 dot((Σ⁻¹ ∂(Kfu))', Kuu⁻¹ Kuf) - tr(Σ⁻¹ Kfu Kuu⁻¹ ∂Kuu Kuu⁻¹ Kuf) = 2 dot((Σ⁻¹ ∂(Kfu))', Kuu⁻¹ Kuf) - dot((Σ⁻¹ Kfu)', Kuu⁻¹ ∂Kuu Kuu⁻¹ Kuf) which again is computed in O(nm²).

source
GaussianProcesses.dmll_noiseMethod
dmll_noise(gp::GPE, precomp::SoRPrecompute)

∂logp(Y|θ) = 1/2 y' Σ⁻¹ ∂Σ Σ⁻¹ y - 1/2 tr(Σ⁻¹ ∂Σ)

∂Σ = I for derivative wrt σ², so ∂logp(Y|θ) = 1/2 y' Σ⁻¹ Σ⁻¹ y - 1/2 tr(Σ⁻¹) = 1/2[ dot(α,α) - tr(Σ⁻¹) ]

Σ⁻¹ = σ⁻²I - σ⁻⁴ Kuf'(Kuu + σ⁻² Kuf Kuf')⁻¹ Kuf = σ⁻²I - σ⁻⁴ Kuf'( ΣQR )⁻¹ Kuf

source
GaussianProcesses.predictMVNMethod
See Quiñonero-Candela and Rasmussen 2005, equations 16b.
Some derivations can be found below that are not spelled out in the paper.

Notation: Qab = Kau Kuu⁻¹ Kub
          ΣQR = Kuu + σ⁻² Kuf Kuf'

          x: prediction (test) locations
          f: training (observed) locations
          u: inducing point locations

We have
    Σ ≈ Kuf' Kuu⁻¹ Kuf + σ²I
By Woodbury
    Σ⁻¹ = σ⁻²I - σ⁻⁴ Kuf'(Kuu + σ⁻² Kuf Kuf')⁻¹ Kuf
        = σ⁻²I - σ⁻⁴ Kuf'(       ΣQR        )⁻¹ Kuf

The predictive mean can be derived (assuming zero mean function for simplicity)
μ = Qxf (Qff + σ²I)⁻¹ y
  = Kxu Kuu⁻¹ Kuf [σ⁻²I - σ⁻⁴ Kuf' ΣQR⁻¹ Kuf] y   # see Woodbury formula above.
  = σ⁻² Kxu Kuu⁻¹ [ΣQR - σ⁻² Kuf Kfu] ΣQR⁻¹ Kuf y # factoring out common terms
  = σ⁻² Kxu Kuu⁻¹ [Kuu] ΣQR⁻¹ Kuf y               # using definition of ΣQR
  = σ⁻² Kxu ΣQR⁻¹ Kuf y                           # matches equation 16b

Similarly for the posterior predictive covariance:
Σ = Qxx - Qxf (Qff + σ²I)⁻¹ Qxf'
  = Qxx - σ⁻² Kxu ΣQR⁻¹ Kuf Qxf'                # substituting result from μ
  = Qxx - σ⁻² Kxu ΣQR⁻¹  Kuf Kfu    Kuu⁻¹ Kux   # definition of Qxf
  = Qxx -     Kxu ΣQR⁻¹ (ΣQR - Kuu) Kuu⁻¹ Kux   # using definition of ΣQR
  = Qxx - Kxu Kuu⁻¹ Kux + Kxu ΣQR⁻¹ Kux         # expanding
  = Qxx - Qxx           + Kxu ΣQR⁻¹ Kux         # definition of Qxx
  = Kxu ΣQR⁻¹ Kux                               # simplifying
source