Kernels¶

George comes equipped with a suite of standard covariance functions or kernels that can be combined to build more complex models. The standard kernels fall into the following categories:

Stationary kernels — functions that depend only on the radial distance between points in some user-defined metric, and
Non-stationary kernels — functions that depend on the value of the input coordinates themselves.

Combining kernels describes how to combine kernels to build more sophisticated models and Implementing new kernels explains how you would go about incorporating a custom kernel.

Common parameters¶

Every kernel accepts the two keyword arguments ndim and axes. By default, kernels are only one dimensional so you must specify the ndim argument if you want the kernel to work with higher dimensional inputs. By default, higher dimensional kernels are applied to every dimension but you can restrict the evaluation to a subspace using the axes argument. For example, if you have a 3 dimensional input space but you want one of the kernels to only act in the first dimension, you would do the following:

from george import kernels
kernel = 10.0 * kernels.Matern32Kernel(1.0, ndim=3, axes=0)

Similarly, if you wanted the kernel to act on only the second and third dimensions, you could do something like:

kernel = 10.0 * kernels.ExpSquaredKernel([1.0, 0.5], ndim=3, axes=[1, 2])

Finally, all of the stationary kernels can be “blocked”. This means that the kernel will only be applied within some parameter range. In practice, the covariance matrix will have a block diagonal structure. To use this feature, you use the block keyword argument:

kernel = 10.0 * kernels.ExpSquaredKernel(1.0, block=(-1.0, 1.0))

# or...
kernel = kernels.ExpSquaredKernel([1.0, 1.5], ndim=2,
                                  block=[(-1.0, 1.0), (0.5, 1.5)])

Implementation details & modeling interface¶

It’s worth understanding how these kernels are implemented. Most of the hard work is done at a low level (in C++) and the Python is only a thin wrapper to this functionality. This makes the code fast and consistent across interfaces but it also means that it isn’t currently possible to implement new kernel functions without recompiling the code. Almost every kernel has hyperparameters that you can set to control its behavior and these are controlled using the modeling.

k = 2.0 * kernels.Matern32Kernel(5.0)

print(k.get_parameter_names())
# ['k1:log_constant', 'k2:metric:log_M_0_0']

print(k.get_parameter_vector())
# [ 0.69314718  1.60943791]

You’ll notice that, in this case, the parameter vector is the logarithm of the parameters given when building the kernel. This will be the case for any strictly positive parameters because it is always better to fit in the logarithm of these types of parameters. You probably also noticed that the parameters have names. This opens up a few interesting features. For example, if you want to change any of the parameters, you can do it as follows:

import numpy as np

k["k1:log_constant"] = np.log(10.0)
print(k.get_parameter_vector())
# [ 2.30258509  1.60943791]

# ... or:
k[0] = np.log(2.0)
print(k.get_parameter_vector())
# [ 0.69314718  1.60943791]

Finally, if you want to update the entire vector, you can use the set_vector() method:

k.set_parameter_vector(k.get_parameter_vector() + np.random.randn(2))

Another feature common to the kernels is that you can “freeze” and “thaw” parameters by name. For example, let’s say that you want to keep the amplitude of your kernel fixed and fit for only the scale length:

k = 2.0 * kernels.Matern32Kernel(5.0)
k.freeze_parameter("k1:log_constant")

print(k.get_parameter_names())
# ['k2:metric:log_M_0_0']

print(k.get_parameter_vector())
# [ 1.60943791]

Bringing a parameter back into the fold is as easy as

k.thaw_parameter("k1:log_constant")

print(k.get_parameter_names())
# ['k1:log_constant', 'k2:log_M_0_0']

print(k.get_vector())
# [ 0.69314718  1.60943791]

Stationary kernels¶

Stationary kernels are a class of functions that depend on the input coordinates \(\mathbf{x}_i\) and \(\mathbf{x}_j\) through their squared distance under some metric \(C\):

\[r^2 = (\mathbf{x}_i - \mathbf{x}_j)^\mathrm{T}\,C^{-1}\, (\mathbf{x}_i - \mathbf{x}_j)\]

The currently supported metrics are:

“isotropic” — the scale length is equal in all dimensions,
“axis-aligned” — there is a different scale length in each dimension, and
“general” — arbitrary covariances between dimensions are allowed.

The “isotropic” and “axis-aligned” metrics are parameterized by the logarithms of their scale lengths. For example:

from george.metrics import Metric
m = Metric(2.0, ndim=2)
print(m.get_parameter_vector())
# [ 0.69314718]

gives a two-dimensional isotropic metric with

\[\begin{split}C = \left(\begin{array}{cc} 2 & 0 \\ 0 & 2 \end{array}\right)\end{split}\]

and

m = Metric([2.0, 4.0], ndim=2)
print(m.get_parameter_vector())
# [ 0.69314718  1.38629436]

specifies the following matrix

\[\begin{split}C = \left(\begin{array}{cc} 2 & 0 \\ 0 & 4 \end{array}\right) \quad.\end{split}\]

Note

Another way to define the isotropic metric is that it scales the square of the distance between points such that the following equality holds for a kernel evaluated at two points a distance \(r\): apart: \(k(r^2;\,\textrm{metric}=\lambda) = k(r^2 / \lambda;\,\mathrm{metric}=1)\).

In the “general” case, the matrix is parameterized by the elements of the Cholesky decomposition \(C = L\,L^\mathrm{T}\) with logarithms along the diagonal. For example:

m = Metric([[2.0, 0.1], [0.1, 4.0]], ndim=2)
print(m.get_parameter_vector())
# [ 0.34657359  0.07071068  0.69252179]

All the stationary kernels take the metric specification as a keyword argument:

k = kernels.ExpSquaredKernel(metric=[[5.0, 0.1], [0.1, 4.0]], ndim=2)
print(k.get_parameter_vector())
# [ 0.80471896  0.04472136  0.69289712]

The currently available stationary kernels are:

class george.kernels.RationalQuadraticKernel(log_alpha=None, metric=None, metric_bounds=None, lower=True, block=None, bounds=None, ndim=1, axes=None)¶

This is equivalent to a “scale mixture” of ExpSquaredKernel kernels with different scale lengths drawn from a gamma distribution. See R&W for more info but here’s the equation:

\[k(r^2) = \left[1 - \frac{r^2}{2\,\alpha} \right]^\alpha\]

Parameters: log_alpha – The Gamma distribution parameter.

class george.kernels.ExpSquaredKernel(metric=None, metric_bounds=None, lower=True, block=None, bounds=None, ndim=1, axes=None)¶: The exponential-squared kernel is a stationary kernel where the value at a given radius \(r^2\) is given by:

\[k(r^2) = \exp \left ( -\frac{r^2}{2} \right )\]

class george.kernels.Matern52Kernel(metric=None, metric_bounds=None, lower=True, block=None, bounds=None, ndim=1, axes=None)¶: The Matern-5/2 kernel is stationary kernel where the value at a given radius \(r^2\) is given by:

\[k(r^2) = \left( 1+\sqrt{5\,r^2}+ \frac{5\,r^2}{3} \right)\, \exp \left (-\sqrt{5\,r^2} \right )\]

class george.kernels.ExpKernel(metric=None, metric_bounds=None, lower=True, block=None, bounds=None, ndim=1, axes=None)¶: The exponential kernel is a stationary kernel where the value at a given radius \(r^2\) is given by:

\[k(r^2) = \exp \left ( -\sqrt{r^2} \right )\]

class george.kernels.Matern32Kernel(metric=None, metric_bounds=None, lower=True, block=None, bounds=None, ndim=1, axes=None)¶: The Matern-3/2 kernel is stationary kernel where the value at a given radius \(r^2\) is given by:

\[k(r^2) = \left( 1+\sqrt{3\,r^2} \right)\, \exp \left (-\sqrt{3\,r^2} \right )\]

Non-stationary kernels¶

Non-stationary kernels are specified by a (symmetric) function of the input coordinates themselves. They are applied identically to every axis so the axes keyword argument will probably come in handy.

For example, to implement a quasi-periodic kernel with a three-dimensional input space where you only want to apply the periodicity along the first (e.g. time) dimension, you would use something like:

k = kernels.ExpSine2Kernel(gamma=0.1, log_period=5.0, ndim=3, axes=0)
k *= 10.0 * kernels.ExpSquaredKernel(metric=5.0, ndim=3, axes=0)
k += 4.0 * kernels.Matern32Kernel(metric=4.0, ndim=3, axes=[1, 2])

The currently available non-stationary kernels are:

class george.kernels.EmptyKernel(bounds=None, ndim=1, axes=None)¶: This kernel is a no-op

class george.kernels.LinearKernel(log_gamma2=None, order=None, bounds=None, ndim=1, axes=None)¶

The linear regression kernel

\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \frac{(\mathbf{x}_i \cdot \mathbf{x}_j)^P}{\gamma^2}\]

Parameters

order – The power \(P\). This parameter is a constant; it is not included in the parameter vector.
log_gamma2 – The scale factor \(\gamma^2\).

class george.kernels.ExpSine2Kernel(gamma=None, log_period=None, bounds=None, ndim=1, axes=None)¶

The exp-sine-squared kernel is a commonly used periodic kernel. Unlike the CosineKernel, this kernel never has negative covariance which might be useful for your problem. Here’s the equation:

\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \exp \left( -\Gamma\,\sin^2\left[ \frac{\pi}{P}\,\left|x_i-x_j\right| \right] \right)\]

Parameters

gamma – The scale \(\Gamma\) of the correlations.
log_period – The log of the period \(P\) of the oscillation (in the same units as \(\mathbf{x}\)).

class george.kernels.CosineKernel(log_period=None, bounds=None, ndim=1, axes=None)¶

The simplest periodic kernel. This

\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \cos\left( \frac{2\,\pi\,|x_i - x_j|}{P} \right)\]

where the parameter \(P\) is the period of the oscillation. This kernel should probably always be multiplied be a stationary kernel (e.g. ExpSquaredKernel) to allow quasi-periodic variations.

Parameters: log_period – The period of the oscillation.

class george.kernels.DotProductKernel(bounds=None, ndim=1, axes=None)¶

The dot product kernel

\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \mathbf{x}_i \cdot \mathbf{x}_j\]

with no parameters.

class george.kernels.LocalGaussianKernel(location=None, log_width=None, bounds=None, ndim=1, axes=None)¶

A local Gaussian kernel.

\[k(\mathbf{x}_i,\,\mathbf{x}_j) = \exp\left( -\frac{(x_i - x_0)^2 + (x_j - x_0)^2}{2\,w} \right))\]

Parameters

location – The location \(x_0\) of the Gaussian.
log_width – The (squared) width \(w\) of the Gaussian.

class george.kernels.ConstantKernel(log_constant=None, bounds=None, ndim=1, axes=None)¶

This kernel returns the constant

\[k(\mathbf{x}_i,\,\mathbf{x}_j) = c\]

where \(c\) is a parameter.

Parameters: log_constant – The log of \(c\) in the above equation.

class george.kernels.PolynomialKernel(log_sigma2=None, order=None, bounds=None, ndim=1, axes=None)¶

A polynomial kernel

\[k(\mathbf{x}_i,\,\mathbf{x}_j) = (\mathbf{x}_i \cdot \mathbf{x}_j + \sigma^2)^P\]

Parameters

order – The power \(P\). This parameter is a constant; it is not included in the parameter vector.
log_sigma2 – The variance \(\sigma^2 > 0\).

Combining kernels¶

More complicated kernels can be constructed by algebraically combining the basic kernels listed in the previous sections. In particular, all the kernels support addition and multiplication. For example, an exponential-squared kernel with a non-trivial variance can be constructed as follows:

from george import kernels
kernel = 1e-3 * kernels.ExpSquaredKernel(3.4)

This is equivalent to:

from math import log
kernel = kernels.Product(kernels.ConstantKernel(log_constant=log(1e-3)),
                         kernels.ExpSquaredKernel(3.4))

As demonstrated in Hyperparameter optimization, a mixture of kernels can be implemented with addition:

k1 = 1e-3 * kernels.ExpSquaredKernel(3.4)
k2 = 1e-4 * kernels.Matern32Kernel(14.53)
kernel = k1 + k2

Implementing new kernels¶

As mentioned previously, because of technical limitations, new kernels can only be implemented by re-compiling george. See Implementing new kernels for a detailed example of implementing a new kernel.