Bayesian Modeling 201: Graduating to Gaussian Processes and Mastering Log Marginal Likelihood for Financial Risk Management

Abhijit Gupta
6 min readJun 9, 2024

Many machine learning practitioners have inevitably maximized the log-likelihood function in order to obtain Maximum Likelihood Estimates (MLE). However, have you heard of its close relative, the log marginal likelihood? If you want to improve your knowledge of Bayesian models, Gaussian Processes (GP) are a complex step up from simpler models. In this post, we’ll look at log marginal likelihood, Gaussian Processes, and how they may be used in multi-asset portfolios, such as futures and stocks. The log marginal likelihood integrates over all possible parameter values, weighted by their posterior probability, as opposed to concentrating on a particular set of ideal parameter values, as MLE does. Because it takes into account both past assumptions about the parameters and the likelihood of the data, this integration offers a more comprehensive evaluation of model fit. Comprehending this idea is essential to realizing the full potential of Bayesian models, of which Gaussian Processes (GPs) are a particularly elegant development over more basic models.

Maximizing the log-likelihood function is the traditional method for identifying parameter values that make the observed data most likely. This process, known as Maximum Likelihood Estimation (MLE), is extensively used because of its simplicity and computing effectiveness. However, MLE has drawbacks, notably when dealing with uncertainty and overfitting, when it may fail to provide robust parameter estimates.

However, the log marginal likelihood provides a distinct perspective. It gives a more comprehensive evaluation of model fit by integrating over all potential parameter values and weighting them by their posterior probabilities. This formulation considers not only how well the data fits the model, but also prior assumptions about the parameters. By balancing model fit and complexity, log marginal likelihood efficiently reduces the danger of overfitting. It also plays an important role in hyperparameter tuning for Gaussian Processes, directing the selection of appropriate hyperparameters that enhance the model’s explanatory power while being parsimonious.

The log marginal likelihood can be expressed as:

Log Marginal likelihood

Here, y is the vector of observed values, K(X,X) is the covariance matrix computed from the kernel function, σ_n² is the variance of the Gaussian noise, and n is the number of observations.

Components of the Log Marginal Likelihood

The log marginal likelihood is composed of three key terms:

1. Data Fit Term:

-1/2 * y^T * (K(X,X) + σ_n^2 * I)^-1 * y

This term assesses how well the model fits the observed data. A higher value indicates a better fit, as the model can explain the data with greater accuracy.

2. Complexity Penalty Term:

-1/2 * log det(K(X,X) + σ_n^2 * I)

This term penalizes model complexity. Models with more parameters or greater flexibility tend to have higher determinant values. This penalty discourages overly complex models that might overfit the data.

3. Normalization Constant:

-n/2 * log(2π)

This is a constant term that ensures the log marginal likelihood represents a valid probability distribution. It scales the value appropriately to ensure proper normalization.

Gaussian Processes: A Step Up from Simple Bayesian Models

Gaussian Processes (GPs) are a useful and strong tool for regression and classification. GPs model distributions across functions rather than parameters, as traditional models do. They broaden the scope of a Gaussian distribution from finite-dimensional vectors to infinite-dimensional functions, defining random functions rather than just random variables. A GP is characterized by its mean function, which represents the expected value of the process, and its covariance function (kernel), which governs the similarity of distinct points in the input space. The kernel function used is crucial because it conveys assumptions about the function being modeled.

In mathematical terms, a GP can be written as:

where m(x) is the mean function and k(x,x′) is the covariance function.

One of GPs’ key advantages is their ability to produce not only point estimates but also entire prediction distributions with uncertainty measures. This is especially useful in cases where understanding prediction uncertainty is just as important as making predictions. In the context of GPs, the log marginal likelihood is used as an objective function to optimize the kernel hyperparameters. These hyperparameters influence the GP’s behavior, including forecasts and uncertainty estimates. Optimizing them is critical for guaranteeing accurate and dependable model performance.

In scenarios involving large datasets, inverting the covariance matrix can be computationally demanding. The diagonal plus low-rank approximation offers an efficient solution by decomposing the covariance matrix into a sum of a diagonal matrix and a low-rank matrix, significantly reducing computational complexity.

Application to Multi-Asset Portfolios

Notably, Gaussian Processes find valuable applications in multi-asset portfolios, encompassing both futures and stocks. These portfolios often exhibit complex dependencies that traditional techniques struggle to capture. GPs, with their ability to model intricate relationships and quantify uncertainty, are well-suited for this task. One notable application, as highlighted in recent research, is enhancing the Black-Litterman (BL) portfolio optimization model. The BL model is a popular framework that combines investor views with market equilibrium assumptions to construct optimal portfolios. However, it relies on accurate estimates of expected returns, which can be challenging to obtain.

A study published in the IAENG International Journal of Applied Mathematics (“Black-Litterman Portfolio Optimization Using Gaussian Process Regression”) demonstrates how GPs can be effectively employed within the BL framework to improve return predictions and portfolio performance. By leveraging the flexibility of GPs, the study shows that investor views can be generated and integrated into the BL model more effectively, leading to more informed portfolio decisions.

The research found that portfolios constructed using GP-based predictions outperformed benchmark portfolios in terms of cumulative excess return and Sharpe ratio. This highlights the potential of GPs to enhance the BL model and improve investment outcomes in multi-asset portfolios. The study also explored the impact of confidence levels in investor views, finding that higher confidence in GP-derived views led to further improvements in portfolio performance.

By employing GPs, financial analysts can model the joint distribution of asset returns, quantify prediction uncertainty, and optimize portfolio allocations for enhanced risk management. This leads to more robust investment strategies and improved risk-adjusted returns. In their paper “An Overview of Gaussian Process Regression for Volatility Forecasting,” the authors explore the application of Gaussian Process Regression (GPR) for predicting volatility in foreign exchange (FX) markets. They highlight GPR’s ability to capture complex non-linear relationships and provide probabilistic forecasts, which is particularly valuable for volatility modeling.

The paper concludes that GPR, with appropriate kernel selection and hyperparameter tuning, demonstrates promising potential in forecasting FX volatility. It outperforms traditional models like GARCH in certain scenarios, particularly when capturing long-term dependencies and adapting to changing market conditions. However, they also acknowledge the computational challenges associated with GPR for large datasets and emphasize the need for further research to optimize its implementation in real-world financial applications.

Overall, the paper contributes to the growing body of research on applying machine learning techniques like GPR to financial forecasting, showcasing its potential for improving volatility predictions and risk management in the FX market.


In conclusion, the log marginal likelihood is a potent tool for model selection and hyperparameter optimization within Gaussian Processes, striking a balance between model fit and complexity. Gaussian Processes themselves represent a significant leap from simpler Bayesian models, offering full predictive distributions and incorporating uncertainty. In the realm of financial modeling, particularly for multi-asset portfolios, GPs, coupled with the diagonal plus low-rank approximation, provide a powerful framework for capturing complex dependencies and making informed investment decisions.