Unlocking the Power of Bijectors in TensorFlow Probability: Transforming Distributions and Beyond!

12 min readApr 27, 2023

Dive into the world of bijectors with TensorFlow Probability and unleash their potential to create complex and expressive probability distributions! In this comprehensive guide, we explore the foundations of bijectors, their wide-ranging applications, and how to create custom bijectors for your specific needs. From transforming base distributions to enabling advanced techniques like normalizing flows and variational inference, bijectors are powerful tools that can revolutionize your probabilistic modeling journey. Join us as we unravel the mysteries of bijectors and open up new possibilities in machine learning and data science!

I. Introduction

A. Overview of TensorFlow Probability

TensorFlow Probability (TFP) is a powerful library that extends TensorFlow’s capabilities by adding a suite of tools for probabilistic modelling, inference, and computation. It is designed for both researchers and practitioners, offering a robust platform for developing and deploying probabilistic models in various fields, including finance, healthcare, and natural sciences. TFP integrates seamlessly with TensorFlow, allowing you to utilize TensorFlow’s rich ecosystem and hardware acceleration capabilities. Key features of TFP include a comprehensive collection of probability distributions, bijectors, and advanced algorithms for inference, optimization, and sampling.

B. Importance of Probabilistic Modeling in Machine Learning and Data Science

Probabilistic modelling is a cornerstone of machine learning and data science, as it allows you to represent uncertainty, learn from incomplete or noisy data, and make predictions with quantified confidence. By incorporating uncertainty into the modelling process, probabilistic models offer several advantages over deterministic models, including:

Robustness: Probabilistic models can handle noisy or incomplete data more effectively, leading to more reliable predictions and better generalization to new, unseen data.
Interpretability: By providing a principled framework for quantifying uncertainty, probabilistic models enable more transparent decision-making, which is crucial in domains where understanding the reasoning behind predictions is as important as the predictions themselves.
Flexibility: Probabilistic modelling provides a versatile framework for representing complex relationships between variables and for incorporating prior knowledge or domain expertise into the model.
Decision-making: Probabilistic models naturally support decision-making under uncertainty by providing probability distributions over possible outcomes, allowing you to optimize decisions based on risk tolerance and other criteria.

C. Introduction to Bijectors

Bijectors are invertible, smooth functions that transform one probability distribution into another. They play a central role in probabilistic modelling, as they allow you to express complex distributions in terms of simpler, more familiar ones. This is particularly useful when you need to work with distributions that are difficult to sample from or compute probabilities for directly. Bijectors can be used to:

Transform base distributions: By applying bijectors, you can create more complex distributions from simpler ones, such as transforming a standard Gaussian distribution into a more complicated, multi-modal distribution.
Compose and chain transformations: Bijectors can be combined in various ways to create more complex transformations, allowing you to build sophisticated models and perform advanced operations like normalizing flows.
Improve computational efficiency: Bijectors can help reduce the computational complexity of certain operations, such as computing gradients in optimization problems, by transforming the problem into a more convenient space.
Simplify modelling: By transforming complex distributions into more interpretable, easy-to-understand base distributions, bijectors can make developing and analysing probabilistic models easier.

In the following sections, we will dive deeper into the concept of bijectors, explore built-in and custom bijectors in TensorFlow Probability, and discuss practical examples and use cases to illustrate their importance and versatility.

II. Understanding Bijectors

A. Definition and properties of bijectors

A bijector is a smooth, invertible function that maps one probability distribution onto another. It has several important properties that make it suitable for transforming distributions:

Invertibility: A bijector must be invertible, meaning that it has a well-defined inverse function. This property ensures that the transformation can be reversed, allowing you to go back and forth between the original and transformed distributions.
Smoothness: Bijectors should be smooth (i.e., differentiable) functions, which is essential for efficient computation of gradients during optimization and sampling.
Preserving probability: A key property of bijectors is that they preserve the total probability when transforming a distribution. In other words, the transformed distribution’s probability density function (PDF)must integrate to one, just like the original distribution.

B. Types of bijectors

Bijectors can be broadly categorized into several types based on their functional form and properties:

Affine bijectors: These bijectors perform linear transformations, such as scaling, shifting, or rotating the input space. Examples include the Scale bijector, which multiplies the input by a constant, and the Shift bijector, which adds a constant to the input. Affine bijectors are particularly useful for transforming distributions with known location and scale parameters, such as Gaussian or Laplace distributions.
Nonlinear bijectors: These bijectors apply more complex, nonlinear transformations to the input space. Examples include the Sigmoid bijector, which maps the input to the interval (0, 1), and the Exp bijector, which exponentiates the input. Nonlinear bijectors are often used to transform distributions with specific constraints or to model complex relationships between variables.
Invertible bijectors: Some bijectors are designed to be easily invertible, facilitating efficient forward and inverse transformations computation. Examples include the Reshape bijector, which changes the shape of the input without altering its content, and the Permute bijector, which rearranges the input elements according to a specified permutation. Invertible bijectors are particularly useful for constructing normalizing flows and other advanced probabilistic models.

C. Importance of bijectors in probabilistic modeling

Bijectors play a crucial role in probabilistic modeling for several reasons:

Expressing complex distributions: By transforming simpler base distributions, bijectors enable you to create and work with more complex distributions that may be difficult to represent or sample from directly.
Simplifying models: Bijectors can help simplify modeling by allowing you to work with more interpretable, easy-to-understand base distributions, which can make the modeling process more transparent and easier to debug.
Enhancing computational efficiency: By transforming the problem space, bijectors can sometimes reduce the computational complexity of certain operations, such as gradient computation in optimization problems, leading to faster convergence and more efficient training.
Facilitating advanced techniques: Bijectors are essential building blocks for advanced probabilistic modeling techniques like variational inference, normalizing flows, and Bayesian neural networks.

D. Bijector operations

Bijectors perform several key operations that are essential for transforming distributions:

Forward transformation: The forward transformation maps the input from the base distribution’s space to the transformed distribution’s space using the bijector’s forward function.
Inverse transformation: The inverse transformation maps the input from the transformed distribution’s space back to the base distribution’s space using the bijector’s inverse function. This operation is crucial for computing probabilities and sampling from the transformed distribution.
Forward and inverse log determinant Jacobians: When transforming a distribution using a bijector, it’s essential to account for the change in the probability density induced by the transformation. This change is captured by the forward and inverse log determinant of the Jacobian matrix of the bijector.

The Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function, and its determinant measures the local “stretching” or “shrinking” factor when applying the bijector. The log determinant of the Jacobian is used to adjust the probability density of the transformed distribution.

In summary, bijectors play a crucial role in probabilistic modeling by allowing you to express complex distributions in terms of simpler ones, improving computational efficiency, and facilitating advanced techniques. They perform several key operations, including forward and inverse transformations, as well as computing the forward and inverse log determinant Jacobians, to ensure the transformed distributions preserve the total probability.

III. TensorFlow Probability and Bijectors

A. Overview of TensorFlow Probability bijectors

TensorFlow Probability (TFP) provides a rich set of built-in bijectors that cover a wide range of transformations, making it easy to work with complex distributions and advanced probabilistic models. These bijectors are designed to be modular and composable, allowing you to chain and combine them in various ways to create custom transformations. TFP bijectors also integrate seamlessly with TensorFlow’s computation graph, enabling efficient gradient-based optimization and hardware acceleration.

B. Built-in bijectors in TensorFlow Probability

AffineBijector: The AffineBijector represents an affine transformation, which includes linear transformations such as scaling, shifting, and rotation. This bijector can be used to transform distributions with known location and scale parameters. For example, it can be used to transform a standard Gaussian distribution into a Gaussian distribution with a specified mean and standard deviation.
ExpBijector: The ExpBijector applies the exponential function to the input, transforming it into a positive-valued output. This bijector is particularly useful for working with log-normal distributions or transforming a Gaussian distribution to model positive-valued data.
SigmoidBijector: The SigmoidBijector maps the input to the interval (0, 1) using the sigmoid function. This bijector is commonly used to transform distributions with support on the entire real line, such as the Gaussian distribution, to model probabilities or proportions.
SoftplusBijector: The SoftplusBijector applies the softplus function to the input, which is a smooth approximation of the rectifier (ReLU) function. This bijector is useful for transforming distributions to have positive support, similar to the ExpBijector but with a different functional form.
BatchNormalization: The BatchNormalization bijector applies batch normalization to the input, which is a technique for improving the training of deep neural networks. It can also be used as a bijector to transform the input distribution in normalizing flows or other probabilistic models.
Chain: The Chain bijector allows you to compose multiple bijectors in a sequence, applying the transformations one after the other. This bijector is particularly useful for constructing complex transformations by combining simpler bijectors.

C. Creating custom bijectors in TensorFlow Probability

In addition to the built-in bijectors, TFP allows you to create custom bijectors by subclassing the tfp.bijectors.Bijector class and implementing the required methods, such as _forward, _inverse, _forward_log_det_jacobian, and _inverse_log_det_jacobian. This flexibility enables you to create specialized bijectors for specific use cases or implement novel transformations not available in the built-in set.

D. Composing and chaining bijectors

TFP bijectors are designed to be composable, allowing you to create complex transformations by combining simpler ones. To compose bijectors, you can use the Chain bijector, which applies a sequence of bijectors in the specified order. This enables you to build sophisticated models and transformations by chaining multiple bijectors together.

For example, you can chain an ExpBijector and a SigmoidBijector to transform a Gaussian distribution into a logit-normal distribution, which has support on the open interval (0, 1). This can be useful for modeling probabilities or proportions with a more flexible distribution than the Beta distribution.

In summary, TensorFlow Probability provides a comprehensive set of built-in bijectors and supports the creation of custom bijectors, allowing you to build complex transformations and probabilistic models with ease. By composing and chaining bijectors, you can create sophisticated and versatile probabilistic models tailored to your specific needs.

IV. Practical Examples of Bijectors in TensorFlow Probability

A. Transforming a base distribution

1. Example: Transforming a Gaussian distribution using an ExpBijector

Suppose you want to model positive-valued data using a Gaussian distribution as a base distribution. You can transform the Gaussian distribution using an ExpBijector, resulting in a log-normal distribution. Here’s an example:


import tensorflow_probability as tfp
import numpy as np
import matplotlib.pyplot as plt
tfd = tfp.distributions
tfb = tfp.bijectors
# Base distribution: Gaussian
base_dist = tfd.Normal(loc=0., scale=1.)
# Bijector: Exp
exp_bijector = tfb.Exp()
# Transformed distribution: Log-normal
log_normal = tfd.TransformedDistribution(
 distribution=base_dist,
 bijector=exp_bijector
)
# Plot the PDFs
x = np.linspace(-3, 3, 100)
y = np.linspace(0.001, 10, 100)
plt.plot(x, base_dist.prob(x), label='Gaussian PDF')
plt.plot(y, log_normal.prob(y), label='Log-normal PDF')
plt.legend()
plt.show()

Another example for getting inverse Gamma distribution by transforming Gamma:

dist = tfd.TransformedDistribution(
    # gamma distribution
    tfd.Gamma(concentration=3., rate=.5),
    #tfb.Identity(),
    # bijector for the transformation g(X) = 1 / X
    

    tfb.Reciprocal(),

    
)
sns.histplot(dist.sample(1000, seed=random.PRNGKey(0)), kde=True, stat='density')
# label the axes and create more appropriate limits
plt.xlabel("X")
plt.ylabel("density")
# label the plot
plt.title(f"density plot of $g(X) \sim {dist.name} (alpha=3, beta=0.5)$")
plt.show()

2. Example: Transforming a Beta distribution using a SigmoidBijector

Suppose you want to model data on the entire real line using a Beta distribution, which has support in the interval (0, 1), as a base distribution. You can transform the Beta distribution using a SigmoidBijector, resulting in a logit-normal distribution:


# Base distribution: Beta
base_dist = tfd.Beta(concentration1=2., concentration0=2.)
# Bijector: Sigmoid
sigmoid_bijector = tfb.Sigmoid()
# Transformed distribution: Logit-normal
logit_normal = tfd.TransformedDistribution(
 distribution=base_dist,
 bijector=sigmoid_bijector
)
# Plot the PDFs
x = np.linspace(0.001, 0.999, 100)
y = np.linspace(-6, 6, 100)
plt.plot(x, base_dist.prob(x), label='Beta PDF')
plt.plot(y, logit_normal.prob(y), label='Logit-normal PDF')
plt.legend()
plt.show()

B. Variational inference with bijectors

1. Background on variational inference

Variational inference is a powerful technique for approximating intractable posterior distributions in Bayesian modeling. It involves finding a tractable family of distributions (known as the variational family) and optimizing its parameters to minimize the divergence between the true posterior and the variational approximation. Bijectors can be used to transform the variational family, enabling more expressive and flexible approximations.

2. Example: Bayesian linear regression with variational inference

In this example, we’ll perform variational inference on a Bayesian linear regression model. We’ll use a Gaussian distribution with a trainable mean and covariance matrix as the variational family, and we’ll employ the SoftplusBijector to ensure that the covariance matrix is positive definite.


import tensorflow as tf
# Generate synthetic data
np.random.seed(42)
X = np.random.randn(100, 1)
y = 2 * X[:, 0] + 1 + 0.1 * np.random.randn(100)
# Define the model
model = tf.keras.Sequential([
 tf.keras.layers.Dense(1, input_shape=(1,), activation='linear', use_bias=True)
])
# Define the priors
prior_weights = tfd.Normal(loc=0., scale=1.)
prior_bias = tfd.Normal(loc=0., scale=1.)
# Define the likelihood
likelihood = tfd.Normal(loc=model(X), scale=0.1)
# Joint distribution
joint = tfd.JointDistributionSequential([
 prior_weights, prior_bias, likelihood
])
# Variational family
q_loc = tf.Variable(tf.zeros(2))
q_scale_raw = tf.Variable(tf.zeros(2))
softplus_bijector = tfb.Softplus()
q_scale = softplus_bijector.forward(q_scale_raw)
q = tfd.Normal(loc=q_loc, scale=q_scale)
# Perform variational inference
losses = tfp.vi.fit_surrogate_posterior(
 target_log_prob_fn=joint.log_prob,
 surrogate_posterior=q,
 optimizer=tf.optimizers.Adam(learning_rate=0.01),
 num_steps=1000
)
# Plot the loss
plt.plot(losses)
plt.xlabel('Step')
plt.ylabel('Negative ELBO')
plt.show()

C. Normalizing flows with bijectors

1. Background on normalizing flows

Normalizing flows are a class of generative models that learn complex, high-dimensional probability distributions by transforming a simple base distribution through a sequence of invertible mappings. Bijectors play a crucial role in constructing normalizing flows, as they define the invertible transformations.

2. Example: Training a normalizing flow with real-world data

In this example, we’ll train a normalizing flow to model the Old Faithful geyser dataset, which consists of eruption durations and waiting times between eruptions. We’ll use the RealNVP bijector, which is a popular choice for normalizing flows.


import seaborn as sns
# Load the Old Faithful dataset
data = sns.load_dataset('geyser').values
data = (data - data.mean(axis=0)) / data.std(axis=0)
# Base distribution
base_dist = tfd.MultivariateNormalDiag(loc=tf.zeros(2), scale_diag=tf.ones(2))
# Bijector: RealNVP
num_hidden_units = 32
num_masked = 1
bijector = tfb.RealNVP(
 num_masked=num_masked,
 shift_and_log_scale_fn=tfb.real_nvp_default_template(hidden_units=[num_hidden_units])
)
# Normalizing flow
flow = tfd.TransformedDistribution(
 distribution=base_dist,
 bijector=bijector
)
# Training the flow
optimizer = tf.optimizers.Adam(learning_rate=0.001)
for step in range(1000):
  with tf.GradientTape() as tape:
    loss = -tf.reduce_mean(flow.log_prob(data))
  gradients = tape.gradient(loss, flow.trainable_variables)
  optimizer.apply_gradients(zip(gradients, flow.trainable_variables))
  if step % 100 == 0:
    print(f"Step {step}, Loss: {loss.numpy()}")

# Plot the results
samples = flow.sample(1000).numpy()
plt.scatter(data[:, 0], data[:, 1], label='Data', alpha=0.5)
plt.scatter(samples[:, 0], samples[:, 1], label='Samples', alpha=0.5)
plt.legend()
plt.show()

Extending TensorFlow Probability with custom bijector implementations:

Creating a custom bijector in TensorFlow Probability involves several key steps, such as inheriting from the Bijector class, implementing forward and inverse transformations, and ensuring chainability with other bijectors. Below, we provide an outline of these steps, along with an example of a custom bijector:

Inheriting from Bijector: When creating a custom bijector, inherit from the tfp.bijectors.Bijector class and implement the required methods, such as _forward, _inverse, and _forward_log_det_jacobian.
Invertible bijectors: If your custom bijector is invertible, ensure that you implement both forward and inverse transformations. This will allow your bijector to be used in a broader range of applications, such as normalizing flows and variational inference.
Chainable bijectors: Consider making your custom bijector chainable with other bijectors by implementing the _forward_event_shape_tensor and _inverse_event_shape_tensor methods. This will enable more complex transformations and compositions of bijectors.

import tensorflow_probability as tfp
tfb = tfp.bijectors

class CustomBijector(tfb.Bijector):
    def __init__(self, **kwargs):
        super(CustomBijector, self).__init__(**kwargs)

    def _forward(self, x):
        # Implement the forward transformation
        return ...

    def _inverse(self, y):
        # Implement the inverse transformation
        return ...

    def _forward_log_det_jacobian(self, x):
        # Implement the log determinant of the Jacobian for the forward transformation
        return ...

If your custom bijector is invertible, make sure to implement both forward and inverse transformations. Additionally, consider making your custom bijector chainable with other bijectors by implementing the _forward_event_shape_tensor and _inverse_event_shape_tensor methods.

class CustomBijector(tfb.Bijector):
    def __init__(self, **kwargs):
        super(CustomBijector, self).__init__(**kwargs)

    def _forward(self, x):
        # Implement the forward transformation
        return ...

    def _inverse(self, y):
        # Implement the inverse transformation
        return ...

    def _forward_log_det_jacobian(self, x):
        # Implement the log determinant of the Jacobian for the forward transformation
        return ...

    def _forward_event_shape_tensor(self, input_shape):
        # Implement the event shape transformation for the forward transformation
        return ...

    def _inverse_event_shape_tensor(self, output_shape):
        # Implement the event shape transformation for the inverse transformation
        return ...

By following these steps, you can create custom bijectors tailored to your specific needs, enabling more complex transformations and compositions of bijectors within TensorFlow Probability.

V. Conclusion

A. Recap of the importance and applications of bijectors

Throughout this blog post, we have explored the importance and applications of bijectors in TensorFlow Probability. Bijectors are powerful tools that enable the creation of complex and expressive probability distributions by transforming base distributions. They play a critical role in advanced techniques such as normalizing flows and variational inference, providing essential functionality to a wide range of applications in machine learning and data science.

B. Future directions and research in bijectors and probabilistic modeling

As probabilistic modeling continues to gain traction and influence within the machine learning community, we can expect bijectors and related techniques to become even more important and widespread. Future directions for bijectors in TensorFlow Probability may include the development of new types of bijectors, more efficient implementations, and seamless integration with emerging machine learning libraries and frameworks. Additionally, research in probabilistic modeling may lead to novel applications for bijectors, further expanding their impact and utility within the machine learning and data science domains.

C. Encouragement to explore and experiment with bijectors in TensorFlow Probability

We encourage you to explore and experiment with bijectors in TensorFlow Probability to create sophisticated and versatile probabilistic models for various applications. By understanding and leveraging the power of bijectors, you can enable more robust and accurate predictions, quantify uncertainty, and ultimately push the boundaries of machine learning and data science. The flexibility and expressiveness of bijectors make them an indispensable tool in the world of probabilistic modeling, and their potential for innovation and discovery is vast.