Deploy W&B Platform on Azure
W&B recommends fully managed deployment options such as W&B Multi-tenant Cloud or W&B Dedicated Cloud deployment types. W&B fully managed services are simple and secure to use, with minimum to no configuration required.
If you've determined to self-managed W&B Server, W&B recommends using the W&B Server Azure Terraform Module to deploy the platform on Azure.
The module documentation is extensive and contains all available options that can be used. We will cover some deployment options in this document.
Before you start, we recommend you choose one of the remote backends available for Terraform to store the State File.
The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components.
The Terraform Module will deploy the following mandatory
components:
- Azure Resource Group
- Azure Virtual Network (VPC)
- Azure MySQL Fliexible Server
- Azure Storage Account & Blob Storage
- Azure Kubernetes Service
- Azure Application Gateway
Other deployment options can also include the following optional components:
- Azure Cache for Redis
- Azure Event Grid
Pre-requisite permissions
The simplest way to get the AzureRM provider configured is via Azure CLI but the incase of automation using Azure Service Principal can also be useful. Regardless the authentication method used, the account that will run the Terraform needs to be able to create all components described in the Introduction.
General steps
The steps on this topic are common for any deployment option covered by this documentation.
- Prepare the development environment.
- Install Terraform
- We recommend creating a Git repository with the code that will be used, but you can keep your files locally.
-
Create the
terraform.tfvars
file Thetvfars
file content can be customized according to the installation type, but the minimum recommended will look like the example below.namespace = "wandb"
wandb_license = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz"
subdomain = "wandb-aws"
domain_name = "wandb.ml"
location = "westeurope"The variables defined here need to be decided before the deployment because. The
namespace
variable will be a string that will prefix all resources created by Terraform.The combination of
subdomain
anddomain
will form the FQDN that W&B will be configured. In the example above, the W&B FQDN will bewandb-aws.wandb.ml
and the DNSzone_id
where the FQDN record will be created. -
Create the file
versions.tf
This file will contain the Terraform and Terraform provider versions required to deploy W&B in AWS
terraform {
required_version = "~> 1.3"
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.17"
}
}
}
Refer to the Terraform Official Documentation to configure the AWS provider.
Optionally, but highly recommended, you can add the remote backend configuration mentioned at the beginning of this documentation.
- Create the file
variables.tf
. For every option configured in theterraform.tfvars
Terraform requires a correspondent variable declaration.
variable "namespace" {
type = string
description = "String used for prefix resources."
}
variable "location" {
type = string
description = "Azure Resource Group location"
}
variable "domain_name" {
type = string
description = "Domain for accessing the Weights & Biases UI."
}
variable "subdomain" {
type = string
default = null
description = "Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route."
}
variable "license" {
type = string
description = "Your wandb/local license"
}
Recommended deployment
This is the most straightforward deployment option configuration that will create all Mandatory
components and install in the Kubernetes Cluster
the latest version of W&B
.
- Create the
main.tf
In the same directory where you created the files in theGeneral Steps
, create a filemain.tf
with the following content:
provider "azurerm" {
features {}
}
provider "kubernetes" {
host = module.wandb.cluster_host
cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
client_key = base64decode(module.wandb.cluster_client_key)
client_certificate = base64decode(module.wandb.cluster_client_certificate)
}
provider "helm" {
kubernetes {
host = module.wandb.cluster_host
cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate)
client_key = base64decode(module.wandb.cluster_client_key)
client_certificate = base64decode(module.wandb.cluster_client_certificate)
}
}
# Spin up all required services
module "wandb" {
source = "wandb/wandb/azurerm"
version = "~> 1.2"
namespace = var.namespace
location = var.location
license = var.license
domain_name = var.domain_name
subdomain = var.subdomain
deletion_protection = false
tags = {
"Example" : "PublicDns"
}
}
output "address" {
value = module.wandb.address
}
output "url" {
value = module.wandb.url
}
-
Deploy to W&B To deploy W&B, execute the following commands:
terraform init
terraform apply -var-file=terraform.tfvars
Deployment with REDIS Cache
Another deployment option uses Redis
to cache the SQL queries and speed up the application response when loading the metrics for the experiments.
You must add the option create_redis = true
to the same main.tf
file that you used in recommended deployment to enable the cache.
# Spin up all required services
module "wandb" {
source = "wandb/wandb/azurerm"
version = "~> 1.2"
namespace = var.namespace
location = var.location
license = var.license
domain_name = var.domain_name
subdomain = var.subdomain
create_redis = true # Create Redis
[...]
Deployment with External Queue
Deployment option 3 consists of enabling the external message broker
. This is optional because the W&B brings embedded a broker. This option doesn't bring a performance improvement.
The Azure resource that provides the message broker is the Azure Event Grid
, and to enable it, you must add the option use_internal_queue = false
to the same main.tf
that you used in the recommended deployment
# Spin up all required services
module "wandb" {
source = "wandb/wandb/azurerm"
version = "~> 1.2"
namespace = var.namespace
location = var.location
license = var.license
domain_name = var.domain_name
subdomain = var.subdomain
use_internal_queue = false # Enable Azure Event Grid
[...]
}
Other deployment options
You can combine all three deployment options adding all configurations to the same file. The Terraform Module provides several options that you can combine along with the standard options and the minimal configuration found in recommended deployment