Documentation
Search…
Production Setup
A W&B Local Server is a Docker container running on your infrastructure that connects to scalable data stores. See the following for instructions for how to provision a new instance.
Check out a video tutorial for getting set up using Terraform on AWS!

Amazon Web Services

The simplest way to configure W&B within AWS is to use our official Terraform. Detailed instructions can be found in the README. If instead you want to configure services manually you can find instructions here.

Microsoft Azure

The simplest way to configure W&B within Azure is to use our official Terraform. Detailed instructions can be found in the README. If instead you want to configure services manually you can find instructions here.

Google Cloud Platform

These instructions assume you have a GKE k8s cluster already configured. They also assume you have gcloud command via the Cloud SDK and docker running locally. This will run the W&B service on the internet. If instead you only want this service running on your internal private network, you'll need to modify the instructions to use a Service of type LoadBalancer with the appropriate annotations and SSL configuration as described here.

Credentials

Create a Service account in the cloud console IAM with the following roles:
    Service Account Token Creator
    Pub/Sub Admin
    Storage Object Admin
    Cloud SQL Client
You can optionally restrict Pub/Sub and Cloud Storage roles to only apply to the subscriptions / bucket we create in future steps
Download a key in JSON format then create the following secret in your kubernetes cluster:
1
kubectl create secret generic wandb-service-account --from-file=key.json=PATH-TO-KEY-FILE.json
Copied!

Storage

Create a bucket in the same region as your k8s cluster.
Navigate to Pub/Sub > Topics in the GCP Console, and click "Create topic". Choose a name and create a topic.
Navigate to Pub/Sub > Subscriptions in the GCP Console, and click "Create subscription". Choose a name, and make sure Delivery Type is set to "Pull". Click "Create".
Finally make sure gcloud is configured with the project you're using and run:
1
gsutil notification create -t TOPIC_NAME -f json gs://BUCKET_NAME
Copied!
Write the following to a file named cors.json
1
[
2
{
3
"origin": [
4
"*"
5
],
6
"responseHeader": ["Content-Type", "x-goog-acl"],
7
"method": ["GET", "HEAD", "PUT"],
8
"maxAgeSeconds": 3600
9
}
10
]
Copied!
Then run:
1
gsutil cors set cors.json gs://BUCKET_NAME
Copied!

Setup MySQL

Provision a Cloud SQL instance with version 5.7 of MySQL in the same region as your other resources. Note the Connection name once the instance has been provisions and replace YOUR_MYSQL_CONNECTION_NAME in the templates below with it.
Connect to your new database:
1
gcloud sql connect YOUR_DATABASE_NAME --user=root --quiet
Copied!
Then run the following SQL:
1
CREATE USER 'wandb_local'@'%' IDENTIFIED BY 'wandb_local';
2
CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
3
GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION;
Copied!

Create a k8s deployment

This assumes you've created a static IP and are using GKE to manage certificates.

Create global static IP

1
gcloud compute addresses create wandb-local-static-ip --global
Copied!
You'll want to configure a DNS entry that points to this IP once it's been created, referred to as YOUR_DNS_NAME below.

Create k8s deployment with a Google managed certificate

1
apiVersion: apps/v1
2
kind: Deployment
3
metadata:
4
name: wandb
5
labels:
6
app: wandb
7
spec:
8
strategy:
9
type: RollingUpdate
10
replicas: 1
11
selector:
12
matchLabels:
13
app: wandb
14
template:
15
metadata:
16
labels:
17
app: wandb
18
spec:
19
containers:
20
- name: cloud-sql-proxy
21
# It is recommended to use the latest version of the Cloud SQL proxy
22
# Make sure to update on a regular schedule!
23
image: gcr.io/cloudsql-docker/gce-proxy:1.20.2
24
command:
25
- "/cloud_sql_proxy"
26
# If connecting from a VPC-native GKE cluster, you can use the
27
# following flag to have the proxy connect over private IP
28
# - "-ip_address_types=PRIVATE"
29
- "-instances=YOUR_MYSQL_CONNECTION_NAME=tcp:3306"
30
- "-credential_file=/secrets/key.json"
31
securityContext:
32
# The default Cloud SQL proxy image runs as the
33
# "nonroot" user and group (uid: 65532) by default.
34
runAsNonRoot: true
35
volumeMounts:
36
- name: wandb-service-account
37
mountPath: /secrets/
38
readOnly: true
39
resources:
40
requests:
41
cpu: 100m
42
memory: 32Mi
43
limits:
44
cpu: 1000m
45
memory: 512Mi
46
- name: wandb
47
env:
48
- name: BUCKET
49
value: gs://YOUR_BUCKET_NAME
50
- name: BUCKET_QUEUE
51
value: pubsub:/PROJECT_NAME/TOPIC_NAME/SUBSCRIPTION_NAME
52
- name: GOOGLE_APPLICATION_CREDENTIALS
53
value: /var/secrets/google/key.json
54
- name: MYSQL
55
value: mysql://wandb_local:[email protected]:3306/wandb_local
56
- name: LICENSE
57
value: $REPLACE_ME_WITH_YOUR_LICENSE
58
imagePullPolicy: Always
59
image: wandb/local:latest
60
command:
61
- sh
62
- -c
63
- "sleep 10 && /sbin/my_init"
64
ports:
65
- name: http
66
containerPort: 8080
67
protocol: TCP
68
volumeMounts:
69
- name: wandb-service-account
70
mountPath: /var/secrets/google
71
livenessProbe:
72
httpGet:
73
path: /healthz
74
port: http
75
readinessProbe:
76
httpGet:
77
path: /ready
78
port: http
79
resources:
80
requests:
81
cpu: "1500m"
82
memory: 4G
83
limits:
84
cpu: "4000m"
85
memory: 8G
86
volumes:
87
- name: wandb-service-account
88
secret:
89
secretName: wandb-service-account
90
---
91
apiVersion: v1
92
kind: Service
93
metadata:
94
name: wandb-service
95
spec:
96
type: NodePort
97
selector:
98
app: wandb
99
ports:
100
- protocol: TCP
101
port: 80
102
targetPort: 8080
103
---
104
apiVersion: extensions/v1beta1
105
kind: Ingress
106
metadata:
107
name: wandb-ingress
108
annotations:
109
kubernetes.io/ingress.global-static-ip-name: wandb-local-static-ip
110
networking.gke.io/managed-certificates: wandb-local-cert
111
spec:
112
defaultBackend:
113
service:
114
name: wandb-service
115
port:
116
number: 80
117
---
118
apiVersion: networking.gke.io/v1beta1
119
kind: ManagedCertificate
120
metadata:
121
name: wandb-local-cert
122
spec:
123
domains:
124
- YOUR_DNS_DOMAIN
Copied!

Configure your instance

Go to https://YOUR_DNS_DOMAIN in your browser, and create your account. Choose "System Settings" from the menu in the upper right.
Input the LICENSE we've provided and click "Use and external file storage backend". Specify: pubsub:/PROJECT/TOPIC/SUBSCRIPTION in the Notification Subscription section.
Input https://YOUR_DNS_DOMAIN in the Frontend Host section. Click update settings.

Verify your installation

On a machine with python run:
1
pip install wandb
2
wandb login --host=https://YOUR_DNS_DOMAIN
3
wandb verify
Copied!

On Premise / Baremetal

W&B depends on scalable data stores that must be configured and managed by your operations team. The team must provide a MySQL 5.7 database server and an S3 compatible object store for the application to scale properly.

MySQL 5.7

W&B only supports MySQL 5.7 we do not support MySQL 8 or any other SQL engine.
There are a number of enterprise services that make operating a scalable MySQL database simpler. We suggest looking into one of the following solutions:
The most important things to consider when running your own MySQL 5.7 database are:
    1.
    Backups. You should be periodically backing up the database to a separate facility. We suggest daily backups with at least 1 week of retention.
    2.
    Performance. The disk the server is running on should be fast. We suggest running the database on an SSD or accelerated NAS.
    3.
    Monitoring. The database should be monitored for load. If CPU usage is sustained at > 40% of the system for more than 5 minutes it's likely a good indication the server is resource starved.
    4.
    Availability. Depending on your availability and durability requirements you may want to configure a hot standby on a separate machine that streams all updates in realtime from the primary server and can be used to failover to incase the primary server crashes or become corrupted.
Once you've provisioned a MySQL 5.7 database you can create a database and user using the following SQL (replacing SOME_PASSWORD).
1
CREATE USER 'wandb_local'@'%' IDENTIFIED BY 'SOME_PASSWORD';
2
CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
3
GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION;
Copied!

Object Store

The S3 compatible object store can be an externally hosted Minio cluster, or W&B supports any S3 compatible object store that has support for signed urls. To see if your object store supports signed urls, you can run the following script. When connecting to an s3 compatible object store you can specify your credentials in the connection string, i.e.
1
s3://$ACCESS_KEY:[email protected]$HOST/$BUCKET_NAME
Copied!
By default we assume 3rd party object stores are not running over HTTPS. If you've configured a trusted SSL certificate for your object store, you can tell us to only connect over tls by adding the tls query parameter to the url, i.e.
This will only work if the SSL certificate is trusted. We do not support self signed certificates.
1
s3://$ACCESS_KEY:[email protected]$HOST/$BUCKET_NAME?tls=true
Copied!
When using 3rd party object stores, you'll want to set BUCKET_QUEUE to internal://. This tells the W&B server to manage all object notification internally instead of depending on SQS.
The most important things to consider when running your own Object store are:
    1.
    Storage capacity and performance. It's fine to use magnetic disks, but you should be monitoring the capacity of these disks. Average W&B usage results in 10's to 100's of Gigabytes. Heavy usage could result in Petabytes of storage consumption.
    2.
    Fault tolerance. At a minimum the physical disk storing the objects should be on a RAID array. Consider running Minio in distributed mode.
    3.
    Availability. Monitoring should be configured to ensure the storage is available.
There are many enterprise alternatives to running your own object storage service such as:

Minio setup

If you're using minio, you can run the following commands to create a bucket .
1
mc config host add local http://$MINIO_HOST:$MINIO_PORT "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY" --api s3v4
2
mc mb --region=us-east1 local/local-files
Copied!

Kubernetes Deployment

The following k8s yaml can be customized but should serve as a basic foundation for configuring local in kubernetes.
1
apiVersion: apps/v1
2
kind: Deployment
3
metadata:
4
name: wandb
5
labels:
6
app: wandb
7
spec:
8
strategy:
9
type: RollingUpdate
10
replicas: 1
11
selector:
12
matchLabels:
13
app: wandb
14
template:
15
metadata:
16
labels:
17
app: wandb
18
spec:
19
containers:
20
- name: wandb
21
env:
22
- name: LICENSE
23
value: XXXXXXXXXXXXXXX
24
- name: BUCKET
25
value: s3://$ACCESS_KEY:[email protected]$HOST/$BUCKET_NAME
26
- name: BUCKET_QUEUE
27
value: internal://
28
- name: AWS_REGION
29
value: us-east1
30
- name: MYSQL
31
value: mysql://$USERNAME:[email protected]$HOSTNAME/$DATABASE
32
imagePullPolicy: IfNotPresent
33
image: wandb/local:latest
34
ports:
35
- name: http
36
containerPort: 8080
37
protocol: TCP
38
volumeMounts:
39
- name: wandb
40
mountPath: /vol
41
livenessProbe:
42
httpGet:
43
path: /healthz
44
port: http
45
readinessProbe:
46
httpGet:
47
path: /ready
48
port: http
49
resources:
50
requests:
51
cpu: "2000m"
52
memory: 4G
53
limits:
54
cpu: "4000m"
55
memory: 8G
56
---
57
apiVersion: v1
58
kind: Service
59
metadata:
60
name: wandb-service
61
spec:
62
type: NodePort
63
selector:
64
app: wandb
65
ports:
66
- protocol: TCP
67
port: 80
68
targetPort: 8080
69
---
70
apiVersion: extensions/v1beta1
71
kind: Ingress
72
metadata:
73
name: wandb-ingress
74
annotations:
75
kubernetes.io/ingress.class: nginx
76
spec:
77
backend:
78
serviceName: wandb-service
79
servicePort: 80
Copied!
The k8s YAML above should work in most on-premise installations. However the details of your Ingress and optional SSL termination will vary. See networking below.

Openshift

W&B supports operating from within an Openshift kubernetes cluster. Simply follow the instructions in the kubernetes deployment section above.

Docker

You can run wandb/local on any instance that also has Docker installed. We suggest at least 8GB of RAM and 4vCPU's. Simply run the following command to launch the container:
1
docker run --rm -d \
2
-e LICENSE=XXXXX \
3
-e BUCKET=s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME \
4
-e BUCKET_QUEUE=internal:// \
5
-e AWS_REGION=us-east1 \
6
-e MYSQL=mysql://$USERNAME:$PASSWORD@$HOSTNAME/$DATABASE \
7
-p 8080:8080 --name wandb-local wandb/local
Copied!
You'll want to configure a process manager to ensure this process is restarted if it crashes. A good overview of using SystemD to do this can be found here.

Networking

Load Balancer

You'll want to run a load balancer that terminates network requests at the appropriate network boundary. Some customers expose their wandb service on the internet, others only expose it on an internal VPN/VPC. It's important that both the machines being used to execute machine learning payloads and the devices users access the service through web browsers can communicate to this endpoint. Common load balancers include:
    2.
    Istio
    3.
    Caddy
    4.
    5.
    Apache
    6.
    HAProxy

SSL / TLS

The W&B application server does not terminate SSL. If your security policies require SSL communication within your trusted networks consider using a tool like Istio and side car containers. The load balancer itself should terminate SSL with a valid certificate. Using self-signed certificates is not supported and will cause a number of challenges for users. If possible using a service like Let's Encrypt is a great way to provided trusted certificates to your load balancer. Services like Caddy and Cloudflare manage SSL for you.

Verifying your installation

Regardless of how your server was installed, it's a good idea everything is configured properly. W&B makes it easy to verify everything is properly configured by using our CLI.
1
pip install wandb
2
wandb login --host=https://YOUR_DNS_DOMAIN
3
wandb verify
Copied!
If you see any errors contact W&B support staff. You can also see any errors the application hit at startup by checking the logs.

Docker

1
docker logs wandb-local
Copied!

Kubernetes

1
kubectl get pods
2
kubectl logs wandb-XXXXX-XXXXX
Copied!
Last modified 18d ago