Deploy W&B Platform On-premises
8 minute read
Reach out to the W&B Sales Team for related question: contact@wandb.com.
Infrastructure guidelines
Before you start deploying W&B, refer to the reference architecture, especially the infrastructure requirements.
Application server
W&B recommends deploying W&B Server into its own namespace and a two availability zone node group with the following specifications to provide the best performance, reliability, and availability:
Specification | Value |
---|---|
Bandwidth | Dual 10 Gigabit+ Ethernet Network |
Root Disk Bandwidth (Mbps) | 4,750+ |
Root Disk Provision (GB) | 100+ |
Core Count | 4 |
Memory (GiB) | 8 |
This ensures that W&B Server has sufficient disk space to process the application data and store temporary logs before they are externalized.
It also ensures fast and reliable data transfer, the necessary processing power and memory for smooth operation, and that W&B will not be affected by any noisy neighbors.
It is important to keep in mind that these specifications are minimum requirements, and actual resource needs may vary depending on the specific usage and workload of the W&B application. Monitoring the resource usage and performance of the application is critical to ensure that it operates optimally and to make adjustments as necessary.
Database server
W&B recommends a MySQL 8 database as a metadata store. The shape of the model parameters and related metadata impact the performance of the database. The database size grows as the ML practitioners track more training runs, and incurs read heavy load when queries are executed in run tables, users workspaces, and reports.
To ensure optimal performance W&B recommends deploying the W&B database on to a server with the following starting specs:
Specification | Value |
---|---|
Bandwidth | Dual 10 Gigabit+ Ethernet Network |
Root Disk Bandwidth (Mbps) | 4,750+ |
Root Disk Provision (GB) | 1000+ |
Core Count | 4 |
Memory (GiB) | 32 |
Again, W&B recommends monitoring the resource usage and performance of the database to ensure that it operates optimally and to make adjustments as necessary.
Additionally, W&B recommends the following parameter overrides to tune the DB for MySQL 8.
Object storage
W&B is compatible with an object storage that supports S3 API interface, Signed URLs and CORS. W&B recommends specifying the storage array to the current needs of your practitioners and to capacity plan on a regular cadence.
More details on object store configuration can be found in the how-to section.
Some tested and working providers:
Secure Storage Connector
The Secure Storage Connector is not available for teams at this time for bare metal deployments.
MySQL database
MySQL 8
versions 8.0.28
and above.There are a number of enterprise services that make operating a scalable MySQL database simpler. W&B recommends looking into one of the following solutions:
https://www.percona.com/software/mysql-database/percona-server
https://github.com/mysql/mysql-operator
Satisfy the conditions below if you run W&B Server MySQL 8.0 or when you upgrade from MySQL 5.7 to 8.0:
binlog_format = 'ROW'
innodb_online_alter_log_max_size = 268435456
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
binlog_row_image = 'MINIMAL'
Due to some changes in the way that MySQL 8.0 handles sort_buffer_size
, you might need to update the sort_buffer_size
parameter from its default value of 262144
. The recommendation is to set the value to 67108864
(64MiB) to ensure that MySQL works efficiently with W&B. MySQL supports this configuration starting with v8.0.28.
Database considerations
Create a database and a user with the following SQL query. Replace SOME_PASSWORD
with password of your choice:
CREATE USER 'wandb_local'@'%' IDENTIFIED BY 'SOME_PASSWORD';
CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION;
Parameter group configuration
Ensure that the following parameter groups are set to tune the database performance:
binlog_format = 'ROW'
innodb_online_alter_log_max_size = 268435456
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
binlog_row_image = 'MINIMAL'
sort_buffer_size = 67108864
Object storage
The object store can be externally hosted on a Minio cluster, or any Amazon S3 compatible object store that has support for signed URLs. Run the following script to check if your object store supports signed URLs.
Additionally, the following CORS policy needs to be applied to the object store.
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
<AllowedOrigin>http://YOUR-W&B-SERVER-IP</AllowedOrigin>
<AllowedMethod>GET</AllowedMethod>
<AllowedMethod>PUT</AllowedMethod>
<AllowedMethod>HEAD</AllowedMethod>
<AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
You can specify your credentials in a connection string when you connect to an Amazon S3 compatible object store. For example, you can specify the following:
s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME
You can optionally tell W&B to only connect over TLS if you configure a trusted SSL certificate for your object store. To do so, add the tls
query parameter to the URL. For example, the following URL example demonstrates how to add the TLS query parameter to an Amazon S3 URI:
s3://$ACCESS_KEY:$SECRET_KEY@$HOST/$BUCKET_NAME?tls=true
Set BUCKET_QUEUE
to internal://
if you use third-party object stores. This tells the W&B server to manage all object notifications internally instead of depending on an external SQS queue or equivalent.
The most important things to consider when running your own object store are:
- Storage capacity and performance. It’s fine to use magnetic disks, but you should be monitoring the capacity of these disks. Average W&B usage results in 10’s to 100’s of Gigabytes. Heavy usage could result in Petabytes of storage consumption.
- Fault tolerance. At a minimum, the physical disk storing the objects should be on a RAID array. If you use minio, consider running it in distributed mode.
- Availability. Monitoring should be configured to ensure the storage is available.
There are many enterprise alternatives to running your own object storage service such as:
MinIO set up
If you use minio, you can run the following commands to create a bucket.
mc config host add local http://$MINIO_HOST:$MINIO_PORT "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY" --api s3v4
mc mb --region=us-east1 local/local-files
Deploy W&B Server application to Kubernetes
The recommended installation method is with the official W&B Helm chart. Follow this section to deploy the W&B Server application.
OpenShift
W&B supports operating from within an OpenShift Kubernetes cluster.
Run the container as an un-privileged user
By default, containers use a $UID
of 999. Specify $UID
>= 100000 and a $GID
of 0 if your orchestrator requires the container run with a non-root user.
$GID=0
) for file system permissions to function properly.An example security context for Kubernetes looks similar to the following:
spec:
securityContext:
runAsUser: 100000
runAsGroup: 0
Networking
Load balancer
Run a load balancer that stop network requests at the appropriate network boundary.
Common load balancers include:
Ensure that all machines used to execute machine learning payloads, and the devices used to access the service through web browsers, can communicate to this endpoint.
SSL / TLS
W&B Server does not stop SSL. If your security policies require SSL communication within your trusted networks consider using a tool like Istio and side car containers. The load balancer itself should terminate SSL with a valid certificate. Using self-signed certificates is not supported and will cause a number of challenges for users. If possible using a service like Let’s Encrypt is a great way to provided trusted certificates to your load balancer. Services like Caddy and Cloudflare manage SSL for you.
Example nginx configuration
The following is an example configuration using nginx as a reverse proxy.
events {}
http {
# If we receive X-Forwarded-Proto, pass it through; otherwise, pass along the
# scheme used to connect to this server
map $http_x_forwarded_proto $proxy_x_forwarded_proto {
default $http_x_forwarded_proto;
'' $scheme;
}
# Also, in the above case, force HTTPS
map $http_x_forwarded_proto $sts {
default '';
"https" "max-age=31536000; includeSubDomains";
}
# If we receive X-Forwarded-Host, pass it though; otherwise, pass along $http_host
map $http_x_forwarded_host $proxy_x_forwarded_host {
default $http_x_forwarded_host;
'' $http_host;
}
# If we receive X-Forwarded-Port, pass it through; otherwise, pass along the
# server port the client connected to
map $http_x_forwarded_port $proxy_x_forwarded_port {
default $http_x_forwarded_port;
'' $server_port;
}
# If we receive Upgrade, set Connection to "upgrade"; otherwise, delete any
# Connection header that may have been passed to this server
map $http_upgrade $proxy_connection {
default upgrade;
'' close;
}
server {
listen 443 ssl;
server_name www.example.com;
ssl_certificate www.example.com.crt;
ssl_certificate_key www.example.com.key;
proxy_http_version 1.1;
proxy_buffering off;
proxy_set_header Host $http_host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $proxy_connection;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $proxy_x_forwarded_proto;
proxy_set_header X-Forwarded-Host $proxy_x_forwarded_host;
location / {
proxy_pass http://$YOUR_UPSTREAM_SERVER_IP:8080/;
}
keepalive_timeout 10;
}
}
Verify your installation
Very your W&B Server is configured properly. Run the following commands in your terminal:
pip install wandb
wandb login --host=https://YOUR_DNS_DOMAIN
wandb verify
Check log files to view any errors the W&B Server hits at startup. Run the following commands:
docker logs wandb-local
kubectl get pods
kubectl logs wandb-XXXXX-XXXXX
Contact W&B Support if you encounter errors.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.