Advanced agent setup
This guide provides information on how to set up the W&B Launch agent to build container images in different environments.
Build is only required for git and code artifact jobs. Image jobs do not require build.
See Create a launch job for more information on job types.
Builders
The Launch agent can build images using Docker or Kaniko.
- Kaniko: builds a container image in Kubernetes without running the build as a privileged container.
- Docker: builds a container image by executing a
docker build
command locally.
The builder type can be controlled by the builder.type
key in the launch agent config to either docker
, kaniko
, or noop
to turn off build. By default, the agent helm chart sets the builder.type
to noop
. Additional keys in the builder
section will be used to configure the build process.
If no builder is specified in the agent config and a working docker
CLI is found, the agent will default to using Docker. If Docker is not available the agent will default to noop
.
Use Kaniko for building images in a Kubernetes cluster. Use Docker for all other cases.
Pushing to a container registry
The launch agent tags all images it builds with a unique source hash. The agent pushes the image to the registry specified in the builder.destination
key.
For example, if the builder.destination
key is set to my-registry.example.com/my-repository
, the agent will tag and push the image to my-registry.example.com/my-repository:<source-hash>
. If the image exists in the registry, the build is skipped.
Agent configuration
If you are deploying the agent via our Helm chart, the agent config should be provided in the agentConfig
key in the values.yaml
file.
If you are invoking the agent yourself with wandb launch-agent
, you can provide the agent config as a path to a YAML file with the --config
flag. By default, the config will be loaded from ~/.config/wandb/launch-config.yaml
.
Within your launch agent config (launch-config.yaml
), provide the name of the target resource environment and the container registry for the environment
and registry
keys, respectively.
The following tabs demonstrates how to configure the launch agent based on your environment and registry.
- Amazon Web Services
- Google Cloud
- Azure
The AWS environment configuration requires the region key. The region should be the AWS region that the agent runs in.
environment:
type: aws
region: <aws-region>
builder:
type: <kaniko|docker>
# URI of the ECR repository where the agent will store images.
# Make sure the region matches what you have configured in your
# environment.
destination: <account-id>.ecr.<aws-region>.amazonaws.com/<repository-name>
# If using Kaniko, specify the S3 bucket where the agent will store the
# build context.
build-context-store: s3://<bucket-name>/<path>
The agent uses boto3 to load the default AWS credentials. See the boto3 documentation for more information on how to configure default AWS credentials.
The Google Cloud environment requires region and project keys. Set region
to the region that the agent runs in. Set project
to the Google Cloud project that the agent runs in. The agent uses google.auth.default()
in Python to load the default credentials.
environment:
type: gcp
region: <gcp-region>
project: <gcp-project-id>
builder:
type: <kaniko|docker>
# URI of the Artifact Registry repository and image name where the agent
# will store images. Make sure the region and project match what you have
# configured in your environment.
uri: <region>-docker.pkg.dev/<project-id>/<repository-name>/<image-name>
# If using Kaniko, specify the GCS bucket where the agent will store the
# build context.
build-context-store: gs://<bucket-name>/<path>
See the google-auth
documentation for more information on how to configure default GCP credentials so they are available to the agent.
The Azure environment does not require any additional keys. When the agent starts, it use azure.identity.DefaultAzureCredential()
to load the default Azure credentials.
environment:
type: azure
builder:
type: <kaniko|docker>
# URI of the Azure Container Registry repository where the agent will store images.
destination: https://<registry-name>.azurecr.io/<repository-name>
# If using Kaniko, specify the Azure Blob Storage container where the agent
# will store the build context.
build-context-store: https://<storage-account-name>.blob.core.windows.net/<container-name>
See the azure-identity
documentation for more information on how to configure default Azure credentials.
Agent permissions
The agent permissions required vary by use case.
Cloud registry permissions
Below are the permissions that are generally required by launch agents to interact with cloud registries.
- Amazon Web Services
- Google Cloud
- Azure
{
'Version': '2012-10-17',
'Statement':
[
{
'Effect': 'Allow',
'Action':
[
'ecr:CreateRepository',
'ecr:UploadLayerPart',
'ecr:PutImage',
'ecr:CompleteLayerUpload',
'ecr:InitiateLayerUpload',
'ecr:DescribeRepositories',
'ecr:DescribeImages',
'ecr:BatchCheckLayerAvailability',
'ecr:BatchDeleteImage',
],
'Resource': 'arn:aws:ecr:<region>:<account-id>:repository/<repository>',
},
{
'Effect': 'Allow',
'Action': 'ecr:GetAuthorizationToken',
'Resource': '*',
},
],
}
artifactregistry.dockerimages.list;
artifactregistry.repositories.downloadArtifacts;
artifactregistry.repositories.list;
artifactregistry.repositories.uploadArtifacts;
Add the AcrPush
role if you use the Kaniko builder.
Storage permissions for Kaniko
The launch agent requires permission to push to cloud storage if the agent uses the Kaniko builder. Kaniko uses a context store outside of the pod running the build job.
- Amazon Web Services
- Google Cloud
- Azure
The recommended context store for the Kaniko builder on AWS is Amazon S3. The following policy can be used to give the agent access to an S3 bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::<BUCKET-NAME>"]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": ["arn:aws:s3:::<BUCKET-NAME>/*"]
}
]
}
On GCP, the following IAM permissions are required for the agent to upload build contexts to GCS:
storage.buckets.get;
storage.objects.create;
storage.objects.delete;
storage.objects.get;
The Storage Blob Data Contributor role is required in order for the agent to upload build contexts to Azure Blob Storage.
Customizing the Kaniko build
Specify the Kubernetes Job spec that the Kaniko job uses in the builder.kaniko-config
key of the agent configuration. For example:
builder:
type: kaniko
build-context-store: <my-build-context-store>
destination: <my-image-destination>
build-job-name: wandb-image-build
kaniko-config:
spec:
template:
spec:
containers:
- args:
- "--cache=false" # Args must be in the format "key=value"
env:
- name: "MY_ENV_VAR"
value: "my-env-var-value"
Deploy Launch agent into CoreWeave
Optionally deploy the W&B Launch agent to CoreWeave Cloud infrastructure. CoreWeave is a cloud infrastructure that is purpose built for GPU-accelerated workloads.
For information on how to deploy the Launch agent to CoreWeave, see the CoreWeave documentation.
You will need to create a CoreWeave account in order to deploy the Launch agent into a CoreWeave infrastructure.