Skip to main content

Advanced agent set up

How you configure the launch agent depends on numerous factors. One of those factors is whether or not the launch agent builds an image for you.

tip

The W&B launch agent builds an image for you if you provide a Git repository based or artifact based jobs.

In the simplest use case, you provide an image-based launch job that is executed in a launch queue target environment that has access to your image repository.

The following section describe the requirements you must satisfy if you use the launch agent to build images for you.

Builders

Launch agents can build images from W&B artifacts and Git repository sourced jobs. This means that ML engineers can rapidly iterate over code without needing to rebuild Docker images themselves. To allow this builder behavior, the launch agent config file (launch-config.yaml) must have a builder option specified. W&B Launch supports two builders, Kaniko and Docker, along with a noop option that will tell the agent to only use prebuilt images.

  • Kaniko: Use Kaniko when the agent polls launch queues in a Kubernetes cluster
  • Docker: Use Docker for all other cases in which you want to build images automatically.
  • Noop: Use when you only want to use prebuilt images. (Both Kaniko and Docker builders can use prebuilt images or build new ones.)

Docker

W&B recommends that you use the Docker builder if you want the agent to build images on a local machine (that has Docker installed). Specify the Docker builder in the launch agent config with the builder key.

For example, the following YAML snippet shows how to specify this in a launch agent config file (launch-config.yaml):

launch-config.yaml
builder:
type: docker

Kaniko

To use the Kaniko builder, you must specify a container registry and environment option.

For example, the following YAML snippet shows how to specify Kaniko in a launch agent config file (launch-config.yaml):

launch-config.yaml
builder:
type: kaniko
build-context-store: s3://my-bucket/build-contexts/
build-job-name: wandb-image-build # Kubernetes job name prefix for all builds

If you run a Kubernetes cluster other than using AKS, EKS, or GKE, you need to create a Kubernetes secret that contains the credentials for your cloud environment.

Within your agent configuration file, and within the builder section, set the secret-name and secret-key keys to let Kaniko use the secrets:

launch-config.yaml
builder:
type: kaniko
build-context-store: <my-build-context-store>
secret-name: <Kubernetes-secret-name>
secret-key: <secret-file-name>
note

The Kaniko builder requires permissions to put data into cloud storage (such as Amazon S3) see the Agent permissions section for more information.

You can specify the Kubernetes Job spec that the Kaniko job uses in the kaniko-config key. For example:

launch-config.yaml
builder:
type: kaniko
build-context-store: <my-build-context-store>
build-job-name: wandb-image-build
kaniko-config:
spec:
template:
spec:
containers:
- args:
- "--cache=false" # Args must be in the format "key=value"
env:
- name: "MY_ENV_VAR"
value: "my-env-var-value"

Connect an agent to a container registry

You can connect the launch agent to a container registry such Amazon Elastic Container Registry (Amazon ECR), Google Artifact Registry on GCP, or Azure Container Registry. The following describes common use cases as to why you might want to connect the launch agent to a cloud container registry:

  • you do not want to store images you are building on your local machine
  • you want to share images across multiple machines
  • if the agent builds an image for you and you use a cloud compute resource such as Amazon SageMaker or VertexAI.

To connect the launch agent to a container registry, provide additional information about the environment and registry you want to use in the launch agent config. In addition, grant the agent permissions within the environment to interact with required components based on your use case.

note

Launch agents support pulling from any container registry the nodes the job is running on have access to, including private Dockerhub, JFrog, Quay, etc. Pushing images to registries is currently only supported for ECR, ACR, and GCR.

Agent configuration

Within your launch agent config (launch-config.yaml), provide the name of the target resource environment and the container registry for the environment and registry keys, respectively.

The following tabs demonstrates how to configure the launch agent based on your environment and registry.

The AWS environment configuration requires the region key. The region should be the AWS region that the agent runs in. The agent uses boto3 to load the default AWS credentials.

launch-config.yaml
environment:
type: aws
region: <aws-region>
registry:
type: ecr
# URI of the ECR repository where the agent will store images.
# Make sure the region matches what you have configured in your
# environment.
uri: <account-id>.ecr.<aws-region>.amazonaws.com/<repository-name>
# Alternatively, you can simply set the repository name
# repository: my-repository-name

See the boto3 documentation for more information on how to configure default AWS credentials.

Agent permissions

The agent permissions required depend on your use case. The policies outlined below are used by launch agents.

Cloud registry permissions

Below are the permissions that are generally required by launch agents to interact with cloud registries.

{
'Version': '2012-10-17',
'Statement':
[
{
'Effect': 'Allow',
'Action':
[
'ecr:CreateRepository',
'ecr:UploadLayerPart',
'ecr:PutImage',
'ecr:CompleteLayerUpload',
'ecr:InitiateLayerUpload',
'ecr:DescribeRepositories',
'ecr:DescribeImages',
'ecr:BatchCheckLayerAvailability',
'ecr:BatchDeleteImage',
],
'Resource': 'arn:aws:ecr:<region>:<account-id>:repository/<repository>',
},
{
'Effect': 'Allow',
'Action': 'ecr:GetAuthorizationToken',
'Resource': '*',
},
],
}

Kaniko permissions

The launch agent requires permission to push to cloud storage if the agent uses the Kaniko builder. Kaniko uses a context store outside of the pod running the build job.

The recommended context store for the Kaniko builder on AWS is Amazon S3. The following policy can be used to give the agent access to an S3 bucket:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": ["arn:aws:s3:::<BUCKET-NAME>"]
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": ["arn:aws:s3:::<BUCKET-NAME>/*"]
}
]
}

[Optional] Deploy Launch agent into CoreWeave

Optionally deploy the W&B Launch agent to CoreWeave Cloud infrastructure. CoreWeave is a cloud infrastructure that is purpose built for GPU-accelerated workloads.

For information on how to deploy the Launch agent to CoreWeave, see the CoreWeave documentation.

note

You will need to create a CoreWeave account in order to deploy the Launch agent into a CoreWeave infrastructure.

Was this page helpful?👍👎