This page describes the high-level steps required to set up W&B Launch:
Set up a queue: Queues are FIFO and possess a queue configuration. A queue’s configuration controls where and how jobs are executed on a target resource.
Set up an agent: Agents run on your machine/infrastructure and poll one or more queues for launch jobs. When a job is pulled, the agent ensures that the image is built and available. The agent then submits the job to the target resource.
Set up a queue
Launch queues must be configured to point to a specific target resource along with any additional configuration specific to that resource. For example, a launch queue that points to a Kubernetes cluster might include environment variables or set a custom namespace its launch queue configuration. When you create a queue, you will specify both the target resource you want to use and the configuration for that resource to use.
When an agent receives a job from a queue, it also receives the queue configuration. When the agent submits the job to the target resource, it includes the queue configuration along with any overrides from the job itself. For example, you can use a job configuration to specify the Amazon SageMaker instance type for that job instance only. In this case, it is common to use queue config templates as the end user interface.
Click the create queue button on the top right of the screen.
From the Entity dropdown menu, select the entity the queue will belong to.
Provide a name for your queue in the Queue field.
From the Resource dropdown, select the compute resource you want jobs added to this queue to use.
Choose whether to allow Prioritization for this queue. If prioritization is enabled, a user on your team can define a priority for their launch job when they enqueue them. Higher priority jobs are executed before lower priority jobs.
Provide a resource configuration in either JSON or YAML format in the Configuration field. The structure and semantics of your configuration document will depend on the resource type that the queue is pointing to. For more details, see the dedicated set up page for your target resource.
Set up a launch agent
Launch agents are long running processes that poll one or more launch queues for jobs. Launch agents dequeue jobs in first in, first out (FIFO) order or in priority order depending on the queues they pull from. When an agent dequeues a job from a queue, it optionally builds an image for that job. The agent then submits the job to the target resource along with configuration options specified in the queue configuration.
Agents are highly flexible and can be configured to support a wide variety of use cases. The required configuration for your agent will depend on your specific use case. See the dedicated page for Docker, Amazon SageMaker, Kubernetes, or Vertex AI.
W&B recommends you start agents with a service account’s API key, rather than a specific user’s API key. There are two benefits to using a service account’s API key:
The agent isn’t dependent on an individual user.
The author associated with a run created through Launch is viewed by Launch as the user who submitted the launch job, rather than the user associated with the agent.
Agent configuration
Configure the launch agent with a YAML file named launch-config.yaml. By default, W&B checks for the config file in ~/.config/wandb/launch-config.yaml. You can optionally specify a different directory when you activate the launch agent.
The contents of your launch agent’s configuration file will depend on your launch agent’s environment, the launch queue’s target resource, Docker builder requirements, cloud registry requirements, and so forth.
Independent of your use case, there are core configurable options for the launch agent:
max_jobs: maximum number of jobs the agent can execute in parallel
entity: the entity that the queue belongs to
queues: the name of one or more queues for the agent to watch
You can use the W&B CLI to specify universal configurable options for the launch agent (instead of the config YAML file): maximum number of jobs, W&B entity, and launch queues. See the wandb launch-agent command for more information.
The following YAML snippet shows how to specify core launch agent config keys:
# Max number of concurrent runs to perform. -1 = no limitmax_jobs: -1entity: <entity-name># List of queues to poll.queues:
- <queue-name>
Configure a container builder
The launch agent can be configured to build images. You must configure the agent to use a container builder if you intend to use launch jobs created from git repositories or code artifacts. See the Create a launch job for more information on how to create a launch job.
W&B Launch supports three builder options:
Docker: The Docker builder uses a local Docker daemon to build images.
Kaniko: Kaniko is a Google project that enables image building in environments where a Docker daemon is unavailable.
Noop: The agent will not try to build jobs, and instead only pull pre-built images.
Use the Kaniko builder if your agent is polling in an environment where a Docker daemon is unavailable (for example, a Kubernetes cluster).
To specify an image builder, include the builder key in your agent configuration. For example, the following code snippet shows a portion of the launch config (launch-config.yaml) that specifies to use Docker or Kaniko:
builder:
type: docker | kaniko | noop
Configure a container registry
In some cases, you might want to connect a launch agent to a cloud registry. Common scenarios where you might want to connect a launch agent to a cloud registry include:
You want to run a job in an envirnoment other than where you built it, such as a powerful workstation or cluster.
You want to use the agent to build images and run these images on Amazon SageMaker or VertexAI.
You want the launch agent to provide credentials to pull from an image repository.
To learn more about how to configure the agent to interact with a container registry, see the Advanced agent set up page.
Activate the launch agent
Activate the launch agent with the launch-agent W&B CLI command:
In some use cases, you might want to have a launch agent polling queues from within a Kubernetes cluster. See the Advanced queue set up page for more information.
1 - Configure launch queue
The following page describes how to configure launch queue options.
Set up queue config templates
Administer and manage guardrails on compute consumption with Queue Config Templates. Set defaults, minimums, and maximum values for fields such as memory consumption, GPU, and runtime duration.
After you configure a queue with config templates, members of your team can alter fields you defined only within the specified range you defined.
Configure queue template
You can configure a queue template on an existing queue or create a new queue.
Select View queue next to the name of the queue you want to add a template to.
Select the Config tab. This will show information about your queue such as when the queue was created, the queue config, and existing launch-time overrides.
Navigate to the Queue config section.
Identify the config key-values you want to create a template for.
Replace the value in the config with a template field. Template fields take the form of {{variable-name}}.
Click on the Parse configuration button. When you parse your configuration, W&B will automatically create tiles below the queue config for each template you created.
For each tile generated, you must first specify the data type (string, integer, or float) the queue config can allow. To do this, select the data type from the Type dropdown menu.
Based on your data type, complete the fields that appear within each tile.
Click on Save config.
For example, suppose you want to create a template that limits which AWS instances your team can use. Before you add a template field, your queue config might look something similar to:
Next, you click on Parse configuration. A new tile labeled aws-instance will appear underneath the Queue config.
From there, you select String as the datatype from the Type dropdown. This will populate fields where you can specify values a user can choose from. For example, in the following image the admin of the team configured two different AWS instance types that users can choose from (ml.m4.xlarge and ml.p3.xlarge):
Dynamically configure launch jobs
Queue configs can be dynamically configured using macros that are evaluated when the agent dequeues a job from the queue. You can set the following macros:
Macro
Description
${project_name}
The name of the project the run is being launched to.
${entity_name}
The owner of the project the run being launched to.
${run_id}
The id of the run being launched.
${run_name}
The name of the run that is launching.
${image_uri}
The URI of the container image for this run.
Any custom macro not listed in the preceding table (for example ${MY_ENV_VAR}), is substituted with an environment variable from the agent’s environment.
Use the launch agent to build images that execute on accelerators (GPUs)
You might need to specify an accelerator base image if you use launch to build images that are executed in an accelerator environment.
This accelerator base image must satisfy the following requirements:
Debian compatibility (the Launch Dockerfile uses apt-get to fetch python)
Compatibility CPU & GPU hardware instruction set (Make sure your CUDA version is supported by the GPU you intend on using)
Compatibility between the accelerator version you provide and the packages installed in your ML algorithm
Packages installed that require extra steps for setting up compatibility with hardware
How to use GPUs with TensorFlow
Ensure TensorFlow properly utilizes your GPU. To accomplish this, specify a Docker image and its image tag for the builder.accelerator.base_image key in the queue resource configuration.
For example, the tensorflow/tensorflow:latest-gpu base image ensures TensorFlow properly uses your GPU. This can be configured using the resource configuration in the queue.
The following JSON snippet demonstrates how to specify the TensorFlow base image in your queue config:
The Launch agent can build images using Docker or Kaniko.
Kaniko: builds a container image in Kubernetes without running the build as a privileged container.
Docker: builds a container image by executing a docker build command locally.
The builder type can be controlled by the builder.type key in the launch agent config to either docker, kaniko, or noop to turn off build. By default, the agent helm chart sets the builder.type to noop. Additional keys in the builder section will be used to configure the build process.
If no builder is specified in the agent config and a working docker CLI is found, the agent will default to using Docker. If Docker is not available the agent will default to noop.
Use Kaniko for building images in a Kubernetes cluster. Use Docker for all other cases.
Pushing to a container registry
The launch agent tags all images it builds with a unique source hash. The agent pushes the image to the registry specified in the builder.destination key.
For example, if the builder.destination key is set to my-registry.example.com/my-repository, the agent will tag and push the image to my-registry.example.com/my-repository:<source-hash>. If the image exists in the registry, the build is skipped.
Agent configuration
If you are deploying the agent via our Helm chart, the agent config should be provided in the agentConfig key in the values.yaml file.
If you are invoking the agent yourself with wandb launch-agent, you can provide the agent config as a path to a YAML file with the --config flag. By default, the config will be loaded from ~/.config/wandb/launch-config.yaml.
Within your launch agent config (launch-config.yaml), provide the name of the target resource environment and the container registry for the environment and registry keys, respectively.
The following tabs demonstrates how to configure the launch agent based on your environment and registry.
The AWS environment configuration requires the region key. The region should be the AWS region that the agent runs in.
environment:
type: awsregion: <aws-region>builder:
type: <kaniko|docker># URI of the ECR repository where the agent will store images.# Make sure the region matches what you have configured in your# environment.destination: <account-id>.ecr.<aws-region>.amazonaws.com/<repository-name># If using Kaniko, specify the S3 bucket where the agent will store the# build context.build-context-store: s3://<bucket-name>/<path>
The agent uses boto3 to load the default AWS credentials. See the boto3 documentation for more information on how to configure default AWS credentials.
The Google Cloud environment requires region and project keys. Set region to the region that the agent runs in. Set project to the Google Cloud project that the agent runs in. The agent uses google.auth.default() in Python to load the default credentials.
environment:
type: gcpregion: <gcp-region>project: <gcp-project-id>builder:
type: <kaniko|docker># URI of the Artifact Registry repository and image name where the agent# will store images. Make sure the region and project match what you have# configured in your environment.uri: <region>-docker.pkg.dev/<project-id>/<repository-name>/<image-name># If using Kaniko, specify the GCS bucket where the agent will store the# build context.build-context-store: gs://<bucket-name>/<path>
See the google-auth documentation for more information on how to configure default GCP credentials so they are available to the agent.
The Azure environment does not require any additional keys. When the agent starts, it use azure.identity.DefaultAzureCredential() to load the default Azure credentials.
environment:
type: azurebuilder:
type: <kaniko|docker># URI of the Azure Container Registry repository where the agent will store images.destination: https://<registry-name>.azurecr.io/<repository-name># If using Kaniko, specify the Azure Blob Storage container where the agent# will store the build context.build-context-store: https://<storage-account-name>.blob.core.windows.net/<container-name>
Add the AcrPush role if you use the Kaniko builder.
Storage permissions for Kaniko
The launch agent requires permission to push to cloud storage if the agent uses the Kaniko builder. Kaniko uses a context store outside of the pod running the build job.
The recommended context store for the Kaniko builder on AWS is Amazon S3. The following policy can be used to give the agent access to an S3 bucket:
The Storage Blob Data Contributor role is required in order for the agent to upload build contexts to Azure Blob Storage.
Customizing the Kaniko build
Specify the Kubernetes Job spec that the Kaniko job uses in the builder.kaniko-config key of the agent configuration. For example:
builder:
type: kanikobuild-context-store: <my-build-context-store>destination: <my-image-destination>build-job-name: wandb-image-buildkaniko-config:
spec:
template:
spec:
containers:
- args:
- "--cache=false"# Args must be in the format "key=value"env:
- name: "MY_ENV_VAR"value: "my-env-var-value"
Deploy Launch agent into CoreWeave
Optionally deploy the W&B Launch agent to CoreWeave Cloud infrastructure. CoreWeave is a cloud infrastructure that is purpose built for GPU-accelerated workloads.
For information on how to deploy the Launch agent to CoreWeave, see the CoreWeave documentation.
You will need to create a CoreWeave account in order to deploy the Launch agent into a CoreWeave infrastructure.
3 - Tutorial: Set up W&B Launch on Kubernetes
You can use W&B Launch to push ML workloads to a Kubernetes cluster, giving ML engineers a simple interface right in W&B to use the resources you already manage with Kubernetes.
W&B uses the Kaniko builder to enable the Launch agent to build Docker images in a Kubernetes cluster. To learn more on how to set up Kaniko for the Launch agent, or how to turn off job building and only use prebuilt Docker images, see Advanced agent set up.
To install Helm and apply or upgrade W&B’s Launch agent Helm chart, you need kubectl access to the cluster with sufficient permissions to create, update, and delete Kubernetes resources. Typically, a user with cluster-admin or a custom role with equivalent permissions is required.
In some use cases, you might want to use CustomResource definitions. CustomResource definitions are useful if, for example, you want to perform multi-node distributed training. See the tutorial for using Launch with multi-node jobs using Volcano for an example application. Another use case might be that you want to use W&B Launch with Kubeflow.
The following YAML snippet shows a sample Launch queue config that uses Kubeflow:
Select the Entity you would like to create the queue in.
Provide a name for your queue in the Name field.
Select Kubernetes as the Resource.
Within the Configuration field, provide the Kubernetes Job workflow spec or Custom Resource spec you configured in the previous section.
Configure a Launch agent with Helm
Use the Helm chart provided by W&B to deploy the Launch agent into your Kubernetes cluster. Control the behavior of the launch agent with the values.yamlfile.
Specify the contents that would normally by defined in your launch agent config file (~/.config/wandb/launch-config.yaml) within the launchConfig key in thevalues.yaml file.
For example, suppose you have Launch agent config that enables you to run a Launch agent in EKS that uses the Kaniko Docker image builder:
Within your values.yaml file, this might look like:
agent:
labels: {}
# W&B API key.apiKey: ''# Container image to use for the agent.image: wandb/launch-agent:latest# Image pull policy for agent image.imagePullPolicy: Always# Resources block for the agent spec.resources:
limits:
cpu: 1000mmemory: 1Gi# Namespace to deploy launch agent intonamespace: wandb# W&B api url (Set yours here)baseUrl: https://api.wandb.ai# Additional target namespaces that the launch agent can deploy intoadditionalTargetNamespaces:
- default - wandb# This should be set to the literal contents of your launch agent config.launchConfig: | queues:
- <queue name>
max_jobs: <n concurrent jobs>
environment:
type: aws
region: <aws-region>
registry:
type: ecr
uri: <my-registry-uri>
builder:
type: kaniko
build-context-store: <s3-bucket-uri># The contents of a git credentials file. This will be stored in a k8s secret# and mounted into the agent container. Set this if you want to clone private# repos.gitCreds: |# Annotations for the wandb service account. Useful when setting up workload identity on gcp.serviceAccount:
annotations:
iam.gke.io/gcp-service-account:
azure.workload.identity/client-id:
# Set to access key for azure storage if using kaniko with azure.azureStorageAccessKey: ''
For more information on registries, environments, and required agent permissions see Advanced agent set up.
4 - Tutorial: Set up W&B Launch on SageMaker
You can use W&B Launch to submit launch jobs to Amazon SageMaker to train machine learning models using provided or custom algorithms on the SageMaker platform. SageMaker takes care of spinning up and releasing compute resources, so it can be a good choice for teams without an EKS cluster.
Launch jobs sent to a W&B Launch queue connected to Amazon SageMaker are executed as SageMaker Training Jobs with the CreateTrainingJob API. Use the launch queue configuration to control arguments sent to the CreateTrainingJob API.
Amazon SageMaker uses Docker images to execute training jobs. Images pulled by SageMaker must be stored in the Amazon Elastic Container Registry (ECR). This means that the image you use for training must be stored on ECR.
This guide shows how to execute SageMaker Training Jobs. For information on how to deploy to models for inference on Amazon SageMaker, see this example Launch job.
Prerequisites
Before you get started, ensure you satisfy the following prerequisites:
Decide if you want the Launch agent to build a Docker images
Decide if you want the W&B Launch agent to build a Docker image for you. There are two options you can choose from:
Permit the launch agent build a Docker image, push the image to Amazon ECR, and submit SageMaker Training jobs for you. This option can offer some simplicity to ML Engineers rapidly iterating over training code.
The launch agent uses an existing Docker image that contains your training or inference scripts. This option works well with existing CI systems. If you choose this option, you will need to manually upload your Docker image to your container registry on Amazon ECR.
Set up AWS resources
Ensure you have the following AWS resources configured in your preferred AWS region:
If you want the launch agent to build images, see the Advanced agent set up for additional permissions required.
The kms:CreateGrant permission for SageMaker queues is required only if the associated ResourceConfig has a specified VolumeKmsKeyId and the associated role does not have a policy that permits this action.
Configure launch queue for SageMaker
Next, create a queue in the W&B App that uses SageMaker as its compute resource:
Select the Entity you would like to create the queue in.
Provide a name for your queue in the Name field.
Select SageMaker as the Resource.
Within the Configuration field, provide information about your SageMaker job. By default, W&B will populate a YAML and JSON CreateTrainingJob request body:
For production workloads and for customers who already have an EKS cluster, W&B recommends deploying the Launch agent to the EKS cluster using this Helm chart.
For production workloads without an current EKS cluster, an EC2 instance is a good option. Though the launch agent instance will keep running all the time, the agent doesn’t need more than a t2.micro sized EC2 instance which is relatively affordable.
For experimental or solo use cases, running the Launch agent on your local machine can be a fast way to get started.
Based on your use case, follow the instructions provided in the following tabs to properly configure up your launch agent:
W&B strongly encourages that you use the W&B managed helm chart to install the agent in an EKS cluster.
Navigate to the Amazon EC2 Dashboard and complete the following steps:
Click Launch instance.
Provide a name for the Name field. Optionally add a tag.
From the Instance type, select an instance type for your EC2 container. You do not need more than 1vCPU and 1GiB of memory (for example a t2.micro).
Create a key pair for your organization within the Key pair (login) field. You will use this key pair to connect to your EC2 instance with SSH client at a later step.
Within Network settings, select an appropriate security group for your organization.
Expand Advanced details. For IAM instance profile, select the launch agent IAM role you created above.
Review the Summary field. If correct, select Launch instance.
Navigate to Instances within the left panel of the EC2 Dashboard on AWS. Ensure that the EC2 instance you created is running (see the Instance state column). Once you confirm your EC2 instance is running, navigate to your local machine’s terminal and complete the following:
Select Connect.
Select the SSH client tab and following the instructions outlined to connect to your EC2 instance.
Within your EC2 instance, install the following packages:
Now you can proceed to setting up the Launch agent config.
Use the AWS config files located at ~/.aws/config and ~/.aws/credentials to associate a role with an agent that is polling on a local machine. Provide the IAM role ARN that you created for the launch agent in the previous step.
Note that session tokens have a max length of 1 hour or 3 days depending on the principal they are associated with.
Configure a launch agent
Configure the launch agent with a YAML config file named launch-config.yaml.
By default, W&B will check for the config file in ~/.config/wandb/launch-config.yaml. You can optionally specify a different directory when you activate the launch agent with the -c flag.
The following YAML snippet demonstrates how to specify the core config agent options:
Upload your Docker image that contains your launch job to your Amazon ECR repo. Your Docker image needs to be in your ECR registry before you submit new launch jobs if you are using image-based jobs.
5 - Tutorial: Set up W&B Launch on Vertex AI
You can use W&B Launch to submit jobs for execution as Vertex AI training jobs. With Vertex AI training jobs, you can train machine learning models using either provided, or custom algorithms on the Vertex AI platform. Once a launch job is initiated, Vertex AI manages the underlying infrastructure, scaling, and orchestration.
W&B Launch works with Vertex AI through the CustomJob class in the google-cloud-aiplatform SDK. The parameters of a CustomJob can be controlled with the launch queue configuration. Vertex AI cannot be configured to pull images from a private registry outside of GCP. This means that you must store container images in GCP or in a public registry if you want to use Vertex AI with W&B Launch. See the Vertex AI documentation for more information on making container images accessible to Vertex jobs.
Prerequisites
Create or access a GCP project with the Vertex AI API enabled. See the GCP API Console docs for more information on enabling an API.
Create a GCP Artifact Registry repository to store images you want to execute on Vertex. See the GCP Artifact Registry documentation for more information.
Create a staging GCS bucket for Vertex AI to store its metadata. Note that this bucket must be in the same region as your Vertex AI workloads in order to be used as a staging bucket. The same bucket can be used for staging and build contexts.
Create a service account with the necessary permissions to spin up Vertex AI jobs. See the GCP IAM documentation for more information on assigning permissions to service accounts.
Grant your service account permission to manage Vertex jobs
Permission
Resource Scope
Description
aiplatform.customJobs.create
Specified GCP Project
Allows creation of new machine learning jobs within the project.
aiplatform.customJobs.list
Specified GCP Project
Allows listing of machine learning jobs within the project.
aiplatform.customJobs.get
Specified GCP Project
Allows retrieval of information about specific machine learning jobs within the project.
If you want your Vertex AI workloads to assume the identity of a non-standard service account, refer to the Vertex AI documentation for instructions on service account creation and necessary permissions. The spec.service_account field of the launch queue configuration can be used to select a custom service account for your W&B runs.
Configure a queue for Vertex AI
The queue configuration for Vertex AI resources specify inputs to the CustomJob constructor in the Vertex AI Python SDK, and the run method of the CustomJob. Resource configurations are stored under the spec and run keys:
The spec key contains values for the named arguments of the CustomJob constructor in the Vertex AI Python SDK.
The run key contains values for the named arguments of the run method of the CustomJob class in the Vertex AI Python SDK.
Customizations of the execution environment happens primarily in the spec.worker_pool_specs list. A worker pool spec defines a group of workers that will run your job. The worker spec in the default config asks for a single n1-standard-4 machine with no accelerators. You can change the machine type, accelerator type and count to suit your needs.
For more information on available machine types and accelerator types, see the Vertex AI documentation.
Create a queue
Create a queue in the W&B App that uses Vertex AI as its compute resource:
Select the Entity you would like to create the queue in.
Provide a name for your queue in the Name field.
Select GCP Vertex as the Resource.
Within the Configuration field, provide information about your Vertex AI CustomJob you defined in the previous section. By default, W&B will populate a YAML and JSON request body similar to the following:
After you configure your queue, click on the Create Queue button.
You must at minimum specify:
spec.worker_pool_specs : non-empty list of worker pool specifications.
spec.staging_bucket : GCS bucket to be used for staging Vertex AI assets and metadata.
Some of the Vertex AI docs show worker pool specifications with all keys in camel case,for example, workerPoolSpecs. The Vertex AI Python SDK uses snake case for these keys, for example worker_pool_specs.
Every key in the launch queue configuration should use snake case.
Configure a launch agent
The launch agent is configurable through a config file that is, by default, located at ~/.config/wandb/launch-config.yaml.
If you want the launch agent to build images for you that are executed in Vertex AI, see Advanced agent set up.
Set up agent permissions
There are multiple methods to authenticate as this service account. This can be achieved through Workload Identity, a downloaded service account JSON, environment variables, the Google Cloud Platform command-line tool, or a combination of these methods.
6 - Tutorial: Set up W&B Launch with Docker
The following guide describes how to configure W&B Launch to use Docker on a local machine for both the launch agent environment and for the queue’s target resource.
Using Docker to execute jobs and as the launch agent’s environment on the same local machine is particularly useful if your compute is installed on a machine that does not have a cluster management system (such as Kubernetes).
You can also use Docker queues to run workloads on powerful workstations.
This set up is common for users who perform experiments on their local machine, or that have a remote machine that they SSH in to, to submit launch jobs.
When you use Docker with W&B Launch, W&B will first build an image, and then build and run a container from that image. The image is built with the Docker docker run <image-uri> command. The queue configuration is interpreted as additional arguments that are passed to the docker run command.
Configure a Docker queue
The launch queue configuration (for a Docker target resource) accepts the same options defined in the docker run CLI command.
The agent receives options defined in the queue configuration. The agent then merges the received options with any overrides from the launch job’s configuration to produce a final docker run command that is executed on the target resource (in this case, a local machine).
There are two syntax transformations that take place:
Repeated options are defined in the queue configuration as a list.
Flag options are defined in the queue configuration as a Boolean with the value true.
docker run \
--env MY_ENV_VAR=value \
--env MY_EXISTING_ENV_VAR \
--volume "/mnt/datasets:/mnt/datasets"\
--rm <image-uri> \
--gpus all
Volumes can be specified either as a list of strings, or a single string. Use a list if you specify multiple volumes.
Docker automatically passes environment variables, that are not assigned a value, from the launch agent environment. This means that, if the launch agent has an environment variable MY_EXISTING_ENV_VAR, that environment variable is available in the container. This is useful if you want to use other config keys without publishing them in the queue configuration.
The --gpus flag of the docker run command allows you to specify GPUs that are available to a Docker container. For more information on how to use the gpus flag, see the Docker documentation.
If you build images from a code or artifact-sourced job, you can override the base image used by the agent to include the NVIDIA Container Toolkit.
For example, within your launch queue, you can override the base image to tensorflow/tensorflow:latest-gpu:
Select the Entity you would like to create the queue in.
Enter a name for your queue in the Name field.
Select Docker as the Resource.
Define your Docker queue configuration in the Configuration field.
Click on the Create Queue button to create the queue.
Configure a launch agent on a local machine
Configure the launch agent with a YAML config file named launch-config.yaml. By default, W&B will check for the config file in ~/.config/wandb/launch-config.yaml. You can optionally specify a different directory when you activate the launch agent.
You can use the W&B CLI to specify core configurable options for the launch agent (instead of the config YAML file): maximum number of jobs, W&B entity, and launch queues. See the wandb launch-agent command for more information.
Core agent config options
The following tabs demonstrate how to specify the core config agent options with the W&B CLI and with a YAML config file:
The launch agent on your machine can be configured to build Docker images. By default, these images are stored on your machine’s local image repository. To enable your launch agent to build Docker images, set the builder key in the launch agent config to docker:
builder:
type: docker
If you don’t want the agent to build Docker images, and instead use prebuilt images from a registry, set the builder key in the launch agent config to noop
builder:
type: noop
Container registries
Launch uses external container registeries such as Dockerhub, Google Container Registry, Azure Container Registry, and Amazon ECR.
If you want to run a job on a different environment from where you built it, configure your agent to be able to pull from a container registry.
To learn more about how connect the launch agent with a cloud registry, see the Advanced agent setup page.