1 - Are there best practices for using Launch effectively?

  1. Create the queue before starting the agent to enable easy configuration. Failure to do this results in errors that prevent the agent from functioning until a queue is added.

  2. Create a W&B service account to initiate the agent, ensuring it is not linked to an individual user account.

  3. Use wandb.config to manage hyperparameters, allowing for overwriting during job re-runs. Refer to this guide for details on using argparse.

2 - Can I specify a Dockerfile and let W&B build a Docker image for me?

This feature suits projects with stable requirements but frequently changing codebases.

After configuring the Dockerfile, specify it in one of three ways to W&B:

  • Use Dockerfile.wandb
  • Use W&B CLI
  • Use W&B App

Include a Dockerfile.wandb file in the same directory as the W&B run’s entrypoint. W&B utilizes this file instead of the built-in Dockerfile.

Use the --dockerfile flag with the wandb launch command to queue a job:

wandb launch --dockerfile path/to/Dockerfile

When adding a job to a queue in the W&B App, provide the Dockerfile path in the Overrides section. Enter it as a key-value pair with "dockerfile" as the key and the path to the Dockerfile as the value.

The following JSON demonstrates how to include a Dockerfile in a local directory:

{
  "args": [],
  "run_config": {
    "lr": 0,
    "batch_size": 0,
    "epochs": 0
  },
  "entrypoint": [],
  "dockerfile": "./Dockerfile"
}

3 - Can Launch automatically provision (and spin down) compute resources for me in the target environment?

This process depends on the environment. Resources provision in Amazon SageMaker and Vertex. In Kubernetes, autoscalers automatically adjust resources based on demand. Solution Architects at W&B assist in configuring Kubernetes infrastructure to enable retries, autoscaling, and the use of spot instance node pools. For support, contact support@wandb.com or use your shared Slack channel.

4 - Can you specify secrets for jobs/automations? For instance, an API key which you do not wish to be directly visible to users?

Yes. Follow these steps:

  1. Create a Kubernetes secret in the designated namespace for the runs using the command:
    kubectl create secret -n <namespace> generic <secret_name> <secret_value>

  2. After creating the secret, configure the queue to inject the secret when runs start. Only cluster administrators can view the secret; end users cannot see it.

5 - Does Launch support parallelization? How can I limit the resources consumed by a job?

Launch supports scaling jobs across multiple GPUs and nodes. Refer to this guide for details.

Each launch agent is configured with a max_jobs parameter, which determines the maximum number of simultaneous jobs it can run. Multiple agents can point to a single queue as long as they connect to an appropriate launching infrastructure.

You can set limits on CPU, GPU, memory, and other resources at the queue or job run level in the resource configuration. For information on setting up queues with resource limits on Kubernetes, refer to this guide.

For sweeps, include the following block in the queue configuration to limit the number of concurrent runs:

  scheduler:
    num_workers: 4

6 - How can admins restrict which users have modify access?

Control access to certain queue fields for users who are not team administrators through queue config templates. Team administrators define which fields non-admin users can view, and set the editing limits. Only team administrators have the ability to create or edit queues.

7 - How do I control who can push to a queue?

Queues are specific to a user team. Define the owning entity during queue creation. To restrict access, modify team membership.

8 - How do I fix a "permission denied" error in Launch?

If you encounter the error message Launch Error: Permission denied, it indicates insufficient permissions to log to the desired project. Possible causes include:

  1. You are not logged in on this machine. Run wandb login in the command line.
  2. The specified entity does not exist. The entity must be your username or an existing team’s name. Create a team if necessary with the Subscriptions page.
  3. You lack project permissions. Request the project creator to change the privacy setting to Open to allow logging runs to the project.

9 - How do I make W&B Launch work with Tensorflow on GPU?

For TensorFlow jobs using GPUs, specify a custom base image for the container build. This ensures proper GPU utilization during runs. Add an image tag under the builder.accelerator.base_image key in the resource configuration. For example:

{
    "gpus": "all",
    "builder": {
        "accelerator": {
            "base_image": "tensorflow/tensorflow:latest-gpu"
        }
    }
}

In versions prior to W&B 0.15.6, use cuda instead of accelerator as the parent key for base_image.

10 - How does W&B Launch build images?

The steps for building an image depend on the job source and the specified accelerator base image in the resource configuration.

The build process includes the following actions based on the job type and provided accelerator base image:

| | Install Python using apt | Install Python packages | Create a user and workdir | Copy code into image | Set entrypoint | |

11 - I do not like clicking- can I use Launch without going through the UI?

Yes. The standard wandb CLI includes a launch subcommand to launch jobs. For more information, run:

wandb launch --help

12 - I do not want W&B to build a container for me, can I still use Launch?

To launch a pre-built Docker image, execute the following command. Replace the placeholders in the <> with your specific information:

wandb launch -d <docker-image-uri> -q <queue-name> -E <entrypoint>

This command creates a job and starts a run.

To create a job from an image, use the following command:

wandb job create image <image-name> -p <project> -e <entity>

13 - Is `wandb launch -d` or `wandb job create image` uploading a whole docker artifact and not pulling from a registry?

No, the wandb launch -d command does not upload images to a registry. Upload images to a registry separately. Follow these steps:

  1. Build an image.
  2. Push the image to a registry.

The workflow is as follows:

docker build -t <repo-url>:<tag> .
docker push <repo-url>:<tag>
wandb launch -d <repo-url>:<tag>

The launch agent then spins up a job pointing to the specified container. See Advanced agent setup for examples on configuring agent access to pull images from a container registry.

For Kubernetes, ensure that the Kubernetes cluster pods have access to the registry where the image is pushed.

14 - What permissions does the agent require in Kubernetes?

The following Kubernetes manifest creates a role named wandb-launch-agent in the wandb namespace. This role allows the agent to create pods, configmaps, secrets, and access pod logs in the wandb namespace. The wandb-cluster-role enables the agent to create pods, access pod logs, create secrets, jobs, and check job status across any specified namespace.

15 - What requirements does the accelerator base image have?

For jobs utilizing an accelerator, provide a base image that includes the necessary accelerator components. Ensure the following requirements for the accelerator image:

  • Compatibility with Debian (the Launch Dockerfile uses apt-get to install Python)
  • Supported CPU and GPU hardware instruction set (confirm the CUDA version compatibility with the intended GPU)
  • Compatibility between the supplied accelerator version and the packages in the machine learning algorithm
  • Installation of packages that require additional steps for hardware compatibility

16 - When multiple jobs in a Docker queue download the same artifact, is any caching used, or is it re-downloaded every run?

No caching exists. Each launch job operates independently. Configure the queue or agent to mount a shared cache using Docker arguments in the queue configuration.

Additionally, mount the W&B artifacts cache as a persistent volume for specific use cases.