# W&B Tutorials & Blog Source: https://docs.wandb.ai/blog # W&B Courses Source: https://docs.wandb.ai/courses # Build AI agents, applications, and models Source: https://docs.wandb.ai/index Develop AI models and ship LLM applications with the W&B platform for experiment tracking, evaluation, and observability. ## Weights & Biases

Develop AI models and ship LLM applications with the W\&B platform for experiment tracking, evaluation, and observability.

Manage AI model development with experiment tracking, fine-tuning, reporting, hyperparameter sweeps, and a model registry for versioning and reproducibility. Manage AI models in your code with tracing, output evaluation, cost estimates, and a playground to compare large language models (LLMs) and settings.
## Powered by CoreWeave

Managed services that run on CoreWeave infrastructure.

Access leading open-source foundation models through an OpenAI-compatible API, with usage tracking and Weave integration for tracing and evaluation. Now in public preview. Post-train and fine-tune LLMs with Serverless RL and Serverless SFT, managed GPU infrastructure, ART and RULER integration, and auto-scaling for multi-turn agentic tasks. Now in private preview by invitation. Run code in isolated compute environments with lifecycle management, secrets handling, file access, and a Python SDK.
# W&B Models Source: https://docs.wandb.ai/models Use W&B Models for experiment tracking, dataset versioning, model management, and collaborative ML development. W\&B Models is the system of record for ML Practitioners who want to organize their models, boost productivity and collaboration, and deliver production ML at scale. W&B Models architecture diagram With W\&B Models, you can: * Track and visualize all [ML experiments](/models/track/). * Optimize and fine-tune models at scale with [hyperparameter sweeps](/models/sweeps/). * [Maintain a centralized hub of all models](/models/registry/), with a seamless handoff point to devops and deployment * Configure custom automations that trigger key workflows for [model CI/CD](/models/automations/). Machine learning practitioners rely on W\&B Models as their ML system of record to track and visualize experiments, manage model versions and lineage, and optimize hyperparameters. # Get Started with W&B Models Source: https://docs.wandb.ai/models/models_quickstart Get started with W&B Models by tracking experiments, logging metrics, and visualizing results in a few lines of code. Learn when and how to use W\&B to track, share, and manage model artifacts in your machine learning workflows. This page covers logging experiments, generating reports, and accessing logged data using the appropriate W\&B API for each task. This tutorial uses the following: * [W\&B Python SDK](/models/ref/python) (`wandb.sdk`): to log and monitor experiments during training. * [W\&B Public API](/models/ref/python/public-api) (`wandb.apis.public`): to query and analyze logged experiment data. * [W\&B Reports and Workspaces API](/models/ref/wandb_workspaces) (`wandb.wandb-workspaces`): to create a report to summarize findings. ## Sign up and create an API key To authenticate your machine with W\&B, you must first generate an API key at [wandb.ai/settings](https://wandb.ai/settings). Copy the API key and store it securely. ## Install and import packages Install the W\&B library and some other packages you will need for this walkthrough. ```python theme={null} pip install wandb ``` Import W\&B Python SDK: ```python theme={null} import wandb ``` Specify the entity of your team in the following code block: ```python theme={null} TEAM_ENTITY = "" # Replace with your team entity PROJECT = "my-awesome-project" ``` ## Train a model The following code simulates a basic machine learning workflow: training a model, logging metrics, and saving the model as an artifact. Use the W\&B Python SDK (`wandb.sdk`) to interact with W\&B during training. Log the loss using [`wandb.Run.log()`](/models/ref/python/experiments/run/#method-runlog), then save the trained model as an artifact using [`wandb.Artifact`](/models/ref/python/experiments/artifact) before finally adding the model file using [`Artifact.add_file`](/models/ref/python/experiments/artifact#add_file). ```python theme={null} import random # For simulating data def model(training_data: int) -> int: """Model simulation for demonstration purposes.""" return training_data * 2 + random.randint(-1, 1) # Simulate weights and noise weights = random.random() # Initialize random weights noise = random.random() / 5 # Small random noise to simulate noise # Hyperparameters and configuration config = { "epochs": 10, # Number of epochs to train "learning_rate": 0.01, # Learning rate for the optimizer } # Use context manager to initialize and close W&B runs with wandb.init(project=PROJECT, entity=TEAM_ENTITY, config=config) as run: # Simulate training loop for epoch in range(config["epochs"]): xb = weights + noise # Simulated input training data yb = weights + noise * 2 # Simulated target output (double the input noise) y_pred = model(xb) # Model prediction loss = (yb - y_pred) ** 2 # Mean Squared Error loss print(f"epoch={epoch}, loss={loss}") # Log epoch and loss to W&B run.log({ "epoch": epoch, "loss": loss, }) # Unique name for the model artifact, model_artifact_name = f"model-demo" # Local path to save the simulated model file PATH = "model.txt" # Save model locally with open(PATH, "w") as f: f.write(str(weights)) # Saving model weights to a file # Create an artifact object # Add locally saved model to artifact object artifact = wandb.Artifact(name=model_artifact_name, type="model", description="My trained model") artifact.add_file(local_path=PATH) artifact.save() ``` The key takeaways from the previous code block are: * Use `wandb.Run.log()` to log metrics during training. * Use `wandb.Artifact` to save models (datasets, and so forth) as an artifact to your W\&B project. Now that you have trained a model and saved it as an artifact, you can publish it to a registry in W\&B. Use [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run/#method-runuse_artifact) to retrieve the artifact from your project and prepare it for publication in the Model registry. `wandb.Run.use_artifact()` serves two key purposes: * Retrieves the artifact object from your project. * Marks the artifact as an input to the run, ensuring reproducibility and traceability. See [Create and view lineage map](/models/registry/lineage) for details. ## View the training data in the dashboard Log in to your account at [https://wandb.ai/login](https://wandb.ai/login) Under **Projects** you should see `my-awesome-project` (or whatever you used as a project name above). Click this to enter the workspace for your project. From here, you can see details about every run you've done. In this screenshot, the code was re-run several times, generating a number of runs, each of which is given a randomly-generated name. W&B project page with multiple runs in a table view, including run names, metrics, and status information ## Publish the model to the W\&B Registry To share the model with others in your organization, publish it to a [collection](/models/registry/create_collection) using `wandb.Run.link_artifact()`. The following code links the artifact to a [registry](/models/registry), making it accessible to your team. ```python theme={null} # Artifact name specifies the specific artifact version within our team's project artifact_name = f'{TEAM_ENTITY}/{PROJECT}/{model_artifact_name}:v0' print("Artifact name: ", artifact_name) REGISTRY_NAME = "Model" # Name of the registry in W&B COLLECTION_NAME = "DemoModels" # Name of the collection in the registry # Create a target path for our artifact in the registry target_path = f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}" print("Target path: ", target_path) with wandb.init(entity=TEAM_ENTITY, project=PROJECT) as run: model_artifact = run.use_artifact(artifact_or_name=artifact_name, type="model") run.link_artifact(artifact=model_artifact, target_path=target_path) ``` After running `wandb.Run.link_artifact()`, the model artifact will be in the `DemoModels` collection in your registry. From there, you can view details such as the version history, [lineage map](/models/registry/lineage), and other [metadata](/models/registry/registry_cards). For additional information on how to link artifacts to a registry, see [Link artifacts to a registry](/models/registry/link_version). ## Retrieve model artifact from registry for inference To use a model for inference, use `wandb.Run.use_artifact()` to retrieve the published artifact from the registry. This returns an artifact object that you can then use [`wandb.Artifact.download()`](/models/ref/python/experiments/artifact/#method-artifactdownload) to download the artifact to a local file. ```python theme={null} REGISTRY_NAME = "Model" # Name of the registry in W&B COLLECTION_NAME = "DemoModels" # Name of the collection in the registry VERSION = 0 # Version of the artifact to retrieve model_artifact_name = f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}" print(f"Model artifact name: {model_artifact_name}") with wandb.init(entity=TEAM_ENTITY, project=PROJECT) as run: registry_model = run.use_artifact(artifact_or_name=model_artifact_name) local_model_path = registry_model.download() ``` For more information on how to retrieve artifacts from a registry, see [Download an artifact from a registry](/models/registry/download_use_artifact). Depending on your machine learning framework, you may need to recreate the model architecture before loading the weights. This is left as an exercise for the reader, as it depends on the specific framework and model you are using. ## Share your finds with a report W\&B Report and Workspace API is in Public Preview. Create and share a [report](/models/reports) to summarize your work. To create a report programmatically, use the [W\&B Report and Workspace API](/models/ref/wandb_workspaces/reports). First, install the W\&B Reports API: ```python theme={null} pip install wandb wandb-workspaces -qqq ``` The following code block creates a report with multiple blocks, including markdown, panel grids, and more. You can customize the report by adding more blocks or changing the content of existing blocks. The output of the code block prints a link to the URL report created. You can open this link in your browser to view the report. ```python theme={null} import wandb_workspaces.reports.v2 as wr experiment_summary = """This is a summary of the experiment conducted to train a simple model using W&B.""" dataset_info = """The dataset used for training consists of synthetic data generated by a simple model.""" model_info = """The model is a simple linear regression model that predicts output based on input data with some noise.""" report = wr.Report( project=PROJECT, entity=TEAM_ENTITY, title="My Awesome Model Training Report", description=experiment_summary, blocks= [ wr.TableOfContents(), wr.H2("Experiment Summary"), wr.MarkdownBlock(text=experiment_summary), wr.H2("Dataset Information"), wr.MarkdownBlock(text=dataset_info), wr.H2("Model Information"), wr.MarkdownBlock(text = model_info), wr.PanelGrid( panels=[ wr.LinePlot(title="Train Loss", x="Step", y=["loss"], title_x="Step", title_y="Loss") ], ), ] ) # Save the report to W&B report.save() ``` For more information on how to create a report programmatically or how to create a report interactively with the W\&B App, see [Create a report](/models/reports/create-a-report) in the W\&B Docs Developer guide. ## Query the registry Use the [W\&B Public APIs](/models/ref/python/public-api) to query, analyze, and manage historical data from W\&B. This can be useful for tracking the lineage of artifacts, comparing different versions, and analyzing the performance of models over time. The following code block demonstrates how to query the Model registry for all artifacts in a specific collection. It retrieves the collection and iterates through its versions, printing out the name and version of each artifact. ```python theme={null} import wandb # Initialize wandb API api = wandb.Api() # Find all artifact versions that contains the string `model` and # has either the tag `text-classification` or an `latest` alias registry_filters = { "name": {"$regex": "model"} } # Use logical $or operator to filter artifact versions version_filters = { "$or": [ {"tag": "text-classification"}, {"alias": "latest"} ] } # Returns an iterable of all artifact versions that match the filters artifacts = api.registries(filter=registry_filters).collections().versions(filter=version_filters) # Print out the name, collection, aliases, tags, and created_at date of each artifact found for art in artifacts: print(f"artifact name: {art.name}") print(f"collection artifact belongs to: { art.collection.name}") print(f"artifact aliases: {art.aliases}") print(f"tags attached to artifact: {art.tags}") print(f"artifact created at: {art.created_at}\n") ``` For more information on querying the registry, see the [Query registry items](/models/registry/search_registry/#query-registry-items-with-mongodb-style-queries). # W&B Quickstart Source: https://docs.wandb.ai/models/quickstart Install W&B and start tracking, visualizing, and managing machine learning experiments in minutes. Install W\&B to track, visualize, and manage machine learning experiments of any size. Are you looking for information on W\&B Weave? See the [Weave Python SDK quickstart](/weave/quickstart) or [Weave TypeScript SDK quickstart](/weave/reference/generated_typescript_docs/intro-notebook). ## Sign up and create an API key To authenticate your machine with W\&B, you need an API key. To create an API key, select the **Personal API key** or **Service Account API key** tab for details. To create a personal API key owned by your user ID: 1. Log in to W\&B, click your user profile icon, then click **User Settings**. 2. Click **Create new API key**. 3. Provide a descriptive name for your API key. 4. Click **Create**. 5. Copy the displayed API key immediately and store it securely. To create an API key owned by a service account: 1. Navigate to the **Service Accounts** tab in your team or organization settings. 2. Find the service account in the list. 3. Click the **action ()** menu, then click **Create API key**. 4. Provide a name for the API key, then click **Create**. 5. Copy the displayed API key immediately and store it securely. 6. Click **Done**. You can create multiple API keys for a single service account to support different environments or workflows. The full API key is only shown once at creation time. After you close the dialog, you cannot view the full API key again. Only the key ID (first part of the key) is visible in your settings. If you lose the full API key, you must create a new API key. For secure storage options, see [Store API keys securely](/platform/app/settings-page/user-settings/#store-and-handle-api-keys-securely). This quickstart is also available as a Colab notebook:
## Install the `wandb` library and log in 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/). ```bash theme={null} export WANDB_API_KEY= ``` 2. Install the `wandb` library and log in. ```shell theme={null} pip install wandb wandb login ``` ```bash theme={null} pip install wandb ``` ```python theme={null} import wandb wandb.login() ``` ```notebook theme={null} !pip install wandb import wandb wandb.login() ``` ## Initialize a run and track hyperparameters In your Python script or notebook, initialize a W\&B run object with [`wandb.init()`](/models/ref/python/experiments/run/). Use a dictionary for the `config` parameter to specify hyperparameter names and values. Within the `with` statement, you can log metrics and other information to W\&B. ```python theme={null} import wandb wandb.login() # Project that the run is recorded to project = "my-awesome-project" # Dictionary with hyperparameters config = { 'epochs' : 10, 'lr' : 0.01 } with wandb.init(project=project, config=config) as run: # Training code here # Log values to W&B with run.log() run.log({"accuracy": 0.9, "loss": 0.1}) ``` See the next section for a complete example that simulates a training run and logs accuracy and loss metrics to W\&B. A [run](/models/runs/) is a core element of W\&B. You use runs to [track metrics](/models/track/), [create logs](/models/track/log/), track artifacts, and more. ## Create a machine learning training experiment This mock training script logs simulated accuracy and loss metrics to W\&B. Copy and paste the following code into a Python script or notebook cell and run it: ```python theme={null} import wandb import random wandb.login() # Project that the run is recorded to project = "my-awesome-project" # Dictionary with hyperparameters config = { 'epochs' : 10, 'lr' : 0.01 } with wandb.init(project=project, config=config) as run: offset = random.random() / 5 print(f"lr: {config['lr']}") # Simulate a training run for epoch in range(2, config['epochs']): acc = 1 - 2**-config['epochs'] - random.random() / config['epochs'] - offset loss = 2**-config['epochs'] + random.random() / config['epochs'] + offset print(f"epoch={config['epochs']}, accuracy={acc}, loss={loss}") run.log({"accuracy": acc, "loss": loss}) ``` Visit [wandb.ai/home](https://wandb.ai/home) to view recorded metrics such as accuracy and loss and how they changed during each training step. The following image shows the loss and accuracy tracked from each run. Each run object appears in the **Runs** column with generated names. Shows loss and accuracy tracked from each run. ## Next steps Explore more features of the W\&B ecosystem: 1. Read the [W\&B Integration tutorials](/models/integrations) that combine W\&B with frameworks like PyTorch, libraries like Hugging Face, and services like SageMaker. 2. Organize runs, automate visualizations, summarize findings, and share updates with collaborators using [W\&B Reports](/models/reports). 3. Create [W\&B Artifacts](/models/artifacts) to track datasets, models, dependencies, and results throughout your machine learning pipeline. 4. Automate hyperparameter searches and optimize models with [W\&B Sweeps](/models/sweeps). 5. Analyze runs, visualize model predictions, and share insights on a [central dashboard](/models/tables). 6. Visit [W\&B AI Academy](https://wandb.ai/site/courses/) to learn about LLMs, MLOps, and W\&B Models through hands-on courses. 7. Visit [weave-docs.wandb.ai](/weave) to learn how to track, experiment with, evaluate, deploy, and improve your LLM-based applications using Weave. # Overview Source: https://docs.wandb.ai/models/runs Learn about the basic building block of W&B, Runs. A *run* is a single unit of computation logged by W\&B. You can think of a W\&B Run as an atomic element of your whole project. In other words, each run is a record of a specific computation, such as training a model and logging the results, hyperparameter sweeps, and so forth. Common use cases for initializing and logging to a run include: * Training a model and [recording metrics](/models/ref/python/experiments/run#method-run-log) such as accuracy and loss * Conducting [hyperparameter tuning](/models/sweeps/) and running new experiments * Conducting a new machine learning experiment with a different model * Tracking and saving datasets and models as [W\&B Artifacts](/models/artifacts/) * [Downloading and using](/models/artifacts/download-and-use-an-artifact/) datasets or models used by other members of your team as W\&B Artifacts To initialize a W\&B run, call the [`wandb.init()`](/models/ref/python/functions/init) method from the W\&B Python SDK. This starts a new run and returns a `wandb.Run` object that you can use to log metrics, artifacts, and other information to the run. For more information about initializing a run, see [Initialize runs](/models/runs/initialize-run). Each run object has a [unique identifier known as a *run ID*](/models/runs/run-identifiers#unique-run-identifiers). [You can specify a unique ID](/models/runs/run-identifiers#unique-run-identifiers) or let [W\&B randomly generate one for you](/models/runs/run-identifiers#autogenerated-run-ids). Each run object also has a human-readable, non-unique [run name](/models/runs/run-identifiers#run-name). You can specify a name for your run or let W\&B randomly generate one for you. You can rename a run after initializing it. W\&B logs your run to a [*project*](/models/track/project-page/). You specify the project when you initialize the run with `wandb.init(project="")`. W\&B creates a new project if the project does not exist. If the project does exist, W\&B logs the run to the project you specified. If you do not specify a project name, W\&B stores the run in a project called `Uncategorized`. `wandb.init()` returns a `wandb.Run` object that contains properties of the run, such as its ID, name, configuration, and state. Use the run object to log metrics, artifacts, and other information to the run with methods such as `wandb.Run.log()`, `wandb.Run.log_code()`, and `wandb.Run.use_artifact()`. Each run has a state that describes the current status of the run. See [Run states](/models/runs/run-states) for a full list of possible run states. [View runs and their properties](/models/runs/view-logged-runs) within the run's project workspace on the W\&B App. You can also programmatically access run properties with the [`wandb.Api.Run`](/models/ref/python/experiments/run) object. As an example, consider the following code snippet that initializes a W\&B run and logs some metrics to it: Pass your W\&B entity to the `entity` variable in the code snippets below if you want to follow along. Your entity is your W\&B username or team name. You can find it in the URL of your W\&B App workspace. For example, if your workspace URL is `https://wandb.ai/nico/awesome-project`, then your entity is `nico`. ```python theme={null} import wandb entity = "nico" # Replace with your W&B entity project = "awesome-project" with wandb.init(entity=entity, project=project) as run: run.log({"accuracy": 0.9, "loss": 0.1}) ``` The first line imports the W\&B Python SDK. The second line initializes a run in the project `awesome-project` under the entity `nico`. The third line logs the accuracy and loss of the model to that run. Within the terminal, W\&B returns: ```bash theme={null} wandb: Syncing run earnest-sunset-1 wandb: ⭐️ View project at https://wandb.ai/nico/awesome-project wandb: 🚀 View run at https://wandb.ai/nico/awesome-project/runs/1jx1ud12 wandb: wandb: wandb: Run history: wandb: accuracy ▁ wandb: loss ▁ wandb: wandb: Run summary: wandb: accuracy 0.9 wandb: loss 0.5 wandb: wandb: 🚀 View run earnest-sunset-1 at: https://wandb.ai/nico/awesome-project/runs/1jx1ud12 wandb: ⭐️ View project at: https://wandb.ai/nico/awesome-project wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20241105_111006-1jx1ud12/logs ``` W\&B returns two URLs in the terminal output. The first URL directs you to the [specific run's workspace](/models/runs/view-logged-runs), and the second URL directs you to the [project](/models/track/project-page) page. Single run workspace Logging a metric at a single point of time might not be that useful. A more realistic example in the case of training discriminative models is to log metrics at regular intervals. For example, consider the following code snippet: ```python theme={null} import wandb import random config = { "epochs": 10, "learning_rate": 0.01, } with wandb.init(project="awesome-project", config=config) as run: print(f"lr: {config['learning_rate']}") # Simulating a training run for epoch in range(config['epochs']): offset = random.random() / 5 acc = 1 - 2**-epoch - random.random() / (epoch + 1) - offset loss = 2**-epoch + random.random() / (epoch + 1) + offset print(f"epoch={epoch}, accuracy={acc}, loss={loss}") run.log({"accuracy": acc, "loss": loss}) ``` The training script calls `wandb.Run.log()` 10 times. Each time the script calls `wandb.Run.log()`, W\&B logs the accuracy and loss for that epoch. Within your terminal, you should see output similar to the following: ```bash theme={null} wandb: Syncing run jolly-haze-4 wandb: ⭐️ View project at https://wandb.ai/nico/awesome-project wandb: 🚀 View run at https://wandb.ai/nico/awesome-project/runs/pdo5110r lr: 0.01 epoch=0, accuracy=-0.10070974957523078, loss=1.985328507123956 epoch=1, accuracy=0.2884687745057535, loss=0.7374362314407752 epoch=2, accuracy=0.7347387967382066, loss=0.4402409835486663 epoch=3, accuracy=0.7667969248039795, loss=0.26176963846423457 epoch=4, accuracy=0.7446848791003173, loss=0.24808611724405083 epoch=5, accuracy=0.8035095836268268, loss=0.16169791827329466 epoch=6, accuracy=0.861349032371624, loss=0.03432578493587426 epoch=7, accuracy=0.8794926436276016, loss=0.10331872172219471 epoch=8, accuracy=0.9424839917077272, loss=0.07767793473500445 epoch=9, accuracy=0.9584880427028566, loss=0.10531971149250456 wandb: 🚀 View run jolly-haze-4 at: https://wandb.ai/nico/awesome-project/runs/pdo5110r wandb: Find logs at: wandb/run-20241105_111816-pdo5110r/logs ``` W\&B captures the simulated training loop within a single run called `jolly-haze-4`. This is because the script calls `wandb.init()` method only once. Copy and paste the URL that W\&B prints from the previous output into your browser. The URL directs you to the run's workspace in the W\&B App UI. For example, the following image shows the workspace for the run `jolly-haze-4`: Training run with logged metrics # Send an alert Source: https://docs.wandb.ai/models/runs/alert Send alerts, triggered from your Python code, to your Slack or email Create alerts with Slack or email if your run crashes or with a custom trigger. For example, you can create an alert if the gradient of your training loop starts to blow up (reports NaN) or a step in your ML pipeline completes. Alerts apply to all projects where you initialize runs, including both personal and team projects. And then see W\&B Alerts messages in Slack (or your email): Slack alert setup W\&B Alerts require you to add `run.alert()` to your code. Without modifying your code, [Automations](/models/automations/) provide another way to notify Slack based on an event in W\&B, such as when an [artifact](/models/artifacts/) artifact version is created or when a [run metric](/models/runs/) meets or changes by a threshold. For example, an automation can notify a Slack channel when a new version is created, run an automated testing webhook when the `production` alias is added to an artifact, or start a validation job only when a run's `loss` is within acceptable bounds. Read the [Automations overview](/models/automations/) or [create an automation](/models/automations/create-automations/). The following guide only applies to alerts in Multi-tenant Cloud. If you're using [W\&B Server](/platform/hosting/) in your Private Cloud or on W\&B Dedicated Cloud, refer to [Configure Slack alerts in W\&B Server](/platform/hosting/monitoring-usage/slack-alerts) to set up Slack alerts. To set up an alert, take these steps, which are detailed in the following sections: 1. Turn on Alerts in your W\&B [User Settings](https://wandb.ai/settings). 2. Add `run.alert()` to your code. 3. Test the configuration. ### 1. Turn on alerts in your W\&B User Settings In your [User Settings](https://wandb.ai/settings): * Scroll to the **Alerts** section * Turn on **Scriptable run alerts** to receive alerts from `run.alert()` * Use **Connect Slack** to pick a Slack channel to post alerts. We recommend the **Slackbot** channel because it keeps the alerts private. * **Email** will go to the email address you used when you signed up for W\&B. We recommend setting up a filter in your email so all these alerts go into a folder and don't fill up your inbox. You will only have to do this the first time you set up W\&B Alerts, or when you'd like to modify how you receive alerts. Alerts settings in W&B User Settings ### 2. Add `run.alert()` to your code Add `run.alert()` to your code (either in a Notebook or Python script) wherever you'd like it to be triggered ```python theme={null} import wandb with wandb.init() as run: run.alert(title="High Loss", text="Loss is increasing rapidly") ``` ### 3. Test the configuration Check your Slack or emails for the alert message. If you didn't receive any, make sure you've got emails or Slack turned on for **Scriptable Alerts** in your [User Settings](https://wandb.ai/settings) ## Example This simple alert sends a warning when accuracy falls below a threshold. In this example, it only sends alerts at least 5 minutes apart. ```python theme={null} import wandb from wandb import AlertLevel with wandb.init() as run: if acc < threshold: run.alert( title="Low accuracy", text=f"Accuracy {acc} is below the acceptable threshold {threshold}", level=AlertLevel.WARN, wait_duration=300, ) ``` ## Tag or mention users Use the at sign `@` followed by the Slack user ID to tag yourself or your colleagues in either the title or the text of the alert. You can find a Slack user ID from their Slack profile page. ```python theme={null} run.alert(title="Loss is NaN", text=f"Hey <@U1234ABCD> loss has gone to NaN") ``` ## Configure team alerts Team admins can set up alerts for the team on the team settings page: `wandb.ai/teams/your-team`. Team alerts apply to everyone on your team. W\&B recommends using the **Slackbot** channel because it keeps alerts private. ## Change Slack channel to send alerts to To change what channel alerts are sent to, click **Disconnect Slack** and then reconnect. After you reconnect, pick a different Slack channel. # Semantic run plot legends Source: https://docs.wandb.ai/models/runs/color-code-runs Color-code W&B runs based on metrics or config parameters to create visually meaningful chart legends. Create visually meaningful line plots and plot legends by color-coding your W\&B runs based on metrics or configuration parameters. Identify patterns and trends across experiments by coloring runs according to their performance metrics (highest, lowest, or latest values). W\&B automatically groups your runs into color-coded buckets based on the values of your selected parameter. To use metric or configuration-based colors for your runs, you need to configure two settings: ### Turn on key-based colors 1. Navigate to your W\&B project. 2. Select the **Workspace** tab from the project sidebar. 3. Click on the **Settings** icon in the top right corner. 4. From the drawer, select **Runs**. 5. In the **Run colors** section, select **Key-based colors**. 6. Configure the following options: * From the **Key** dropdown, select the metric you want to use for assigning colors to runs. * From the **Y value** dropdown, select the y value you want to use for assigning colors to runs. * Set the number of buckets to a value from 2 to 8. When you use key-based colors, the option to [customize run colors](/models/runs/run-colors) is not available. The following sections describe how to set the metric and y value and as how to customize the buckets used for assigning colors to runs. ### Example: Key-based coloring with loss metric In this example plot, runs are colored with a gradient where darker colors represent higher loss values and lighter colors represent lower loss values. The Y value is set to `latest` to use the most recent loss value for each run. W&B workspace showing runs colored based on their loss values using key-based coloring. ## Set a metric The metric options in your **Key** dropdown are derived from the key-value pairs [you log to W\&B](/models/runs/color-code-runs/#custom-metrics) and [default metrics](/models/runs/color-code-runs/#default-metrics) defined by W\&B. ### Default metrics * `Relative Time (Process)`: The relative time of the run, measured in seconds since the start of the run. * `Relative Time (Wall)`: The relative time of the run, measured in seconds since the start of the run, adjusted for wall clock time. * `Wall Time`: The wall clock time of the run, measured in seconds since the epoch. * `Step`: The step number of the run, which is typically used to track the progress of training or evaluation. ### Custom metrics Color runs and create meaningful plot legends based on custom metrics logged by your training or evaluation scripts. Custom metrics are logged as key-value pairs, where the key is the name of the metric and the value is the metric value. For example, the following code snippet logs accuracy (`"acc"` key) and loss (`"loss"` key) during a training loop: ```python theme={null} import wandb import random epochs = 10 with wandb.init(project="basic-intro") as run: # Block simulates a training loop logging metrics offset = random.random() / 5 for epoch in range(2, epochs): acc = 1 - 2 ** -epoch - random.random() / epoch - offset loss = 2 ** -epoch + random.random() / epoch + offset # Log metrics from your script to W&B run.log({"acc": acc, "loss": loss}) ``` Within the **Key** dropdown, both `"acc"` and `"loss"` are available options. ## Set a configuration key The configuration options in your **Key** dropdown are derived from the key-value pairs you pass to the `config` parameter when you initialize a W\&B run. Configuration keys are typically used to log hyperparameters or other settings used in your training or evaluation scripts. ```python theme={null} import wandb config = { "learning_rate": 0.01, "batch_size": 32, "optimizer": "adam" } with wandb.init(project="basic-intro", config=config) as run: # Your training code here pass ``` Within the **Key** dropdown, `"learning_rate"`, `"batch_size"`, and `"optimizer"` are available options. ## Set a y value You can choose from the following options: * **Latest**: Determine color based on Y value at last logged step for each line. * **Max**: Color based on highest Y value logged against the metric. * **Min**: Color based on lowest Y value logged against the metric. ## Customize buckets Buckets are ranges of values that W\&B uses to categorize runs based on the metric or configuration key you select. Buckets are evenly distributed across the range of values for the specified metric or configuration key and each bucket is assigned a unique color. Runs that fall within that bucket's range are displayed in that color. Consider the following: Color coded runs * **Key** is set to `"Accuracy"` (abbreviated as `"acc"`). * **Y value** is set to `"Max"` With this configuration, W\&B colors each run based on their accuracy values. The colors vary from a light yellow color to a deep color. Lighter colors represent lower accuracy values, while deeper colors represent higher accuracy values. Six buckets are defined for the metric, with each bucket representing a range of accuracy values. Within the **Buckets** section, the following range of buckets are defined: * Bucket 1: (Min - 0.7629) * Bucket 2: (0.7629 - 0.7824) * Bucket 3: (0.7824 - 0.8019) * Bucket 4: (0.8019 - 0.8214) * Bucket 5: (0.8214 - 0.8409) * Bucket 6: (0.8409 - Max) In the line plot below, the run with the highest accuracy (0.8232) is colored in a deep purple (Bucket 5), while the run with the lowest accuracy (0.7684) is colored in a light orange (Bucket 2). The other runs are colored based on their accuracy values, with the color gradient indicating their relative performance. Color coded runs plot # Pin and compare runs Source: https://docs.wandb.ai/models/runs/compare-runs Learn how to use pinned and baseline runs to keep track of important runs and efficiently evaluate model experiments. Use the W\&B App to organize, identify, and compare important runs, such as top performers, production models, failed experiments, and reference runs. To accomplish this, you can organize and compare runs with: * **Pinned runs**: Pin runs from any project in your workspace to keep them visible at the top of the runs list. You can pin up to 20 runs in a workspace, including runs from other projects. * **Baseline run**: Specify a baseline run as your reference point for comparisons. The baseline run is always visible in the workspace and at the top of the runs list. In the runs table, summary metric deltas show how each run compares to the baseline. In line plots, the baseline appears with visually distinct styling to help with comparison. Line plot with baseline and pinned runs These features are particularly useful for: * Comparing new experiments against your production model. * [Comparing runs across projects](/models/runs/compare-runs#compare-runs-across-projects). * [Creating a baseline for your experiments to evaluate how new runs perform against it.](/models/runs/compare-runs#manage-and-compare-a-baseline-run) * Tracking multiple candidate models during experimentation. * Evaluating whether new runs improve on your best results. See [Limitations](#limitations). ## Pin runs Pin runs to keep them easily accessible at the top of your workspace. To hide a pinned or baseline run, click the icon. To show a hidden run, click the icon. Pinned runs appear at the top of the run selector with a circular pin icon, separated from other runs by a visual divider. To pin a run: 1. Navigate to your workspace. 2. In the run selector or runs table, find the run you want to pin. 3. Click the **action ()** menu, then select **Pin run**. You can pin up to 20 runs in a workspace. If you have a baseline run, you can pin up to 19 runs because the baseline is implicitly pinned. Runs table with pinned runs To unpin a run, click the pin icon, or follow the procedure to pin the run, but select **Unpin run** instead. Runs that you pin to a project only impact your [personal or saved workspace view](/models/track/workspaces#saved-workspace-views), including [runs pinned from another project](/models/runs/compare-runs#compare-runs-across-projects). ## Compare runs across projects Compare runs from different projects by selecting runs from another project and [pinning](/models/runs/compare-runs#pin-runs) them to your current workspace. 1. Navigate to your project's workspace. 2. Click on the **Select runs from another project** button (small square box with rounded corners and a diagonal arrow) at the top of your workspace's run selector or the runs table. Runs table with pinned runs from another project 3. In the modal, select a project from the **Source project** dropdown. 4. Search the source project’s runs to find the run you want to compare. You can search by run name or unique ID. 5. Select the checkbox next to each run you want to compare, then click **Pin runs**. View runs that you pin from another project in the run selector or runs table. Pinned runs from other projects have an open circle icon next to their name. To see which project a pinned run comes from, hover over the run name. Artifacts, logs, and other details for pinned runs link back to the original run. W\&B does not import this data into the current project or workspace. To view the details for a run pinned from another project, click the run. W\&B opens the original run details in a new browser tab. To compare runs from different projects, pin the runs you want to compare, then use [line plots](/models/app/features/panels/line-plot) to visually compare them. ## Manage and compare a baseline run You can designate one run as the baseline for the workspace to use it as a reference point for evaluating other runs in your workspace. In the runs selector and runs table, the baseline run appears at the top alongside pinned runs, and has a bookmark icon instead of a pin. In line plots, lines for the baseline run appear bolder than other lines. When hovering over the plot or legend, the baseline run's line is dashed. Demo of comparing another run with the baseline ### Set a baseline run To set a baseline run: 1. Navigate to your workspace. 2. In the run selector or runs table, find the run you want to use as your baseline. 3. Click the **action ()** menu, then select **Set as baseline**. The baseline run appears at the top of the run selector, separated from other runs by a visual divider. The baseline run has a bookmark icon instead of a circle. Runs table with a baseline run and pinned runs ### Change the baseline run Only one run can be the baseline at a time. To change which run is your baseline: 1. Navigate to your workspace. 2. In the run selector or runs table, find the run you want to use as your new baseline. 3. Click the **action ()** menu, then select **Replace baseline**. If the menu item is inactive, ensure that you have at least one pinning slot available. If necessary, unpin a pinned run by clicking the circular pin icon next to a pinned run. 4. The new run becomes the baseline, and the previous baseline is automatically pinned so you can find it easily. Optionally, unpin it by clicking its pin icon. ### Remove the baseline designation To remove the baseline designation: 1. Navigate to your workspace. 2. In the run selector or runs table, find the current baseline run. 3. Click the **action ()** menu, then select **Remove baseline**. If the menu item is inactive, ensure that you have at least one pinning slot available. If necessary, unpin a pinned run by clicking the circular pin icon next to a pinned run. 4. The previous baseline is automatically pinned so you can find it easily. Optionally, unpin it by clicking its pin icon. ### Compare runs to the baseline The baseline run is always visible in line plots for metrics the run has logged. In line plots, lines for the baseline run appear bolder than other lines. * Hover over a part of the plot to display a tooltip with values for all visible runs, including the baseline run and pinned runs. Demo showing details for all visible runs at a given point * Hover over the baseline run's legend label to display the line prominently. It appears as a heavy dashed line. Lines for other visible runs appear with reduced saturation. Demo showing details for the baseline run * Hover over another run's legend label to display that run's line prominently and compare it with the baseline, which appears as a heavy dashed line. Lines for other visible runs appear with reduced saturation. Demo of comparing another run with the baseline ### Summary metric deltas When a run is set as the baseline, by default every other run that logs the same summary metric as the baseline run shows the delta (amount of change) of that metric from the baseline. The delta appears to the right of the metric's value in the run's row in the runs table. By default, the delta is shown with dark gray text on a dark gray background. To turn on semantic coloring for quick visual reference, you can set the **Metric directionality** for a column. With directionality set: * If the other run **outperforms** (is directionally better than) the baseline, the delta is shown in dark red text with a light red background. * If the other run **underperforms** (is directionally better than) the baseline, the delta is shown in dark teal text with a light teal background. To set the directionality for a metric: 1. In the runs table, hover over the column heading for the metric. 2. Click the **action ()** menu that appears. 3. Set **Metric directionality** to **Higher values are best** or **Lower values are best**. The following screenshot shows how the runs `nanochat-train-base` and `nanochat-train-mid` compare with the baseline run `nanochat-train`. Delta metrics are shown for `TOTAL_TRAINING_TIME`, `TRAIN/DT`, AND `TRAIN/GRAD_NORM`. Screenshot comparing summary metric deltas from the baseline run ## Hide summary metric deltas in a workspace By default, a workspace with a baseline run always displays summary metric deltas. To hide them for a workspace: 1. In the workspace, click **Settings**. 2. In the drawer that appears, click **Runs**. 3. In the **Baseline** tab, toggle **Show value deltas in the runs table**. 4. Close the workspace settings drawer. ## Use cases This section describes some scenarios where pinned and baseline runs can help guide your experiments. * **Track production models**: Ensure that new models meet your quality bar before deployment. 1. Set your production model as the baseline. 2. Compare all experiments against your deployed model to identify candidates that outperform production. * **Compare hyperparameter experiments**: Evaluate hyperparameter sweeps or manual experiments against your best-known configuration. 1. Set your best known configuration as the baseline. 2. Pin promising candidates as you discover them. 3. Use the line plots to visually compare runs against the baseline. 4. Continue to update the baseline as you find better configurations. ## Example workflow This section illustrates how pinned and baseline runs can help you to compare runs. 1. Run this example code, which simulates a hyperparameter-tuning scenario with a series of runs. Replace placeholders surrounded with angle brackets (`<>`) with your own values. ```python theme={null} import wandb import random import math def train_model(learning_rate, batch_size, run_name, tags=None): """Simulate training a model with given hyperparameters.""" config = { "learning_rate": learning_rate, "batch_size": batch_size, "optimizer": "adam", "architecture": "resnet50" } with wandb.init( # Replace with your team and project name project="hyperparameter-tuning", entity="", name=run_name, config=config, tags=tags or [] ) as run: # Simulate training loop for epoch in range(50): # Simulated metrics accuracy = 0.6 + 0.3 * (1 - math.exp(-learning_rate * epoch / 10)) loss = 1.0 * math.exp(-learning_rate * epoch / 10) run.log({ "epoch": epoch, "accuracy": accuracy, "loss": loss }) # Create baseline run with standard configuration train_model( learning_rate=0.001, batch_size=64, run_name="baseline-config", tags=["baseline", "production"] ) # Experiment with different learning rates train_model( learning_rate=0.003, batch_size=64, run_name="lr-experiment-0.003", tags=["experiment"] ) train_model( learning_rate=0.0001, batch_size=64, run_name="lr-experiment-0.0001", tags=["experiment"] ) ``` After running this code, your workspace has three runs. 2. Set `baseline-config` as your baseline run. 3. Pin `baseline-config` to keep it visible. 4. Compare the experiment runs against the baseline. * In the runs table, review the summary metric deltas next to each run's values to compare the run to the baseline. * In line plots, compare the performance of one or more runs to the baseline, which is always visible. 5. Pin promising experiments for further investigation. In this example, after 50 epochs, `lr-experiment-0.003` has the highest accuracy (`~0.64`) and the lowest loss (`~0.86`). ## Limitations The following features are not yet supported for pinned and baseline runs: * **Grouping**: When [viewing runs](/models/runs#view-logged-runs) in the run selector or runs table, if runs are grouped by a column, pinned and baseline runs are not visually distinct from other runs. * **Reports**: In a run set in a [W\&B Report](/models/reports), pinned and baseline runs are not visually distinct from other runs. * **Workspace view only**: The baseline does not appear when viewing a single run's workspace. * **Line plots only**: Baseline comparison is available only for line plots, and is not yet available for other panels such as bar plots or media panels. # View runs in a project Source: https://docs.wandb.ai/models/runs/customize-run-display Details about customizing how runs are displayed in your project's runs table View all runs logged to your W\&B project in the **Runs** tab of your project sidebar. Within the Runs tab is the *Runs table*. The Runs table shows details about all of your runs in a project. Use the Runs table to compare runs, sort runs by specific columns, and organize runs into groups. The following image shows the Runs table for a project named `deep-drive`: Runs table ## Manage columns The following sections describe how to customize the Runs table. ### Add columns Add columns in the Runs table to customize which properties associated with your project are visible. To add a columns in the Runs table: 1. In the project sidebar, select the **Runs** tab. 2. Above the list of runs, click the **Columns** (six horizontal dashes) button. 3. Select the name of a property within the **Hidden** section of the modal. 4. Drag columns to change their order. 5. Click **Close**. ### Remove columns Remove columns in the Runs table to customize which properties associated with your project are visible. To remove columns in the Runs table: 1. In the project sidebar, select the **Runs** tab. 2. Above the list of runs, click the **Columns** (six horizontal dashes) button. 3. Select the name of a property within the **Visible & Pinned** section of the modal. 4. Click **Close**. ### Move columns To move columns in the Runs table: 1. In the project sidebar, select the **Runs** tab. 2. Drag a column to the left or right. ### Pin columns Pinned columns are shown on the left-hand side. Unpinned columns are shown on the right-hand side of the Runs tab. If you pin a column in the **Runs** tab, it is also pinned in the **Workspace** tab. Similarly, if you pin a column in the **Workspace** tab, it is also pinned in the **Runs** tab. To pin a column: 1. In the project sidebar, navigate to the **Runs** tab. 2. Click the **Columns** (six horizontal dashes) button. 3. Within the **Visible & Pinned** section of the modal, click on the pin icon next to the column name you want to pin. To unpin a column: 1. In the project sidebar, navigate to the **Runs** tab. 2. Either hover over the column name, then click its **action ()** menu, or click the **Columns** (six horizontal dashes) button and click on the pin icon next to the column name you want to unpin. 3. Click **Unpin column**. W\&B persists columns you pin or unpin in the Runs table in the Runs selector of the Workspace tab. ### Hide columns To hide columns in the Runs table: 1. In the project sidebar, select the **Runs** tab. 2. Hover over the column name, click the **action ()** menu that appears. 3. Click **Hide column**. To view all columns that are currently hidden, click **Columns**. ## Sort runs by column Sort runs by any visible column in the Runs table. This is particularly useful if you want to view the best (or worst) recorded value. To sort the list of runs by any visible column: 1. Hover over the column name, then click its **action ()** menu. 2. Optinally hover over **Show latest**. From the dropdown, select **latest**, **min**, or **max**. 3. Click **Sort ascending** or **Sort descending**. The following animation shows sorting runs by the maximum value of a logged metric: Sort by min/max values W\&B persists the sort order you select in the Runs table in the Runs selector of the Workspace tab. ## Export runs table to CSV Export the table of all your runs, hyperparameters, and summary metrics to a CSV with the download button. 1. In the project sidebar, select the **Runs** tab. 2. Above the list of runs, click the **Download** (downward arrow) button. # Delete runs Source: https://docs.wandb.ai/models/runs/delete-runs Delete runs from a W&B project using the W&B App or the Public API, and learn how deleted run data is removed from storage. ## Delete runs Delete runs from a project with the W\&B App or the Python API. 1. Navigate to the project that contains the runs you want to delete. 2. Select the **Runs** tab. 3. Select the checkbox next to the runs you want to delete. 4. Choose the **Delete** button (trash can icon) above the table. 5. From the drawer that appears, choose **Delete**. For projects that contain a large number of runs, you can use either the search bar to filter runs you want to delete using Regex or the filter button to filter runs based on their status, tags, or other properties. You can delete runs programmatically with [`Run.delete()`](/models/ref/python/public-api/run#method-run-delete). Set `delete_artifacts=True` if you also want to remove artifacts associated with the run. ```python theme={null} import wandb api = wandb.Api() runs = api.runs("/") for run in runs: if run.state == "finished": # Replace with your own condition run.delete(delete_artifacts=False) ``` For the full method signature and behavior, see the [`Run.delete` reference](/models/ref/python/public-api/run#method-run-delete). To remove individual files attached to a run, like logged media: 1. Obtain the relevant file handles with [`Run.files()`](/models/ref/python/public-api/run#method-run-files). 2. Use [`File.delete()`](/models/ref/python/public-api/file#method-file-delete) to delete individual files. A run ID cannot be reused, even after the run is deleted. Instead, the run will fail with an error. When you delete a run and choose to delete associated artifacts, the artifacts are permanently removed and can't be recovered, even if the run is restored later. This includes artifacts linked to the Registry. ## Run deletion flowchart The following diagram illustrates the complete run deletion process, including the handling of associated artifacts and Registry links: ```mermaid theme={null} graph TB Start([User Initiates
Run Deletion]) --> RunSelect[Select Runs
to Delete] RunSelect --> DeletePrompt{Delete Associated
Artifacts?} DeletePrompt -->|No| DeleteRunOnly[Delete Run Only

- Run metadata removed
- Artifacts remain available
- Can still access artifacts] DeletePrompt -->|Yes| CheckArtifacts[Check for
Associated Artifacts] CheckArtifacts --> HasRegistry{Artifacts Linked to
Model Registry?} HasRegistry -->|Yes| RegistryWarning[⚠️ Warning

Registry models will be deleted
Production aliases affected] HasRegistry -->|No| DirectDelete RegistryWarning --> ConfirmRegistry{Confirm Registry
Model Deletion?} ConfirmRegistry -->|No| DeleteRunOnly ConfirmRegistry -->|Yes| DirectDelete[Delete Run + Artifacts

- Run metadata removed
- Artifacts permanently deleted
- Registry links removed
- Cannot be recovered] DeleteRunOnly --> PartialEnd([Run Deleted
Artifacts Preserved]) DirectDelete --> FullEnd([Run + Artifacts
Permanently Deleted]) style Start fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000 style DeletePrompt fill:#fff3e0,stroke:#333,stroke-width:2px,color:#000 style RegistryWarning fill:#ffecb3,stroke:#333,stroke-width:2px,color:#000 style DirectDelete fill:#ffebee,stroke:#333,stroke-width:2px,color:#000 style DeleteRunOnly fill:#e8f5e9,stroke:#333,stroke-width:2px,color:#000 style PartialEnd fill:#c8e6c9,stroke:#333,stroke-width:2px,color:#000 style FullEnd fill:#ffcdd2,stroke:#333,stroke-width:2px,color:#000 ``` ## When deleted run data is removed from storage On [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) and [W\&B Self-Managed](/platform/hosting/hosting-options/self-managed), the `GORILLA_DATA_RETENTION_PERIOD` environment variable controls how long **deleted run data** is retained before it can be permanently removed from object storage. **Artifacts are not removed by this setting**; they follow the artifact deletion and garbage collection flow described in [Delete an artifact](/models/artifacts/delete-artifacts). Setting or changing `GORILLA_DATA_RETENTION_PERIOD` is irreversible for data past the retention window. Back up your database and bucket before enabling or tightening retention. See [Configure environment variables](/platform/hosting/env-vars) for the reference table and warnings. Even after run or file deletion and retention processing, **bucket usage can lag** while background jobs catch up. W\&B does not guarantee immediate reclamation of object storage. For a full overview of artifacts versus run data, timing expectations, and optional operator actions, see [Manage bucket storage and costs](/platform/hosting/managing-bucket-storage). If deletions do not appear as expected in the W\&B App when using the Public API, upgrade the W\&B Python SDK to a current release and retry. # Filter runs Source: https://docs.wandb.ai/models/runs/filter-runs Learn how to filter runs in the Runs table using the expression editor. Filter runs based on their name, state, [tags](#filter-runs-with-tags), or other properties with the expression editor in the Runs table. When you add a filter, you first choose a field (for example, tags, timestamp, or entity). Each field has an underlying type, such as text, time, or ID. The list of operators you see (for example, is, in, ≥, within last) depends on this type. After you choose a field, the UI only shows operators that are valid for that field's type. ## Common operators by type | Filter type | Example fields | Common operators | Example usage | | ----------- | ------------------- | ---------------------- | ---------------------------------- | | Tags | `tags` | is, is not, in, not in | `tags is "baseline"` | | Time | `created timestamp` | ≤, ≥, within last | `created timestamp` ≥ `01/16/2026` | | String | `state` | =, ≠, IN, NOT IN | `state = "finished"` | The above table shows only a subset of available fields and operators. The expression editor shows all available fields and operators. ## Create a filter expression 1. Navigate to the **Runs** tab from the project sidebar. 2. Select the **Filter** button, which looks like a funnel, above the runs table. 3. From left to right, select a column name, a logical operator, and a filter value to create a filter expression. The filter is applied as soon as you select a filter value. 4. Optionally select **New Filter** to apply additional AND or OR conditions. 5. Optionally select **New group** to group filters together with parentheses. This allows you to create complex filter expressions such as [A AND (B OR C)](#example-combine-filters-with-and-and-or-conditions). 6. Close the filter expression editor by clicking the **x** icon in the top right corner. The following image filters runs based on loss values less than or equal to `1`: Incorrect predictions filter The following sections show some examples of how to filter runs in the Runs table. ### Example: Filter runs with tags Filter runs based on their tags: 1. Click on the **Runs** tab from the project sidebar. 2. Select the **Filter** button, which looks like a funnel, above the runs table. 3. From left to right, select `"Tags"` from the dropdown menu select a logic operator. 4. Select is, is not, in, or not in from the second dropdown menu. 5. Enter the tag name you want to filter by from the third dropdown menu. Filter runs by tags ### Example: Combine filters with AND and OR conditions This section describes how to create a complex filter expression like `A AND (B OR C)`. In this example, A is `loss` ≤ `1`, B is `State` = `finished`, and C is `State` = `crashed`. 1. Click on the **Runs** tab from the project sidebar. 2. Select the **Filter** button, which looks like a funnel, above the runs table. 3. From left to right, select `loss` from the first dropdown menu, select ≤ from the second dropdown menu, and enter `1` in the third dropdown menu. This creates filter A. 4. Click **New Filter** and select **New group** to create a new group for filters B and C. 5. For filter B, select `State` from the first dropdown menu, select `=` from the second dropdown menu, and select `finished` from the third dropdown menu. 6. Click **New Filter** to create filter C, and select `State` from the first dropdown menu, select `=` from the second dropdown menu, and select `crashed` from the third dropdown menu. The following image demonstrates how the resulting filter expression appears in the filter expression editor: Combine filters with AND and OR conditions ## Default filters By default, W\&B provides the following filters: * **Show only my works**: Shows only runs created by the current user. * **Hide crashed runs**: Hides runs with the `crashed` state. Default filters appear as toggles below the **New filter** button in the filter expression editor. ## Remove a filter To remove a filter from the Runs table: 1. Click on the **Filter** button, which looks like a funnel, above the runs table. 2. Select the `x` icon next to the filter you want to remove. # Fork a run Source: https://docs.wandb.ai/models/runs/forking Explore different parameters or models from a specific point in an experiment without impacting the original run. The ability to fork a run is in active development. It is in preview for Multi-tenant Cloud and Dedicated Cloud, and not yet available in Self-Managed. You can explore different hyperparameters or models from a specific point in an experiment without impacting the original run. To do this, fork from an existing W\&B run. When you fork from a run, W\&B creates a new run using the source run’s [unique ID](/models/runs/run-identifiers#unique-run-identifiers) and a specified step. Summary metrics from the source run are copied to the forked run. The forked run shares all history and files from the source run up to the specified step. After the fork step, you can log new data to the forked run independently of the original run. View a [live demo](https://wandb.ai/wandb/test-fork-run/workspace?nw=nwuserjuliarose) of a forked run produced by the code below. * Forking a run requires [`wandb`](https://pypi.org/project/wandb/) SDK version >= 0.16.5 * Forking a run requires monotonically increasing steps. You cannot fork from a run that uses non-monotonic steps defined with [`define_metric()`](/models/ref/python/experiments/run#define_metric). Non-monotonic steps break the chronological order of run history and system metrics. Specify the source run's unique `run ID` and the `step` you want to start the forked run from as arguments to `fork_from` in [`wandb.init()`](/models/ref/python/functions/init). ## Fork from a previously logged run The following code snippet shows how to fork from a run that was previously logged to W\&B. First, obtain the run ID of the run you want to fork from. Next, specify the run ID and the step you want to fork from as arguments to `fork_from` in [`wandb.init()`](/models/ref/python/functions/init). Copy and paste the following code into a Python script or notebook cell. Replace ``, ``, and `` with your own values: ```python theme={null} import wandb # The unique ID of the source run to fork from source_run_id = "" # Specify the step to fork from fork_step = 200 # Fork the run with wandb.init( project="", entity="", fork_from=f"{source_run_id}?_step={fork_step}", ) as forked_run: pass ``` ## Fork from a run in the same script The following code snippet shows how to create a run and fork from that run within the same script. This might occur if you want to fork from a run that you just created without having to look up the run ID in the W\&B App. First, initialize a run and log some data. Next, use the original run object's `id` property to obtain the run ID of that run. Finally, initialize a new run and pass the original run's ID and the step you want to fork from as arguments to `fork_from` in [`wandb.init()`](/models/ref/python/functions/init). ```python theme={null} import wandb # Initialize a run with wandb.init( project="", entity="" ) as original_run: # ...training logic goes here ... pass # Specify the step to fork from fork_step = int("") # Use the original run's ID and specify the step to fork from with wandb.init( project="", entity="", fork_from=f"{original_run.id}?_step={fork_step}", ) as forked_run: # ...training logic goes here ... pass ``` Use the `original_run.id` property to obtain the unique run ID of the original run. ### Example script For example, the following code example shows how to first fork a run and then how to log metrics to the forked run starting from a training step of 200. Copy and paste the following code into a Python script or notebook cell. Replace `` and `` with your own values. ```python theme={null} import wandb import math # Initialize the first run and log some metrics with wandb.init( project="", entity="" ) as run1: for i in range(300): run1.log({"metric": i}) # Fork from the first run at a specific step and log the # metric starting from step 200 with wandb.init( project="", entity="", fork_from=f"{run1.id}?_step=200" ) as run2: # Continue logging in the new run # For the first few steps, log the metric as is from run1 # After step 250, start logging the spikey pattern for i in range(200, 300): if i < 250: # Continue logging from run1 without spikes metric_value = i else: # Introduce the spikey behavior starting from step 250 metric_value = i + (2 * math.sin(i / 3.0)) # Apply a subtle spikey pattern # Log both metrics in a single call to ensure they're # logged at the same step run2.log({ "metric": metric_value, "additional_metric": i * 1.1 }) ``` **Rewind and forking compatibility** Forking complements a [`rewind`](/models/runs/rewind/) by providing more flexibility in managing and experimenting with your runs. When you fork from a run, W\&B creates a new branch off a run at a specific point to try different parameters or models. When you rewind a run, W\&B lets you correct or modify the run history itself. # Organize runs Source: https://docs.wandb.ai/models/runs/grouping Organize your runs into groups and other properties. Organize your runs into *groups*. A group is a collection of runs that share a common purpose, such as training runs for a specific model or evaluation runs for a specific dataset. You can also organize runs by other properties such as *job type*. [Job types](/models/runs/grouping#organize-runs-by-job-type) indicate the function of a run, such as `preprocessing`, `training`, or `evaluation`. ## Organize runs into groups You can add runs to a group programmatically using the W\&B Python SDK or interactively in the W\&B App. W\&B stores group names as a run [`wandb.Run.group`](/models/ref/python/experiments/run#property-run-group) property. Programmatically add one or more runs to a group with the W\&B Python SDK. Pass the name of your group as an argument to the `group` parameter when you initialize a run with `wandb.init(group="")`. You can use group names to organize and filter runs in the W\&B App. The following example creates three groups named `A`, `B`, and `C`. Each group contains three runs. ```python theme={null} import wandb entity = "" project = "" for group in ["A", "B", "C"]: for i in range(3): with wandb.init(entity=entity, project=project, group=group, name=f"{group}_run_{i}") as run: # Simulate some training for step in range(100): run.log({ "acc": 0.5 + (step / 100) * 0.3 + (i * 0.05), "loss": 1.0 - (step / 100) * 0.5 }) ``` In the project's workspace, you can view runs organized by group. The following image illustrates organizing the runs table by group name. Three groups named `A`, `B`, and `C` appear in the runs table, each containing three runs. Runs table grouped by group name 1. Navigate to your W\&B project. 2. Select the **Runs** tab from the project sidebar. 3. Above the list of runs, click the **Group** button. 4. Click the checkbox next to one or more runs you want to group. 5. Select **Move to group**. 6. In the drawer, select an existing group or create a new group. ### View groups View runs organized by group in the W\&B App: 1. In your project sidebar, select the **Runs** tab. 2. Above the list of runs, click the **Group** button. 3. From the dropdown, select a **Group**. ### Move runs between groups Move runs from one group to another group: 1. Navigate to your W\&B project. 2. Select the **Runs** tab from the project sidebar. 3. Select one or more runs by clicking their checkboxes. 4. Above the table, click **Move to group**. 5. Within the drawer, select the target group or create a new group. 6. Click **Move**. ### Remove runs from a group 1. Navigate to your W\&B project. 2. Select the **Runs** tab from the project sidebar. 3. Above the list of runs, click the **Group** button. 4. From the dropdown, select the **X** next to the name of the group you want to remove. ### Delete a group To delete a group, remove all runs from it. This automatically deletes the group. ## Organize runs by job type Organize runs by their *job type*. A job type indicates the function of a run, such as `preprocessing`, `training`, or `evaluation`. View a run's job type by accessing the run's [`wandb.Run.job_type`](/models/ref/python/experiments/run#property-run-job-type) property. Add a job type to a run by passing the `job_type` parameter to `wandb.init(job_type="")`. For example, the following code snippet creates runs with job types of either `training` or `evaluation`: ```python theme={null} import wandb entity = "" project = "" for job_type in ["training", "evaluation"]: for i in range(2): with wandb.init(entity=entity, project=project, job_type=job_type, name=f"{job_type}_run_{i}") as run: # Simulate some process for step in range(50): run.log({ "metric1": 0.2 + (step / 50) * 0.4 + (i * 0.03), "metric2": 0.8 - (step / 50) * 0.3 }) ``` The following image shows runs organized by job type: Ungrouped runs table ### View runs organized by job type View runs organized by group in the W\&B App: 1. In your project sidebar, select the **Runs** tab. 2. Above the list of runs, click the **Group** button. 3. From the dropdown, select **Job Type**. # Visualize CoreWeave infrastructure alerts Source: https://docs.wandb.ai/models/runs/infrastructure-alerts View CoreWeave infrastructure alerts such as GPU failures and thermal violations on your W&B experiment run plots. Observe infrastructure alerts such as GPU failures, thermal violations, and more during machine learning experiments you log to W\&B. When you run on a supported [CoreWeave Kubernetes Service (CKS)](https://docs.coreweave.com/products/cks) cluster, enable this integration, and satisfy the prerequisites on this page, [CoreWeave Mission Control](https://www.coreweave.com/mission-control) can monitor your compute infrastructure during a [W\&B run](/models/runs). This feature is in Preview. Contact your W\&B representative for access. ## Prerequisites The following must be true for this integration to work end-to-end. | Prerequisite | Details | | ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **CoreWeave platform** | Available only on [CoreWeave Kubernetes Service (CKS)](https://docs.coreweave.com/products/cks) clusters. Not available on CoreWeave bare metal clusters or CoreWeave Classic. Training jobs running through [SUNK](https://docs.coreweave.com/products/sunk) on CKS also satisfy this requirement. | | **W\&B Python SDK** | For training jobs, use the `wandb` package version `0.20.1` or later when you log a run. | | **W\&B Server (Dedicated Cloud or Self-Managed)** | If using a W\&B Dedicated Cloud or W\&B Self-Managed deployment, use W\&B Server version `0.73.0` or later. Set the `SERVER_FLAG_ENABLE_CORE_WEAVE_OBSERVABILITY` environment variable on the W\&B app pod so the server can accept CoreWeave observability data. | If an error occurs, CoreWeave sends that information to W\&B. W\&B populates infrastructure information onto your run's plots in your project's workspace. CoreWeave attempts to automatically resolve some issues, and W\&B surfaces that information in the run's page. # Initialize runs Source: https://docs.wandb.ai/models/runs/initialize-run Initialize W&B runs with wandb.init() to start tracking experiments, including handling concurrent runs in one process. Initialize a W\&B Run with [`wandb.init()`](/models/ref/python/functions/init). By default, W\&B assumes each Python process has only one active run at a time when you call `wandb.init()`. If you call `wandb.init()` again, W\&B will either return the same run or finish the old run before starting a new one. How W\&B handles multiple calls to `wandb.init()` in the same process depends on the environment (notebook vs. non-notebook) and the `reinit` configuration. To manage multiple active runs in the same process, see [Multiple runs in one process](/models/runs/initialize-run#multiple-runs-in-one-process). W\&B recommends using a `with` block when calling `wandb.init()`. This ensures that W\&B properly finalizes the run and uploads all data when the block ends. ## Single run per process The following example code snippet shows how to import the W\&B Python SDK and initialize a run. ```python title="basic.py" theme={null} import wandb with wandb.init(entity="nico", project="awesome-project") as run: # Your training logic here ``` The code snippet produces the following output: ```bash theme={null} 🚀 View run exalted-darkness-6 at: https://wandb.ai/nico/awesome-project/runs/pgbn9y21 Find logs at: wandb/run-20241106_090747-pgbn9y21/logs ``` The output shows that W\&B logs the run `exalted-darkness-6` to the project `awesome-project` under the entity `nico`. `pgbn9y21` is the unique run ID that W\&B generates for this run. ## Multiple runs in one process Manage multiple runs in a single Python process. This is useful for workflows where you want to keep a primary process active while creating short-lived secondary processes for sub-tasks. Some use cases include: * Keeping a single “primary” run active throughout a script while spinning up short-lived “secondary” runs for evaluations or sub-tasks. * Orchestrating sub-experiments in a single file. * Logging from one “main” process to several runs that represent different tasks or time periods. By default, W\&B assumes each Python process has only one active run at a time when you call `wandb.init()`. If you call `wandb.init()` again, W\&B will either return the same run or finish the old run before starting a new one, depending on the configuration. The content in this guide explains how to use `reinit` to modify the `wandb.init()` behavior to enable multiple runs in a single Python process. **Requirements** To manage multiple runs in a single Python process, you must have W\&B Python SDK version `v0.19.10` or newer. ### `reinit` options Use the `reinit` parameter to configure how W\&B handles multiple calls to `wandb.init()`. The following table describes valid arguments and their effects: | | Description | Creates a run? | Example use case | | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | `create_new` | Create a new run with `wandb.init()` without finishing existing, active runs. W\&B does not automatically switch the global `wandb.Run` to new runs. You must hold onto each run object yourself. See the [multiple runs in one process example](/models/runs/initialize-run/#example-multiple-runs-in-one-process) below for details. | Yes | Ideal for creating and managing concurrent processes. For example, a “primary” run that remains active while you start or end “secondary” runs. | | `finish_previous` | Finish all active runs with `run.finish()` before creating a new one run with `wandb.init()`. Default behavior for non notebook environments. | Yes | Ideal when you want to break sequential sub-processes into separate individual runs. | | `return_previous` | Return the most recent, unfinished run. Default behavior for notebook environments. | No | | W\&B does not support `create_new` mode for [W\&B Integrations](/models/integrations) that assume a single global run, such as Hugging Face Trainer, Keras callbacks, and PyTorch Lightning. If you use these integrations, you should run each sub-experiment in a separate process. ### Specifying `reinit` * Use `wandb.init()` with the `reinit` argument directly: ```python theme={null} import wandb with wandb.init(reinit="") as run: # Your code here ``` * Use `wandb.init()` and pass a `wandb.Settings` object to the `settings` parameter. Specify `reinit` in the `Settings` object: ```python theme={null} import wandb with wandb.init(settings=wandb.Settings(reinit="")) as run: # Your code here ``` * Use `wandb.setup()` to set the `reinit` option globally for all runs in the current process. This is useful if you want to configure the behavior once and have it apply to all subsequent `wandb.init()` calls in that process. ```python theme={null} import wandb with wandb.setup(wandb.Settings(reinit="")) as run: # Your code here ``` * Specify the desired value for `reinit` in the environment variable `WANDB_REINIT`. Defining an environment variable applies the `reinit` option to `wandb.init()` calls. ```bash theme={null} export WANDB_REINIT="" ``` The following code snippet shows a high level overview how to set up W\&B to create a new run each time you call `wandb.init()`: ```python theme={null} import wandb wandb.setup(wandb.Settings(reinit="create_new")) with wandb.init() as experiment_results_run: # This run will be used to log the results of each experiment. # You can think of this as a parent run that collects results with wandb.init() as run: # The do_experiment() function logs fine-grained metrics # to the given run and returns result metrics that # you want to track separately. experiment_results = do_experiment(run) # After each experiment, log its results to a parent # run. Each point in the parent run's charts corresponds # to one experiment's results. experiment_results_run.log(experiment_results) ``` ### Example: Concurrent processes Suppose you want to create a primary process that remains open for the script's entire lifespan, while periodically spawning short-lived secondary processes without finishing the primary process. For example, this pattern can be useful if you want to train a model in the primary run, but compute evaluations or do other work in separate runs. To achieve this, use `reinit="create_new"` and initialize multiple runs. For this example, suppose "Run A" is the primary process that remains open throughout the script, while "Run B1", "Run B2", are short-lived secondary runs for tasks like evaluation. The high level workflow might look like this: 1. Initialize the primary process Run A with `wandb.init()` and log training metrics. 2. Initialize Run B1 (with `wandb.init()`), log data, then finish it. 3. Log more data to Run A. 4. Initialize Run B2, log data, then finish it. 5. Continue logging to Run A. 6. Finally finish Run A at the end. The following Python code example demonstrates this workflow: ```python theme={null} import wandb def train(name: str) -> None: """Perform one training iteration in its own W&B run. Using a 'with wandb.init()' block with `reinit="create_new"` ensures that this training sub-run can be created even if another run (like our primary tracking run) is already active. """ with wandb.init( project="my_project", name=name, reinit="create_new" ) as run: # In a real script, you'd run your training steps inside this block. run.log({"train_loss": 0.42}) # Replace with your real metric(s) def evaluate_loss_accuracy() -> (float, float): """Returns the current model's loss and accuracy. Replace this placeholder with your real evaluation logic. """ return 0.27, 0.91 # Example metric values # Create a 'primary' run that remains active throughout multiple train/eval steps. with wandb.init( project="my_project", name="tracking_run", reinit="create_new" ) as tracking_run: # 1) Train once under a sub-run named 'training_1' train("training_1") loss, accuracy = evaluate_loss_accuracy() tracking_run.log({"eval_loss": loss, "eval_accuracy": accuracy}) # 2) Train again under a sub-run named 'training_2' train("training_2") loss, accuracy = evaluate_loss_accuracy() tracking_run.log({"eval_loss": loss, "eval_accuracy": accuracy}) # The 'tracking_run' finishes automatically when this 'with' block ends. ``` Note three key points from the previous example: 1. `reinit="create_new"` creates a new run each time you call `wandb.init()`. 2. You keep references of each run. `wandb.run` does not automatically point to the new run created with `reinit="create_new"`. Store new runs in variables like `run_a`, `run_b1`, etc., and call `.log()` or `.finish()` on those objects as needed. 3. You can finish sub-runs whenever you want while keeping the primary run open until. 4. Finish your runs with `run.finish()` when you are done logging to them. This ensures that all data is uploaded and the run is properly closed. # Move a run to a different project or team Source: https://docs.wandb.ai/models/runs/manage-runs Move runs between projects or teams using the W&B App. Before you begin, ensure you have the necessary permissions to move runs between projects or teams. You must have access to the run at its current and new locations. To move runs from one project to another or between teams: 1. Navigate to the project that contains the runs you want to move. 2. Select the **Runs** tab from the project sidebar. 3. Select the checkbox next to the runs you want to move. 4. Click the **Move to project** button above the table. 5. Select the destination team and project from the dropdown. When you move a run, historical artifacts associated with it are not moved. To move an artifact manually, you can use the [`wandb artifact get`](/models/ref/cli/wandb-artifact/wandb-artifact-get/) SDK command or the [`Api.artifact` API](/models/ref/python/public-api/api/#artifact) to download the artifact, then use [`wandb artifact put`](/models/ref/cli/wandb-artifact/wandb-artifact-put/) or the `Api.artifact` API to upload it to the run's new location. # Resume a run Source: https://docs.wandb.ai/models/runs/resuming Resume paused, stopped, or crashed W&B runs using the resume parameter options in wandb.init(). Specify how W\&B should respond if a run stops or crashes by setting the `resume` parameter in `wandb.init()`. When you initialize a run, W\&B checks whether the run ID already exists and applies the behavior defined by the `resume` value. The following table outlines the behavior of W\&B based on the argument passed to the `resume` parameter and whether the run ID exists or not. | Argument | Description | Run ID exists | Run ID does not exist | Use case | | --------- | ------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------- | ------------------------------------------------- | ------------------------------------------------ | | `"must"` | W\&B must resume run specified by the run ID. | W\&B resumes run with the same run ID. Resumes from last step. | W\&B raises an error. | Resume a run that must use the same run ID. | | `"allow"` | Allow W\&B to resume run if run ID exists. | W\&B resumes run with the same run ID. Resumes from last step. | W\&B initializes a new run with specified run ID. | Resume a run without overriding an existing run. | | `"never"` | Never allow W\&B to resume a run specified by the run ID. | Raise an error if a run with the specified ID already exists. | W\&B initializes a new run with specified run ID. | | | `"auto"` | Allow W\&B to automatically try to resume run if run ID exists. Restart the run from the same directory as the failed process. | W\&B resumes run with the same run ID. | W\&B initializes a new run with specified run ID. | Enable runs to automatically resume. | **When to use `auto` vs `allow`** W\&B recommends that you use `resume="allow"` and specify the specific run ID you want to resume. The `resume="auto"` option does not require you to specify a run ID but it can lead to unexpected behavior if you have multiple runs that fail in the same directory or if the file directory structure changes. You must also ensure that you restart the run from the same directory as the failed process when you use `resume="auto"`. For all the examples below, replace values enclosed within `<>` with your own. [View a live demo of a resumed run](https://wandb.ai/wandb/resume-run/workspace?nw=nwuserjuliarose). ## Resume a run that must use the same run ID If a run is stopped, crashes, or fails, you can resume it using the same run ID. To do so, initialize a run and specify the following: * Set the `resume` parameter to `"must"` (`resume="must"`) * Provide the run ID of the run that stopped or crashed The following code snippet shows how to accomplish this with the W\&B Python SDK: ```python theme={null} with wandb.init(entity="", project="", resume="must") as run: # Your training code here ``` Unexpected results will occur if multiple processes use the same `id` concurrently. For more information on how to manage multiple processes, see the [Log distributed training experiments](/models/track/log/distributed-training/) ## Resume a run without overriding the existing run Resume a run that stopped or crashed without overriding the existing run. This is especially helpful if your process doesn't exit successfully. The next time you start W\&B, W\&B will start logging from the last step. Set the `resume` parameter to `"allow"` (`resume="allow"`) when you initialize a run with W\&B. Provide the run ID of the run that stopped or crashed. The following code snippet shows how to accomplish this with the W\&B Python SDK: ```python theme={null} import wandb with wandb.init(entity="", project="", id="", resume="allow") as run: # Your training code here ``` ## Enable runs to automatically resume The following code snippet shows how to enable runs to automatically resume with the Python SDK or with environment variables. Pass `auto` as an argument to the `resume` parameter when you initialize a run. Ensure that you restart the run from the same directory as the failed process. Copy and paste the following code snippet. Replace values enclosed within `<>` with your own: ```python theme={null} with wandb.init(entity="", project="", id="", resume="auto") as run: # Your training code here ``` The following example shows how to specify the W\&B `WANDB_RUN_ID` variable in a bash script: ```bash title="run_experiment.sh" theme={null} RUN_ID="$1" WANDB_RESUME=auto WANDB_RUN_ID="$RUN_ID" python eval.py ``` Within your terminal, you could run the shell script along with the W\&B run ID. The following code snippet passes the run ID `akj172`: ```bash theme={null} sh run_experiment.sh akj172 ``` Automatic resuming only works if the process is restarted on top of the same filesystem as the failed process. For example, suppose you execute a python script called `train.py` in a directory called `Users/AwesomeEmployee/Desktop/ImageClassify/training/`. Within `train.py`, the script creates a run that enables automatic resuming. Suppose next that the training script is stopped. To resume this run, you would need to restart your `train.py` script within `Users/AwesomeEmployee/Desktop/ImageClassify/training/` . If you can not share a filesystem, specify the `WANDB_RUN_ID` environment variable or pass the run ID with the W\&B Python SDK. See the [Custom run IDs](./#custom-run-ids) section in the "What are runs?" page for more information on run IDs. ## Resume preemptible Sweeps runs When you handle preemption correctly, interrupted [sweep](/models/sweeps/) runs can be requeued automatically so another agent can pick them up. That pattern is especially helpful when the sweep agent runs in a preemptible environment, such as a SLURM job in a preemptible queue, an EC2 Spot instance, or a Google Cloud preemptible VM. The behavior below applies when you run sweep agents with the [`wandb agent`](/models/ref/cli/wandb-agent) CLI, which starts your training program as a **subprocess**. It does not fully apply when you use only the Python API [`wandb.agent()`](/models/ref/python/functions/agent), because that path runs your training function in a thread rather than a separate process, so OS signal delivery and forwarding do not match the CLI agent model. **Recommended pattern:** Register a signal handler for the preemption signal your scheduler or platform uses (for example `SIGUSR1` or `SIGTERM`). In the handler, call [`mark_preempting()`](/models/ref/python/experiments/run#mark_preempting) when a run is active, perform any cleanup (such as saving a checkpoint), then exit with a non-zero code (a common convention is `128 + signum` for signal termination). Do **not** call `mark_preempting()` unconditionally immediately after `wandb.init()`. Doing so can mark every failure, including code bugs, as preemption and requeue the run repeatedly. For runnable examples, `--forward-signals` on the CLI agent, and a full reference table for different uses of `mark_preempting()`, see [Signal handling and sweep runs](/models/sweeps/signal-handling-sweep-runs). When you follow that pattern, W\&B records run state roughly as follows: | Scenario | Run state | | ---------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- | | Run completes normally with exit code 0 | FINISHED | | Run fails with a non-zero exit code | FAILED | | Run receives an unhandled signal (for example `SIGKILL`) | CRASHED after about five minutes | | Run receives a handled preemption signal (for example `SIGTERM` or `SIGUSR1`), the handler calls `mark_preempting()`, and the process exits non-zero | PREEMPTED; the run is queued for the next agent request | Sweep agents drain the run queue before the sweep generates new hyperparameter combinations from the search algorithm. Once the queue is empty, the sweep resumes normal scheduling. # Rewind a run Source: https://docs.wandb.ai/models/runs/rewind Rewind a run to correct or modify its history without losing original data. Rewind a run to modify the history of a run. When you rewind a run, W\&B resets the state of the run to the specified step while maintaining the same run ID. The ability to rewind a run is in active development. It is in preview for Multi-tenant Cloud and Dedicated Cloud, and not yet available in Self-Managed. Due to known performance limitations with rewind, W\&B typically recommends [Forking](./forking) as an alternative. ### Prerequisites Before you rewind a run, ensure you meet the following prerequisites: * To rewind a run, you must have [W\&B Python SDK](https://pypi.org/project/wandb/) version >= `0.17.1`. * You must use monotonically increasing steps. This does not work with non-monotonic steps defined with [`define_metric()`](/models/ref/python/experiments/run#define_metric) because it disrupts the required chronological order of run history. ### Limitations Rewind does not support the following: * **Log rewind**: Logs are reset in the new run segment. * **System metrics rewind**: W\&B logs only new system metrics after the rewind point. * **Artifact association**: W\&B associates artifacts with the source run that produces them. W\&B recomputes the summary metrics for the run you rewind based on the newly logged history, with the following results: * **History truncation**: W\&B truncates the history to the rewind point, allowing new data logging. * **Summary metrics**: Recomputed based on the newly logged history. * **Configuration preservation**: W\&B preserves the original configurations and you can merge new configurations. ### Rewind and forking compatibility Forking complements a rewind. When you fork from a run, W\&B creates a new branch off a run at a specific point to try different parameters or models. When you rewind a run, W\&B lets you correct or modify the run history itself. ## Rewind a run Rewind a run from a specific step and log new data from that point in time. Pass both the run ID and the step you want to rewind from as arguments to the `resume_from` parameter in [`wandb.init()`](/models/ref/python/functions/init). The `resume_from` parameter accepts a string in the format of `?_step=`, where `` is the run ID of the run you want to rewind and `` is the step you want to rewind from. Suppose you log a linear line for 300 steps: ```python theme={null} import wandb # Initialize the first run and log some metrics with wandb.init(project="", entity="wandb") as run: for i in range(300): # Plot a linear line run.log({"metric": i, "step": i}) ``` Within your project's workspace, you see a line plot from step 0 to step 300: Line plot of the original run At a later time, you want to rewind the run from step 200 and you want to log a new metric called `additional_metric` that logs `i*1.1` from step 200 to step 300. From step 250 you want to log a new subtle wavy pattern (`i**2 + 2*sin(i/3)`) instead of a linear line: ```python theme={null} import math run_ID = "" # Replace with the run ID of the run you want to rewind # Rewind from the first run at a specific step and log the metric starting from step 200 with wandb.init(project="", entity="wandb", resume_from=f"{run_ID}?_step=200") as run: # For the first few steps, log the metric as is from run # After step 250, start logging the wavy pattern for i in range(200, 300): if i < 250: run.log({"metric": i, "step": i}) # Continue logging from run without waves else: # Introduce the wavy behavior starting from step 250 subtle_wave = i + (2 * math.sin(i / 3.0)) # Apply a subtle wavy pattern run.log({"metric": subtle_wave, "step": i}) # Additionally log the new metric at all steps run.log({"additional_metric": i * 1.1, "step": i}) ``` The following image shows the updated project's workspace. Note the following changes in the plot after the rewind: * The line plot shows the original linear line from step 0 to step 200 and the new subtle wavy pattern starts from step 250 (left image). * W\&B created a new plot (right plot) labeled `additional_metric` that starts from step 200. From left to right: original linear line and additional metric ## View an archived run After you rewind a run, you can explore the original archived run in the W\&B App. Follow these steps to view an archived run: 1. **Access the Overview Tab:** Navigate to the [**Overview** tab](./#overview-tab) on the run's page. This tab provides a comprehensive view of the run's details and history. 2. **Locate the Forked From field:** Within the **Overview** tab, find the `Forked From` field. This field captures the history of the resumptions. The **Forked From** field includes a link to the source run, allowing you to trace back to the original run and understand the entire rewind history. By using the `Forked From` field, you can effortlessly navigate the tree of archived resumptions and gain insights into the sequence and origin of each rewind. ## Fork from a run that you rewind To fork from a rewound run, use the [`fork_from`](/models/runs/forking/) argument in `wandb.init()` and specify the source run ID and the step from the source run to fork from: ```python theme={null} import wandb # Fork the run from a specific step forked_run = wandb.init( project="", entity="", fork_from=f"{rewind_run.id}?_step=500", ) # Continue logging in the new run for i in range(500, 1000): forked_run.log({"metric": i*3}) forked_run.finish() ``` # Customize run colors Source: https://docs.wandb.ai/models/runs/run-colors Customize, randomize, and reset the colors assigned to individual runs in your W&B project workspace. W\&B automatically assigns a color to each run that you create in your project. You can change the default color of a run to help you visually distinguish it from other runs in the table and graphs. Reset your project workspace to restore the default colors for all runs in the table. Run colors are locally scoped. On the project page, custom colors apply only to your own workspace. In reports, custom colors for runs apply only at the section level. You can visualize the same run in different sections, which can use different custom colors per section. ## Edit default run colors 1. Click the **Runs** tab from the project sidebar. 2. Click the dot color next to the run name in the **Name** column. 3. Select a color from the color palette or the color picker, or enter a hex code. Edit default run color in project workspace ## Randomize run colors To randomize the colors of all runs in the table: 1. Click the **Runs** tab from the project sidebar. 2. Hover over the **Name** column header, click the **action ()** menu, and select **Randomize run colors** from the dropdown menu. The option to randomize run colors is available only after modify the run's table in some way, such as by sorting, filtering, searching, or grouping. ## Reset run colors To restore the default colors for all runs in the table: 1. Click the **Runs** tab from the project sidebar. 2. Hover over the **Name** column header, click the **action ()** menu, and select **Reset colors** from the dropdown menu. Reset run colors in project workspace # Find and customize a run's ID or name Source: https://docs.wandb.ai/models/runs/run-identifiers Learn how to find a run's unique identifier and run name, how to create a custom run ID, and how to customize a run's name. When you initialize a W\&B Run, W\&B assigns that run a [unique identifier known as a *run ID*](/models/runs/run-identifiers#run-id). Each run also has a human-readable [non-unique *run name*](/models/runs/run-identifiers#run-name) that you can customize. ## Run ID A run's ID uniquely identifies the run. By default, W\&B generates a [random and unique run ID](#autogenerated-run-ids) automatically when you initialize a new run, unless you [specify your own unique run ID](#create-a-custom-run-id) when you [initialize the run](/models/runs/initialize-run). ### Find a run's ID Find a run's unique ID programmatically with the W\&B Python SDK or interactively in the W\&B App. When you initialize a run, W\&B returns the unique run ID in the terminal. For example, consider the following code snippet that initializes a W\&B run: ```python theme={null} import wandb entity = "nico" # Replace with your W&B entity project = "awesome-project" with wandb.init(entity=entity, project=project) as run: # Your code here ``` Within the terminal, W\&B returns: ```bash theme={null} wandb: Syncing run earnest-sunset-1 wandb: ⭐️ View project at https://wandb.ai/nico/awesome-project wandb: 🚀 View run at https://wandb.ai/nico/awesome-project/runs/1jx1ud12 ``` The last part of the run URL (`1jx1ud12`) is the unique run ID. You can also find a run's unique ID in the W\&B App: 1. Navigate to the [W\&B App](https://wandb.ai/home). 2. Navigate to the W\&B project you specified when you initialized the run. 3. Within your project's workspace, select either the **Workspace** or **Runs** tab. 4. Select the run you want to view. 5. Choose the **Overview** tab. W\&B displays the run ID in the **Run path** field. The run path consists of the name of your team, the name of the project, and the run ID. The unique ID is the last part of the run path. For example, in the following image, the unique run ID is `9mxi1arc`: Run ID location Use a run's unique ID to directly navigate to that run's overview page in the W\&B App. The following code block shows the format of a URL path for a run: ```text title="W&B App URL for a specific run" theme={null} https://wandb.ai/// ``` Replace values enclosed in angle brackets (`< >`) with the actual values of the entity, project, and run ID. ### Create a custom run ID Pass your desired run ID as a string to the `id` parameter when you initialize a run: ```python theme={null} import wandb with wandb.init(entity="", project="", id="") as run: # Your code here ``` ## Run name Each run has a human-readable, non-unique run name. By default, W\&B generates a random run name when you initialize a new run if you do not specify a run name for it. The name of a run appears within your project's workspace and at the top of the [run's **Overview** page](#overview-tab). Continuing from the previous example, the name of the run is `glowing-shadows-8`. Run ID location You can name your run when [you initialize it](/models/runs/run-identifiers#create-a-custom-run-name) or [rename](/models/runs/run-identifiers#rename-a-run) it at a later time. ### Create a custom run name Specify a name for your run by passing the `name` parameter to the [`wandb.init()`](/models/ref/python/functions/init) method. ```python theme={null} import wandb with wandb.init(entity="", project="", name="") as run: # Your code here ``` ### Rename a run Rename a run after initializing it programmatically with the Python SDK or interactively in the W\&B App. Use [`wandb.Api.Run`](/models/ref/python/public-api/api#method-api-run) to access a run logged to W\&B. This method returns a [run object](/models/ref/python/public-api/run#property-run-name) that you can use to update the run name. Call `wandb.Api.Run.update()` method to persist changes. Replace the values enclosed in angle brackets (`< >`) with your own values. ```python theme={null} import wandb api = wandb.Api() # Access run by its path run = api.run(path = "//") # Specify a new run name run.name = "" run.update() ``` 1. Navigate to your W\&B project. 2. Select the **Workspace** or **Runs** tab. 3. Search or scroll to the run you want to rename. 4. Hover over the run name, click the **action ()** menu, then click **Rename run**. 5. To change the run name, update the **Run name** field. 6. Click **Save**. ## Run display name Each run also has a *run display name* that you can customize for each workspace. If you change a run's display name in one workspace, the display name changes only for that workspace, not in other workspaces or projects. The display name defaults to the same value as the run name. The display name appears in the run's workspace and runs table. Use the run display name to override the run name displayed in that workspace without renaming the run in the project. ### Rename a run's display name Change a run's display name from the W\&B App: 1. Navigate to your W\&B project. 2. Select the **Workspace** or **Runs** tab. 3. Search or scroll to the run you want to rename. 4. Hover over the run name, click the **action ()** menu, then click **Rename run**. 5. Specify a new value for the **Display name** field. 6. Click **Save**. ## Customize run name truncation By default, long run names are truncated in the middle for readability. To customize the truncation of run names: 1. Click the **action ()** menu at the top of the list of runs. 2. Set **Run name cropping** to crop the end, middle, or beginning. # Run states Source: https://docs.wandb.ai/models/runs/run-states Learn about the different states a W&B run can have. The [Run state](/models/runs/run-states#run-states) indicates the current status of a W\&B run. You can [view the state](/models/runs/run-states#view-the-state-of-a-run) of a run in the W\&B App or programmatically using the W\&B Python SDK. ## Run states The following table describes the possible states a run can be in: | State | Description | | ---------- | ------------------------------------------------------------------------------------------------ | | `Crashed` | Run stopped sending heartbeats in the internal process, which can happen if the machine crashes. | | `Failed` | Run ended with a non-zero exit status. | | `Finished` | Run ended and fully synced data, or called `wandb.Run.finish()`. | | `Killed` | Run was forcibly stopped before it could finish. | | `Running` | Run is still running and has recently sent a heartbeat. | | `Pending` | Run is scheduled but not yet started (common in sweeps and Launch jobs). | ### Run states in sweeps When runs are part of a [sweep](/models/sweeps/), their states behave independently from the sweep's status: * **Individual run states** reflect each run's execution status (Running, Finished, Failed, etc.) * **Sweep status** controls whether new runs are created, not how existing runs execute * Pausing or stopping a sweep doesn't affect already-running runs * Only cancelling a sweep forcibly kills running runs (changes their state to `Killed`) For a detailed explanation of how sweep and run statuses interact, see [Understanding sweep and run statuses](/models/sweeps/pause-resume-and-cancel-sweeps#understanding-sweep-and-run-statuses). ## View the state of a run Programmatically or interactively view a run's state with the Python SDK or W\&B App. Use the `state` property of the [`wandb.Api.Run`](/models/ref/python/public-api/runs) object to access the current state of a run. The following code snippet retrieves and prints the state of all runs in a specified project. Copy and paste the following code snippet into your Python environment. Replace the values enclosed in angle brackets (`< >`) with your own values: ```python theme={null} import wandb api = wandb.Api() runs = api.runs(path="/") # Access run object's properties for run in runs: print(f"Run: {run.name}") print(f"Run state: {run.state}") print() ``` You can apply different filters to retrieve runs from your projects based on different criteria. See [Filter runs](/models/runs/filter-runs) to learn more about filtering runs programmatically. View the state of a run from the W\&B App: 1. Navigate to your W\&B project. 2. Select the **Workspace** or **Runs** tab from the project sidebar. 3. Search or scroll to the run you want to view. 4. Select the run to open the run overview page. 5. Choose the **Overview** tab. Next to the **State** field, view the current state of the run. # Search runs Source: https://docs.wandb.ai/models/runs/search-runs Learn how to search for specific runs by name or ID in your project's Runs table or Workspace. Use the search box within your project's Runs table or Workspace to find specific [runs by name or ID](/models/runs/run-identifiers). By default, the search box uses regular expressions (RegEx) to match your query against run names or IDs. ## Search for runs by name or ID 1. Click either the **Runs** tab or **Workspace** from the project sidebar. 2. Click on the search box at the top of the runs table. 3. Enter the run name or run ID you want to search for. ## Turn off regular expressions search 1. Click either the **Runs** tab or **Workspace** from the project sidebar. 2. Click on the search box at the top of the runs table. 3. Toggle off the **RegEx** toggle (.\*) so that it is gray. # Stop runs Source: https://docs.wandb.ai/models/runs/stop-runs Stop runs programmatically using the W&B Python SDK or manually from the W&B App. Stop a run programmatically with the W\&B Python SDK or interactively in the W\&B App. 1. Navigate to the terminal or code editor where you initialized the run. 2. Press `Ctrl+D` to stop the run. 1. Navigate to the project that your run is logging to. 2. Select the run you want to stop within the run selector. 3. Choose the **Overview** tab. 4. Select the stop button next to the **State** field. Next to the **State** field, the run's state changes from `running` to `Killed`. See [State fields](/models/runs/run-states) for a full list of possible run states. # Add labels to runs with tags Source: https://docs.wandb.ai/models/runs/tags Add, update, and remove tags on W&B runs using the Python SDK, Public API, or the W&B App UI for organization. Add tags to label runs with particular features that might not be obvious from the logged metrics or artifact data. For example, you can add a tag to a run to indicated that run's model is `in_production`, that run is `preemptible`, this run represents the `baseline`, and so forth. ## Add tags to one or more runs Programmatically or interactively add tags to your runs. Based on your use case, select a tab below that best fits your needs: Use `wandb.init()` to add tags when you initialize a run. Pass a list of strings to the `tags` parameter in `wandb.init()` to add tags to a run. For example: ```python theme={null} import wandb with wandb.init( entity="", project="", tags=["", ""] ) as run: # Your training code here ``` You can also add or update an existing tag during an active run by updating the `tags` attribute of the run object (`wandb.Run.tags`). The `tags` attribute accepts a tuple of strings. Concatenate one or more tags to the existing run tag property to add new tags after you initialize the run: ```python theme={null} import wandb with wandb.init(entity="", project="", tags=[""]) as run: # Training loop logic here # Add a new tag to the run object run.tags += ("",) ``` Use the [W\&B Public API](/models/ref/python/public-api) to add or update tags to a previously saved run. To update tags on an existing run, access the `wandb.Run.tags` property. `wandb.Run.tags` property consists of a list of strings. Concatenate the new tag or tags to the existing tags and then call `wandb.Run.update()` to update the run with the new tags. For example: ```python theme={null} with wandb.Api().run("{entity}/{project}/{run-id}") as run: run.tags.append("") run.update() ``` This method is best suited to tagging large numbers of runs with the same tag or tags. 1. Navigate to your project workspace. 2. Select **Runs** from the project sidebar. 3. Select one or more runs from the table. 4. Once you select one or more runs, select the **Tag** button above the table. 5. Type the tag you want to add and select the **Create new tag** checkbox to add the tag. This method is best suited to applying a tag or tags to a single run manually. 1. Navigate to your project workspace. 2. Click a run to open it. The run page opens with the **Overview** tab shown by default. 3. Select the gray plus icon (**+**) button next to **Tags**. 4. Type a tag you want to add and select **Add** below the text box to add a new tag. ## Remove tags from one or more runs Follow these steps to remove tags from a run in the W\&B App. This method is best suited to removing tags from a large numbers of runs. 1. In the Run sidebar of the project, select the table icon in the upper-right. This will expand the sidebar into the full runs table. 2. Hover over a run in the table to see a checkbox on the left or look in the header row for a checkbox to select all runs. 3. Select the checkbox to enable bulk actions. 4. Select the runs you want to remove tags. 5. Select the **Tag** button above the rows of runs. 6. Select the checkbox next to a tag to remove it from the run. 1. In the left sidebar of the Run page, select the top **Overview** tab. The tags on the run are visible here. 2. Hover over a tag and select the "x" to remove it from the run. # View a specific run in a project Source: https://docs.wandb.ai/models/runs/view-logged-runs Learn how to view a specific logged run and its properties using the W&B App or the LEET terminal UI. View information about a specific run, such as its current state, artifacts, metrics, and more. You can view and monitor runs using the W\&B App or the `wandb beta leet` terminal UI. To view a specific run in the W\&B App: 1. Navigate to the [W\&B App](https://wandb.ai/home). 2. Navigate to the W\&B project you specified when you initialized the run. 3. Within the project sidebar, select the **Workspace** tab. 4. Within the run selector, click the run you want to view, or enter a partial run name to filter for matching runs. Alternatively, you can directly access a specific run's workspace by entering its URL in your browser. The URL path of a specific run has the following format: ```text theme={null} https://wandb.ai///runs/ ``` Replace values enclosed in angle brackets (`< >`) with the actual values of the team name, project name, and [run ID](/models/runs/run-identifiers#run-id). Explore the run's properties by navigating through the tabs: [Overview](/models/runs/view-logged-runs#overview), [Logs](/models/runs/view-logged-runs#logs), [Files](/models/runs/view-logged-runs#files), [Code](/models/runs/view-logged-runs#code), and [Artifacts](/models/runs/view-logged-runs#artifacts). To view a run locally in your terminal using the `wandb beta leet` terminal UI: 1. If you started the run locally from a script, navigate to the directory where you ran your code. It contains a `wandb/` directory with a subdirectory per run and a `latest-run/` symbolic link. Each run directory contains a transaction log named in the format `run-.wandb`. If you did not start the run locally but downloaded a `.wandb` transaction log file instead, make a note of its location. 2. Start `wandb beta leet` using one of these commands: ```bash theme={null} # View the latest run, stored in ./wandb/latest-run/ wandb beta leet # Specify a run directory wandb beta leet ./wandb/run-20250813_124246-n67z9ude # Specify a .wandb file wandb beta leet ./wandb/run-20250813_124246-n67z9ude/run-n67z9ude.wandb ``` LEET displays a three-panel interface: * **Left panel**: Run overview with environment variables, configuration, and summary statistics * **Center panel**: Metrics grid with Braille-style line charts showing your logged metrics * **Right panel**: System metrics including GPU/CPU/RAM utilization Get started with these keyboard shortcuts: * `h` or `?` - View all keyboard shortcuts * `/` - Filter metrics by pattern * `[` / `]` - Toggle left/right panels * `n` / `N` - Navigate between metric pages * `q` / `CMD+C` - Quit See the [`wandb beta leet`](/models/ref/cli/wandb-beta/wandb-beta-leet) reference for more details. ## Overview Use the **Overview** tab to learn about specific run information in a project, such as: * **References**: Dictionary keys from your experiment's [config](/models/track/config) that you want to show prominently in the W\&B App. See [Highlight config values](/models/track/config#highlight-config-values) for more details. * **Notes**: Any notes you added to the run. You can add notes to a run with the W\&B App or programmatically with the Python SDK. * **Tags**: A list of strings. Tags are useful for organizing related runs together or applying temporary labels like `baseline` or `production`. You can add tags to a run with the W\&B App or programmatically with the Python SDK. * **Author**: The W\&B entity that creates the run. * **Command**: The command that initializes the run. * **Description**: A description of the run that you provided. This field is empty if you do not specify a description when you create the run. You can add a description to a run with the W\&B App or programmatically with the Python SDK. * **Tracked Hours**: The amount of time the run is actively computing or logging data, excluding any pauses or waiting periods. This metric helps you understand the actual computational time spent on your run. You are not billed for tracked hours, which are unlimited for all plans. * **Runtime**: Measures the total time from the start to the end of the run. It's the wall-clock time for the run, including any time where the run is paused or waiting for resources. This metric provides the complete elapsed time for your run. * **Git repository**: The git repository associated with the run. You must [enable git](/platform/app/settings-page/user-settings/#personal-github-integration) to view this field. * **Host name**: Where W\&B computes the run. W\&B displays the name of your machine if you initialize the run locally on your machine. * **Name**: The name of the run. * **OS**: Operating system that initializes the run. * **Python executable**: The command that starts the run. * **Python version**: Specifies the Python version that creates the run. * **Run path**: Identifies the unique run identifier in the form `entity/project/run-ID`. * **Start time**: The timestamp when you initialize the run. * **State**: The [state of the run](/models/runs/run-states). * **System hardware**: The hardware W\&B uses to compute the run. * **W\&B CLI version**: The W\&B CLI version installed on the machine that hosted the run command. * **Git state**: The most recent git commit SHA of a repository or working directory where the run is initialized. This field is empty if you do not enable Git when you create the run or if the git information is not available. W\&B stores the following information below the overview section: * **Artifact Outputs**: Artifact outputs produced by the run. * **Config**: List of config parameters saved with [`wandb.Run.config`](/models/track/config/). * **Summary**: List of summary parameters saved with [`wandb.Run.log()`](/models/track/log/). By default, W\&B sets this value to the last value logged. You can use the search box aboce the Config and Summary sections to filter for specific parameters. For example, if you enter `acc` in the search box, W\&B filters for all parameters with `acc` in their name such as `accuracy` and `val_acc`. W\&B does not search for nested config or summary parameters. For example, if you log a nested config parameter with `wandb.Run.config.update({"model": {"learning_rate": 0.01}})`, W\&B does not return the `learning_rate` parameter if you search for `learning_rate` in the search box. You can only find the `learning_rate` parameter by searching for `model`. View an example project overview [here](https://wandb.ai/stacey/deep-drive/overview). ## Logs The **Logs** tab shows output printed on the command line such as the standard output (`stdout`) and standard error (`stderr`). Run logs tab Click the **Download** button in the upper right hand corner to download the log file. View an example logs tab [here](https://app.wandb.ai/stacey/deep-drive/runs/pr0os44x/logs). ## Files Use the **Files** tab to view files associated with a specific run such as model checkpoints, validation set examples, and more Run files tab View an example files tab [here](https://app.wandb.ai/stacey/deep-drive/runs/pr0os44x/files/media/images). ## Code The **Code** tab displays the code files associated with a specific run. This includes the main script that was executed as well as any additional code files that were part of the run's environment. ## Artifacts The **Artifacts** tab lists the input and output [artifacts](/models/artifacts/) for the specified run. Run artifacts tab # Sweeps overview Source: https://docs.wandb.ai/models/sweeps Hyperparameter search and model optimization with W&B Sweeps Use W\&B Sweeps to automate hyperparameter search and visualize rich, interactive experiment tracking. Pick from popular search methods such as Bayesian, grid search, and random to search the hyperparameter space. Scale and parallelize sweep across one or more machines. Hyperparameter tuning insights ## How it works Create a sweep with two [W\&B CLI](/models/ref/cli/) commands: 1. Initialize a sweep. ```bash theme={null} wandb sweep --project ``` 2. Start the sweep agent. ```bash theme={null} wandb agent ``` The preceding code snippet, and the colab linked on this page, show how to initialize and create a sweep with the W\&B CLI. See the [Sweeps walkthrough](/models/sweeps/walkthrough/) to use the Python SDK to configure, initialize, and run a sweep. ## How to get started Depending on your use case, explore the following resources to get started with W\&B Sweeps: * Read through the [sweeps walkthrough](/models/sweeps/walkthrough/) for a step-by-step outline of the W\&B Python SDK commands to use to define a sweep configuration, initialize a sweep, and start a sweep. * Explore this chapter to learn how to: * [Add W\&B to your code](/models/sweeps/add-w-and-b-to-your-code/) * [Define sweep configuration](/models/sweeps/define-sweep-configuration/) * [Initialize sweeps](/models/sweeps/initialize-sweeps/) * [Start sweep agents](/models/sweeps/start-sweep-agents/) * [Visualize sweep results](/models/sweeps/visualize-sweep-results/) * Explore a [curated list of Sweep experiments](/models/sweeps/useful-resources/) that explore hyperparameter optimization with W\&B Sweeps. Results are stored in W\&B Reports. For a step-by-step video, see: [Tune Hyperparameters Easily with W\&B Sweeps](https://www.youtube.com/watch?v=9zrmUIlScdY\&ab_channel=Weights%26Biases). ### Notebook examples The following notebook examples explore how to use W\&B Sweeps for hyperparameter optimization across a variety of frameworks and use cases: * [Hyperparameter optimization with Sweeps](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/tensorflow/Hyperparameter_Optimization_in_TensorFlow_using_W\&B_Sweeps.ipynb) * [Using XGBoost with W\&B Sweeps](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/boosting/Using_W\&B_Sweeps_with_XGBoost.ipynb) # Add W&B (wandb) to your code Source: https://docs.wandb.ai/models/sweeps/add-w-and-b-to-your-code Add W&B to your Python code script or Jupyter Notebook. This guide provides recommendations on how to integrate W\&B into your Python training script or notebook for hyperparameter search optimization. ## Original training script Suppose you have a Python script that trains a model (see below). Your goal is to find the hyperparameters that maxmimizes the validation accuracy(`val_acc`). In your Python script, you define two functions: `train_one_epoch` and `evaluate_one_epoch`. The `train_one_epoch` function simulates training for one epoch and returns the training accuracy and loss. The `evaluate_one_epoch` function simulates evaluating the model on the validation data set and returns the validation accuracy and loss. You define a configuration dictionary (`config`) that contains hyperparameter values such as the learning rate (`lr`), batch size (`batch_size`), and number of epochs (`epochs`). The values in the configuration dictionary control the training process. Next you define a function called `main` that mimics a typical training loop. For each epoch, the accuracy and loss is computed on the training and validation data sets. This code is a mock training script. It does not train a model, but simulates the training process by generating random accuracy and loss values. The purpose of this code is to demonstrate how to integrate W\&B into your training script. ```python theme={null} import random import numpy as np def train_one_epoch(epoch, lr, batch_size): acc = 0.25 + ((epoch / 30) + (random.random() / 10)) loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5)) return acc, loss def evaluate_one_epoch(epoch): acc = 0.1 + ((epoch / 20) + (random.random() / 10)) loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6)) return acc, loss # config variable with hyperparameter values config = {"lr": 0.0001, "batch_size": 16, "epochs": 5} def main(): lr = config["lr"] batch_size = config["batch_size"] epochs = config["epochs"] for epoch in np.arange(1, epochs): train_acc, train_loss = train_one_epoch(epoch, lr, batch_size) val_acc, val_loss = evaluate_one_epoch(epoch) print("epoch: ", epoch) print("training accuracy:", train_acc, "training loss:", train_loss) print("validation accuracy:", val_acc, "validation loss:", val_loss) if __name__ == "__main__": main() ``` In the next section, you will add W\&B to your Python script to track hyperparameters and metrics during training. You want to use W\&B to find the best hyperparameters that maximize the validation accuracy (`val_acc`). ## Add W\&B to your training script Update you training script to include W\&B. How you integrate W\&B to your Python script or notebook depends on how you manage sweeps. To use the W\&B Python SDK to start, stop, and manage sweeps, follow the instructions in the **Python script or notebook** tab. To use the W\&B CLI instead, follow the instructions in the **CLI** tab. Create a YAML configuration file with your sweep configuration. The configuration file contains the hyperparameters you want the sweep to explore. In the following example, the batch size (`batch_size`), epochs (`epochs`), and the learning rate (`lr`) hyperparameters are varied during each sweep. ```yaml theme={null} # config.yaml program: train.py method: random name: sweep metric: goal: maximize name: val_acc parameters: batch_size: values: [16, 32, 64] lr: min: 0.0001 max: 0.1 epochs: values: [5, 10, 15] ``` For more information on how to create a W\&B Sweep configuration, see [Define sweep configuration](/models/sweeps/define-sweep-configuration/). You must provide the name of your Python script for the `program` key in your YAML file. Next, add the following to the code example: 1. Import the W\&B Python SDK (`wandb`) and PyYAML (`yaml`). PyYAML is used to read in our YAML configuration file. 2. Read in the configuration file. 3. Use [`wandb.init()`](/models/ref/python/functions/init) to start a background process to sync and log data as a [W\&B Run](/models/ref/python/experiments/run). Pass the config object to the config parameter. 4. Define hyperparameter values from `wandb.Run.config` instead of using hard coded values. 5. Log the metric you want to optimize with [`wandb.Run.log()`](/models/ref/python/experiments/run.md/#method-runlog). You must log the metric defined in your configuration. Within the configuration dictionary (`sweep_configuration` in this example) you define the sweep to maximize the `val_acc` value. ```python theme={null} import wandb import yaml import random import numpy as np def train_one_epoch(epoch, lr, batch_size): acc = 0.25 + ((epoch / 30) + (random.random() / 10)) loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5)) return acc, loss def evaluate_one_epoch(epoch): acc = 0.1 + ((epoch / 20) + (random.random() / 10)) loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6)) return acc, loss def main(): # Set up your default hyperparameters with open("./config.yaml") as file: config = yaml.load(file, Loader=yaml.FullLoader) with wandb.init(config=config) as run: for epoch in np.arange(1, run.config['epochs']): train_acc, train_loss = train_one_epoch(epoch, run.config['lr'], run.config['batch_size']) val_acc, val_loss = evaluate_one_epoch(epoch) run.log( { "epoch": epoch, "train_acc": train_acc, "train_loss": train_loss, "val_acc": val_acc, "val_loss": val_loss, } ) # Call the main function. main() ``` In your CLI, set a maximum number of runs for the sweep agent to try. This is optional. This example we set the maximum number to 5. ```bash theme={null} NUM=5 ``` Next, initialize the sweep with the [`wandb sweep`](/models/ref/cli/wandb-sweep) command. Provide the name of the YAML file. Optionally provide the name of the project for the project flag (`--project`): ```bash theme={null} wandb sweep --project sweep-demo-cli config.yaml ``` This returns a sweep ID. For more information on how to initialize sweeps, see [Initialize sweeps](./initialize-sweeps). Copy the sweep ID and replace `sweepID` in the following code snippet to start the sweep job with the [`wandb agent`](/models/ref/cli/wandb-agent) command: ```bash theme={null} wandb agent --count $NUM your-entity/sweep-demo-cli/sweepID ``` For more information, see [Start sweep jobs](./start-sweep-agents). Follow these steps to add W\&B to your Python script: 1. Create a dictionary object where the key-value pairs define a [sweep configuration](/models/sweeps/define-sweep-configuration/). The sweep configuration defines the hyperparameters you want W\&B to explore on your behalf along with the metric you want to optimize. Continuing from the previous example, the batch size (`batch_size`), epochs (`epochs`), and the learning rate (`lr`) are the hyperparameters to vary during each sweep. You want to maximize the accuracy of the validation score so you set `"goal": "maximize"` and the name of the variable you want to optimize for, in this case `val_acc` (`"name": "val_acc"`). 2. Pass the sweep configuration dictionary to [`wandb.sweep()`](/models/ref/python/functions/sweep). This initializes the sweep and returns a sweep ID (`sweep_id`). For more information, see [Initialize sweeps](./initialize-sweeps). 3. At the top of your script, import the W\&B Python SDK (`wandb`). 4. Within your `main` function, use [`wandb.init()`](/models/ref/python/functions/init) to generate a background process to sync and log data as a [W\&B Run](/models/ref/python/experiments/run). Pass the project name as a parameter to the `wandb.init()` method. If you do not pass a project name, W\&B uses the default project name. 5. Fetch the hyperparameter values from the `wandb.Run.config` object. This allows you to use the hyperparameter values defined in the sweep configuration dictionary instead of hard coded values. 6. Log the metric you are optimizing for to W\&B using [`wandb.Run.log()`](/models/ref/python/experiments/run.md/#method-runlog). You must log the metric defined in your configuration. For example, if you define the metric to optimize as `val_acc`, you must log `val_acc`. If you do not log the metric, W\&B does not know what to optimize for. Within the configuration dictionary (`sweep_configuration` in this example), you define the sweep to maximize the `val_acc` value. 7. Start the sweep with [`wandb.agent()`](/models/ref/python/functions/agent). Provide the sweep ID and the name of the function the sweep will execute (`function=main`), and specify the maximum number of runs to try to four (`count=4`). Putting this all together, your script might look similar to the following: ```python theme={null} import wandb # Import the W&B Python SDK import numpy as np import random import argparse def train_one_epoch(epoch, lr, batch_size): acc = 0.25 + ((epoch / 30) + (random.random() / 10)) loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5)) return acc, loss def evaluate_one_epoch(epoch): acc = 0.1 + ((epoch / 20) + (random.random() / 10)) loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6)) return acc, loss def main(args=None): # When called by sweep agent, args will be None, # so we use the project from sweep config project = args.project if args else None with wandb.init(project=project) as run: # Fetches the hyperparameter values from `wandb.Run.config` object lr = run.config["lr"] batch_size = run.config["batch_size"] epochs = run.config["epochs"] # Execute the training loop and log the performance values to W&B for epoch in np.arange(1, epochs): train_acc, train_loss = train_one_epoch(epoch, lr, batch_size) val_acc, val_loss = evaluate_one_epoch(epoch) run.log( { "epoch": epoch, "train_acc": train_acc, "train_loss": train_loss, "val_acc": val_acc, # Metric optimized "val_loss": val_loss, } ) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--project", type=str, default="sweep-example", help="W&B project name") args = parser.parse_args() # Define a sweep config dictionary sweep_configuration = { "method": "random", "name": "sweep", # Metric that you want to optimize # For example, if you want to maximize validation # accuracy set "goal": "maximize" and the name of the variable # you want to optimize for, in this case "val_acc" "metric": { "goal": "maximize", "name": "val_acc" }, "parameters": { "batch_size": {"values": [16, 32, 64]}, "epochs": {"values": [5, 10, 15]}, "lr": {"max": 0.1, "min": 0.0001}, }, } # Initialize the sweep by passing in the config dictionary sweep_id = wandb.sweep(sweep=sweep_configuration, project=args.project) # Start the sweep job wandb.agent(sweep_id, function=main, count=4) ``` **Logging metrics to W\&B in a sweep** You must log the metric you define and are optimizing for in both your sweep configuration and with `wandb.Run.log()`. For example, if you define the metric to optimize as `val_acc` within your sweep configuration, you must also log `val_acc` to W\&B. If you do not log the metric, W\&B does not know what to optimize for. ```python theme={null} with wandb.init() as run: val_loss, val_acc = train() run.log( { "val_loss": val_loss, "val_acc": val_acc } ) ``` The following is an incorrect example of logging the metric to W\&B. The metric that is optimized for in the sweep configuration is `val_acc`, but the code logs `val_acc` within a nested dictionary under the key `validation`. You must log the metric directly, not within a nested dictionary. ```python theme={null} with wandb.init() as run: val_loss, val_acc = train() run.log( { "validation": { "val_loss": val_loss, "val_acc": val_acc } } ) ``` # Overview Source: https://docs.wandb.ai/models/sweeps/define-sweep-configuration Learn how to create configuration files for sweeps. A W\&B Sweep combines a strategy for exploring hyperparameter values with the code that evaluates them. The strategy can be as simple as trying every option or as complex as Bayesian Optimization and Hyperband ([BOHB](https://arxiv.org/abs/1807.01774)). Define a sweep configuration either in a [Python dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) or a [YAML](https://yaml.org/) file. How you define your sweep configuration depends on how you want to manage your sweep. Define your sweep configuration in a YAML file if you want to initialize a sweep and start a sweep agent from the command line. Define your sweep in a Python dictionary if you initialize a sweep and start a sweep entirely within a Python script or notebook. The following guide describes how to format your sweep configuration. See [Sweep configuration options](./sweep-config-keys) for a comprehensive list of top-level sweep configuration keys. ## Basic structure Both sweep configuration format options (YAML and Python dictionary) use key-value pairs and nested structures. Use top-level keys within your sweep configuration to define qualities of your sweep search such as the name of the sweep ([`name`](./sweep-config-keys) key), the parameters to search through ([`parameters`](./sweep-config-keys#parameters) key), the methodology to search the parameter space ([`method`](./sweep-config-keys#method) key), and more. For example, the following code snippets show the same sweep configuration defined within a YAML file and within a Python dictionary. Within the sweep configuration there are five top level keys specified: `program`, `name`, `method`, `metric` and `parameters`. Define a sweep configuration in a YAML file if you want to manage sweeps interactively from the command line (CLI) ```yaml title="config.yaml" theme={null} program: train.py name: sweepdemo method: bayes metric: goal: minimize name: validation_loss parameters: learning_rate: min: 0.0001 max: 0.1 batch_size: values: [16, 32, 64] epochs: values: [5, 10, 15] optimizer: values: ["adam", "sgd"] ``` Define a sweep in a Python dictionary data structure if you define training algorithm in a Python script or notebook. The following code snippet stores a sweep configuration in a variable named `sweep_configuration`: ```python title="train.py" theme={null} sweep_configuration = { "name": "sweepdemo", "method": "bayes", "metric": {"goal": "minimize", "name": "validation_loss"}, "parameters": { "learning_rate": {"min": 0.0001, "max": 0.1}, "batch_size": {"values": [16, 32, 64]}, "epochs": {"values": [5, 10, 15]}, "optimizer": {"values": ["adam", "sgd"]}, }, } ``` Within the top level `parameters` key, the following keys are nested: `learning_rate`, `batch_size`, `epoch`, and `optimizer`. For each of the nested keys you specify, you can provide one or more values, a distribution, a probability, and more. For more information, see the [parameters](./sweep-config-keys#parameters) section in [Sweep configuration options](./sweep-config-keys). ## Double nested parameters Sweep configurations support nested parameters. To define a nested parameter, include an additional `parameters` key under the top-level parameter name. The following example shows a sweep configuration with three nested parameters: `nested_category_1`, `nested_category_2`, and `nested_category_3`. Each nested parameter includes two additional parameters: `momentum` and `weight_decay`. `nested_category_1`, `nested_category_2`, and `nested_category_3` are placeholders. Replace them with names that fit your use case. The following code snippets show how to define nested parameters in both a YAML file and a Python dictionary. ```yaml theme={null} program: sweep_nest.py name: nested_sweep method: random metric: name: loss goal: minimize parameters: optimizer: values: ['adam', 'sgd'] fc_layer_size: values: [128, 256, 512] dropout: values: [0.3, 0.4, 0.5] epochs: value: 1 learning_rate: distribution: uniform min: 0 max: 0.1 batch_size: distribution: q_log_uniform_values q: 8 min: 32 max: 256 nested_category_1: parameters: momentum: distribution: uniform min: 0.0 max: 0.9 weight_decay: values: [0.0001, 0.0005, 0.001] nested_category_2: parameters: momentum: distribution: uniform min: 0.0 max: 0.9 weight_decay: values: [0.1, 0.2, 0.3] nested_category_3: parameters: momentum: distribution: uniform min: 0.5 max: 0.7 weight_decay: values: [0.2, 0.3, 0.4] ``` ```python theme={null} { "program": "sweep_nest.py", "name": "nested_sweep", "method": "random", "metric": { "name": "loss", "goal": "minimize" }, "parameters": { "optimizer": { "values": ["adam", "sgd"] }, "fc_layer_size": { "values": [128, 256, 512] }, "dropout": { "values": [0.3, 0.4, 0.5] }, "epochs": { "value": 1 }, "learning_rate": { "distribution": "uniform", "min": 0, "max": 0.1 }, "batch_size": { "distribution": "q_log_uniform_values", "q": 8, "min": 32, "max": 256 }, "nested_category_1": { "parameters": { "momentum": { "distribution": "uniform", "min": 0.0, "max": 0.9 }, "weight_decay": { "values": [0.0001, 0.0005, 0.001] } } }, "nested_category_2": { "parameters": { "momentum": { "distribution": "uniform", "min": 0.0, "max": 0.9 }, "weight_decay": { "values": [0.1, 0.2, 0.3] } } }, "nested_category_3": { "parameters": { "momentum": { "distribution": "uniform", "min": 0.5, "max": 0.7 }, "weight_decay": { "values": [0.2, 0.3, 0.4] } } } } } ``` Nested parameters defined in sweep configuration overwrite keys specified in a W\&B run configuration. As an example, suppose you have `train.py` script that initializes a run with a nested default: ```python theme={null} def main(): with wandb.init(config={"nested_param": {"manual_key": 1}}) as run: # Your training code here ``` Your sweep configuration defines nested parameters under a top-level `"parameters"` key: ```python theme={null} sweep_configuration = { "method": "grid", "metric": {"name": "score", "goal": "minimize"}, "parameters": { "top_level_param": {"value": 0}, "nested_param": { "parameters": { "learning_rate": {"value": 0.01}, "double_nested_param": { "parameters": {"x": {"value": 0.9}, "y": {"value": 0.8}} }, } }, }, } sweep_id = wandb.sweep(sweep=sweep_configuration, project="") wandb.agent(sweep_id, function=main, count=4) ``` During a sweep run, `run.config["nested_param"]` reflects the subtree defined by the sweep (`learning_rate`, `double_nested_param`) config and does not include `manual_key` defined in `wandb.init(config=...)`. ## Sweep configuration template The following template shows how you can configure parameters and specify search constraints. Replace `hyperparameter_name` with the name of your hyperparameter and any values enclosed in `<>`. ```yaml title="config.yaml" theme={null} program: method: parameter: hyperparameter_name0: value: 0 hyperparameter_name1: values: [0, 0, 0] hyperparameter_name: distribution: value: hyperparameter_name2: distribution: min: max: q: hyperparameter_name3: distribution: values: - - - early_terminate: type: hyperband s: 0 eta: 0 max_iter: 0 command: - ${Command macro} - ${Command macro} - ${Command macro} - ${Command macro} ``` To express a numeric value using scientific notation, add the YAML `!!float` operator, which casts the value to a floating point number. For example, `min: !!float 1e-5`. See [Command example](#command-example). ## Sweep configuration examples ```yaml title="config.yaml" theme={null} program: train.py method: random metric: goal: minimize name: loss parameters: batch_size: distribution: q_log_uniform_values max: 256 min: 32 q: 8 dropout: values: [0.3, 0.4, 0.5] epochs: value: 1 fc_layer_size: values: [128, 256, 512] learning_rate: distribution: uniform max: 0.1 min: 0 optimizer: values: ["adam", "sgd"] ``` ```python title="train.py" theme={null} sweep_config = { "method": "random", "metric": {"goal": "minimize", "name": "loss"}, "parameters": { "batch_size": { "distribution": "q_log_uniform_values", "max": 256, "min": 32, "q": 8, }, "dropout": {"values": [0.3, 0.4, 0.5]}, "epochs": {"value": 1}, "fc_layer_size": {"values": [128, 256, 512]}, "learning_rate": {"distribution": "uniform", "max": 0.1, "min": 0}, "optimizer": {"values": ["adam", "sgd"]}, }, } ``` ### Bayes hyperband example ```yaml theme={null} program: train.py method: bayes metric: goal: minimize name: val_loss parameters: dropout: values: [0.15, 0.2, 0.25, 0.3, 0.4] hidden_layer_size: values: [96, 128, 148] layer_1_size: values: [10, 12, 14, 16, 18, 20] layer_2_size: values: [24, 28, 32, 36, 40, 44] learn_rate: values: [0.001, 0.01, 0.003] decay: values: [1e-5, 1e-6, 1e-7] momentum: values: [0.8, 0.9, 0.95] epochs: value: 27 early_terminate: type: hyperband s: 2 eta: 3 max_iter: 27 ``` The following tabs show how to specify either a minimum or maximum number of iterations for `early_terminate`: The brackets for this example are: `[3, 3*eta, 3*eta*eta, 3*eta*eta*eta]`, which equals `[3, 9, 27, 81]`. ```yaml theme={null} early_terminate: type: hyperband min_iter: 3 ``` The brackets for this example are `[27/eta, 27/eta/eta]`, which equals `[9, 3]`. ```yaml theme={null} early_terminate: type: hyperband max_iter: 27 s: 2 ``` ### Macro and custom command arguments example For more complex command line arguments, you can use macros to pass environment variables, the Python interpreter, and additional arguments. [W\&B supports pre defined macros](./sweep-config-keys#command-macros) and custom command line arguments that you can specify in your sweep configuration. For example, the following sweep configuration (`sweep.yaml`) defines a command that runs a Python script (`run.py`) with the `${env}`, `${interpreter}`, and `${program}` macros replaced with the appropriate values when the sweep runs. The `--batch_size=${batch_size}`, `--test=True`, and `--optimizer=${optimizer}` arguments use custom macros to pass the values of the `batch_size`, `test`, and `optimizer` parameters defined in the sweep configuration. ```yaml title="sweep.yaml" theme={null} program: run.py method: random metric: name: validation_loss parameters: learning_rate: min: 0.0001 max: 0.1 command: - ${env} - ${interpreter} - ${program} - "--batch_size=${batch_size}" - "--optimizer=${optimizer}" - "--test=True" ``` The associated Python script (`run.py`) can then parse these command line arguments using the `argparse` module. ```python title="run.py" theme={null} # run.py import wandb import argparse parser = argparse.ArgumentParser() parser.add_argument('--batch_size', type=int) parser.add_argument('--optimizer', type=str, choices=['adam', 'sgd'], required=True) parser.add_argument('--test', type=str2bool, default=False) args = parser.parse_args() # Initialize a W&B Run with wandb.init('test-project') as run: run.log({'validation_loss':1}) ``` See the [Command macros](./sweep-config-keys#command-macros) section in [Sweep configuration options](./sweep-config-keys) for a list of pre-defined macros you can use in your sweep configuration. #### Boolean arguments The `argparse` module does not support boolean arguments by default. To define a boolean argument, you can use the [`action`](https://docs.python.org/3/library/argparse.html#action) parameter or use a custom function to convert the string representation of the boolean value to a boolean type. As an example, you can use the following code snippet to define a boolean argument. Pass `store_true` or `store_false` as an argument to `ArgumentParser`. ```python theme={null} import wandb import argparse parser = argparse.ArgumentParser() parser.add_argument('--test', action='store_true') args = parser.parse_args() args.test # This will be True if --test is passed, otherwise False ``` You can also define a custom function to convert the string representation of the boolean value to a boolean type. For example, the following code snippet defines the `str2bool` function, which converts a string to a boolean value. ```python theme={null} def str2bool(v: str) -> bool: """Convert a string to a boolean. This is required because argparse does not support boolean arguments by default. """ if isinstance(v, bool): return v return v.lower() in ('yes', 'true', 't', '1') ``` # Tutorial: Define, initialize, and run a sweep Source: https://docs.wandb.ai/models/sweeps/walkthrough Sweeps quickstart shows how to define, initialize, and run a sweep. There are four main steps This page shows how to define, initialize, and run a sweep. There are four main steps: 1. [Set up your training code](#set-up-your-training-code) 2. [Define the search space with a sweep configuration](#define-the-search-space-with-a-sweep-configuration) 3. [Initialize the sweep](#initialize-the-sweep) 4. [Start the sweep agent](#start-the-sweep) Copy and paste the following code into a Jupyter Notebook or Python script: ```python theme={null} # Import the W&B Python Library and log into W&B import wandb # 1: Define objective/training function def objective(config): score = config.x**3 + config.y return score def main(): with wandb.init(project="my-first-sweep") as run: score = objective(run.config) run.log({"score": score}) # 2: Define the search space sweep_configuration = { "method": "random", "metric": {"goal": "minimize", "name": "score"}, "parameters": { "x": {"max": 0.1, "min": 0.01}, "y": {"values": [1, 3, 7]}, }, } # 3: Start the sweep sweep_id = wandb.sweep(sweep=sweep_configuration, project="my-first-sweep") wandb.agent(sweep_id, function=main, count=10) ``` The following sections break down and explains each step in the code sample. ## Set up your training code Define a training function that takes in hyperparameter values from `wandb.Run.config` and uses them to train a model and return metrics. Optionally provide the name of the project where you want the output of the W\&B Run to be stored (project parameter in [`wandb.init()`](/models/ref/python/functions/init)). If the project is not specified, the run is put in an "Uncategorized" project. Both the sweep and the run must be in the same project. Therefore, the name you provide when you initialize W\&B must match the name of the project you provide when you initialize a sweep. ```python theme={null} # 1: Define objective/training function def objective(config): score = config.x**3 + config.y return score def main(): with wandb.init(project="my-first-sweep") as run: score = objective(run.config) run.log({"score": score}) ``` ## Define the search space with a sweep configuration Specify the hyperparameters to sweep in a dictionary. For configuration options, see [Define sweep configuration](/models/sweeps/define-sweep-configuration/). The following example demonstrates a sweep configuration that uses a random search (`'method':'random'`). The sweep will randomly select a random set of values listed in the configuration for the batch size, epoch, and the learning rate. W\&B minimizes the metric specified in the `metric` key when `"goal": "minimize"` is associated with it. In this case, W\&B will optimize for minimizing the metric `score` (`"name": "score"`). ```python theme={null} # 2: Define the search space sweep_configuration = { "method": "random", "metric": {"goal": "minimize", "name": "score"}, "parameters": { "x": {"max": 0.1, "min": 0.01}, "y": {"values": [1, 3, 7]}, }, } ``` ## Initialize the Sweep W\&B uses a *Sweep Controller* to manage sweeps on the cloud (standard), locally (local) across one or more machines. For more information about Sweep Controllers, see [Search and stop algorithms locally](./local-controller). A sweep identification number is returned when you initialize a sweep: ```python theme={null} sweep_id = wandb.sweep(sweep=sweep_configuration, project="my-first-sweep") ``` For more information about initializing sweeps, see [Initialize sweeps](./initialize-sweeps). ## Start the Sweep Use the [`wandb.agent()`](/models/ref/python/functions/agent) API call to start a sweep. ```python theme={null} wandb.agent(sweep_id, function=main, count=10) ``` **Multiprocessing** You must wrap your `wandb.agent()` and `wandb.sweep()` calls with `if __name__ == '__main__':` if you use Python standard library's `multiprocessing` or PyTorch's `pytorch.multiprocessing` package. For example: ```python theme={null} if __name__ == '__main__': wandb.agent(sweep_id="", function="", count="") ``` Wrapping your code with this convention ensures that it is only executed when the script is run directly, and not when it is imported as a module in a worker process. See [Python standard library `multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods) or [PyTorch `multiprocessing`](https://docs.pytorch.org/docs/stable/notes/multiprocessing.html#asynchronous-multiprocess-training-e-g-hogwild) for more information about multiprocessing. See [https://realpython.com/if-name-main-python/](https://realpython.com/if-name-main-python/) for information about the `if __name__ == '__main__':` convention. ## Visualize results (optional) Open your project to see your live results in the W\&B App dashboard. With just a few clicks, construct rich, interactive charts like [parallel coordinates plots](/models/app/features/panels/parallel-coordinates/),[ parameter importance analyzes](/models/app/features/panels/parameter-importance/), and [additional chart types](/models/app/features/panels/). Sweeps Dashboard example For more information about how to visualize results, see [Visualize sweep results](./visualize-sweep-results). For an example dashboard, see this sample [Sweeps Project](https://wandb.ai/anmolmann/pytorch-cnn-fashion/sweeps/pmqye6u3). ## Stop the agent (optional) In the terminal, press `Ctrl+C` to stop the current run. Press it again to terminate the agent. # Experiments overview Source: https://docs.wandb.ai/models/track Track machine learning experiments with W&B to log metrics, hyperparameters, system metrics, and model artifacts. Track machine learning experiments with a few lines of code. You can then review the results in an [interactive dashboard](/models/track/workspaces/) or export your data to Python for programmatic access using our [Public API](/models/ref/python/public-api/). Utilize W\&B Integrations if you use popular frameworks such as [Keras](/models/integrations/keras). See [W\&B Integrations](/models/integrations) for a full list of integrations and information on how to add W\&B to your code. Experiments dashboard The image above shows an example dashboard where you can view and compare metrics across multiple [runs](/models/runs/). ## How it works Track a machine learning experiment with a few lines of code: 1. Create a [W\&B Run](/models/runs/). 2. Store a dictionary of hyperparameters, such as learning rate or model type, into your configuration ([`wandb.Run.config`](/models/track/config/)). 3. Log metrics ([`wandb.Run.log()`](/models/track/log/)) over time in a training loop, such as accuracy and loss. 4. Save outputs of a run, like the model weights or a table of predictions. The following code demonstrates a common W\&B experiment tracking workflow: ```python theme={null} # Start a run. # # When this block exits, it waits for logged data to finish uploading. # If an exception is raised, the run is marked failed. with wandb.init(entity="", project="my-project-name") as run: # Save mode inputs and hyperparameters. run.config.learning_rate = 0.01 # Run your experiment code. for epoch in range(num_epochs): # Do some training... # Log metrics over time to visualize model performance. run.log({"loss": loss}) # Upload model outputs as artifacts. run.log_artifact(model) ``` ## Get started Depending on your use case, explore the following resources to get started with W\&B Experiments: * Read the [W\&B Quickstart](/models/quickstart/) for a step-by-step outline of the W\&B Python SDK commands you could use to create, track, and use a dataset artifact. * Explore this chapter to learn how to: * Create an experiment * Configure experiments * Log data from experiments * View results from experiments * Explore the [W\&B Python Library](/models/ref/python/) within the [W\&B API Reference Guide](/models/ref/python/). ## Best practices and tips For best practices and tips for experiments and logging, see [Best Practices: Experiments and Logging](https://wandb.ai/wandb/pytorch-lightning-e2e/reports/W-B-Best-Practices-Guide--VmlldzozNTU1ODY1#w\&b-experiments-and-logging). # Configure experiments Source: https://docs.wandb.ai/models/track/config Use a dictionary-like object to save your experiment configuration Use the `config` property of a run to save your training configuration: * hyperparameter * input settings such as the dataset name or model type * any other independent variables for your experiments. The `wandb.Run.config` property makes it easy to analyze your experiments and reproduce your work in the future. You can group by configuration values in the W\&B App, compare the configurations of different W\&B runs, and evaluate how each training configuration affects the output. The `config` property is a dictionary-like object that can be composed from multiple dictionary-like objects. To save output metrics or dependent variables like loss and accuracy, use `wandb.Run.log()` instead of `wandb.Run.config`. ## Set up an experiment configuration Configurations are typically defined in the beginning of a training script. Machine learning workflows may vary, however, so you are not required to define a configuration at the beginning of your training script. Use dashes (`-`) or underscores (`_`) instead of periods (`.`) in your config variable names. Use the dictionary access syntax `["key"]["value"]` instead of the attribute access syntax `config.key.value` if your script accesses `wandb.Run.config` keys below the root. The following sections outline different common scenarios of how to define your experiments configuration. ### Set the configuration at initialization Pass a dictionary at the beginning of your script when you call the `wandb.init()` API to generate a background process to sync and log data as a W\&B Run. The following code snippet demonstrates how to define a Python dictionary with configuration values and how to pass that dictionary as an argument when you initialize a W\&B Run. ```python theme={null} import wandb # Define a config dictionary object config = { "hidden_layer_sizes": [32, 64], "kernel_sizes": [3], "activation": "ReLU", "pool_sizes": [2], "dropout": 0.5, "num_classes": 10, } # Pass the config dictionary when you initialize W&B with wandb.init(project="config_example", config=config) as run: ... ``` If you pass a nested dictionary as the `config`, W\&B flattens the names using dots. Access the values from the dictionary similarly to how you access other dictionaries in Python: ```python theme={null} # Access values with the key as the index value hidden_layer_sizes = run.config["hidden_layer_sizes"] kernel_sizes = run.config["kernel_sizes"] activation = run.config["activation"] # Python dictionary get() method hidden_layer_sizes = run.config.get("hidden_layer_sizes") kernel_sizes = run.config.get("kernel_sizes") activation = run.config.get("activation") ``` Throughout the Developer Guide and examples we copy the configuration values into separate variables. This step is optional. It is done for readability. ### Set the configuration with argparse You can set your configuration with an argparse object. [argparse](https://docs.python.org/3/library/argparse.html), short for argument parser, is a standard library module in Python 3.2 and above that makes it easy to write scripts that take advantage of all the flexibility and power of command line arguments. This is useful for tracking results from scripts that are launched from the command line. The following Python script demonstrates how to define a parser object to define and set your experiment config. The functions `train_one_epoch` and `evaluate_one_epoch` are provided to simulate a training loop for the purpose of this demonstration: ```python theme={null} # config_experiment.py import argparse import random import numpy as np import wandb # Training and evaluation demo code def train_one_epoch(epoch, lr, bs): acc = 0.25 + ((epoch / 30) + (random.random() / 10)) loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5)) return acc, loss def evaluate_one_epoch(epoch): acc = 0.1 + ((epoch / 20) + (random.random() / 10)) loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6)) return acc, loss def main(args): # Start a W&B Run with wandb.init(project="config_example", config=args) as run: # Access values from config dictionary and store them # into variables for readability lr = run.config["learning_rate"] bs = run.config["batch_size"] epochs = run.config["epochs"] # Simulate training and logging values to W&B for epoch in np.arange(1, epochs): train_acc, train_loss = train_one_epoch(epoch, lr, bs) val_acc, val_loss = evaluate_one_epoch(epoch) run.log( { "epoch": epoch, "train_acc": train_acc, "train_loss": train_loss, "val_acc": val_acc, "val_loss": val_loss, } ) if __name__ == "__main__": parser = argparse.ArgumentParser( formatter_class=argparse.ArgumentDefaultsHelpFormatter ) parser.add_argument("-b", "--batch_size", type=int, default=32, help="Batch size") parser.add_argument( "-e", "--epochs", type=int, default=50, help="Number of training epochs" ) parser.add_argument( "-lr", "--learning_rate", type=int, default=0.001, help="Learning rate" ) args = parser.parse_args() main(args) ``` ### Set the configuration throughout your script You can add more parameters to your config object throughout your script. The following code snippet demonstrates how to add new key-value pairs to your config object: ```python theme={null} import wandb # Define a config dictionary object config = { "hidden_layer_sizes": [32, 64], "kernel_sizes": [3], "activation": "ReLU", "pool_sizes": [2], "dropout": 0.5, "num_classes": 10, } # Pass the config dictionary when you initialize W&B with wandb.init(project="config_example", config=config) as run: # Update config after you initialize W&B run.config["dropout"] = 0.2 run.config.epochs = 4 run.config["batch_size"] = 32 ``` You can update multiple values at a time: ```python theme={null} run.config.update({"lr": 0.1, "channels": 16}) ``` ### Set the configuration after your run finishes Use the [W\&B Public API](/models/ref/python/public-api/) to update a completed run's config. You must provide the API with your entity, project name and the run's ID. You can find these details in the Run object or in the [W\&B App](/models/track/workspaces/): ```python theme={null} with wandb.init() as run: ... # Find the following values from the Run object if it was initiated from the # current script or notebook, or you can copy them from the W&B App UI. username = run.entity project = run.project run_id = run.id # Note that api.run() returns a different type of object than wandb.init(). api = wandb.Api() api_run = api.run(f"{username}/{project}/{run_id}") api_run.config["bar"] = 32 api_run.update() ``` ## Highlight config values Pin config keys to the **References** section at the top of a run's overview page. Use [`wandb.Run.pin_config_keys`](/models/ref/python/experiments/run#method-run-pin_config_keys) to pin one or more config keys with the Python SDK. For example, if you use a Grafana dashboard to monitor training runs, add the dashboard URL to your config and pin the `grafana_url` key: ```python theme={null} config = { "hidden_layer_sizes": [32, 64], "kernel_sizes": [3], "activation": "ReLU", "pool_sizes": [2], "dropout": 0.5, "num_classes": 10, "grafana_url": "[Grafana dashboard](https://my-grafana-instance.com/)" } with wandb.init(config=config) as run: # Add the "grafana_url" config key to the References section. run.pin_config_keys(["grafana_url"]) ``` ## `absl.FLAGS` You can also pass in [`absl` flags](https://abseil.io/docs/python/guides/flags). ```python theme={null} flags.DEFINE_string("model", None, "model to run") # name, default, help run.config.update(flags.FLAGS) # adds absl flags to config ``` ## File-Based Configs If you place a file named `config-defaults.yaml` in the same directory as your run script, the run automatically picks up the key-value pairs defined in the file and passes them to `wandb.Run.config`. The following code snippet shows a sample `config-defaults.yaml` YAML file: ```yaml theme={null} batch_size: desc: Size of each mini-batch value: 32 ``` You can override the default values automatically loaded from `config-defaults.yaml` by setting updated values in the `config` argument of `wandb.init()`. For example: ```python theme={null} import wandb # Override config-defaults.yaml by passing custom values with wandb.init(config={"epochs": 200, "batch_size": 64}) as run: ... ``` To load a configuration file other than `config-defaults.yaml`, use the `--configs command-line` argument and specify the path to the file: ```bash theme={null} python train.py --configs other-config.yaml ``` ### Example use case for file-based configs Suppose you have a YAML file with some metadata for the run, and then a dictionary of hyperparameters in your Python script. You can save both in the nested `config` object: ```python theme={null} hyperparameter_defaults = dict( dropout=0.5, batch_size=100, learning_rate=0.001, ) config_dictionary = dict( yaml=my_yaml_file, params=hyperparameter_defaults, ) with wandb.init(config=config_dictionary) as run: ... ``` ## View config values in the W\&B App View your config values in the **Config** section of a run's Overview tab. The config values are also available in the **References** section if you pin them. 1. Navigate to your project in the W\&B App. 2. Click the run you want to view the config values for. 3. Select the **Overview** tab. 4. Scroll to the **Config** section. 5. (Optional) Click on **View raw data** to view config values in JSON format. The raw JSON format is useful when you write expressions to create or transform line plots from config values, logged metrics, or summary values. See [Expressions](/models/app/features/panels/line-plot/reference#expressions) for more details. ## TensorFlow v1 flags You can pass TensorFlow flags into the `wandb.Run.config` object directly. ```python theme={null} with wandb.init() as run: run.config.epochs = 4 flags = tf.app.flags flags.DEFINE_string("data_dir", "/tmp/data") flags.DEFINE_integer("batch_size", 128, "Batch size.") run.config.update(flags.FLAGS) # add tensorflow flags as config ``` # Create an experiment Source: https://docs.wandb.ai/models/track/create-an-experiment Create a W&B Experiment using the Python SDK to track run initialization, hyperparameters, and metric logging. Use the W\&B Python SDK to track machine learning experiments. You can then review the results in an interactive dashboard or export your data to Python for programmatic access with the [W\&B Public API](/models/ref/python/public-api/). This guide describes how to use W\&B building blocks to create a W\&B Experiment. ## How to create a W\&B Experiment Create a W\&B Experiment in four steps: 1. [Initialize a W\&B Run](#initialize-a-wb-run) 2. [Capture a dictionary of hyperparameters](#capture-a-dictionary-of-hyperparameters) 3. [Log metrics inside your training loop](#log-metrics-inside-your-training-loop) 4. [Log an artifact to W\&B](#log-an-artifact-to-wb) ### Initialize a W\&B run Use [`wandb.init()`](/models/ref/python/functions/init) to create a W\&B Run. The following snippet creates a run in a W\&B project named `“cat-classification”` with the description `“My first experiment”` to help identify this run. Tags `“baseline”` and `“paper1”` are included to remind us that this run is a baseline experiment intended for a future paper publication. ```python theme={null} import wandb with wandb.init( project="cat-classification", notes="My first experiment", tags=["baseline", "paper1"], ) as run: ... ``` `wandb.init()` returns a [Run](/models/ref/python/experiments/run) object. Note: Runs are added to pre-existing projects if that project already exists when you call `wandb.init()`. For example, if you already have a project called `“cat-classification”`, that project will continue to exist and not be deleted. Instead, a new run is added to that project. ### Capture a dictionary of hyperparameters Save a dictionary of hyperparameters such as learning rate or model type. The model settings you capture in config are useful later to organize and query your results. ```python theme={null} with wandb.init( ..., config={"epochs": 100, "learning_rate": 0.001, "batch_size": 128}, ) as run: ... ``` For more information on how to configure an experiment, see [Configure Experiments](./config). ### Log metrics inside your training loop Call [`run.log()`](/models/ref/python/experiments/run/#method-runlog) to log metrics about each training step such as accuracy and loss. ```python theme={null} model, dataloader = get_model(), get_data() for epoch in range(run.config.epochs): for batch in dataloader: loss, accuracy = model.training_step() run.log({"accuracy": accuracy, "loss": loss}) ``` For more information on different data types you can log with W\&B, see [Log Data During Experiments](/models/track/log/). ### Log an artifact to W\&B Optionally log a W\&B Artifact. Artifacts make it easy to version datasets and models. ```python theme={null} # You can save any file or even a directory. In this example, we pretend # the model has a save() method that outputs an ONNX file. model.save("path_to_model.onnx") run.log_artifact("path_to_model.onnx", name="trained-model", type="model") ``` Learn more about [Artifacts](/models/artifacts/) or about versioning models in [Registry](/models/registry/). ### Putting it all together The full script with the preceding code snippets is found below: ```python theme={null} import wandb with wandb.init( project="cat-classification", notes="", tags=["baseline", "paper1"], # Record the run's hyperparameters. config={"epochs": 100, "learning_rate": 0.001, "batch_size": 128}, ) as run: # Set up model and data. model, dataloader = get_model(), get_data() # Run your training while logging metrics to visualize model performance. for epoch in range(run.config["epochs"]): for batch in dataloader: loss, accuracy = model.training_step() run.log({"accuracy": accuracy, "loss": loss}) # Upload the trained model as an artifact. model.save("path_to_model.onnx") run.log_artifact("path_to_model.onnx", name="trained-model", type="model") ``` ## Next steps: Visualize your experiment Use the W\&B Dashboard as a central place to organize and visualize results from your machine learning models. With just a few clicks, construct rich, interactive charts like [parallel coordinates plots](/models/app/features/panels/parallel-coordinates/), [parameter importance analyses](/models/app/features/panels/parameter-importance/), and [additional chart types](/models/app/features/panels/). Quickstart Sweeps Dashboard example For more information on how to view experiments and specific runs, see [Visualize results from experiments](/models/track/workspaces/). ## Best practices The following are some suggested guidelines to consider when you create experiments: 1. **Finish your runs**: Use `wandb.init()` in a `with` statement to automatically mark the run as finished when the code completes or raises an exception. * In Jupyter notebooks, it may be more convenient to manage the Run object yourself. In this case, you can explicitly call `finish()` on the Run object to mark it complete: ```python theme={null} # In a notebook cell: run = wandb.init() # In a different cell: run.finish() ``` 2. **Config**: Track hyperparameters, architecture, dataset, and anything else you'd like to use to reproduce your model. These will show up in columns— use config columns to group, sort, and filter runs dynamically in the app. 3. **Project**: A project is a set of experiments you can compare together. Each project gets a dedicated dashboard page, and you can easily turn on and off different groups of runs to compare different model versions. 4. **Notes**: Set a quick commit message directly from your script. Edit and access notes in the Overview section of a run in the W\&B App. 5. **Tags**: Identify baseline runs and favorite runs. You can filter runs using tags. You can edit tags at a later time on the Overview section of your project's dashboard on the W\&B App. 6. **Create multiple run sets to compare experiments**: When comparing experiments, create multiple run sets to make metrics easy to compare. You can toggle run sets on or off on the same chart or group of charts. The following code snippet demonstrates how to define a W\&B Experiment using the best practices listed above: ```python theme={null} import wandb config = { "learning_rate": 0.01, "momentum": 0.2, "architecture": "CNN", "dataset_id": "cats-0192", } with wandb.init( project="detect-cats", notes="tweak baseline", tags=["baseline", "paper1"], config=config, ) as run: ... ``` For more information about available parameters when defining a W\&B Experiment, see the [`wandb.init()`](/models/ref/python/functions/init) API docs in the [API Reference Guide](/models/ref/python/). # Environment variables Source: https://docs.wandb.ai/models/track/environment-variables Configure W&B SDK behavior using environment variables for authentication, project settings, logging modes, and more. When you're running a script in an automated environment, you can control W\&B with environment variables set before the script runs or within the script. ```bash theme={null} # This is secret and shouldn't be checked into version control WANDB_API_KEY=$YOUR_API_KEY # Name and notes optional WANDB_NAME="My first run" WANDB_NOTES="Smaller learning rate, more regularization." ``` ```bash theme={null} # Only needed if you don't check in the wandb/settings file WANDB_ENTITY=$username WANDB_PROJECT=$project ``` ```python theme={null} # If you don't want your script to sync to the cloud os.environ["WANDB_MODE"] = "offline" # Add sweep ID tracking to Run objects and related classes os.environ["WANDB_SWEEP_ID"] = "b05fq58z" ``` ## Optional environment variables Use these optional environment variables to do things like set up authentication on remote machines. | Variable name | Usage | | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `WANDB_API_KEY` | Sets the authentication key associated with your account. Create an API key at [User Settings](https://wandb.ai/settings). This must be set if `wandb login` hasn't been run on the remote machine. | | `WANDB_BASE_URL` | If you're using [wandb/local](/platform/hosting/) you should set this environment variable to `http://YOUR_IP:YOUR_PORT` | | `WANDB_CACHE_DIR` | This defaults to \~/.cache/wandb, you can override this location with this environment variable | | `WANDB_CONFIG_DIR` | This defaults to \~/.config/wandb, you can override this location with this environment variable | | `WANDB_CONFIG_PATHS` | Comma separated list of yaml files to load into wandb.config. See [config](./config#file-based-configs). | | `WANDB_CONSOLE` | Set this to "off" to disable stdout / stderr logging. This defaults to "on" in environments that support it. | | `WANDB_DATA_DIR` | Where to upload staging artifacts. The default location depends on your platform, because it uses the value of `user_data_dir` from the `platformdirs` Python package. Make sure this directory exists and the running user has permission to write to it. | | `WANDB_DIR` | Where to store all generated files. If unset, defaults to the `wandb` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of downloaded artifacts, which you can set using the `WANDB_ARTIFACT_DIR` environment variable. | | `WANDB_ARTIFACT_DIR` | Where to store all downloaded artifacts. If unset, defaults to the `artifacts` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of generated metadata files, which you can set using the `WANDB_DIR` environment variable. | | `WANDB_DISABLE_GIT` | Prevent wandb from probing for a git repository and capturing the latest commit / diff. | | `WANDB_DISABLE_CODE` | Set this to true to prevent wandb from saving notebooks or git diffs. We'll still save the current commit if we're in a git repo. | | `WANDB_DOCKER` | Set this to a docker image digest to enable restoring of runs. This is set automatically with the wandb docker command. You can obtain an image digest by running `wandb docker my/image/name:tag --digest` | | `WANDB_ENTITY` | The entity associated with your run. If you have run `wandb init` in the directory of your training script, it will create a directory named *wandb* and will save a default entity which can be checked into source control. If you don't want to create that file or want to override the file you can use the environmental variable. | | `WANDB_ERROR_REPORTING` | Set this to false to prevent wandb from logging fatal errors to its error tracking system. | | `WANDB_HOST` | Set this to the hostname you want to see in the wandb interface if you don't want to use the system provided hostname | | `WANDB_IGNORE_GLOBS` | Set this to a comma separated list of file globs to ignore. These files will not be synced to the cloud. | | `WANDB_JOB_NAME` | Specify a name for any jobs created by `wandb`. | | `WANDB_JOB_TYPE` | Specify the job type, like "training" or "evaluation" to indicate different types of runs. See [grouping](/models/runs/grouping/) for more info. | | `WANDB_MODE` | If you set this to "offline" wandb will save your run metadata locally and not sync to the server. If you set this to `disabled` wandb will turn off completely. | | `WANDB_NAME` | The human-readable name of your run. If not set it will be randomly generated for you | | `WANDB_NOTEBOOK_NAME` | If you're running in jupyter you can set the name of the notebook with this variable. We attempt to auto detect this. | | `WANDB_NOTES` | Longer notes about your run. Markdown is allowed and you can edit this later in the UI. | | `WANDB_PROJECT` | The project associated with your run. This can also be set with `wandb init`, but the environmental variable will override the value. | | `WANDB_RESUME` | By default this is set to *never*. If set to *auto* wandb will automatically resume failed runs. If set to *must* forces the run to exist on startup. If you want to always generate your own unique ids, set this to *allow* and always set `WANDB_RUN_ID`. | | `WANDB_RUN_GROUP` | Specify the experiment name to automatically group runs together. See [grouping](/models/runs/grouping/) for more info. | | `WANDB_RUN_ID` | Set this to a globally unique string (per project) corresponding to a single run of your script. It must be no longer than 64 characters. All non-word characters will be converted to dashes. This can be used to resume an existing run in cases of failure. | | `WANDB_QUIET` | Set this to `true` to limit statements logged to standard output to critical statements only. If this is set all logs will be written to `$WANDB_DIR/debug.log`. | | `WANDB_SILENT` | Set this to `true` to silence wandb log statements. This is useful for scripted commands. If this is set all logs will be written to `$WANDB_DIR/debug.log`. | | `WANDB_SHOW_RUN` | Set this to `true` to automatically open a browser with the run url if your operating system supports it. | | `WANDB_SWEEP_ID` | Add sweep ID tracking to `Run` objects and related classes, and display in the UI. | | `WANDB_TAGS` | A comma separated list of tags to be applied to the run. | | `WANDB_USERNAME` | The username of a member of your team associated with the run. This can be used along with a service account API key to enable attribution of automated runs to members of your team. | | `WANDB_USER_EMAIL` | The email of a member of your team associated with the run. This can be used along with a service account API key to enable attribution of automated runs to members of your team. | ## Singularity environments If you're running containers in [Singularity](https://singularity.lbl.gov/index.html) you can pass environment variables by pre-pending the above variables with `SINGULARITYENV_`. More details about Singularity environment variables can be found [here](https://singularity.lbl.gov/docs-environment-metadata#environment). ## Running on AWS If you're running batch jobs in AWS, it's easy to authenticate your machines with your W\&B credentials. Create an API key at [User Settings](https://wandb.ai/settings), and set the `WANDB_API_KEY` environment variable in the [AWS batch job spec](https://docs.aws.amazon.com/batch/latest/userguide/job_definition_parameters.html#parameters). # Track Jupyter notebooks Source: https://docs.wandb.ai/models/track/jupyter Use W&B with Jupyter to get interactive visualizations without leaving your notebook. Use W\&B with Jupyter to get interactive visualizations without leaving your notebook. Combine custom analysis, experiments, and prototypes, all fully logged. ## Use cases for W\&B with Jupyter notebooks 1. **Iterative experimentation**: Run and re-run experiments, tweaking parameters, and have all the runs you do saved automatically to W\&B without having to take manual notes along the way. 2. **Code saving**: When reproducing a model, it's hard to know which cells in a notebook ran, and in which order. Turn on code saving on your [settings page](/platform/app/settings-page/) to save a record of cell execution for each experiment. 3. **Custom analysis**: Once runs are logged to W\&B, it's easy to get a dataframe from the API and do custom analysis, then log those results to W\&B to save and share in reports. ## Getting started in a notebook Start your notebook with the following code to install W\&B and link your account: ```notebook theme={null} !pip install wandb -qqq import wandb wandb.login() ``` Next, set up your experiment and save hyperparameters: ```python theme={null} wandb.init( project="jupyter-projo", config={ "batch_size": 128, "learning_rate": 0.01, "dataset": "CIFAR-100", }, ) ``` After running `wandb.init()` , start a new cell with `%%wandb` to see live graphs in the notebook. If you run this cell multiple times, data will be appended to the run. ```notebook theme={null} %%wandb # Your training loop here ``` Try it for yourself in this [example notebook](https://wandb.me/jupyter-interact-colab). Jupyter W&B widget ### Rendering live W\&B interfaces directly in your notebooks You can also display any existing dashboards, sweeps, or reports directly in your notebook using the `%wandb` magic: ```notebook theme={null} # Display a project workspace %wandb USERNAME/PROJECT # Display a single run %wandb USERNAME/PROJECT/runs/RUN_ID # Display a sweep %wandb USERNAME/PROJECT/sweeps/SWEEP_ID # Display a report %wandb USERNAME/PROJECT/reports/REPORT_ID # Specify the height of embedded iframe %wandb USERNAME/PROJECT -h 2048 ``` As an alternative to the `%%wandb` or `%wandb` magics, after running `wandb.init()` you can end any cell with `wandb.Run.finish()` to show in-line graphs, or call `ipython.display(...)` on any report, sweep, or run object returned from our apis. ```python theme={null} import wandb from IPython.display import display # Initialize a run run = wandb.init() # If cell outputs run.finish(), you'll see live graphs run.finish() ``` Want to know more about what you can do with W\&B? Check out our [guide to logging data and media](/models/track/log/), learn [how to integrate us with your favorite ML toolkits](/models/integrations), or just dive straight into the [reference docs](/models/ref/python/) or our [repo of examples](https://github.com/wandb/examples). ## Additional Jupyter features in W\&B 1. **Easy authentication in Colab**: When you call `wandb.init()` for the first time in a Colab, we automatically authenticate your runtime if you're currently logged in to W\&B in your browser. On the overview tab of your run page, you'll see a link to the Colab. 2. **Jupyter Magic:** Display dashboards, sweeps and reports directly in your notebooks. The `%wandb` magic accepts a path to your project, sweeps or reports and will render the W\&B interface directly in the notebook. 3. **Launch dockerized Jupyter**: Call `wandb docker --jupyter` to launch a docker container, mount your code in it, ensure Jupyter is installed, and launch on port 8888. 4. **Run cells in arbitrary order without fear**: By default, we wait until the next time `wandb.init()` is called to mark a run as `finished`. That allows you to run multiple cells (say, one to set up data, one to train, one to test) in whatever order you like and have them all log to the same run. If you turn on code saving in [User Settings](https://wandb.ai/settings), you'll also log the cells that were executed, in order and in the state in which they were run, enabling you to reproduce even the most non-linear of pipelines. To mark a run as complete manually in a Jupyter notebook, call `wandb.Run.finish()`. ```python theme={null} import wandb run = wandb.init() # training script and logging goes here run.finish() ``` # Logging at scale and performance Source: https://docs.wandb.ai/models/track/limits This page describes how logging patterns impact performance in W&B and provides guidance for scaling experiment tracking in large projects. Performance is usually influenced by a combination of: * the number of runs in a project * the number of steps in each run * the number of distinct metrics you log * how often you call `wandb.Run.log()` * how much data you send in each log call * how your workspace is configured In most cases, performance issues are caused more often by logging too many distinct metrics than by logging too many steps. ## Key terms The following terms are used throughout this page. ### Steps A **step** is a single logical row of metrics in a run. A step is finalized when `wandb.Run.log()` is called with `commit=True`, or implicitly when neither `commit` nor `step` is specified. ```python theme={null} import wandb with wandb.init() as run: run.log({"loss": 0.42}, commit=True) ``` ### Metric cardinality **Metric cardinality** is the number of distinct metric keys logged in a project, including keys in nested dictionaries. For example, the following logs 4 distinct metric keys: `a`, `b.c`, `b.d.e`, and `b.d.f`. ```python theme={null} import wandb with wandb.init() as run: run.log( { "a": 1, "b": { "c": 2, "d": { "e": 3, "f": 4, }, }, } ) ``` W\&B flattens nested dictionaries into dot-separated metric names. ### Logged points **Logged points** are the total number of metric values recorded. For example, both of the following code samples produce three logged points: ```python theme={null} import wandb with wandb.init() as run: run.log({"a": 1, "b": 2, "c": 3}) ``` ```python theme={null} import wandb with wandb.init() as run: run.log({"a": 1}) run.log({"a": 2}) run.log({"a": 3}) ``` ### Log frequency **Log frequency** is the number of `wandb.Run.log()` calls per minute. ```text theme={null} log frequency = wandb.Run.log() calls per minute ``` ### Throughput **Throughput** is the total number of logged points recorded per minute. You can think of throughput as: ```text theme={null} throughput = logged points per minute ``` Or, equivalently: ```text theme={null} throughput = logged points × log frequency ``` ## Recommendations at scale The recommendations described in this section only apply to W\&B Multi-tenant Cloud. If you use a different W\&B deployment type, check with your administrator for deployment-specific guidance or limits. The following table summarizes recommended operating ranges for large-scale logging. | Dimension | Guidance at scale | | ------------------------------ | ------------------------- | | Runs per project | 10,000 | | Steps per run | 500,000 | | Metric cardinality per project | 100,000 | | Log frequency | 1,000 rows per minute | | Throughput | 100,000 values per minute | | Video throughput | 40 MB per minute | These values are guidelines for maintaining good performance at scale. W\&B may continue to accept data beyond these recommendations, but pages can become slower to load and use. ## Throughput examples Different logging patterns can produce the same throughput. ### Scalar logging examples The values listed in the table only apply to W\&B Multi-tenant Cloud. If you use a different W\&B deployment type, check with your administrator for deployment-specific guidance or limits. | Metrics per log call | Log frequency (per minute) | Throughput (values per minute) | | -------------------- | -------------------------- | ------------------------------ | | 100 | 1,000 | 100,000 | | 1,000 | 100 | 100,000 | | 10,000 | 10 | 100,000 | | 20,000 | 5 | 100,000 | ### Video logging examples The values listed in the table only apply to W\&B Multi-tenant Cloud. If you use a different W\&B deployment type, check with your administrator for deployment-specific guidance or limits. | Video size (MB) | Log frequency (per minute) | Video throughput (MB per minute) | | --------------- | -------------------------- | -------------------------------- | | 1 | 46 | 46 | | 5 | 8 | 40 | | 10 | 4 | 40 | | 50 | 1 | 50 | | 100 | 0.3 | 30 | | 250 | 0.1 | 25 | | 500 | 0.07 | 35 | ## Logging considerations Use `wandb.Run.log()` to track experiment metrics. ### Metric cardinality Keep the total metric cardinality (number of distinct metrics) in a project within the recommended range for your workload. High metric cardinality is one of the most common causes of slow workspaces. Performance issues are often caused by logging too many distinct metrics, not by logging too many steps. Because W\&B flattens nested keys into dot-separated metric names, metric cardinality can increase more than you expect. For example, the following logs 3 distinct metric keys: `a`, `b.c`, and `b.d`. ```python theme={null} import wandb with wandb.init() as run: run.log( { "a": 1, "b": { "c": "hello", "d": [1, 2, 3], }, } ) ``` If your workspace suddenly slows down, check whether recent runs introduced a large number of new metric keys. This often appears as many plots with only one or two runs visible. If this happened unintentionally, consider deleting and recreating those runs with a smaller, more stable set of metric names. ### Value size Keep the size of a single logged value under 1 MB and the total size of a single `wandb.Run.log()` call under 25 MB. These recommendations do not apply to `wandb.Media` types such as `wandb.Image` and `wandb.Audio`, which are handled differently. ```python theme={null} import json import wandb with wandb.init(project="wide-values") as run: # Not recommended run.log({"wide_key": list(range(10000000))}) # Not recommended with open("large_file.json", "r") as f: large_data = json.load(f) run.log(large_data) ``` Large values can slow plot loading for the entire run, not just for the metric that contains the large value. W\&B still stores logged data that exceeds these recommendations, but pages may load more slowly. ### Log frequency and throughput Choose a logging frequency that matches the value of the data you are collecting. Logging too often can increase SDK overhead and make the app slower, especially when combined with high metric cardinality or large payloads. As a starting point, keep logging within these guidelines: * Log frequency: less than 1,000 `wandb.Run.log()` calls per minute * Throughput: less than 100,000 logged values per minute * Video throughput: less than 40 MB per minute Batch related metrics into the same step when possible. For example, the following code snippet logs three metrics in the same step, which is more efficient than logging them separately. ```python theme={null} import wandb with wandb.init(project="metric-frequency") as run: # Recommended: batch related scalar metrics together run.log( { "loss": 0.12, "accuracy": 0.98, "lr": 1e-4, }, commit=True, ) ``` ### Config size Keep the total size of a run config under 10 MB. Large configs can slow project workspaces and runs table operations. ```python theme={null} import json import wandb # Recommended with wandb.init( project="config-size", config={ "lr": 0.1, "batch_size": 32, "epochs": 4, }, ) as run: pass # Not recommended with wandb.init( project="config-size", config={ "large_list": list(range(10000000)), "large_string": "a" * 10000000, }, ) as run: pass # Not recommended with open("large_config.json", "r") as f: large_config = json.load(f) wandb.init(config=large_config) ``` ## Workspace performance Workspace performance depends on both the underlying project data and workspace configuration. ### Runs per project For large projects, keep the number of runs in a project under 10,000 for best performance. If your team regularly works with only a subset of runs, consider moving older or less frequently used runs into a separate archive project. See [Manage runs](/models/runs/manage-runs/). ### Panel count By default, a workspace in automatic mode creates standard panels for each logged key. In large projects, this can produce too many panels and slow the workspace. To improve performance: 1. Reset the workspace to manual mode. 2. Use [Quick add](/models/app/features/panels/#quick-add) to add only the panels you need. Deleting unused panels one at a time usually has little effect. Reset the workspace and add back only the panels you want. See [Panels](/models/app/features/panels/) for details. ### Section count Hundreds of sections in a workspace can hurt performance. Create sections based on high-level metric groupings rather than one section per metric. If you have too many sections, consider creating sections by prefix rather than suffix so that related metrics are grouped into fewer sections. Toggling section creation ### Many metrics per run When logging thousands of metrics per run, use a manual workspace so that you can choose which metrics to visualize. A focused set of panels loads faster. Metrics that are not plotted are still collected and stored. To reset a workspace to manual mode, click the workspace's **action ()** menu, then click **Reset workspace**. Resetting a workspace has no impact on stored metrics for runs. See [workspace panel management](/models/app/features/panels/). ### File count Keep the number of files uploaded for a single run under 1,000. If you need to log a large number of files, use W\&B Artifacts instead. Exceeding 1,000 files in a single run can slow run pages. ### Reports and workspaces A report is designed for communication and presentation. A workspace is designed for dense, interactive analysis across many runs and metrics. Use a workspace when you need to compare large numbers of runs or view many plots together. Use a report when you want to present curated results. ## Python script performance Logging can add overhead to your training script. The main contributors are: 1. Large payloads 2. Network speed and backend configuration 3. Very frequent calls to `wandb.Run.log()` If you call `wandb.Run.log()` too often, each call can add a small amount of latency to the training loop. Batching multiple metrics into fewer log calls usually improves performance. Is frequent logging slowing your training runs down? See [this Colab](https://wandb.me/log-hf-colab) for strategies to improve performance by adjusting your logging pattern. W\&B does not enforce hard product limits for these recommendations beyond API rate limiting. If you exceed the guidance on this page, W\&B may continue to accept your data, but the app or SDK may become slower. ## Rate limits W\&B Multi-tenant Cloud APIs use rate limits to maintain service reliability and availability. Rate limits are subject to change. If you hit a rate limit, the server returns HTTP `429 Rate limit exceeded` and includes rate-limit headers in the response. ### Rate-limit HTTP headers | Header name | Description | | --------------------- | ----------------------------------------------------------------- | | `RateLimit-Limit` | Quota available in the current time window, scaled from 0 to 1000 | | `RateLimit-Remaining` | Remaining quota in the current window, scaled from 0 to 1000 | | `RateLimit-Reset` | Number of seconds until the current quota resets | ### Metric logging API rate limits `wandb.Run.log()` sends training data to W\&B, either directly online or later through [offline syncing](/models/ref/cli/wandb-sync). Rate limits for metric logging apply at the project level and include both request rate and total request size over a rolling time window. Paid plans have higher limits than free plans. If you exceed a rate limit, the W\&B SDK automatically retries requests with backoff. In some cases, this can delay `run.finish()` until the rate-limit window resets. To reduce the chance of rate limiting: * Use the latest W\&B SDK version. * Reduce logging frequency. * Batch related metrics into fewer log calls. * Use offline logging and sync later when appropriate. ```python theme={null} import random import wandb with wandb.init(project="basic-intro") as run: for epoch in range(10): accuracy = 1 - 2 ** -epoch - random.random() / (epoch + 1) loss = 2 ** -epoch + random.random() / (epoch + 1) if epoch % 5 == 0: run.log({"acc": accuracy, "loss": loss}) ``` For manual syncing, use `wandb sync `. See [`wandb sync`](/models/ref/cli/wandb-sync). ### GraphQL API rate limits The W\&B app and the [public API](/models/ref/python/public-api/api) use GraphQL requests to query and modify data. For Multi-tenant Cloud: * unauthorized requests are rate-limited per IP address * authorized requests are rate-limited per user * some SDK requests that specify a project path can also be limited per project based on database query time Teams and Enterprise plans have higher limits than Free plans. If you are making a large number of public API requests, wait at least one second between requests when possible. If you receive HTTP `429 Rate limit exceeded` or see `RateLimit-Remaining=0`, wait for the number of seconds in `RateLimit-Reset` before retrying. ## Troubleshooting slow projects If a project or workspace feels slow, check the following first: 1. Did recent runs introduce a large number of new metric names? 2. Are you logging too frequently? 3. Are individual `run.log()` calls very large? 4. Is the workspace in automatic mode with too many panels or sections? 5. Does the project contain more runs than your team actively uses? In many cases, performance improves after reducing metric cardinality, batching log calls, and switching large workspaces to manual mode. ## Browser considerations The W\&B app can be memory-intensive and performs best in Chrome. Depending on your computer's memory, having W\&B active in 3+ tabs at once can cause performance to degrade. If you encounter unexpectedly slow performance, consider closing other tabs or applications. ## Reporting performance issues to W\&B W\&B takes performance seriously and investigates every report of lag. To expedite investigation, when reporting slow loading times consider invoking W\&B's built-in performance logger that captures key metrics and performance events. Append the URL parameter `&PERF_LOGGING` to a page that is loading slowly, then share the output of your console with your account team or Support. Adding PERF_LOGGING # Overview Source: https://docs.wandb.ai/models/track/log Keep track of metrics, videos, custom plots, and more Log a dictionary of metrics, media, or custom objects to a step with the W\&B Python SDK. W\&B collects the key-value pairs during each step and stores them in one unified dictionary each time you log data with `wandb.Run.log()`. Data logged from your script is saved locally to your machine in a directory called `wandb`, then synced to the W\&B cloud or your [private server](/platform/hosting/). Key-value pairs are stored in one unified dictionary only if you pass the same value for each step. W\&B writes all of the collected keys and values to memory if you log a different value for `step`. Each call to `wandb.Run.log()` is a new `step` by default. W\&B uses steps as the default x-axis when it creates charts and panels. You can optionally create and use a custom x-axis or capture a custom summary metric. For more information, see [Customize log axes](/models/track/log/customize-logging-axes/). Use `wandb.Run.log()` to log consecutive values for each `step`: 0, 1, 2, and so on. It is not possible to write to a specific history step. W\&B only writes to the "current" and "next" step. ## Automatically logged data W\&B automatically logs the following information during a W\&B Experiment: * **System metrics**: CPU and GPU utilization, network, etc. For the GPU, these are fetched with [`nvidia-smi`](https://developer.nvidia.com/nvidia-system-management-interface). * **Command line**: The stdout and stderr are picked up and show in the logs tab on the [run page.](/models/runs/) Turn on [Code Saving](https://wandb.me/code-save-colab) in your account's [Settings page](https://wandb.ai/settings) to log: * **Git commit**: Pick up the latest git commit and see it on the overview tab of the run page, as well as a `diff.patch` file if there are any uncommitted changes. * **Dependencies**: The `requirements.txt` file will be uploaded and shown on the files tab of the run page, along with any files you save to the `wandb` directory for the run. ## What data is logged with specific W\&B API calls? With W\&B, you can decide exactly what you want to log. The following lists some commonly logged objects: * **Datasets**: You have to specifically log images or other dataset samples for them to stream to W\&B. * **Plots**: Use `wandb.plot()` with `wandb.Run.log()` to track charts. See [Log Plots](/models/track/log/plots/) for more information. * **Tables**: Use `wandb.Table` to log data to visualize and query with W\&B. See [Log Tables](/models/track/log/log-tables/) for more information. * **PyTorch gradients**: Add `wandb.Run.watch(model)` to see gradients of the weights as histograms in the UI. * **Configuration information**: Log hyperparameters, a link to your dataset, or the name of the architecture you're using as config parameters, passed in like this: `wandb.init(config=your_config_dictionary)`. * **Metrics**: Use `wandb.Run.log()` to see metrics from your model. If you log metrics like accuracy and loss from inside your training loop, you'll get live updating graphs in the UI. ## Metric naming constraints Due to GraphQL limitations, metric names in W\&B must follow specific naming rules: * **Allowed characters**: Letters (A-Z, a-z), digits (0-9), and underscores (\_) * **Starting character**: Names must start with a letter or underscore * **Pattern**: Metric names should match `/^[_a-zA-Z][_a-zA-Z0-9]*$/` Avoid naming metrics with invalid characters (such as commas, spaces, or special symbols), which may cause problems with sorting, querying, or display in the W\&B UI. **Valid metric names:** ```python theme={null} with wandb.init() as run: run.log({"accuracy": 0.9, "val_loss": 0.1, "epoch_5": 5}) run.log({"modelAccuracy": 0.95, "learning_rate": 0.001}) ``` **Invalid metric names (avoid these):** ```python theme={null} with wandb.init() as run: run.log({"acc,val": 0.9}) # Contains comma run.log({"loss-train": 0.1}) # Contains hyphen run.log({"test acc": 0.95}) # Contains space run.log({"5_fold_cv": 0.8}) # Starts with number ``` ## Common workflows 1. **Compare the best accuracy**: To compare the best value of a metric across runs, set the summary value for that metric. By default, summary is set to the last value you logged for each key. This is useful in the table in the UI, where you can sort and filter runs based on their summary metrics, to help compare runs in a table or bar chart based on their *best* accuracy, instead of final accuracy. For example: `wandb.run.summary["best_accuracy"] = best_accuracy` 2. **View multiple metrics on one chart**: Log multiple metrics in the same call. For example: ```python theme={null} with wandb.init() as run: run.log({"acc": 0.9, "loss": 0.1}) ``` You can then plot both metrics in the UI. 3. **Customize the x-axis**: Add a custom x-axis to the same log call to visualize your metrics against a different axis in the W\&B dashboard. For example: ```python theme={null} with wandb.init() as run: run.log({'acc': 0.9, 'epoch': 3, 'batch': 117}) ``` To set the default x-axis for a given metric use [Run.define\_metric()](/models/ref/python/experiments/run#define_metric). 4. **Log rich media and charts**: `wandb.Run.log()` supports the logging of a wide variety of data types, from [media like images and videos](/models/track/log/media/) to [tables](/models/track/log/log-tables/) and [charts](/models/app/features/custom-charts/). ## Best practices and tips For best practices and tips for Experiments and logging, see [Best Practices: Experiments and Logging](https://wandb.ai/wandb/pytorch-lightning-e2e/reports/W-B-Best-Practices-Guide--VmlldzozNTU1ODY1#w\&b-experiments-and-logging). # Customize log axes Source: https://docs.wandb.ai/models/track/log/customize-logging-axes Use define_metric() to set a custom x-axis for logged metrics instead of the default W&B step counter. Set a custom x-axis when you log metrics to W\&B. By default, W\&B logs metrics as *steps*. Each step corresponds to a `wandb.Run.log()` API call. For example, the following script has a `for` loop that iterates 10 times. In each iteration, the script logs a metric called `validation_loss` and increments the step number by 1. ```python theme={null} import wandb with wandb.init() as run: # range function creates a sequence of numbers from 0 to 9 for i in range(10): log_dict = { "validation_loss": 1/(i+1) } run.log(log_dict) ``` In the project's workspace, the `validation_loss` metric is plotted against the `step` x-axis, which increments by 1 each time `wandb.Run.log()` is called. From the previous code, the x-axis shows the step numbers 0, 1, 2, ..., 9. Line plot panel that uses `step` as the x-axis. In certain situations, it makes more sense to log metrics against a different x-axis such as a logarithmic x-axis. Use the [`define_metric()`](/models/ref/python/experiments/run/#define_metric) method to use any metric you log as a custom x-axis. Specify the metric that you want to appear as the y-axis with the `name` parameter. The `step_metric` parameter specifies the metric you want to use as the x-axis. When you log a custom metric, specify a value for both the x-axis and the y-axis as key-value pairs in a dictionary. Copy and paste the following code snippet to set a custom x-axis metric. Replace the values within `<>` with your own values: ```python theme={null} import wandb custom_step = "" # Name of custom x-axis metric_name = "" # Name of y-axis metric with wandb.init() as run: # Specify the step metric (x-axis) and the metric to log against it (y-axis) run.define_metric(step_metric = custom_step, name = metric_name) for i in range(10): log_dict = { custom_step : int, # Value of x-axis metric_name : int, # Value of y-axis } run.log(log_dict) ``` As an example, the following code snippet creates a custom x-axis called `x_axis_squared`. The value of the custom x-axis is the square of the for loop index `i` (`i**2`). The y-axis consists of mock values for validation loss (`"validation_loss"`) using Python's built-in `random` module: ```python theme={null} import wandb import random with wandb.init() as run: run.define_metric(step_metric = "x_axis_squared", name = "validation_loss") for i in range(10): log_dict = { "x_axis_squared": i**2, "validation_loss": random.random(), } run.log(log_dict) ``` The following image shows the resulting plot in the W\&B App UI. The `validation_loss` metric is plotted against the custom x-axis `x_axis_squared`, which is the square of the for loop index `i`. Note that the x-axis values are `0, 1, 4, 9, 16, 25, 36, 49, 64, 81`, which correspond to the squares of `0, 1, 2, ..., 9` respectively. Line plot panel that uses a custom x axis. Values are logged to W&B as the square of the loop number. You can set a custom x-axis for multiple metrics using `globs` with string prefixes. As an example, the following code snippet plots logged metrics with the prefix `train/*` to the x-axis `train/step`: ```python theme={null} import wandb with wandb.init() as run: # set all other train/ metrics to use this step run.define_metric("train/*", step_metric="train/step") for i in range(10): log_dict = { "train/step": 2**i, # exponential growth w/ internal W&B step "train/loss": 1 / (i + 1), # x-axis is train/step "train/accuracy": 1 - (1 / (1 + i)), # x-axis is train/step "val/loss": 1 / (1 + i), # x-axis is internal wandb step } run.log(log_dict) ``` # Log distributed training experiments Source: https://docs.wandb.ai/models/track/log/distributed-training Use W&B to log distributed training experiments with multiple GPUs. During a distributed training experiment, you train a model using multiple machines or clients in parallel. W\&B can help you track distributed training experiments. Based on your use case, track distributed training experiments using one of the following approaches: * **Track a single process**: Track a rank 0 process (also known as a "leader" or "coordinator") with W\&B. This is a common solution for logging distributed training experiments with the [PyTorch Distributed Data Parallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel) (DDP) Class. * **Track multiple processes**: For multiple processes, you can either: * Track each process separately using one run per process. You can optionally group them together in the W\&B App UI. * Track all processes to a single run. **Concurrent connections** Each concurrent connection takes compute, memory, and network resources. Even empty client connections that don't log metrics regularly push system metric updates, leading to slower performance when loading charts. W\&B recommends that you limit the maximum number of concurrent client connections as appropriate for your workload and that you monitor resource usage over time. W\&B has tested with a hard limit of 300 concurrent client connections in **Dedicated Cloud**. In **Multi-tenant Cloud** organizations, client connections for distributed training are subject to the same [rate limits](/models/track/limits#rate-limits) as regular training runs. Users on [Teams and Enterprise plans](https://wandb.ai/site/pricing) receive higher rate limits than those on the Free plan. ## Track a single process This section describes how to track values and metrics available to your rank 0 process. Use this approach to track only metrics that are available from a single process. Typical metrics include GPU/CPU utilization, behavior on a shared validation set, gradients and parameters, and loss values on representative data examples. Within the rank 0 process, initialize a W\&B run with [`wandb.init()`](/models/ref/python/functions/init) and log experiments ([`wandb.Run.log()`](/models/ref/python/experiments/run/#method-runlog)) to that run. The following [sample Python script (`log-ddp.py`)](https://github.com/wandb/examples/blob/master/examples/pytorch/pytorch-ddp/log-ddp.py) demonstrates one way to track metrics on two GPUs on a single machine using PyTorch DDP. [PyTorch DDP](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html) (`DistributedDataParallel` in`torch.nn`) is a popular library for distributed training. The basic principles apply to any distributed training setup, but the implementation may differ. The Python script: 1. Starts multiple processes with `torch.distributed.launch`. 2. Checks the rank with the `--local_rank` command line argument. 3. If the rank is set to 0, sets up `wandb` logging conditionally in the [`train()`](https://github.com/wandb/examples/blob/master/examples/pytorch/pytorch-ddp/log-ddp.py#L24) function. ```python theme={null} if __name__ == "__main__": # Get args args = parse_args() if args.local_rank == 0: # only on main process # Initialize wandb run run = wandb.init( entity=args.entity, project=args.project, ) # Train model with DDP train(args, run) else: train(args) ``` Explore an [example dashboard showing metrics tracked from a single process](https://wandb.ai/ayush-thakur/DDP/runs/1s56u3hc/system). The dashboard displays system metrics for both GPUs, such as temperature and utilization. GPU metrics dashboard However, the loss values as a function epoch and batch size were only logged from a single GPU. Loss function plots ## Track multiple processes Track multiple processes with W\&B with one of the following approaches: * [Tracking each process separately](/models/track/log/distributed-training/#track-each-process-separately) by creating a run for each process. * [Tracking all processes to a single run](/models/track/log/distributed-training/#track-all-processes-to-a-single-run). ### Track each process separately This section describes how to track each process separately by creating a run for each process. Within each run you log metrics, artifacts, and forth to their respective run. Call `wandb.Run.finish()` at the end of training, to mark that the run has completed so that all processes exit properly. You might find it difficult to keep track of runs across multiple experiments. To mitigate this, provide a value to the `group` parameter when you initialize W\&B (`wandb.init(group='group-name')`) to keep track of which run belongs to a given experiment. For more information about how to keep track of training and evaluation W\&B Runs in experiments, see [Group Runs](/models/runs/grouping/). **Use this approach if you want to track metrics from individual processes**. Typical examples include the data and predictions on each node (for debugging data distribution) and metrics on individual batches outside of the main node. This approach is not necessary to get system metrics from all nodes nor to get summary statistics available on the main node. The following Python code snippet demonstrates how to set the group parameter when you initialize W\&B: ```python theme={null} if __name__ == "__main__": # Get args args = parse_args() # Initialize run run = wandb.init( entity=args.entity, project=args.project, group="DDP", # all runs for the experiment in one group ) # Train model with DDP train(args, run) run.finish() # mark the run as finished ``` Explore the W\&B App UI to view an [example dashboard](https://wandb.ai/ayush-thakur/DDP?workspace=user-noahluna) of metrics tracked from multiple processes. Note that there are two W\&B Runs grouped together in the left sidebar. Click on a group to view the dedicated group page for the experiment. The dedicated group page displays metrics from each process separately. Grouped distributed runs The preceding image demonstrates the W\&B App UI dashboard. On the sidebar we see two experiments. One labeled 'null' and a second (bound by a yellow box) called 'DPP'. If you expand the group (select the Group dropdown) you will see the W\&B Runs that are associated to that experiment. ### Organize distributed runs Set the `job_type` parameter when you initialize W\&B (`wandb.init(job_type='type-name')`) to categorize your nodes based on their function. For example, you might have a main coordinating node and several reporting worker nodes. You can set `job_type` to `main` for the main coordinating node and `worker` for the reporting worker nodes: ```python theme={null} # Main coordinating node with wandb.init(project="", job_type="main", group="experiment_1") as run: # Training code # Reporting worker nodes with wandb.init(project="", job_type="worker", group="experiment_1") as run: # Training code ``` Once you have set the `job_type` for your nodes, you can create [saved views](/models/track/workspaces/#create-a-new-saved-workspace-view) in your workspace to organize your runs. Click the **action ()** menu at the top right and click **Save as new view**. For example, you could create the following saved views: * **Default view**: Filter out worker nodes to reduce noise * Click **Filter**, then set **Job Type** to `worker`. * Shows only your reporting nodes * **Debug view**: Focus on worker nodes for troubleshooting * Click **Filter**, then set **Job Type** `==` `worker` and set **State** to `IN` `crashed`. * Shows only worker nodes that have crashed or are in error states * **All nodes view**: See everything together * No filter * Useful for comprehensive monitoring To open a saved view, click **Workspaces** in the project sidebar, then click the menu. Workspaces appear at the top of the list and saved views appear at the bottom. ### Track all processes to a single run Parameters prefixed by `x_` (such as `x_label`) are in public preview. Create a [GitHub issue in the W\&B repository](https://github.com/wandb/wandb) to provide feedback. **Requirements** To track multiple processes to a single run, you must have: * W\&B Python SDK version `v0.19.9` or newer. * W\&B Server v0.68 or newer. In this approach you use a primary node and one or more worker nodes. Within the primary node you initialize a W\&B run. For each worker node, initialize a run using the run ID used by the primary node. During training each worker node logs to the same run ID as the primary node. W\&B aggregates metrics from all nodes and displays them in the W\&B App UI. Within the primary node, initialize a W\&B run with [`wandb.init()`](/models/ref/python/functions/init). Pass in a `wandb.Settings` object to the `settings` parameter (`wandb.init(settings=wandb.Settings()`) with the following: 1. The `mode` parameter set to `"shared"` to enable shared mode. 2. A unique label for [`x_label`](https://github.com/wandb/wandb/blob/main/wandb/sdk/wandb_settings.py#L638). You use the value you specify for `x_label` to identify which node the data is coming from in logs and system metrics in the W\&B App UI. If left unspecified, W\&B creates a label for you using the hostname and a random hash. 3. Set the [`x_primary`](https://github.com/wandb/wandb/blob/main/wandb/sdk/wandb_settings.py#L660) parameter to `True` to indicate that this is the primary node. 4. Optionally provide a list of GPU indexes (\[0,1,2]) to `x_stats_gpu_device_ids` to specify which GPUs W\&B tracks metrics for. If you do not provide a list, W\&B tracks metrics for all GPUs on the machine. Make note of the run ID of the primary node. Each worker node needs the run ID of the primary node. `x_primary=True` distinguishes a primary node from worker nodes. Primary nodes are the only nodes that upload files shared across nodes such as configuration files, telemetry and more. Worker nodes do not upload these files. For each worker node, initialize a W\&B run with [`wandb.init()`](/models/ref/python/functions/init) and provide the following: 1. A `wandb.Settings` object to the `settings` parameter (`wandb.init(settings=wandb.Settings()`) with: * The `mode` parameter set to `"shared"` to enable shared mode. * A unique label for `x_label`. You use the value you specify for `x_label` to identify which node the data is coming from in logs and system metrics in the W\&B App UI. If left unspecified, W\&B creates a label for you using the hostname and a random hash. * Set the `x_primary` parameter to `False` to indicate that this is a worker node. 2. Pass the run ID used by the primary node to the `id` parameter. 3. Optionally set [`x_update_finish_state`](https://github.com/wandb/wandb/blob/main/wandb/sdk/wandb_settings.py#L772) to `False`. This prevents non-primary nodes from updating the [run's state](/models/runs/run-states#run-states) to `finished` prematurely, ensuring the run state remains consistent and managed by the primary node. * Use the same entity and project for all nodes. This helps ensure the correct run ID is found. * Consider defining an environment variable on each worker node to set the run ID of the primary node. The following sample code demonstrates the high level requirements for tracking multiple processes to a single run: ```python theme={null} import wandb entity = "" project = "" # Initialize a run in the primary node run = wandb.init( entity=entity, project=project, settings=wandb.Settings( x_label="rank_0", mode="shared", x_primary=True, x_stats_gpu_device_ids=[0, 1], # (Optional) Only track metrics for GPU 0 and 1 ) ) # Note the run ID of the primary node. # Each worker node needs this run ID. run_id = run.id # Initialize a run in a worker node using the run ID of the primary node run = wandb.init( entity=entity, # Use the same entity as the primary node project=project, # Use the same project as the primary node settings=wandb.Settings(x_label="rank_1", mode="shared", x_primary=False), id=run_id, ) # Initialize a run in a worker node using the run ID of the primary node run = wandb.init( entity=entity, # Use the same entity as the primary node project=project, # Use the same project as the primary node settings=wandb.Settings(x_label="rank_2", mode="shared", x_primary=False), id=run_id, ) ``` In a real world example, each worker node might be on a separate machine. See the [Distributed Training with Shared Mode](https://wandb.ai/dimaduev/simple-cnn-ddp/reports/Distributed-Training-with-Shared-Mode--VmlldzoxMTI0NTE1NA) report for an end-to-end example on how to train a model on a multi-node and multi-GPU Kubernetes cluster in GKE. View console logs from multi node processes in the project that the run logs to: 1. Navigate to the project that contains the run. 2. Click on the **Runs** tab in the project sidebar. 3. Click on the run you want to view. 4. Click on the **Logs** tab in the project sidebar. You can filter console logs based on the labels you provide for `x_label` in the UI search bar located at the top of the console log page. For example, the following image shows which options are available to filter the console log by if values `rank0`, `rank1`, `rank2`, `rank3`, `rank4`, `rank5`, and `rank6` are provided to `x_label`.\` Multi-node console logs See [Console logs](/models/app/console-logs/) for more information. W\&B aggregates system metrics from all nodes and displays them in the W\&B App UI. For example, the following image shows a sample dashboard with system metrics from multiple nodes. Each node possesses a unique label (`rank_0`, `rank_1`, `rank_2`) that you specify in the `x_label` parameter. Multi-node system metrics See [Line plots](/models/app/features/panels/line-plot/) for information on how to customize line plot panels. ## Example use cases The following code snippets demonstrate common scenarios for advanced distributed use cases. ### Spawn process Use the `wandb.setup()`method in your main function if you initiate a run in a spawned process: ```python theme={null} import multiprocessing as mp def do_work(n): with wandb.init(config=dict(n=n)) as run: run.log(dict(this=n * n)) def main(): wandb.setup() pool = mp.Pool(processes=4) pool.map(do_work, range(4)) if __name__ == "__main__": main() ``` ### Share a run Pass a run object as an argument to share runs between processes: ```python theme={null} def do_work(run): with wandb.init() as run: run.log(dict(this=1)) def main(): run = wandb.init() p = mp.Process(target=do_work, kwargs=dict(run=run)) p.start() p.join() run.finish() # mark the run as finished if __name__ == "__main__": main() ``` W\&B can not guarantee the logging order. Synchronization should be done by the author of the script. ## Troubleshooting There are two common issues you might encounter when using W\&B and distributed training: 1. **Hanging at the beginning of training** - A `wandb` process can hang if the `wandb` multiprocessing interferes with the multiprocessing from distributed training. 2. **Hanging at the end of training** - A training job might hang if the `wandb` process does not know when it needs to exit. Call the `wandb.Run.finish()` API at the end of your Python script to tell W\&B that the run finished. The `wandb.Run.finish()` API will finish uploading data and will cause W\&B to exit. W\&B recommends using `wandb service` command to improve the reliability of your distributed jobs. Both of the preceding training issues are commonly found in versions of the W\&B SDK where wandb service is unavailable. ### Enable W\&B Service Depending on your version of the W\&B SDK, you might already have W\&B Service enabled by default. #### W\&B SDK 0.13.0 and above W\&B Service is enabled by default for versions of the W\&B SDK `0.13.0` and above. #### W\&B SDK 0.12.5 and above Modify your Python script to enable W\&B Service for W\&B SDK version 0.12.5 and above. Use the `wandb.require()` method and pass the string `"service"` within your main function: ```python theme={null} if __name__ == "__main__": main() def main(): wandb.require("service") # rest-of-your-script-goes-here ``` For optimal experience we do recommend you upgrade to the latest version. **W\&B SDK 0.12.4 and below** Set the `WANDB_START_METHOD` environment variable to `"thread"` to use multithreading instead if you use a W\&B SDK version 0.12.4 and below. # Log models Source: https://docs.wandb.ai/models/track/log/log-models Log model artifacts to a W&B run and retrieve them later using the log_model and use_model SDK methods. # Log models The following guide describes how to log models to a W\&B run and interact with them. The following APIs are useful for tracking models as a part of your experiment tracking workflow. Use the APIs listed on this page to log models to a run, and to access metrics, tables, media, and other objects. W\&B suggests that you use [W\&B Artifacts](/models/artifacts/) if you want to: * Create and keep track of different versions of serialized data besides models, such as datasets, prompts, and more. * Explore [lineage graphs](/models/artifacts/explore-and-traverse-an-artifact-graph/) of a model or any other objects tracked in W\&B. * Interact with the model artifacts these methods created, such as [updating properties](/models/artifacts/update-an-artifact/) (metadata, aliases, and descriptions) For more information on W\&B Artifacts and advanced versioning use cases, see the [Artifacts](/models/artifacts/) documentation. ## Log a model to a run Use the [`log_model`](/models/ref/python/experiments/run#log_model) to log a model artifact that contains content within a directory you specify. The [`log_model`](/models/ref/python/experiments/run#log_model) method also marks the resulting model artifact as an output of the W\&B run. You can track a model's dependencies and the model's associations if you mark the model as the input or output of a W\&B run. View the lineage of the model within the W\&B App UI. See the [Explore and traverse artifact graphs](/models/artifacts/explore-and-traverse-an-artifact-graph/) page within the [Artifacts](/models/artifacts/) chapter for more information. Provide the path where your model files are saved to the `path` parameter. The path can be a local file, directory, or [reference URI](/models/artifacts/track-external-files/#amazon-s3--gcs--azure-blob-storage-references) to an external bucket such as `s3://bucket/path`. Ensure to replace values enclosed in `<>` with your own. ```python theme={null} import wandb # Initialize a W&B run with wandb.init(project="", entity="") as run: # Log the model run.log_model(path="", name="") ``` Optionally provide a name for the model artifact for the `name` parameter. If `name` is not specified, W\&B will use the basename of the input path prepended with the run ID as the name. Keep track of the `name` that you, or W\&B assigns, to the model. You will need the name of the model to retrieve the model path with the [`wandb.Run.use_model()`](/models/ref/python/experiments/run#use_model) method. See [`log_model`](/models/ref/python/experiments/run#log_model) in the API Reference for parameters.
Example: Log a model to a run ```python theme={null} import os import wandb from tensorflow import keras from tensorflow.keras import layers config = {"optimizer": "adam", "loss": "categorical_crossentropy"} # Initialize a W&B run with wandb.init(entity="charlie", project="mnist-experiments", config=config) as run: # Hyperparameters loss = run.config["loss"] optimizer = run.config["optimizer"] metrics = ["accuracy"] num_classes = 10 input_shape = (28, 28, 1) # Training algorithm model = keras.Sequential( [ layers.Input(shape=input_shape), layers.Conv2D(32, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Conv2D(64, kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Flatten(), layers.Dropout(0.5), layers.Dense(num_classes, activation="softmax"), ] ) # Configure the model for training model.compile(loss=loss, optimizer=optimizer, metrics=metrics) # Save model model_filename = "model.h5" local_filepath = "./" full_path = os.path.join(local_filepath, model_filename) model.save(filepath=full_path) # Log the model to the W&B run run.log_model(path=full_path, name="MNIST") ``` When you call `wandb.Run.log_model()`, a model artifact named `MNIST` is created and the file `model.h5` is added to the model artifact. Your terminal or notebook will print information of where to find information about the run the model was logged to. ```python theme={null} View run different-surf-5 at: https://wandb.ai/charlie/mnist-experiments/runs/wlby6fuw Synced 5 W&B file(s), 0 media file(s), 1 artifact file(s) and 0 other file(s) Find logs at: ./wandb/run-20231206_103511-wlby6fuw/logs ```
## Download and use a logged model Use the [`use_model`](/models/ref/python/experiments/run#use_model) function to access and download models files previously logged to a W\&B run. Provide the name of the model artifact where the model files you want to retrieve are stored. The name you provide must match the name of an existing logged model artifact. If you did not define `name` when you originally logged the files with `log_model`, the default name assigned is the basename of the input path, prepended with the run ID. Ensure to replace the values enclosed in `<>` with your own: ```python theme={null} import wandb # Initialize a run with wandb.init(project="", entity="") as run: # Access and download model. Returns path to downloaded artifact downloaded_model_path = run.use_model(name="") ``` The [use\_model](/models/ref/python/experiments/run#use_model) function returns the path of downloaded model files. Keep track of this path if you want to link this model later. In the preceding code snippet, the returned path is stored in a variable called `downloaded_model_path`.
Example: Download and use a logged model For example, in the following code snippet a user called the `use_model` API. They specified the name of the model artifact they want to fetch and they also provided a version/alias. They then stored the path that is returned from the API to the `downloaded_model_path` variable. ```python theme={null} import wandb entity = "luka" project = "NLP_Experiments" alias = "latest" # semantic nickname or identifier for the model version model_artifact_name = "fine-tuned-model" # Initialize a run with wandb.init(project=project, entity=entity) as run: # Access and download model. Returns path to downloaded artifact downloaded_model_path = run.use_model(name = f"{model_artifact_name}:{alias}") ```
See [`use_model`](/models/ref/python/experiments/run#use_model) in the API Reference for parameters and return type. # Log summary metrics Source: https://docs.wandb.ai/models/track/log/log-summary Track and customize single summary metrics like best accuracy or minimum loss on a W&B run using run.summary. In addition to values that change over time during training, it is often important to track a single value that summarizes a model or a preprocessing step. Log this information in a W\&B Run's `summary` dictionary. A Run's summary dictionary can handle numpy arrays, PyTorch tensors or TensorFlow tensors. When a value is one of these types we persist the entire tensor in a binary file and store high level metrics in the summary object, such as min, mean, variance, percentiles, and more. The last value logged with `wandb.Run.log()` is automatically set as the summary dictionary in a W\&B Run. If a summary metric dictionary is modified, the previous value is lost. The following code snippet demonstrates how to provide a custom summary metric to W\&B: ```python theme={null} import wandb import argparse with wandb.init(config=args) as run: best_accuracy = 0 for epoch in range(1, args.epochs + 1): test_loss, test_accuracy = test() if test_accuracy > best_accuracy: run.summary["best_accuracy"] = test_accuracy best_accuracy = test_accuracy ``` You can update the summary attribute of an existing W\&B Run after training has completed. Use the [W\&B Public API](/models/ref/python/public-api/) to update the summary attribute: ```python theme={null} api = wandb.Api() run = api.run("username/project/run_id") run.summary["tensor"] = np.random.random(1000) run.summary.update() ``` ## Customize summary metrics Custom summary metrics are useful for capturing model performance at the best step of training in your `run.summary`. For example, you might want to capture the maximum accuracy or the minimum loss value, instead of the final value. By default, the summary uses the final value from history. To customize summary metrics, pass the `summary` argument in `define_metric`. It accepts the following values: * `"min"` * `"max"` * `"mean"` * `"best"` * `"last"` * `"none"` You can use `"best"` only when you also set the optional `objective` argument to `"minimize"` or `"maximize"`. The following example adds the min and max values of loss and accuracy to the summary: ```python theme={null} import wandb import random random.seed(1) with wandb.init() as run: # Min and max summary values for loss run.define_metric("loss", summary="min") run.define_metric("loss", summary="max") # Min and max summary values for accuracy run.define_metric("acc", summary="min") run.define_metric("acc", summary="max") for i in range(10): log_dict = { "loss": random.uniform(0, 1 / (i + 1)), "acc": random.uniform(1 / (i + 1), 1), } run.log(log_dict) ``` ## View summary metrics View summary values in a run's **Overview** page or the project's runs table. 1. Navigate to the W\&B App. 2. Select the **Workspace** tab from the project sidebar. 3. Click the run that logged the summary values. The run page opens with the **Overview** tab shown by default. 4. View the summary values in the **Summary** section. Run overview 1. Navigate to the W\&B App. 2. Select the **Runs** tab. 3. Within the runs table, you can view the summary values within the columns based on the name of the summary value. You can use the W\&B Public API to fetch the summary values of a run. The following code example demonstrates one way to retrieve the summary values logged to a specific run using the W\&B Public API and pandas: ```python theme={null} import wandb import pandas entity = "" project = "" run_name = "" # Name of run with summary values all_runs = [] for run in api.runs(f"{entity}/{project_name}"): print("Fetching details for run: ", run.id, run.name) run_data = { "id": run.id, "name": run.name, "url": run.url, "state": run.state, "tags": run.tags, "config": run.config, "created_at": run.created_at, "system_metrics": run.system_metrics, "summary": run.summary, "project": run.project, "entity": run.entity, "user": run.user, "path": run.path, "notes": run.notes, "read_only": run.read_only, "history_keys": run.history_keys, "metadata": run.metadata, } all_runs.append(run_data) # Convert to DataFrame df = pd.DataFrame(all_runs) # Get row based on the column name (run) and convert to dictionary df[df['name']==run_name].summary.reset_index(drop=True).to_dict() ``` # Log tables Source: https://docs.wandb.ai/models/track/log/log-tables Create, populate, and log W&B Tables to visualize and query structured data from your experiment runs. Use `wandb.Table` to log data to visualize and query with W\&B. In this guide, learn how to: 1. [Create Tables](./log-tables#create-tables) 2. [Add Data](./log-tables#add-data) 3. [Retrieve Data](./log-tables#retrieve-data) 4. [Save Tables](./log-tables#save-tables) ## Create tables To define a Table, specify the columns you want to see for each row of data. Each row might be a single item in your training dataset, a particular step or epoch during training, a prediction made by your model on a test item, an object generated by your model, etc. Each column has a fixed type: numeric, text, boolean, image, video, audio, etc. You do not need to specify the type in advance. Give each column a name, and make sure to only pass data of that type into that column index. For a more detailed example, see the [W\&B Tables guide](https://wandb.ai/stacey/mnist-viz/reports/Guide-to-W-B-Tables--Vmlldzo2NTAzOTk#1.-how-to-log-a-wandb.table). Use the `wandb.Table` constructor in one of two ways: 1. **List of Rows:** Log named columns and rows of data. For example the following code snippet generates a table with two rows and three columns: ```python theme={null} wandb.Table(columns=["a", "b", "c"], data=[["1a", "1b", "1c"], ["2a", "2b", "2c"]]) ``` 2. **Pandas DataFrame:** Log a DataFrame using `wandb.Table(dataframe=my_df)`. Column names will be extracted from the DataFrame. #### From an existing array or dataframe ```python theme={null} # assume a model has returned predictions on four images # with the following fields available: # - the image id # - the image pixels, wrapped in a wandb.Image() # - the model's predicted label # - the ground truth label my_data = [ [0, wandb.Image("img_0.jpg"), 0, 0], [1, wandb.Image("img_1.jpg"), 8, 0], [2, wandb.Image("img_2.jpg"), 7, 1], [3, wandb.Image("img_3.jpg"), 1, 1], ] # create a wandb.Table() with corresponding columns columns = ["id", "image", "prediction", "truth"] test_table = wandb.Table(data=my_data, columns=columns) ``` ## Add data Tables are mutable. As your script executes you can add more data to your table, up to 200,000 rows. There are two ways to add data to a table: 1. **Add a Row**: `table.add_data("3a", "3b", "3c")`. Note that the new row is not represented as a list. If your row is in list format, use the star notation, `*` ,to expand the list to positional arguments: `table.add_data(*my_row_list)`. The row must contain the same number of entries as there are columns in the table. 2. **Add a Column**: `table.add_column(name="col_name", data=col_data)`. Note that the length of `col_data` must be equal to the table's current number of rows. Here, `col_data` can be a list data, or a NumPy NDArray. ### Adding data incrementally This code sample shows how to create and populate a W\&B table incrementally. You define the table with predefined columns, including confidence scores for all possible labels, and add data row by row during inference. You can also [add data to tables incrementally when resuming runs](#adding-data-to-resumed-runs). ```python theme={null} # Define the columns for the table, including confidence scores for each label columns = ["id", "image", "guess", "truth"] for digit in range(10): # Add confidence score columns for each digit (0-9) columns.append(f"score_{digit}") # Initialize the table with the defined columns test_table = wandb.Table(columns=columns) # Iterate through the test dataset and add data to the table row by row # Each row includes the image ID, image, predicted label, true label, and confidence scores for img_id, img in enumerate(mnist_test_data): true_label = mnist_test_data_labels[img_id] # Ground truth label guess_label = my_model.predict(img) # Predicted label test_table.add_data( img_id, wandb.Image(img), guess_label, true_label ) # Add row data to the table ``` #### Adding data to resumed runs You can incrementally update a W\&B table in resumed runs by loading an existing table from an artifact, retrieving the last row of data, and adding the updated metrics. Then, reinitialize the table for compatibility and log the updated version back to W\&B. ```python theme={null} import wandb # Initialize a run with wandb.init(project="my_project") as run: # Load the existing table from the artifact best_checkpt_table = run.use_artifact(table_tag).get(table_name) # Get the last row of data from the table for resuming best_iter, best_metric_max, best_metric_min = best_checkpt_table.data[-1] # Update the best metrics as needed # Add the updated data to the table best_checkpt_table.add_data(best_iter, best_metric_max, best_metric_min) # Reinitialize the table with its updated data to ensure compatibility best_checkpt_table = wandb.Table( columns=["col1", "col2", "col3"], data=best_checkpt_table.data ) # Initialize the Run with wandb.init() as run: # Log the updated table to W&B run.log({table_name: best_checkpt_table}) ``` ## Retrieve data Once data is in a Table, access it by column or by row: 1. **Row Iterator**: Use the row iterator of Table such as `for ndx, row in table.iterrows(): ...` to efficiently iterate over the data's rows. 2. **Get a Column**: Retrieve a column of data using `table.get_column("col_name")` . As a convenience, you can pass `convert_to="numpy"` to convert the column to a NumPy NDArray of primitives. This is useful if your column contains media types such as `wandb.Image` so that you can access the underlying data directly. ## Save tables After you generate a table of data in your script, for example a table of model predictions, save it to W\&B to visualize the results live. ### Log a table to a run Use `wandb.Run.log()` to save your table to the run, like so: ```python theme={null} with wandb.init() as run: my_table = wandb.Table(columns=["a", "b"], data=[["1a", "1b"], ["2a", "2b"]]) run.log({"table_key": my_table}) ``` Each time a table is logged to the same key, a new version of the table is created and stored in the backend. This means you can log the same table across multiple training steps to see how model predictions improve over time, or compare tables across different runs, as long as they're logged to the same key. You can log up to 200,000 rows. To log more than 200,000 rows, you can override the limit with: `wandb.Table.MAX_ARTIFACT_ROWS = X` However, this would likely cause performance issues, such as slower queries, in the UI. ### Access tables programmatically In the backend, Tables are persisted as Artifacts. If you are interested in accessing a specific version, you can do so with the artifact API: ```python theme={null} with wandb.init() as run: my_table = run.use_artifact("run--:").get("") ``` For more information on Artifacts, see the [Artifacts Chapter](/models/artifacts/) in the Developer Guide. ### Visualize tables Any table logged this way will show up in your Workspace on both the Run Page and the Project Page. For more information, see [Visualize and Analyze Tables](/models/tables/visualize-tables/). ## Artifact tables Use `artifact.add()` to log tables to the Artifacts section of your run instead of the workspace. This could be useful if you have a dataset that you want to log once and then reference for future runs. ```python theme={null} with wandb.init(project="my_project") as run: # create a wandb Artifact for each meaningful step test_predictions = wandb.Artifact("mnist_test_preds", type="predictions") # [build up your predictions data as above] test_table = wandb.Table(data=data, columns=columns) test_predictions.add(test_table, "my_test_key") run.log_artifact(test_predictions) ``` Refer to this Colab for a [detailed example of artifact.add() with image data](https://wandb.me/dsviz-nature-colab) and this Report for an example of how to use Artifacts and Tables to [version control and deduplicate tabular data](https://wandb.me/TBV-Dedup). ### Join Artifact tables You can join tables you have locally constructed or tables you have retrieved from other artifacts using `wandb.JoinedTable(table_1, table_2, join_key)`. | Args | Description | | --------- | ------------------------------------------------------------------------------------------------------------------ | | table\_1 | (str, `wandb.Table`, ArtifactEntry) the path to a `wandb.Table` in an artifact, the table object, or ArtifactEntry | | table\_2 | (str, `wandb.Table`, ArtifactEntry) the path to a `wandb.Table` in an artifact, the table object, or ArtifactEntry | | join\_key | (str, \[str, str]) key or keys on which to perform the join | To join two Tables you have logged previously in an artifact context, fetch them from the artifact and join the result into a new Table. For example, the following code example demonstrates how to read one Table of original songs called `'original_songs'` and another Table of synthesized versions of the same songs called `'synth_songs'`. The code joins the two tables on `"song_id"`, and uploads the resulting table as a new W\&B Table: ```python theme={null} import wandb with wandb.init(project="my_project") as run: # fetch original songs table orig_songs = run.use_artifact("original_songs:latest") orig_table = orig_songs.get("original_samples") # fetch synthesized songs table synth_songs = run.use_artifact("synth_songs:latest") synth_table = synth_songs.get("synth_samples") # join tables on "song_id" join_table = wandb.JoinedTable(orig_table, synth_table, "song_id") join_at = wandb.Artifact("synth_summary", "analysis") # add table to artifact and log to W&B join_at.add(join_table, "synth_explore") run.log_artifact(join_at) ``` [Read this tutorial](https://wandb.ai/stacey/cshanty/reports/Whale2Song-W-B-Tables-for-Audio--Vmlldzo4NDI3NzM) for an example on how to combine two previously stored tables stored in different Artifact objects. # Log media and objects Source: https://docs.wandb.ai/models/track/log/media Log rich media, from 3D point clouds and molecules to HTML and histograms We support images, video, audio, and more. Log rich media to explore your results and visually compare your runs, models, and datasets. Read on for examples and how-to guides. For details, see the [Data types reference](/models/ref/python/data-types/). For more details, check out a [demo report about visualize model predictions](https://wandb.ai/lavanyashukla/visualize-predictions/reports/Visualize-Model-Predictions--Vmlldzo1NjM4OA) or watch a [video walkthrough](https://www.youtube.com/watch?v=96MxRvx15Ts). ## Pre-requisites In order to log media objects with the W\&B SDK, you may need to install additional dependencies. You can install these dependencies by running the following command: ```bash theme={null} pip install wandb[media] ``` ## Images Log images to track inputs, outputs, filter weights, activations, and more. Autoencoder inputs and outputs Images can be logged directly from NumPy arrays, as PIL images, or from the filesystem. Each time you log images from a step, they are available in the UI. Expand the image panel, then use the step slider to look at images from different steps. This makes it easy to compare how a model's output changes during training. Click a media panel to view an image in full-screen mode; there you can zoom and pan, including with [keyboard shortcuts](/models/app/keyboard-shortcuts#media-panels). To compare images or videos from different runs, steps, or indices in one view, use [Compare mode](/models/app/features/panels/media#compare-mode) in a media panel. It's recommended to log fewer than 50 images per step to prevent logging from becoming a bottleneck during training and image loading from becoming a bottleneck when viewing results. Provide arrays directly when constructing images manually, such as by using [`make_grid` from `torchvision`](https://pytorch.org/vision/stable/utils.html#torchvision.utils.make_grid). Arrays are converted to png using [Pillow](https://pillow.readthedocs.io/en/stable/index.html). ```python theme={null} import wandb with wandb.init(project="image-log-example") as run: images = wandb.Image(image_array, caption="Top: Output, Bottom: Input") run.log({"examples": images}) ``` We assume the image is gray scale if the last dimension is 1, RGB if it's 3, and RGBA if it's 4. If the array contains floats, we convert them to integers between `0` and `255`. If you want to normalize your images differently, you can specify the [`mode`](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes) manually or just supply a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html), as described in the "Logging PIL Images" tab of this panel. For full control over the conversion of arrays to images, construct the [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html) yourself and provide it directly. ```python theme={null} from PIL import Image with wandb.init(project="") as run: # Create a PIL image from a NumPy array image = Image.fromarray(image_array) # Optionally, convert to RGB if needed if image.mode != "RGB": image = image.convert("RGB") # Log the image run.log({"example": wandb.Image(image, caption="My Image")}) ``` For even more control, create images however you like, save them to disk, and provide a filepath. ```python theme={null} import wandb from PIL import Image with wandb.init(project="") as run: im = Image.fromarray(...) rgb_im = im.convert("RGB") rgb_im.save("myimage.jpg") run.log({"example": wandb.Image("myimage.jpg")}) ``` ## Image overlays Log semantic segmentation masks and interact with them (altering opacity, viewing changes over time, and more) via the W\&B UI. Interactive mask viewing To log an overlay, provide a dictionary with the following keys and values to the `masks` keyword argument of `wandb.Image`: * one of two keys representing the image mask: * `"mask_data"`: a 2D NumPy array containing an integer class label for each pixel * `"path"`: (string) a path to a saved image mask file * `"class_labels"`: (optional) a dictionary mapping the integer class labels in the image mask to their readable class names To log multiple masks, log a mask dictionary with multiple keys, as in the code snippet below. [See a live example](https://app.wandb.ai/stacey/deep-drive/reports/Image-Masks-for-Semantic-Segmentation--Vmlldzo4MTUwMw) [Sample code](https://colab.research.google.com/drive/1SOVl3EvW82Q4QKJXX6JtHye4wFix_P4J) ```python theme={null} mask_data = np.array([[1, 2, 2, ..., 2, 2, 1], ...]) class_labels = {1: "tree", 2: "car", 3: "road"} mask_img = wandb.Image( image, masks={ "predictions": {"mask_data": mask_data, "class_labels": class_labels}, "ground_truth": { # ... }, # ... }, ) ``` Segmentation masks for a key are defined at each step (each call to `run.log()`). * If steps provide different values for the same mask key, only the most recent value for the key is applied to the image. * If steps provide different mask keys, all values for each key are shown, but only those defined in the step being viewed are applied to the image. Toggling the visibility of masks not defined in the step do not change the image. Log bounding boxes with images, and use filters and toggles to dynamically visualize different sets of boxes in the UI. Bounding box example [See a live example](https://app.wandb.ai/stacey/yolo-drive/reports/Bounding-Boxes-for-Object-Detection--Vmlldzo4Nzg4MQ) To log a bounding box, you'll need to provide a dictionary with the following keys and values to the boxes keyword argument of `wandb.Image`: * `box_data`: a list of dictionaries, one for each box. The box dictionary format is described below. * `position`: a dictionary representing the position and size of the box in one of two formats, as described below. Boxes need not all use the same format. * *Option 1:* `{"minX", "maxX", "minY", "maxY"}`. Provide a set of coordinates defining the upper and lower bounds of each box dimension. * *Option 2:* `{"middle", "width", "height"}`. Provide a set of coordinates specifying the `middle` coordinates as `[x,y]`, and `width` and `height` as scalars. * `class_id`: an integer representing the class identity of the box. See `class_labels` key below. * `scores`: a dictionary of string labels and numeric values for scores. Can be used for filtering boxes in the UI. * `domain`: specify the units/format of the box coordinates. **Set this to "pixel"** if the box coordinates are expressed in pixel space, such as integers within the bounds of the image dimensions. By default, the domain is assumed to be a fraction/percentage of the image, expressed as a floating point number between 0 and 1. * `box_caption`: (optional) a string to be displayed as the label text on this box * `class_labels`: (optional) A dictionary mapping `class_id`s to strings. By default we will generate class labels `class_0`, `class_1`, etc. Check out this example: ```python theme={null} import wandb class_id_to_label = { 1: "car", 2: "road", 3: "building", # ... } img = wandb.Image( image, boxes={ "predictions": { "box_data": [ { # one box expressed in the default relative/fractional domain "position": {"minX": 0.1, "maxX": 0.2, "minY": 0.3, "maxY": 0.4}, "class_id": 2, "box_caption": class_id_to_label[2], "scores": {"acc": 0.1, "loss": 1.2}, # another box expressed in the pixel domain # (for illustration purposes only, all boxes are likely # to be in the same domain/format) "position": {"middle": [150, 20], "width": 68, "height": 112}, "domain": "pixel", "class_id": 3, "box_caption": "a building", "scores": {"acc": 0.5, "loss": 0.7}, # ... # Log as many boxes an as needed } ], "class_labels": class_id_to_label, }, # Log each meaningful group of boxes with a unique key name "ground_truth": { # ... }, }, ) with wandb.init(project="my_project") as run: run.log({"driving_scene": img}) ``` ## Image overlays in Tables Interactive Segmentation Masks in Tables To log Segmentation Masks in tables, you will need to provide a `wandb.Image` object for each row in the table. An example is provided in the Code snippet below: ```python theme={null} table = wandb.Table(columns=["ID", "Image"]) for id, img, label in zip(ids, images, labels): mask_img = wandb.Image( img, masks={ "prediction": {"mask_data": label, "class_labels": class_labels} # ... }, ) table.add_data(id, mask_img) with wandb.init(project="my_project") as run: run.log({"Table": table}) ``` Interactive Bounding Boxes in Tables To log Images with Bounding Boxes in tables, you will need to provide a `wandb.Image` object for each row in the table. An example is provided in the code snippet below: ```python theme={null} table = wandb.Table(columns=["ID", "Image"]) for id, img, boxes in zip(ids, images, boxes_set): box_img = wandb.Image( img, boxes={ "prediction": { "box_data": [ { "position": { "minX": box["minX"], "minY": box["minY"], "maxX": box["maxX"], "maxY": box["maxY"], }, "class_id": box["class_id"], "box_caption": box["caption"], "domain": "pixel", } for box in boxes ], "class_labels": class_labels, } }, ) ``` ## Histograms If a sequence of numbers, such as a list, array, or tensor, is provided as the first argument, we will construct the histogram automatically by calling `np.histogram()`. All arrays/tensors are flattened. You can use the optional `num_bins` keyword argument to override the default of `64` bins. The maximum number of bins supported is `512`. In the UI, histograms are plotted with the training step on the x-axis, the metric value on the y-axis, and the count represented by color, to ease comparison of histograms logged throughout training. See the "Histograms in Summary" tab of this panel for details on logging one-off histograms. ```python theme={null} run.log({"gradients": wandb.Histogram(grads)}) ``` GAN discriminator gradients If you want more control, call `np.histogram()` and pass the returned tuple to the `np_histogram` keyword argument. ```python theme={null} np_hist_grads = np.histogram(grads, density=True, range=(0.0, 1.0)) run.log({"gradients": wandb.Histogram(np_hist_grads)}) ``` If histograms are in your summary they will appear on the Overview tab of the [Run Page](/models/runs/). If they are in your history, we plot a heatmap of bins over time on the Charts tab. ## 3D visualizations Log 3D point clouds and Lidar scenes with bounding boxes. Pass in a NumPy array containing coordinates and colors for the points to render. ```python theme={null} point_cloud = np.array([[0, 0, 0, COLOR]]) run.log({"point_cloud": wandb.Object3D(point_cloud)}) ``` The W\&B UI truncates the data at 300,000 points. #### NumPy array formats Three different formats of NumPy arrays are supported for flexible color schemes. * `[[x, y, z], ...]` `nx3` * `[[x, y, z, c], ...]` `nx4` `| c is a category` in the range `[1, 14]` (Useful for segmentation) * `[[x, y, z, r, g, b], ...]` `nx6 | r,g,b` are values in the range `[0,255]`for red, green, and blue color channels. #### Python object Using this schema, you can define a Python object and pass it in to [the `from_point_cloud` method](/models/ref/python/#from_point_cloud). * `points`is a NumPy array containing coordinates and colors for the points to render using [the same formats as the simple point cloud renderer shown above](#python-object). * `boxes` is a NumPy array of python dictionaries with three attributes: * `corners`- a list of eight corners * `label`- a string representing the label to be rendered on the box (Optional) * `color`- rgb values representing the color of the box * `score` - a numeric value that will be displayed on the bounding box that can be used to filter the bounding boxes shown (for example, to only show bounding boxes where `score` > `0.75`). (Optional) * `type` is a string representing the scene type to render. Currently the only supported value is `lidar/beta` ```python theme={null} point_list = [ [ 2566.571924017235, # x 746.7817289698219, # y -15.269245470863748,# z 76.5, # red 127.5, # green 89.46617199365393 # blue ], [ 2566.592983606823, 746.6791987335685, -15.275803826279521, 76.5, 127.5, 89.45471117247024 ], [ 2566.616361739416, 746.4903185513501, -15.28628929674075, 76.5, 127.5, 89.41336375503832 ], [ 2561.706014951675, 744.5349468458361, -14.877496818222781, 76.5, 127.5, 82.21868245418283 ], [ 2561.5281847916694, 744.2546118233013, -14.867862032341005, 76.5, 127.5, 81.87824684536432 ], [ 2561.3693562897465, 744.1804761656741, -14.854129178142523, 76.5, 127.5, 81.64137897587152 ], [ 2561.6093071504515, 744.0287526628543, -14.882135189841177, 76.5, 127.5, 81.89871499537098 ], # ... and so on ] run.log({"my_first_point_cloud": wandb.Object3D.from_point_cloud( points = point_list, boxes = [{ "corners": [ [ 2601.2765123137915, 767.5669506323393, -17.816764802288663 ], [ 2599.7259021588347, 769.0082337923552, -17.816764802288663 ], [ 2599.7259021588347, 769.0082337923552, -19.66876480228866 ], [ 2601.2765123137915, 767.5669506323393, -19.66876480228866 ], [ 2604.8684867834395, 771.4313904894723, -17.816764802288663 ], [ 2603.3178766284827, 772.8726736494882, -17.816764802288663 ], [ 2603.3178766284827, 772.8726736494882, -19.66876480228866 ], [ 2604.8684867834395, 771.4313904894723, -19.66876480228866 ] ], "color": [0, 0, 255], # color in RGB of the bounding box "label": "car", # string displayed on the bounding box "score": 0.6 # numeric displayed on the bounding box }], vectors = [ {"start": [0, 0, 0], "end": [0.1, 0.2, 0.5], "color": [255, 0, 0]}, # color is optional ], point_cloud_type = "lidar/beta", )}) ``` When viewing a point cloud, you can hold control and use the mouse to move around inside the space. #### Point cloud files You can use [the `from_file` method](/models/ref/python/#from_file) to load in a JSON file full of point cloud data. ```python theme={null} run.log({"my_cloud_from_file": wandb.Object3D.from_file( "./my_point_cloud.pts.json" )}) ``` An example of how to format the point cloud data is shown below. ```json theme={null} { "boxes": [ { "color": [ 0, 255, 0 ], "score": 0.35, "label": "My label", "corners": [ [ 2589.695869075582, 760.7400443552185, -18.044831294622487 ], [ 2590.719039645323, 762.3871153874499, -18.044831294622487 ], [ 2590.719039645323, 762.3871153874499, -19.54083129462249 ], [ 2589.695869075582, 760.7400443552185, -19.54083129462249 ], [ 2594.9666662674313, 757.4657929961453, -18.044831294622487 ], [ 2595.9898368371723, 759.1128640283766, -18.044831294622487 ], [ 2595.9898368371723, 759.1128640283766, -19.54083129462249 ], [ 2594.9666662674313, 757.4657929961453, -19.54083129462249 ] ] } ], "points": [ [ 2566.571924017235, 746.7817289698219, -15.269245470863748, 76.5, 127.5, 89.46617199365393 ], [ 2566.592983606823, 746.6791987335685, -15.275803826279521, 76.5, 127.5, 89.45471117247024 ], [ 2566.616361739416, 746.4903185513501, -15.28628929674075, 76.5, 127.5, 89.41336375503832 ] ], "type": "lidar/beta" } ``` #### NumPy arrays Using [the same array formats defined above](#numpy-array-formats), you can use `numpy` arrays directly with [the `from_numpy` method](/models/ref/python/#from_numpy) to define a point cloud. ```python theme={null} run.log({"my_cloud_from_numpy_xyz": wandb.Object3D.from_numpy( np.array( [ [0.4, 1, 1.3], # x, y, z [1, 1, 1], [1.2, 1, 1.2] ] ) )}) ``` ```python theme={null} run.log({"my_cloud_from_numpy_cat": wandb.Object3D.from_numpy( np.array( [ [0.4, 1, 1.3, 1], # x, y, z, category [1, 1, 1, 1], [1.2, 1, 1.2, 12], [1.2, 1, 1.3, 12], [1.2, 1, 1.4, 12], [1.2, 1, 1.5, 12], [1.2, 1, 1.6, 11], [1.2, 1, 1.7, 11], ] ) )}) ``` ```python theme={null} run.log({"my_cloud_from_numpy_rgb": wandb.Object3D.from_numpy( np.array( [ [0.4, 1, 1.3, 255, 0, 0], # x, y, z, r, g, b [1, 1, 1, 0, 255, 0], [1.2, 1, 1.3, 0, 255, 255], [1.2, 1, 1.4, 0, 255, 255], [1.2, 1, 1.5, 0, 0, 255], [1.2, 1, 1.1, 0, 0, 255], [1.2, 1, 0.9, 0, 0, 255], ] ) )}) ``` ```python theme={null} run.log({"protein": wandb.Molecule("6lu7.pdb")}) ``` Log molecular data in any of 10 file types:`pdb`, `pqr`, `mmcif`, `mcif`, `cif`, `sdf`, `sd`, `gro`, `mol2`, or `mmtf.` W\&B also supports logging molecular data from SMILES strings, [`rdkit`](https://www.rdkit.org/docs/index.html) `mol` files, and `rdkit.Chem.rdchem.Mol` objects. ```python theme={null} resveratrol = rdkit.Chem.MolFromSmiles("Oc1ccc(cc1)C=Cc1cc(O)cc(c1)O") run.log( { "resveratrol": wandb.Molecule.from_rdkit(resveratrol), "green fluorescent protein": wandb.Molecule.from_rdkit("2b3p.mol"), "acetaminophen": wandb.Molecule.from_smiles("CC(=O)Nc1ccc(O)cc1"), } ) ``` When your run finishes, you'll be able to interact with 3D visualizations of your molecules in the UI. [See a live example using AlphaFold](https://wandb.me/alphafold-workspace) Molecule structure ### PNG image [`wandb.Image`](/models/ref/python/data-types/image) converts `numpy` arrays or instances of `PILImage` to PNGs by default. ```python theme={null} run.log({"example": wandb.Image(...)}) # Or multiple images run.log({"example": [wandb.Image(...) for img in images]}) ``` ### Video Videos are logged using the [`wandb.Video`](/models/ref/python/) data type: ```python theme={null} run.log({"example": wandb.Video("myvideo.mp4")}) ``` Now you can view videos in the media browser. Go to your project workspace, run workspace, or report and click **Add visualization** to add a rich media panel. ## 2D view of a molecule You can log a 2D view of a molecule using the [`wandb.Image`](/models/ref/python/data-types/image) data type and [`rdkit`](https://www.rdkit.org/docs/index.html): ```python theme={null} molecule = rdkit.Chem.MolFromSmiles("CC(=O)O") rdkit.Chem.AllChem.Compute2DCoords(molecule) rdkit.Chem.AllChem.GenerateDepictionMatching2DStructure(molecule, molecule) pil_image = rdkit.Chem.Draw.MolToImage(molecule, size=(300, 300)) run.log({"acetic_acid": wandb.Image(pil_image)}) ``` ## Other media W\&B also supports logging of a variety of other media types. ### Audio ```python theme={null} run.log({"whale songs": wandb.Audio(np_array, caption="OooOoo", sample_rate=32)}) ``` A maximum of 100 audio clips can be logged per step. For more usage information, see [`audio-file`](/models/ref/query-panel/audio-file). ### Video ```python theme={null} run.log({"video": wandb.Video(numpy_array_or_path_to_video, fps=4, format="gif")}) ``` If a numpy array is supplied we assume the dimensions are, in order: time, channels, width, height. By default we create a 4 fps gif image ([`ffmpeg`](https://www.ffmpeg.org) and the [`moviepy`](https://pypi.org/project/moviepy/) python library are required when passing numpy objects). Supported formats are `"gif"`, `"mp4"`, `"webm"`, and `"ogg"`. If you pass a string to `wandb.Video` we assert the file exists and is a supported format before uploading to wandb. Passing a `BytesIO` object will create a temporary file with the specified format as the extension. On the W\&B [Run](/models/runs/) and [Project](/models/track/project-page/) Pages, you will see your videos in the Media section. For more usage information, see [`video-file`](/models/ref/query-panel/video-file). ### Text Use `wandb.Table` to log text in tables to show up in the UI. By default, the column headers are `["Input", "Output", "Expected"]`. To ensure optimal UI performance, the default maximum number of rows is set to 10,000. However, you can explicitly override the maximum with `wandb.Table.MAX_ROWS = {DESIRED_MAX}`. ```python theme={null} with wandb.init(project="my_project") as run: columns = ["Text", "Predicted Sentiment", "True Sentiment"] # Method 1 data = [["I love my phone", "1", "1"], ["My phone sucks", "0", "-1"]] table = wandb.Table(data=data, columns=columns) run.log({"examples": table}) # Method 2 table = wandb.Table(columns=columns) table.add_data("I love my phone", "1", "1") table.add_data("My phone sucks", "0", "-1") run.log({"examples": table}) ``` You can also pass a pandas `DataFrame` object. ```python theme={null} table = wandb.Table(dataframe=my_dataframe) ``` For more usage information, see [`string`](/models/ref/query-panel/). ### HTML ```python theme={null} run.log({"custom_file": wandb.Html(open("some.html"))}) run.log({"custom_string": wandb.Html('Link')}) ``` Log custom HTML at any key to expose an HTML panel on the run page. By default, we inject default styles; you can turn off default styles by passing `inject=False`. ```python theme={null} run.log({"custom_file": wandb.Html(open("some.html"), inject=False)}) ``` For more usage information, see [`html-file`](/models/ref/query-panel/html-file). # Create and track plots from experiments Source: https://docs.wandb.ai/models/track/log/plots Create and track plots from machine learning experiments. In W\&B Models, methods in `wandb.plot` let you track charts with `wandb.Run.log()`, including charts that change over time during training. To learn more about the custom charting framework, see the [custom charts walkthrough](/models/app/features/custom-charts/walkthrough/). ### Basic charts To create a W\&B chart: 1. Create a `wandb.Table` object and add the data you want to visualize. 2. Generate a plot using one of the W\&B's built-in [helper functions](/models/ref/python/custom-charts) 3. Log the plot with `wandb.Run.log()`. The following basic charts can be used to construct basic visualizations of metrics and results. Log a custom line plot, a list of connected and ordered points on arbitrary axes. ```python theme={null} import wandb with wandb.init() as run: data = [[x, y] for (x, y) in zip(x_values, y_values)] table = wandb.Table(data=data, columns=["x", "y"]) run.log( { "my_custom_plot_id": wandb.plot.line( table, "x", "y", title="Custom Y versus X line plot" ) } ) ``` You can use this to log curves on any two dimensions. If you're plotting two lists of values against each other, the number of values in the lists must match exactly. For example, each point must have an x and a y. Custom line plot For more information, see the [Creating Custom Line Plots With W\&B](https://wandb.ai/wandb/plots/reports/Custom-Line-Plots--VmlldzoyNjk5NTA) report. [Run the code](https://tiny.cc/custom-charts) Log a custom scatter plot, a list of points (x, y) on a pair of arbitrary axes x and y. ```python theme={null} import wandb with wandb.init() as run: data = [[x, y] for (x, y) in zip(class_x_scores, class_y_scores)] table = wandb.Table(data=data, columns=["class_x", "class_y"]) run.log({"my_custom_id": wandb.plot.scatter(table, "class_x", "class_y")}) ``` You can use this to log scatter points on any two dimensions. If you're plotting two lists of values against each other, the number of values in the lists must match exactly. For example, each point must have an x and a y. Custom scatter plot For more information, see the [Creating Custom Scatter Plots With W\&B](https://wandb.ai/wandb/plots/reports/Custom-Scatter-Plots--VmlldzoyNjk5NDQ) report. [Run the code](https://tiny.cc/custom-charts) Log a custom bar chart (a list of labeled values as bars) natively in a few lines: ```python theme={null} import wandb with wandb.init() as run: data = [[label, val] for (label, val) in zip(labels, values)] table = wandb.Table(data=data, columns=["label", "value"]) run.log( { "my_bar_chart_id": wandb.plot.bar( table, "label", "value", title="Custom bar chart" ) } ) ``` You can use this to log arbitrary bar charts. The number of labels and values in the lists must match exactly. Each data point must have both. Custom bar chart For more information, see the [Custom Bar Charts](https://wandb.ai/wandb/plots/reports/Custom-Bar-Charts--VmlldzoyNzExNzk) report. [Run the code](https://tiny.cc/custom-charts) Log a custom histogram (sort a list of values into bins by count or frequency of occurrence) natively in a few lines. If you have a list of prediction confidence scores (`scores`), you can visualize the distribution like this: ```python theme={null} import wandb with wandb.init() as run: data = [[s] for s in scores] table = wandb.Table(data=data, columns=["scores"]) run.log({"my_histogram": wandb.plot.histogram(table, "scores", title="Histogram")}) ``` You can use this to log arbitrary histograms. Note that `data` is a list of lists, intended to support a 2D array of rows and columns. Custom histogram For more information, see the [Creating Custom Histograms With W\&B](https://wandb.ai/wandb/plots/reports/Custom-Histograms--VmlldzoyNzE0NzM) report. [Run the code](https://tiny.cc/custom-charts) Plot multiple lines, or multiple different lists of x-y coordinate pairs, on one shared set of x-y axes: ```python theme={null} import wandb with wandb.init() as run: run.log( { "my_custom_id": wandb.plot.line_series( xs=[0, 1, 2, 3, 4], ys=[[10, 20, 30, 40, 50], [0.5, 11, 72, 3, 41]], keys=["metric Y", "metric Z"], title="Two Random Metrics", xname="x units", ) } ) ``` Note that the number of x and y points must match exactly. You can supply one list of x values to match multiple lists of y values, or a separate list of x values for each list of y values. Multi-line plot For more information, see the [Custom Multi-Line Plots](https://wandb.ai/wandb/plots/reports/Custom-Multi-Line-Plots--VmlldzozOTMwMjU) report. ### Model evaluation charts These preset charts have built-in `wandb.plot()` methods that make it quick to log charts directly from your script and see the exact information you're looking for in the UI. Create a [Precision-Recall curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve) in one line: ```python theme={null} import wandb with wandb.init() as run: # ground_truth is a list of true labels, predictions is a list of predicted scores. # For example ground_truth = [0, 1, 1, 0], predictions = [0.1, 0.4, 0.35, 0.8] ground_truth = [0, 1, 1, 0] predictions = [0.1, 0.4, 0.35, 0.8] run.log({"pr": wandb.plot.pr_curve(ground_truth, predictions)}) ``` You can log this whenever your code has access to: * A model's predicted scores (`predictions`) on a set of examples. * The corresponding ground truth labels (`ground_truth`) for those examples. * (Optional) A list of the labels or class names. For example, `labels=["cat", "dog", "bird"]`, if label index 0 means cat, 1 means dog, 2 means bird. * (Optional) A subset (still in list format) of the labels to visualize in the plot. Precision-recall curve For more information, see the [Plot Precision Recall Curves With W\&B](https://wandb.ai/wandb/plots/reports/Plot-Precision-Recall-Curves--VmlldzoyNjk1ODY) report. [Run the code](https://colab.research.google.com/drive/1mS8ogA3LcZWOXchfJoMrboW3opY1A8BY?usp=sharing) Create an [ROC curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) in one line: ```python theme={null} import wandb with wandb.init() as run: # ground_truth is a list of true labels, predictions is a list of predicted scores. # For example ground_truth = [0, 1, 1, 0], predictions = [0.1, 0.4, 0.35, 0.8] ground_truth = [0, 1, 1, 0] predictions = [0.1, 0.4, 0.35, 0.8] run.log({"roc": wandb.plot.roc_curve(ground_truth, predictions)}) ``` You can log this whenever your code has access to: * A model's predicted scores (`predictions`) on a set of examples. * The corresponding ground truth labels (`ground_truth`) for those examples. * (Optional) A list of the labels or class names. For example, `labels=["cat", "dog", "bird"]`, if label index 0 means cat, 1 means dog, 2 means bird. * (Optional) A subset (still in list format) of these labels to visualize on the plot. ROC curve For more information, see the [Plot ROC Curves With W\&B](https://wandb.ai/wandb/plots/reports/Plot-ROC-Curves--VmlldzoyNjk3MDE) report. [Run the code](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-log/Plot_ROC_Curves_with_W%26B.ipynb) Create a multi-class [confusion matrix](https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html) in one line: ```python theme={null} import wandb cm = wandb.plot.confusion_matrix( y_true=ground_truth, preds=predictions, class_names=class_names ) with wandb.init() as run: run.log({"conf_mat": cm}) ``` You can log this wherever your code has access to: * A model's predicted labels on a set of examples (`preds`) or the normalized probability scores (`probs`). The probabilities must have the shape (number of examples, number of classes). You can supply either probabilities or predictions but not both. * The corresponding ground truth labels for those examples (`y_true`). * A full list of the labels or class names as strings in `class_names`. For example, `class_names=["cat", "dog", "bird"]`, if index 0 is `cat`, 1 is `dog`, 2 is `bird`. Confusion matrix For more information, see the [Confusion Matrix: Usage and Examples](https://wandb.ai/wandb/plots/reports/Confusion-Matrix--VmlldzozMDg1NTM) report. [Run the code](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb-log/Log_a_Confusion_Matrix_with_W%26B.ipynb) ### Interactive custom charts For full customization, tweak a built-in [Custom Chart preset](/models/app/features/custom-charts/walkthrough/) or create a new preset, then save the chart. Use the chart ID to log data to that custom preset directly from your script. ```python theme={null} import wandb # Create a table with the columns to plot. table = wandb.Table(data=data, columns=["step", "height"]) # Map from the table's columns to the chart's fields. fields = {"x": "step", "value": "height"} # Use the table to populate the new custom chart preset. # To use your own saved chart preset, change the vega_spec_name. # To edit the title, change the string_fields. my_custom_chart = wandb.plot_table( vega_spec_name="carey/new_chart", data_table=table, fields=fields, string_fields={"title": "Height Histogram"}, ) with wandb.init() as run: # Log the custom chart. run.log({"my_custom_chart": my_custom_chart}) ``` [Run the code](https://tiny.cc/custom-charts) ### Matplotlib and Plotly plots Instead of using W\&B [Custom Charts](/models/app/features/custom-charts/walkthrough/) with `wandb.plot()`, you can log charts generated with [matplotlib](https://matplotlib.org/) and [Plotly](https://plotly.com/). ```python theme={null} import wandb import matplotlib.pyplot as plt with wandb.init() as run: # Create a simple matplotlib plot. plt.figure() plt.plot([1, 2, 3, 4]) plt.ylabel("some interesting numbers") # Log the plot to W&B. run.log({"chart": plt}) ``` Pass a `matplotlib` plot or figure object to `wandb.Run.log()`. By default we'll convert the plot into a [Plotly](https://plot.ly/) plot. If you'd rather log the plot as an image, you can pass the plot into `wandb.Image`. We also accept Plotly charts directly. If you get an error like "You attempted to log an empty plot", store the figure separately from the plot with `fig = plt.figure()` and then log `fig` in your call to `wandb.Run.log()`. ### Log custom HTML to W\&B Tables W\&B supports logging interactive charts from Plotly and Bokeh as HTML and adding them to Tables. #### Log Plotly figures to Tables as HTML You can log interactive Plotly charts to W\&B Tables by converting them to HTML. ```python theme={null} import wandb import plotly.express as px # Initialize a new run. with wandb.init(project="log-plotly-fig-tables", name="plotly_html") as run: # Create a table. table = wandb.Table(columns=["plotly_figure"]) # Create path for Plotly figure. path_to_plotly_html = "./plotly_figure.html" # Example Plotly figure. fig = px.scatter(x=[0, 1, 2, 3, 4], y=[0, 1, 4, 9, 16]) # Write Plotly figure to HTML. # Set auto_play to False prevents animated Plotly charts # from playing in the table automatically. fig.write_html(path_to_plotly_html, auto_play=False) # Add Plotly figure as HTML file into Table. table.add_data(wandb.Html(path_to_plotly_html)) # Log table. run.log({"test_table": table}) ``` #### Log Bokeh figures to Tables as HTML You can log interactive Bokeh charts to W\&B Tables by converting them to HTML. ```python theme={null} from scipy.signal import spectrogram import holoviews as hv import panel as pn from scipy.io import wavfile import numpy as np from bokeh.resources import INLINE hv.extension("bokeh", logo=False) import wandb def save_audio_with_bokeh_plot_to_html(audio_path, html_file_name): sr, wav_data = wavfile.read(audio_path) duration = len(wav_data) / sr f, t, sxx = spectrogram(wav_data, sr) spec_gram = hv.Image((t, f, np.log10(sxx)), ["Time (s)", "Frequency (hz)"]).opts( width=500, height=150, labelled=[] ) audio = pn.pane.Audio(wav_data, sample_rate=sr, name="Audio", throttle=500) slider = pn.widgets.FloatSlider(end=duration, visible=False) line = hv.VLine(0).opts(color="white") slider.jslink(audio, value="time", bidirectional=True) slider.jslink(line, value="glyph.location") combined = pn.Row(audio, spec_gram * line, slider).save(html_file_name) html_file_name = "audio_with_plot.html" audio_path = "hello.wav" save_audio_with_bokeh_plot_to_html(audio_path, html_file_name) wandb_html = wandb.Html(html_file_name) with wandb.init(project="audio_test") as run: my_table = wandb.Table(columns=["audio_with_plot"], data=[[wandb_html]]) run.log({"audio_table": my_table}) ``` # Track CSV files with experiments Source: https://docs.wandb.ai/models/track/log/working-with-csv Import CSV files into W&B as Tables and Artifacts for visualization, comparison, and analysis in dashboards. Use the W\&B Python Library to log a CSV file and visualize it in a [W\&B Dashboard](/models/track/workspaces/). W\&B Dashboard are the central place to organize and visualize results from your machine learning models. This is particularly useful if you have a [CSV file that contains information of previous machine learning experiments](#import-and-log-your-csv-of-experiments) that are not logged in W\&B or if you have [CSV file that contains a dataset](#import-and-log-your-dataset-csv-file). ## Import and log your dataset CSV file We suggest you use W\&B Artifacts to make the contents of the CSV file easier to re-use. 1. To get started, first import your CSV file. In the following code snippet, replace the `iris.csv` filename with the name of your CSV filename: ```python theme={null} import wandb import pandas as pd # Read our CSV into a new DataFrame new_iris_dataframe = pd.read_csv("iris.csv") ``` 2. Convert the CSV file to a W\&B Table to utilize [W\&B Dashboards](/models/track/workspaces/). ```python theme={null} # Convert the DataFrame into a W&B Table iris_table = wandb.Table(dataframe=new_iris_dataframe) ``` 3. Next, create a W\&B Artifact and add the table to the Artifact: ```python theme={null} # Add the table to an Artifact to increase the row # limit to 200000 and make it easier to reuse iris_table_artifact = wandb.Artifact("iris_artifact", type="dataset") iris_table_artifact.add(iris_table, "iris_table") # Log the raw csv file within an artifact to preserve our data iris_table_artifact.add_file("iris.csv") ``` For more information about W\&B Artifacts, see the [Artifacts chapter](/models/artifacts/). 4. Lastly, start a new W\&B Run to track and log to W\&B with `wandb.init()`: ```python theme={null} # Start a W&B run to log data with wandb.init(project="tables-walkthrough") as run: # Log the table to visualize with a run... run.log({"iris": iris_table}) # and Log as an Artifact to increase the available row limit! run.log_artifact(iris_table_artifact) ``` The `wandb.init()` API spawns a new background process to log data to a Run, and it synchronizes data to wandb.ai (by default). View live visualizations on your W\&B Workspace Dashboard. The following image demonstrates the output of the code snippet demonstration. CSV file imported into W&B Dashboard The full script with the preceding code snippets is found below: ```python theme={null} import wandb import pandas as pd # Read our CSV into a new DataFrame new_iris_dataframe = pd.read_csv("iris.csv") # Convert the DataFrame into a W&B Table iris_table = wandb.Table(dataframe=new_iris_dataframe) # Add the table to an Artifact to increase the row # limit to 200000 and make it easier to reuse iris_table_artifact = wandb.Artifact("iris_artifact", type="dataset") iris_table_artifact.add(iris_table, "iris_table") # log the raw csv file within an artifact to preserve our data iris_table_artifact.add_file("iris.csv") # Start a W&B run to log data with wandb.init(project="tables-walkthrough") as run: # Log the table to visualize with a run... run.log({"iris": iris_table}) # and Log as an Artifact to increase the available row limit! run.log_artifact(iris_table_artifact) ``` ## Import and log your CSV of Experiments In some cases, you might have your experiment details in a CSV file. Common details found in such CSV files include: * A name for the experiment run * Initial [notes](/models/runs/#add-a-note-to-a-run) * [Tags](/models/runs/tags/) to differentiate the experiments * Configurations needed for your experiment (with the added benefit of being able to utilize our [Sweeps Hyperparameter Tuning](/models/sweeps/)). | Experiment | Model Name | Notes | Tags | Num Layers | Final Train Acc | Final Val Acc | Training Losses | | ------------ | ---------------- | ------------------------------------------------ | ------------- | ---------- | --------------- | ------------- | ------------------------------------- | | Experiment 1 | mnist-300-layers | Overfit way too much on training data | \[latest] | 300 | 0.99 | 0.90 | \[0.55, 0.45, 0.44, 0.42, 0.40, 0.39] | | Experiment 2 | mnist-250-layers | Current best model | \[prod, best] | 250 | 0.95 | 0.96 | \[0.55, 0.45, 0.44, 0.42, 0.40, 0.39] | | Experiment 3 | mnist-200-layers | Did worse than the baseline model. Need to debug | \[debug] | 200 | 0.76 | 0.70 | \[0.55, 0.45, 0.44, 0.42, 0.40, 0.39] | | ... | ... | ... | ... | ... | ... | ... | | | Experiment N | mnist-X-layers | NOTES | ... | ... | ... | ... | \[..., ...] | W\&B can take CSV files of experiments and convert it into a W\&B Experiment Run. The following code snippets and code script demonstrates how to import and log your CSV file of experiments: 1. To get started, first read in your CSV file and convert it into a Pandas DataFrame. Replace `"experiments.csv"` with the name of your CSV file: ```python theme={null} import wandb import pandas as pd FILENAME = "experiments.csv" loaded_experiment_df = pd.read_csv(FILENAME) PROJECT_NAME = "Converted Experiments" EXPERIMENT_NAME_COL = "Experiment" NOTES_COL = "Notes" TAGS_COL = "Tags" CONFIG_COLS = ["Num Layers"] SUMMARY_COLS = ["Final Train Acc", "Final Val Acc"] METRIC_COLS = ["Training Losses"] # Format Pandas DataFrame to make it easier to work with for i, row in loaded_experiment_df.iterrows(): run_name = row[EXPERIMENT_NAME_COL] notes = row[NOTES_COL] tags = row[TAGS_COL] config = {} for config_col in CONFIG_COLS: config[config_col] = row[config_col] metrics = {} for metric_col in METRIC_COLS: metrics[metric_col] = row[metric_col] summaries = {} for summary_col in SUMMARY_COLS: summaries[summary_col] = row[summary_col] ``` 2. Next, start a new W\&B Run to track and log to W\&B with [`wandb.init()`](/models/ref/python/functions/init): ```python theme={null} with wandb.init( project=PROJECT_NAME, name=run_name, tags=tags, notes=notes, config=config ) as run: ``` As an experiment runs, you might want to log every instance of your metrics so they are available to view, query, and analyze with W\&B. Use the [`run.log()`](/models/ref/python/experiments/run/#method-runlog) command to accomplish this: ```python theme={null} run.log({key: val}) ``` You can optionally log a final summary metric to define the outcome of the run using the [`define_metric`](/models/ref/python/experiments/run#define_metric) API. This example adds the summary metrics to our run with `run.summary.update()`: ```python theme={null} run.summary.update(summaries) ``` For more information about summary metrics, see [Log Summary Metrics](./log-summary). Below is the full example script that converts the above sample table into a [W\&B Dashboard](/models/track/workspaces/): ```python theme={null} FILENAME = "experiments.csv" loaded_experiment_df = pd.read_csv(FILENAME) PROJECT_NAME = "Converted Experiments" EXPERIMENT_NAME_COL = "Experiment" NOTES_COL = "Notes" TAGS_COL = "Tags" CONFIG_COLS = ["Num Layers"] SUMMARY_COLS = ["Final Train Acc", "Final Val Acc"] METRIC_COLS = ["Training Losses"] for i, row in loaded_experiment_df.iterrows(): run_name = row[EXPERIMENT_NAME_COL] notes = row[NOTES_COL] tags = row[TAGS_COL] config = {} for config_col in CONFIG_COLS: config[config_col] = row[config_col] metrics = {} for metric_col in METRIC_COLS: metrics[metric_col] = row[metric_col] summaries = {} for summary_col in SUMMARY_COLS: summaries[summary_col] = row[summary_col] with wandb.init( project=PROJECT_NAME, name=run_name, tags=tags, notes=notes, config=config ) as run: for key, val in metrics.items(): if isinstance(val, list): for _val in val: run.log({key: _val}) else: run.log({key: val}) run.summary.update(summaries) ``` # Projects Source: https://docs.wandb.ai/models/track/project-page Compare versions of your model, explore results in a scratch workspace, and export findings to a report to save notes and visualizations A *project* is a central location where you visualize results, compare experiments, view and download artifacts, create an automation, and more. Each project has a visibility setting that determines who can access it. For more information about who can access a project, see [Project visibility](/platform/hosting/iam/access-management/restricted-projects). Each project contains the following tabs: * [Overview](/models/track/project-page/#overview-tab): snapshot of your project * [Workspace](/models/track/project-page/#workspace-tab): personal visualization sandbox * [Runs](#runs-tab): A table that lists all the runs in your project * [Automations](#automations-tab): Automations configured in your project * [Sweeps](/models/sweeps): automated exploration and optimization * [Reports](/models/track/project-page/#reports-tab): saved snapshots of notes, runs, and graphs * [Artifacts](#artifacts-tab): Contains all runs and the artifacts associated with that run ## Overview tab W\&B creates a project for you when you initialize a run with the name you provide for the project field. From the **Overview** tab, you can find the project name and manage the project. * To change the project's name, description, or team, click **Edit** in the upper right corner. * To undelete recently deleted runs, click the **action ()** menu in the upper right corner, then click **Undelete recently deleted runs**. * To delete the project, click the **action ()** menu in the upper right corner, then click **Delete project**. Read the confirmation dialog and follow the instructions. Deleting a project is not reversible. The rest of the **Overview** page is divided into **Details** and **Project roles** tabs. ### Details Details about the project include: * **Project visibility**: The visibility of the project. The visibility setting that determines who can access it. See [Project visibility](/platform/hosting/iam/access-management/restricted-projects) for more information. * **Last active**: Timestamp of the last time data is logged to this project * **Contributors**: The number of users that contribute to this project * **Total runs**: The total number of runs in this project * **Total compute**: we add up all the run times in your project to get this total The **Details** tab also includes instructions for accessing and exporting the project's data and metrics, downloading the best model from a sweep, and more. [View a live example project](https://wandb.ai/example-team/sweep-demo/overview) Project overview tab To change a project's privacy from the **Overview** tab: 1. In the W\&B App, from any page in the project, click **Overview** in the left navigation. 2. At the top right, click **Edit**. 3. Choose a new value for **Project visibility**: * **Team** (default): Only your team can view and edit the project. * **Restricted**: Only invited members can access the project, and public access is turned off. * **Open**: Anyone can submit runs or create reports, but only your team can edit it. Appropriate only for classroom settings, public benchmark competitions, or other non-durable contexts. * **Public**: Anyone can view the project, but only your team can edit it. If your W\&B admins have turned off **Public** visibility, you cannot choose it. Instead, you can share a view-only [W\&B Report](/models/reports/collaborate-on-reports#share-a-report), or contact your W\&B organization's admins for assistance. 4. Click **Save**. If you update a project to a more strict privacy setting, you may need to re-invite individual users to restore their ability to access the project. ### Project roles The **Project roles** tab is visible only to the project owner and those with the **Admin** role. List and search for users with access to the project or change a member's role. ## Workspace tab A project's *workspace* gives you a personal sandbox to compare experiments. Use projects to organize models that can be compared, working on the same problem with different architectures, hyperparameters, datasets, preprocessing etc. **Runs sidebar**: list of all the runs in your project. * **action ()** menu: Hover over a row in the sidebar to see the menu appear on the left side. Use this menu to rename a run, delete a run, or stop and active run. * **Visibility icon**: click the eye to turn on and off runs on graphs * **Color**: change the run color to another one of our presets or a custom color * **Search**: search runs by name. This also filters visible runs in the plots. * **Filter**: use the sidebar filter to narrow down the set of runs visible * **Group**: select a config column to dynamically group your runs, for example by architecture. Grouping makes plots show up with a line along the mean value, and a shaded region for the variance of points on the graph. * **Sort**: pick a value to sort your runs by, for example runs with the lowest loss or highest accuracy. Sorting will affect which runs show up on the graphs. * **Expand button**: expand the sidebar into the full table * **Minimize**: press **Cmd+.** (macOS) or **Ctrl+.** (Windows/Linux) to collapse or restore the Runs selector. See [Keyboard shortcuts](/models/app/keyboard-shortcuts) for details. * **Run count**: the number in parentheses at the top is the total number of runs in the project. The number (N visualized) is the number of runs that have the eye turned on and are available to be visualized in each plot. In the example below, the graphs are only showing the first 10 of 183 runs. Edit a graph to increase the max number of runs visible. If you pin, hide, or change the order of columns in the [Runs tab](#runs-tab), the Runs sidebar reflects these customizations. **Panels layout**: use this scratch space to explore results, add and remove charts, and compare versions of your models based on different metrics [View a live example](https://wandb.ai/example-team/sweep-demo) Project workspace ### Add a section of panels Click the section dropdown menu and click **Add section** to create a new section for panels. You can rename sections, drag them to reorganize them, and expand and collapse sections. Each section has options in the upper right corner: * **Add section**: Add a section above or below from the dropdown menu, or click the button at the bottom of the page to add a new section. * **Rename section**: Change the title for your section. * **Export section to report**: Save this section of panels to a new report. * **Delete section**: Remove the whole section and all the charts. This can be undone with the undo button at the bottom of the page in the workspace bar. * **Add panel**: Click the plus button to add a panel to the section. ### Move panels between sections Drag and drop panels to reorder and organize into sections. You can also click the **Move** button in the upper right corner of a panel to select a section to move the panel to. Moving panels between sections ### Resize panels All panels maintain the same size, and there are pages of panels. Resize the section by clicking and dragging the lower right corner of the section, which will display a corner icon when hovering. Resizing panels ### Search for metrics Use the search box in the workspace to filter down the panels. This search matches the panel titles, which are by default the name of the metrics visualized. Workspace search ## Runs tab Use the Runs tab to filter, group, and sort your runs. Runs table The following tabs demonstrate some common actions you can take in the Runs tab. The Runs tab shows details about runs in the project. It shows a large number of columns by default. When you customize the Runs tab, the customization is also reflected in the **Runs** selector of the [Workspace tab](#workspace-tab). * To view all visible columns, scroll the page horizontally. * To change the order of the columns, drag a column to the left or right. * To pin a column, hover over the column name, click the **action ()** menu that appears, then click **Pin column**. Pinned columns appear near the left of the page, after the **Name** column. To unpin a pinned column, choose **Unpin column**. * To hide a column, hover over the column name, click the **action ()** menu that appears, then click **Hide column**. To view all columns that are currently hidden, click **Columns**. * To show, hide, pin, and unpin multiple columns at once, click **Columns**. * Click the name of a hidden column to unhide it. * Click the name of a visible column to hide it. * Click the pin icon next to a visible column to pin it. Sort all rows in a Table by the value in a given column using one of these options: 1. Click the **action ()** menu when hovering over any column header to quickly sort the table by that column's values. 2. Click the **Sort** button and select the desired columns from the drop-down. You can use this panel to sort by multiple columns in priority order. Configuring a run table sort The preceding image demonstrates how to view sorting options for a Table column called `val_acc`. Apply filter expressions to narrow down the runs you want to visualize. Configuring a run table filter See [Create a filter expression](/models/runs/filter-runs#create-a-filter-expression) for more information on how to write filter expressions. Click the **Group** button above the table to group all rows by the value in a particular column. For more information on how to group runs, see [Organize runs into groups](/models/runs/grouping). ## Automations tab Automate downstream actions for versioning artifacts. To create an automation, define trigger events and resulting actions. Actions include executing a webhook or launching a W\&B job. For more information, see [Automations](/models/automations). ## Sweeps tab Start a new [sweep](/models/sweeps) from your project. Sweeps tab ## Reports tab See all the snapshots of results in one place, and share findings with your team. Reports tab To view a report, select the **Reports** tab from the project sidebar, then select a report from the list. ## Artifacts tab View all [artifacts](/models/artifacts) associated with a project, from training datasets and [fine-tuned models](/models/registry) to [tables of metrics and media](/models/tables/tables-walkthrough). ### Metadata panel Artifact metadata panel The metadata panel provides access to the artifact's metadata. This metadata might include configuration arguments required to reconstruct the artifact, URLs where more information can be found, or metrics produced during the run which logged the artifact. View the configuration for the run that produced the artifact as well as history metrics. To view an artifact's metadata: 1. Select the **Artifacts** tab from the project sidebar. 2. Select an artifact from the list to view the details page for the latest version of that artifact. 3. Select the **Metadata** tab to view the metadata associated with that artifact. ### Usage panel Artifact usage panel The usage panel provides a code snippet for downloading the artifact for use outside of the web app, for example on a local machine. This section also indicates and links to the run which output the artifact and any runs which use the artifact as an input. To view the artifact usage: 1. Select the **Artifacts** tab from the project sidebar. 2. Select an artifact from the list to view the details page for the latest version of that artifact. 3. Select the **Usage** tab to view the code snippet and related runs. ### Files panel Artifact files panel The files panel lists the files and folders associated with the artifact. W\&B uploads certain files for a run automatically. For example, `requirements.txt` shows the versions of each library the run used, and `wandb-metadata.json`, and `wandb-summary.json` include information about the run. Other files may be uploaded, such as artifacts or media, depending on the run's configuration. To view files logged to an artifact: 1. Select the **Artifacts** tab from the project sidebar. 2. Select an artifact from the list to view the details page for the latest version of that artifact. 3. Select the **Files** tab to view all files associated with that artifact. ### Lineage panel Artifact lineage The lineage panel provides a view of all of the artifacts associated with a project and the runs that connect them to each other. It shows run types as blocks and artifacts as circles, with arrows to indicate when a run of a given type consumes or produces an artifact of a given type. The type of the particular artifact selected in the left-hand column is highlighted. Click the Explode toggle to view all of the individual artifact versions and the specific runs that connect them. ### Versions tab Artifact versions tab The versions tab shows all versions of the artifact. Select an artifact to view the details for that specific version. 1. Select the **Artifacts** tab from the project sidebar. 2. Select an artifact from the list to view the details page for the latest version of that artifact. 3. Select the **Versions** tab to view all versions of that artifact. 4. From the dropdown (next to the artifact name), select **All Versions**. For example, the previous image shows different model artifact versions for a model artifact named `"zoo-wyhak4p0"`. ## Create a project You can create a project in the W\&B App or programmatically by specifying a project in a call to `wandb.init()`. In the W\&B App, you can create a project from the **Projects** page or from a team's landing page. From the **Projects** page: 1. Click the global navigation icon in the upper left. The project sidebar opens. 2. In the **Projects** section of the navigation, click **View all** to open the project overview page. 3. Click **Create new project**. 4. Set **Team** to the name of the team that will own the project. 5. Specify a name for your project using the **Name** field. 6. Set **Project visibility**, which defaults to **Team**. 7. Optionally, provide a **Description**. 8. Click **Create project**. From a team's landing page: 1. Click the global navigation icon in the upper left. The project sidebar opens. 2. In the **Teams** section of the navigation, click the name of a team to open its landing page. 3. In the landing page, click **Create new project**. 4. **Team** is automatically set to the team that owns the landing page you were viewing. If necessary, change the team. 5. Specify a name for your project using the **Name** field. 6. Set **Project visibility**, which defaults to **Team**. 7. Optionally, provide a **Description**. 8. Click **Create project**. To create a project programmatically, specify a `project` when calling `wandb.init()`. If the project does not yet exist, it is created automatically, and is owned by the specified entity. For example: ```python theme={null} import wandb with wandb.init(entity="", project="") as run: run.log({"accuracy": .95}) ``` Refer to the [`wandb.init()` API reference](/models/ref/python/functions/init/#examples). ## Organization home and Recent Activity In the W\&B App, the organization home page summarizes recent work. The **Recent Activity** viewer shows recent runs. Choose **Your projects** to see your own runs, or **Organization** to see recent runs by your colleagues. * **Search** to filter the activity list. * **Sortable columns** let you reorder rows by the column you care about. * **Notes** and **tags** columns display automatically if a run includes these details. Below the **Recent Activity** viewer, quickly access your recently created reports. ## Star a project Add a star to a project to mark that project as important. Starred projects are grouped at the top of the **Projects** page. Quickly access up to 10 recent projects in the **Projects** list in the top-left **Global navigation menu**, with starred projects listed first. There are two ways to mark a project as important: within a project's overview tab or within your team's profile page. 1. Navigate to your W\&B project on the W\&B App at `https://wandb.ai//`. 2. Select the **Overview** tab from the project sidebar. 3. Click the star icon in the upper right corner next to the **Edit** button. Star project from overview 1. Navigate to your team's profile page at `https://wandb.ai//projects`. 2. Select the **Projects** tab. 3. Hover your mouse next to the project you want to star. Click on star icon that appears. For example, the following image shows the star icon next to the "Compare\_Zoo\_Models" project. Star project from team page Confirm that your project is available.. In the **Project** section of the left navigation, click **View all** to open the **Project** page. Search or filter to find your project by its name, team, or other metadata. Starred projects appear at the top of the**Projects** page, and at the top of the **Projects** section of the left navigation. ## Delete a project You can delete your project using the **action ()** menu on the right of the overview tab. 1. Navigate to your W\&B project 2. Select the **Overview** tab from the project sidebar. 3. Click the **action ()** menu in the upper right hand corner. 4. Select **Delete project** from the dropdown menu. Project overview menu with delete project option in the dropdown W\&B does not terminate active [sweeps](/models/sweeps) or agents when you delete a project. ## Add notes to a project Add notes to your project either as a description overview or as a markdown panel within your workspace. ### Add description overview to a project Descriptions you add to your page appear in the **Overview** tab of your profile. 1. Navigate to your W\&B project 2. Select the **Overview** tab from the project sidebar 3. Click **Edit** in the upper right hand corner 4. Add your notes in the **Description** field 5. Select the **Save** button **Create reports to create descriptive notes comparing runs** You can also create a W\&B Report to add plots and markdown side by side. Use different sections to show different runs, and tell a story about what you worked on. ### Add notes to run workspace 1. Navigate to your W\&B project 2. Select the **Workspace** tab from the project sidebar 3. Click the **Add panels** button from the top right corner 4. Select the **TEXT AND CODE** dropdown from the modal that appears 5. Select **Markdown** 6. Add your notes in the markdown panel that appears in your workspace # Import and export data Source: https://docs.wandb.ai/models/track/public-api-guide Import data from MLFlow, export or update data that you have saved to W&B Export data or import data with W\&B Public APIs. This feature requires python>=3.8 ## Import data from MLFlow W\&B supports importing data from MLFlow, including experiments, runs, artifacts, metrics, and other metadata. Install dependencies: ```shell theme={null} # note: this requires py38+ pip install wandb[importers] ``` Log in to W\&B. Follow the prompts if you have not logged in before. ```shell theme={null} wandb login ``` Import all runs from an existing MLFlow server: ```py theme={null} from wandb.apis.importers.mlflow import MlflowImporter importer = MlflowImporter(mlflow_tracking_uri="...") runs = importer.collect_runs() importer.import_runs(runs) ``` By default, `importer.collect_runs()` collects all runs from the MLFlow server. If you prefer to upload a special subset, you can construct your own runs iterable and pass it to the importer. ```py theme={null} import mlflow from wandb.apis.importers.mlflow import MlflowRun client = mlflow.tracking.MlflowClient(mlflow_tracking_uri) runs: Iterable[MlflowRun] = [] for run in mlflow_client.search_runs(...): runs.append(MlflowRun(run, client)) importer.import_runs(runs) ``` You might need to [configure the Databricks CLI first](https://docs.databricks.com/dev-tools/cli/index.html) if you import from Databricks MLFlow. Set `mlflow-tracking-uri="databricks"` in the previous step. To skip importing artifacts, you can pass `artifacts=False`: ```py theme={null} importer.import_runs(runs, artifacts=False) ``` To import to a specific W\&B entity and project, you can pass a `Namespace`: ```py theme={null} from wandb.apis.importers import Namespace importer.import_runs(runs, namespace=Namespace(entity, project)) ``` ## Export data Use the Public API to export or update data that you have saved to W\&B. Before using this API, log data from your script. Check the [Quickstart](/models/quickstart/) for more details. **Use Cases for the Public API** * **Export Data**: Pull down a dataframe for custom analysis in a Jupyter Notebook. Once you have explored the data, you can sync your findings by creating a new analysis run and logging results, for example: `wandb.init(job_type="analysis")` * **Update Existing Runs**: You can update the data logged in association with a W\&B run. For example, you might want to update the config of a set of runs to include additional information, like the architecture or a hyperparameter that wasn't originally logged. See the [Generated Reference Docs](/models/ref/python/public-api/) for details on available functions. ### Create an API key An API key authenticates your machine to W\&B. To create an API key, select the **Personal API key** or **Service Account API key** tab for details. To create a personal API key owned by your user ID: 1. Log in to W\&B, click your user profile icon, then click **User Settings**. 2. Click **Create new API key**. 3. Provide a descriptive name for your API key. 4. Click **Create**. 5. Copy the displayed API key immediately and store it securely. To create an API key owned by a service account: 1. Navigate to the **Service Accounts** tab in your team or organization settings. 2. Find the service account in the list. 3. Click the **action ()** menu, then click **Create API key**. 4. Provide a name for the API key, then click **Create**. 5. Copy the displayed API key immediately and store it securely. 6. Click **Done**. You can create multiple API keys for a single service account to support different environments or workflows. The full API key is only shown once at creation time. After you close the dialog, you cannot view the full API key again. Only the key ID (first part of the key) is visible in your settings. If you lose the full API key, you must create a new API key. For secure storage options, see [Store API keys securely](/platform/app/settings-page/user-settings/#store-and-handle-api-keys-securely). ## Store and handle API keys securely API keys provide access to your W\&B account and should be protected like passwords. Follow these best practices: ### Recommended storage methods * **Secrets manager**: Use a dedicated secrets management system such as [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/), [HashiCorp Vault](https://developer.hashicorp.com/vault), [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault), or [Google Secret Manager](https://cloud.google.com/security/products/secret-manager). * **Password manager**: Use a reputable password manager application. * **OS-level keychains**: Store keys in macOS Keychain, Windows Credential Manager, or Linux secret service. Not suggested for production. ### What to avoid * Never commit API keys to version control systems such as Git. * Do not store API keys in plain text configuration files. * Do not pass API keys on the command line, because they will be visible in the output of OS commands like `ps`. * Avoid sharing API keys through email, chat, or other unencrypted channels. * Do not hard-code API keys in your source code. If an API key is exposed, delete the API key from your W\&B account immediately and contact [support](mailto:support@wandb.ai) or your AISE. ### Environment variables When using API keys in your code, pass them through environment variables: ```bash theme={null} export WANDB_API_KEY="your-api-key-here" ``` This approach keeps keys out of your source code and makes it easier to rotate them when needed. Avoid setting the environment variable in line with the command, because it will be visible in the output of OS commands like `ps`: ```bash theme={null} # Avoid this pattern, which can expose the API key in process managers export WANDB_API_KEY="your-api-key-here" ./my-script.sh ``` ### SDK version compatibility New API keys are longer than legacy keys. When authenticating with older versions of the `wandb` or `weave` SDKs, you may encounter an API key length error. **Solution**: Update to a newer SDK version: * `wandb` SDK v0.22.3+ ```bash theme={null} pip install --upgrade wandb==0.22.3 ``` * `weave` SDK v0.52.17+ ```bash theme={null} pip install --upgrade weave==0.52.17 ``` If you cannot upgrade the SDK immediately, set the API key using the `WANDB_API_KEY` environment variable as a workaround. ### Find the run path To use the Public API, you'll often need the run path which is `//`. In the app UI, open a run page and click the [Overview tab ](/models/track/public-api-guide/#overview-tab)to get the run path. ### Export run data Download data from a finished or active run. Common usage includes downloading a dataframe for custom analysis in a Jupyter notebook, or using custom logic in an automated environment. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") ``` The most commonly used attributes of a run object are: | Attribute | Meaning | | --------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `run.config` | A dictionary of the run's configuration information, such as the hyperparameters for a training run or the preprocessing methods for a run that creates a dataset Artifact. Think of these as the run's inputs. | | `run.history()` | A list of dictionaries meant to store values that change while the model is training such as loss. The command `run.log()` appends to this object. | | `run.summary` | A dictionary of information that summarizes the run's results. This can be scalars like accuracy and loss, or large files. By default, `run.log()` sets the summary to the final value of a logged time series. The contents of the summary can also be set directly. Think of the summary as the run's outputs. | You can also modify or update the data of past runs. By default a single instance of an api object will cache all network requests. If your use case requires real time information in a running script, call `api.flush()` to get updated values. ### Understanding different run attributes The following code snippet shows how to create a run, log some data, and then access the run's attributes: ```python theme={null} import wandb import random with wandb.init(project="public-api-example") as run: n_epochs = 5 config = {"n_epochs": n_epochs} run.config.update(config) for n in range(run.config.get("n_epochs")): run.log( {"val": random.randint(0, 1000), "loss": (random.randint(0, 1000) / 1000.00)} ) ``` The following sections describe the different outputs for the above run object attributes ##### `run.config` ```python theme={null} {"n_epochs": 5} ``` #### `run.summary` ```python theme={null} { "_step": 4, "_timestamp": 1644345412, "_wandb": {"runtime": 3}, "loss": 0.041, "val": 525, } ``` ### Sampling The default history method samples the metrics to a fixed number of samples (the default is 500, you can change this with the `samples` \_\_ argument). If you want to export all of the data on a large run, you can use the `run.scan_history()` method. For more details see the [API Reference](/models/ref/python/public-api). ### Querying multiple runs This example script finds a project and outputs a CSV of runs with name, configs and summary stats. Replace `` and `` with your W\&B entity and the name of your project, respectively. ```python theme={null} import pandas as pd import wandb api = wandb.Api() entity, project = "", "" runs = api.runs(entity + "/" + project) summary_list, config_list, name_list = [], [], [] for run in runs: # .summary contains output keys/values for # metrics such as accuracy. # We call ._json_dict to omit large files summary_list.append(run.summary._json_dict) # .config contains the hyperparameters. # We remove special values that start with _. config_list.append({k: v for k, v in run.config.items() if not k.startswith("_")}) # .name is the human-readable name of the run. name_list.append(run.name) runs_df = pd.DataFrame( {"summary": summary_list, "config": config_list, "name": name_list} ) runs_df.to_csv("project.csv") run.finish() ``` The W\&B API also provides a way for you to query across runs in a project with api.runs(). The most common use case is exporting runs data for custom analysis. The query interface is the same as the one [MongoDB uses](https://www.mongodb.com/docs/manual/reference/mql/query-predicates/). ```python theme={null} runs = api.runs( "username/project", {"$or": [{"config.experiment_name": "foo"}, {"config.experiment_name": "bar"}]}, ) print(f"Found {len(runs)} runs") ``` Calling `api.runs` returns a `Runs` object that is iterable and acts like a list. By default the object loads 50 runs at a time in sequence as required, but you can change the number loaded per page with the `per_page` keyword argument. `api.runs` also accepts an `order` keyword argument. The default order is `-created_at`. To order results ascending, specify `+created_at`. You can also sort by config or summary values. For example, `summary.val_acc` or `config.experiment_name`. ### Error handling If errors occur while talking to W\&B servers a `wandb.CommError` will be raised. The original exception can be introspected via the `exc` attribute. ### Get the latest git commit through the API In the UI, click on a run and then click the Overview tab on the run page to see the latest git commit. It's also in the file `wandb-metadata.json` . Using the public API, you can get the git hash with `run.commit`. ### Get a run's name and ID during a run After calling `wandb.init()` you can access the random run ID or the human readable run name from your script like this: * Unique run ID (8 character hash): `run.id` * Random run name (human readable): `run.name` If you're thinking about ways to set useful identifiers for your runs, here's what we recommend: * **Run ID**: leave it as the generated hash. This needs to be unique across runs in your project. * **Run name**: This should be something short, readable, and preferably unique so that you can tell the difference between different lines on your charts. * **Run notes**: This is a great place to put a quick description of what you're doing in your run. You can set this with `wandb.init(notes="your notes here")` * **Run tags**: Track things dynamically in run tags, and use filters in the UI to filter your table down to just the runs you care about. You can set tags from your script and then edit them in the UI, both in the runs table and the overview tab of the run page. See the detailed instructions [here](/models/runs/tags/). ## Public API Examples ### Export data to visualize in matplotlib or seaborn Check out our [API examples](/models/ref/python/public-api/) for some common export patterns. You can also click the download button on a custom plot or on the expanded runs table to download a CSV from your browser. ### Read metrics from a run This example outputs timestamp and accuracy saved with `run.log({"accuracy": acc})` for a run saved to `"//"`. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") if run.state == "finished": for i, row in run.history().iterrows(): print(row["_timestamp"], row["accuracy"]) ``` ### Filter runs You can filter by using the MongoDB Query Language. #### Date ```python theme={null} runs = api.runs( "/", {"$and": [{"created_at": {"$lt": "YYYY-MM-DDT##", "$gt": "YYYY-MM-DDT##"}}]}, ) ``` ### Read specific metrics from a run To pull specific metrics from a run, use the `keys` argument. The default number of samples when using `run.history()` is 500. Logged steps that do not include a specific metric will appear in the output dataframe as `NaN`. The `keys` argument will cause the API to sample steps that include the listed metric keys more frequently. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") if run.state == "finished": for i, row in run.history(keys=["accuracy"]).iterrows(): print(row["_timestamp"], row["accuracy"]) ``` ### Compare two runs This will output the config parameters that are different between `run1` and `run2`. ```python theme={null} import pandas as pd import wandb api = wandb.Api() # replace with your , , and run1 = api.run("//") run2 = api.run("//") df = pd.DataFrame([run1.config, run2.config]).transpose() df.columns = [run1.name, run2.name] print(df[df[run1.name] != df[run2.name]]) ``` Outputs: ``` c_10_sgd_0.025_0.01_long_switch base_adam_4_conv_2fc batch_size 32 16 n_conv_layers 5 4 optimizer rmsprop adam ``` ### Update metrics for a run, after the run has finished This example sets the accuracy of a previous run to `0.9`. It also modifies the accuracy histogram of a previous run to be the histogram of `numpy_array`. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") run.summary["accuracy"] = 0.9 run.summary["accuracy_histogram"] = wandb.Histogram(numpy_array) run.summary.update() ``` ### Rename a metric in a completed run This example renames a summary column in your tables. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") run.summary["new_name"] = run.summary["old_name"] del run.summary["old_name"] run.summary.update() ``` Renaming a column only applies to tables. Charts will still refer to metrics by their original names. ### Update config for an existing run This example updates one of your configuration settings. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") run.config["key"] = updated_value run.update() ``` ### Export system resource consumptions to a CSV file The snippet below would find the system resource consumptions and then, save them to a CSV. ```python theme={null} import wandb with wandb.Api().run("//") as run: system_metrics = run.history(stream="events") system_metrics.to_csv("sys_metrics.csv") ``` ### Get unsampled metric data When you pull data from history, by default it's sampled to 500 points. Get all the logged data points using `run.scan_history()`. Here's an example downloading all the `loss` data points logged in history. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") history = run.scan_history() losses = [row["loss"] for row in history] ``` ### Get paginated data from history If metrics are being fetched slowly on our backend or API requests are timing out, you can try lowering the page size in `scan_history` so that individual requests don't time out. The default page size is 500, so you can experiment with different sizes to see what works best: ```python theme={null} import wandb api = wandb.Api() run = api.run("//") run.scan_history(keys=sorted(cols), page_size=100) ``` ### Export metrics from all runs in a project to a CSV file This script pulls down the runs in a project and produces a dataframe and a CSV of runs including their names, configs, and summary stats. Replace `` and `` with your W\&B entity and the name of your project, respectively. ```python theme={null} import pandas as pd import wandb api = wandb.Api() entity, project = "", "" runs = api.runs(entity + "/" + project) summary_list, config_list, name_list = [], [], [] for run in runs: # .summary contains the output keys/values # for metrics such as accuracy. # We call ._json_dict to omit large files summary_list.append(run.summary._json_dict) # .config contains the hyperparameters. # We remove special values that start with _. config_list.append({k: v for k, v in run.config.items() if not k.startswith("_")}) # .name is the human-readable name of the run. name_list.append(run.name) runs_df = pd.DataFrame( {"summary": summary_list, "config": config_list, "name": name_list} ) runs_df.to_csv("project.csv") ``` ### Get the starting time for a run This code snippet retrieves the time at which the run was created. ```python theme={null} import wandb api = wandb.Api() run = api.run("entity/project/run_id") start_time = run.created_at ``` ### Upload files to a finished run The code snippet below uploads a selected file to a finished run. ```python theme={null} import wandb api = wandb.Api() run = api.run("entity/project/run_id") run.upload_file("file_name.extension") ``` ### Download a file from a run This finds the file "model-best.h5" associated with run ID uxte44z7 in the cifar project and saves it locally. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") run.file("model-best.h5").download() ``` ### Download all files from a run This finds all files associated with a run and saves them locally. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") for file in run.files(): file.download() ``` ### Get runs from a specific sweep This snippet downloads all the runs associated with a particular sweep. ```python theme={null} import wandb api = wandb.Api() sweep = api.sweep("//") sweep_runs = sweep.runs ``` ### Get the best run from a sweep The following snippet gets the best run from a given sweep. ```python theme={null} import wandb api = wandb.Api() sweep = api.sweep("//") best_run = sweep.best_run() ``` The `best_run` is the run with the best metric as defined by the `metric` parameter in the sweep config. ### Download the best model file from a sweep This snippet downloads the model file with the highest validation accuracy from a sweep with runs that saved model files to `model.h5`. ```python theme={null} import wandb api = wandb.Api() sweep = api.sweep("//") runs = sorted(sweep.runs, key=lambda run: run.summary.get("val_acc", 0), reverse=True) val_acc = runs[0].summary.get("val_acc", 0) print(f"Best run {runs[0].name} with {val_acc}% val accuracy") runs[0].file("model.h5").download(replace=True) print("Best model saved to model-best.h5") ``` ### Delete all files with a given extension from a run This snippet deletes files with a given extension from a run. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") extension = ".png" files = run.files() for file in files: if file.name.endswith(extension): file.delete() ``` ### Download system metrics data This snippet produces a dataframe with all the system resource consumption metrics for a run and then saves it to a CSV. ```python theme={null} import wandb api = wandb.Api() run = api.run("//") system_metrics = run.history(stream="events") system_metrics.to_csv("sys_metrics.csv") ``` ### Update summary metrics You can pass a dictionary to update summary metrics. ```python theme={null} summary.update({"key": val}) ``` ### Get the command that ran the run Each run captures the command that launched it on the run overview page. To pull this command down from the API, you can run: ```python theme={null} import wandb api = wandb.Api() run = api.run("//") meta = json.load(run.file("wandb-metadata.json").download()) program = ["python"] + [meta["program"]] + meta["args"] ``` # Reproduce experiments Source: https://docs.wandb.ai/models/track/reproduce_experiments Reproduce a teammate's W&B experiment by downloading the associated code, dependencies, and configuration from a run. Reproduce an experiment that a team member creates to verify and validate their results. Before you reproduce an experiment, you need to make note of the: * Name of the project the run was logged to * Name of the run you want to reproduce To reproduce an experiment: 1. Navigate to the project where the run is logged to. 2. Select the **Workspace** tab from the project sidebar. 3. Click the run that you want to reproduce. The run page opens with the **Overview** tab shown by default. To continue, download the experiment's code at a given hash or clone the experiment's entire repository. Download the experiment's Python script or notebook: 1. On the **Overview** tab (shown by default), in the **Command** field, make a note of the name of the script that created the experiment. 2. Select the **Code** tab on the run page. 3. Click **Download** next to the file that corresponds to the script or notebook. Clone the GitHub repository your teammate used when creating the experiment. To do this: 1. If necessary, gain access to the GitHub repository that your teammate used to create the experiment. 2. Copy the **Git repository** field, which contains the GitHub repository URL. 3. Clone the repository: ```bash theme={null} git clone https://github.com/your-repo.git && cd your-repo ``` 4. Copy and paste the **Git state** field into your terminal. The Git state is a set of Git commands that checks out the exact commit that your teammate used to create the experiment. Replace values specified in the following code snippet with your own: ```bash theme={null} git checkout -b "" 0123456789012345678901234567890123456789 ``` 5. On the run page, select the **Files** tab. 6. Download the `requirements.txt` file and store it in your working directory. This directory should contain either the cloned GitHub repository or the downloaded Python script or notebook. 7. (Recommended) Create a Python virtual environment. 8. Install the requirements specified in the `requirements.txt` file. ```bash theme={null} pip install -r requirements.txt ``` 9. Now that you have the code and dependencies, you can run the script or notebook to reproduce the experiment. If you cloned a repository, you might need to navigate to the directory where the script or notebook is located. Otherwise, you can run the script or notebook from your working directory. If you downloaded a Python notebook, navigate to the directory where you downloaded the notebook and run the following command in your terminal: ```bash theme={null} jupyter notebook ``` If you downloaded a Python script, navigate to the directory where you downloaded the script and run the following command in your terminal; Replace values enclosed in `<>` with your own: ```bash theme={null} python .py ``` # View experiments results Source: https://docs.wandb.ai/models/track/workspaces A playground for exploring run data with interactive visualizations W\&B workspace is your personal sandbox to customize charts and explore model results. A W\&B workspace consists of *Tables* and *Panel sections*: * **Tables**: All runs logged to your project are listed in the project's table. Turn on and off runs, change colors, and expand the table to see notes, config, and summary metrics for each run. * **Panel sections**: A section that contains one or more [panels](/models/app/features/panels/). Create new panels, organize them, and export to reports to save snapshots of your workspace. Workspace table and panels ## Workspace types There are two main workspace categories: **Personal workspaces** and **Saved views**. * **Personal workspaces:** A customizable workspace for in-depth analysis of models and data visualizations. Only the owner of the workspace can edit and save changes. Teammates can view a personal workspace but teammates can not make changes to someone else's personal workspace. * **Saved views:** Saved views are collaborative snapshots of a workspace. Anyone on your team can view, edit, and save changes to saved workspace views. Use saved workspace views for reviewing and discussing experiments, runs, and more. The following image shows multiple personal workspaces created by Cécile-parker's teammates. In this project, there are no saved views: No saved views ## Saved workspace views Improve team collaboration with tailored workspace views. Create Saved Views to organize your preferred setup of charts and data. ### Create a new saved workspace view 1. Navigate to a personal workspace or a saved view. 2. Make edits to the workspace. 3. Click the **action ()** menu at the top right corner of your workspace. Click **Save as a new view**. New saved views appear in the workspace navigation menu. Saved views menu ### Update a saved workspace view Saved changes overwrite the previous state of the saved view. Unsaved changes are not retained. To update a saved workspace view in W\&B: 1. Navigate to a saved view. 2. Make the desired changes to your charts and data within the workspace. 3. Click the **Save** button to confirm your changes. A confirmation dialog appears when you save your updates to a workspace view. If you prefer not to see this prompt in the future, select the option **Do not show this modal next time** before confirming your save. ### Delete a saved workspace view Remove saved views that are no longer needed. 1. Navigate to the saved view you want to remove. 2. Select the **menu ()** button at the top right of the view. 3. Choose **Delete view**. 4. Confirm the deletion to remove the view from your workspace menu. ### Share a workspace view Share your customized workspace with your team by sharing the workspace URL directly. All users with access to the workspace project can see the saved Views of that workspace. ## Workspace templates This feature requires an [Enterprise](https://wandb.ai/site/pricing/) license. Use *workspace templates* to create workspaces using the same settings as an existing workspace instead of the [default settings for new workspaces](#default-workspace-settings). ### Default workspace settings W\&B uses the following default settings for workspaces: #### Hide empty sections during search By default, W\&B does not show empty sections in the workspace. You can change this setting to show empty sections in the workspace. #### Sort panels alphabetically By default, W\&B does not sort panels by plot title alphabetically. You can change this setting to sort panels alphabetically by their plot title. #### Section organization By default, W\&B organizes panels into sections based on the first prefix of the metric name. For example, if your workspace includes the metrics `a/b/c/d` and `a/e/f`, W\&B organizes these metrics into a section called `a`. If your workspace includes the metrics `a/b/c/d` and `e/f/g`, W\&B organizes these metrics into sections called `a` and `e`. You can change the default section organization to group by the last prefix. For example, if your workspace includes the metrics `a/b/c/d` and `a/e/f`, W\&B organizes these metrics into sections called `d` and `f`. If your workspace includes the metrics `a/b/c/d` and `e/f/g`, W\&B organizes these metrics into sections called `d` and `g`. #### Line plot settings By default, new workspaces use these default settings for line plots: | X axis | Default | | --------- | ------- | | Value | Step | | Log scale | false | | Y axis | Default | | --------- | ------- | | Log scale | false | | Smoothing | Default | | ----------------------------- | --------------- | | Type | Time weight EMA | | Weight | 0 | | Show original after smoothing | Off | | Max number of runs | Default | | ------------------ | ------- | | Max runs | 10 | | Data | Default | | ----------------- | ------------- | | Point aggregation | Full fidelity | | Grouping | Default | | ---------------------- | ------- | | Use grouping in charts | On | | Group aggregation | Mean | | Display preferences | Default | | ------------------------- | ------- | | Color run names | On | | Display full run name | Off | | Show X range in tooltip | Off | | Tooltip runs | Default | | Sync zoom across charts | Off | | Show highlighted run only | Off | ### Configure your workspace template 1. Open any workspace or create a new one. 2. Click on the **Settings** button at the top right corner of the workspace. 3. Select **Workspace layout** from the panel. 4. Configure the workspace's settings according to your preferences. ### Save your workspace template 1. At the top of the workspace, click the **action ()** menu near the **Undo** and **Redo** arrow icons. 2. Click **Save personal workspace template**. 3. Review the settings for the template, then click **Save**. New workspaces will use these settings instead of the defaults. ### View your workspace template To view your workspace template's current configuration: 1. From any page, select your user icon on the top right corner. From the dropdown, choose **User Settings**. 2. Navigate to the **Personal workspace template** section. If you are using a workspace template, its configuration displays. Otherwise, the section includes no details. ### Update your workspace template To update your workspace template: 1. Open any workspace. 2. Modify the workspace's settings. For example, set the number of runs to include to `11`. 3. To save the changes to the template, click the **action ()** menu near the **Undo** and **Redo** arrow icons, then click **Update personal workspace template**. 4. Verify the settings, then click **Update**. The template is updated, and reapplied to all workspaces that use it. ### Delete your workspace template To delete your workspace template and go back to the default settings: 1. From any page, select your user icon on the top right corner. From the dropdown, choose **User Settings**. 2. Navigate to the **Personal workspace template** section. Your workspace template's configuration displays. 3. Click the trash icon next to **Settings**. For Dedicated Cloud and Self-Managed, deleting your workspace template is supported on v0.70 and above. On older Server versions, update your workspace template to use the [default settings](#default-workspace-settings) instead. ## Programmatically create workspaces For an end-to-end example, see [Programmatic Workspaces](https://colab.research.google.com/github/wandb/wandb-workspaces/blob/Update-wandb-workspaces-tuturial/Workspace_tutorial.ipynb) notebook. [`wandb-workspaces`](https://github.com/wandb/wandb-workspaces/tree/main) is a Python library for programmatically working with [W\&B](https://wandb.ai/) workspaces and reports. Define a workspace programmatically with [`wandb-workspaces`](https://github.com/wandb/wandb-workspaces/tree/main). [`wandb-workspaces`](https://github.com/wandb/wandb-workspaces/tree/main) is a Python library for programmatically working with [W\&B](https://wandb.ai/) workspaces and reports. You can define the workspace's properties, such as: * Set panel layouts, colors, and section orders. * Configure workspace settings like default x-axis, section order, and collapse states. * Add and customize panels within sections to organize workspace views. * Load and modify existing workspaces using a URL. * Save changes to existing workspaces or save as new views. * Filter, group, and sort runs programmatically using simple expressions. * Customize run appearance with settings like colors and visibility. * Copy views from one workspace to another for integration and reuse. ### Install Workspace API In addition to `wandb`, ensure that you install `wandb-workspaces`: ```bash theme={null} pip install wandb wandb-workspaces ``` ### Define and save a workspace view programmatically ```python theme={null} import wandb_workspaces.reports.v2 as ws workspace = ws.Workspace(entity="your-entity", project="your-project", views=[...]) workspace.save() ``` ### Edit an existing view ```python theme={null} existing_workspace = ws.Workspace.from_url("workspace-url") existing_workspace.views[0] = ws.View(name="my-new-view", sections=[...]) existing_workspace.save() ``` ### Copy a workspace `saved view` to another workspace ```python theme={null} old_workspace = ws.Workspace.from_url("old-workspace-url") old_workspace_view = old_workspace.views[0] new_workspace = ws.Workspace(entity="new-entity", project="new-project", views=[old_workspace_view]) new_workspace.save() ``` See [`wandb-workspace examples`](https://github.com/wandb/wandb-workspaces/tree/main/examples/workspaces) for comprehensive workspace API examples. # Settings Source: https://docs.wandb.ai/platform/app/settings-page Use the W&B Settings page to customize your individual user profile or team settings. Use the W\&B Settings page to customize your individual user profile and manage settings for the teams you belong to. The following sections summarize what you can configure at each level, with links to the detailed pages. The user settings page supports the following actions: * [Edit your profile](/platform/app/settings-page/user-settings#profile), including your profile picture, display name, geographic location, biography information, and email addresses associated with your account. * [Configure API keys](/platform/app/settings-page/user-settings#api-keys). * [Configure alerts](/platform/app/settings-page/user-settings#alerts) for runs. * [Link your GitHub repository](/platform/app/settings-page/user-settings#personal-github-integration). * [Delete your account](/platform/app/settings-page/user-settings#delete-your-account). For more information, see [User settings](/platform/app/settings-page/user-settings/). If you belong to one or more teams, the team settings page supports the following actions: * [Invite or remove members](/platform/app/settings-page/teams#members) from a team. * [Manage alerts](/platform/app/settings-page/teams#alerts) for team runs. * [Change privacy settings](/platform/app/settings-page/teams#privacy). * [View and manage team storage](/platform/app/settings-page/teams#usage) usage. For more information about team settings, see [Team settings](/platform/app/settings-page/teams/). # Manage billing settings Source: https://docs.wandb.ai/platform/app/settings-page/billing-settings View plan details, monitor usage, and configure usage and spending alerts for your W&B organization's billing. The **Billing** settings page lets organization and billing admins review the current plan, monitor usage, configure usage and spending alerts, manage payment methods, and access invoices. Use this page to keep your organization within its usage limits and to stay informed about upcoming charges. To open billing settings: 1. In the top right corner, click your user icon. 2. From the dropdown, choose **Billing**, or choose **Settings** and then select the **Billing** tab. ## Plan details The **Plan details** section summarizes your organization's current plan, charges, limits, and usage. From here, you can compare plans or talk to Sales. From the **Plan details** section, you can take the following actions: * For details and a list of users, click **Manage users**. * For details about usage, click **View usage**. * To view the amount of storage your organization uses, both free and paid, see the storage summary. From here, you can purchase additional storage and manage storage that is in use. Learn more about [storage settings](/platform/app/settings-page/storage/). ## Plan usage The **Plan usage** section visually summarizes current usage and displays upcoming usage charges. For detailed insights into usage by month, click **View usage** on an individual tile. To export usage by calendar month, team, or project, click **Export CSV**. For questions about usage or billing, contact your AISE or [Support](mailto:support@wandb.com). ### Usage alerts Usage alerts notify organization and billing admins by email when your organization reaches 85% and 100% of its usage limit for the current billing period in a given category. Configure usage alerts to monitor consumption and avoid unexpected overages. The following categories are tracked: | Category | Description | | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Storage | Total accumulated volume of your organization's stored artifacts and run data, measured in GB. | | Tracked hours | Wall-clock training time during the billing period, measured in hours. | | Weave | Total volume of data imported during the billing period, measured in MB. Shown only if Weave is active in the deployment. | | Inference | Total input and output tokens used by Inference. See [Inference Pricing](https://wandb.ai/site/pricing/inference/). Shown only if Inference is active in the deployment. | | Training | Token-based inference and compute, measured in GPU/hour. See [Serverless RL Pricing](https://wandb.ai/site/pricing/reinforcement-learning/). Shown only if Serverless RL is active in the deployment. | For organizations on the **Enterprise** plan, all **Organization admins** and **Billing admins** receive usage alerts. On the [Pro plan](https://wandb.ai/site/pricing/), only the **Billing admin** receives usage alerts. Messages are sent from `support@wandb.com`. The thresholds are not configurable, but you can selectively turn off one or both alerts per category. To configure usage alerts for your organization, you must be an **Organization admin** or **Billing admin**: 1. In the top right corner, click your user profile icon. 2. In the **Account** section, click **Usage & Alerts**. Alternatively, click **Settings**, then click the **Usage & Alerts** tab. 3. From the sidebar, select a category. 4. Click the **Usage** tab to view your organization's usage and alert settings for that category. 5. Turn on or off the **85% usage alert** or **100% usage alert** for that category. If you turn off both usage alerts, no alerts are sent for that category of usage. ### Spending alerts In a given billing period, spending alerts notify organization and billing admins by email when your organization reaches one or more customizable spending thresholds expressed in USD. Use spending alerts to track costs against a budget you define. You can configure up to five unique spending alerts per category: | Category | Description | | ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Storage | Charges for your organization's storage [usage](#usage-alerts) during the billing period. See [Pricing](https://wandb.ai/site/pricing/). | | Tracked hours | Charges for your organization's training [usage](#usage-alerts) during the billing period. See [Pricing](https://wandb.ai/site/pricing/). | | Weave | Charges for your organization's Weave [usage](#usage-alerts) during the billing period. See [Pricing](https://wandb.ai/site/pricing/). Shown only if Weave is active in the deployment. | | Inference | Charges for your organization's Inference [usage](#usage-alerts) during the billing period. See [Inference Pricing](https://wandb.ai/site/pricing/inference/). Shown only if Inference is active in the deployment. | | Training | Charges for your organization's Serverless RL [usage](#usage-alerts) during the billing period. See [Serverless RL Pricing](https://wandb.ai/site/pricing/reinforcement-learning/). Shown only if Serverless RL is active in the deployment. | Messages are sent from `support@wandb.com`. For organizations on the **Enterprise** plan: * All **Organization admins** and **Billing admins** receive spending alerts. * No spending alerts are configured by default. On the Pro plan: * Only the **Billing admin** receives spending alerts. * By default, spending alerts are configured at $200, $450, $700, and $1,000. You aren't billed for tracked hours, which are unlimited for all plans. Tracked hours are displayed on the **Usage** page for monitoring purposes. #### View spending alerts To view your organization's spending alerts: 1. In the top right corner, click your user profile icon. 2. In the **Account** section, click **Usage & Alerts**. Alternatively, click **Settings**, then click the **Usage & Alerts** tab. 3. From the sidebar, select a category. 4. Initially, the **Usage** tab displays. To view spending alerts, click **Spend alerts**. All configured spending alerts for the category display. #### Add or delete a spending alert To manage spending alerts: 1. In the top right corner, click your user profile icon. 2. In the **Account** section, click **Usage & Alerts**. Alternatively, click **Settings**, then click the **Usage & Alerts** tab. 3. From the sidebar, select a category, such as **Storage**. 4. To add an alert, click **Add alert**. a. Enter a threshold amount in USD. a. Click **Create alert**. 5. To delete a spending alert, click the trash icon next to it. ## Payment methods The **Payment methods** section shows the payment methods on file for your organization. If you haven't added a payment method, you're prompted to do so when you upgrade your plan or add paid storage. ## Billing admin The **Billing admin** section shows the current billing admin. The billing admin is an organization admin, receives all billing-related emails, and can view and manage payment methods. In W\&B Dedicated Cloud, multiple users can be billing admins. In W\&B Multi-tenant Cloud, only one user at a time can be the billing admin. To change the billing admin or assign the role to additional users: 1. Click **Manage roles**. 2. Search for a user. 3. Click the **Billing admin** field in that user's row. 4. Read the summary, then click **Change billing user**. ## Invoices If you pay using a credit card, the **Invoices** section lets you view monthly invoices. The following exceptions apply: * For Enterprise accounts that pay through wire transfer, this section is blank. For questions, contact your account team. * If your organization incurs no charges, no invoice is generated. # Manage email settings Source: https://docs.wandb.ai/platform/app/settings-page/emails Add, delete, and manage email addresses and login methods in your W&B profile settings page. In your W\&B **Settings** page, the Emails dashboard lets you add, delete, and manage email types and primary email addresses. To open the Emails dashboard, follow these steps: 1. In the W\&B dashboard, select your profile icon in the upper-right corner. 2. From the dropdown, select **Settings**. 3. Scroll down to the Emails dashboard. Email management dashboard ## Manage primary email A smiling-face-with-sunglasses icon marks your primary email. Your primary email defaults to the address you provided when you created your W\&B account. To change the primary email associated with your W\&B account, select the **action ()** menu. You can only set verified emails as primary. Primary email dropdown ## Add emails To add an email, select **+ Add Email**. The Auth0 page opens, where you can enter credentials for the new email or connect using single sign-on (SSO). ## Delete emails To delete an email registered to your W\&B account, select the **action ()** menu and choose **Delete Emails**. You can't delete a primary email. Set a different email as primary before deleting. ## Login methods The **Log in Methods** column shows the login methods associated with your account. When you create a W\&B account, W\&B sends a verification email to your address. Your email account stays unverified until you verify the address, and unverified emails appear in red. If you no longer have the original verification email, attempt to log in with your email address again to receive a second one. For account login issues, contact [support@wandb.com](mailto:support@wandb.com). # Manage storage Source: https://docs.wandb.ai/platform/app/settings-page/storage Manage W&B data storage consumption using reference artifacts, external buckets, and TTL deletion policies. This page describes how to manage your W\&B data storage so you can stay within your storage limit. If you're approaching or exceeding your storage limit, you have multiple ways to manage your data. The best option depends on your account type and your current project setup. ## Manage storage consumption To reduce how much storage your account uses, W\&B offers different methods of optimizing your storage consumption: * Use [reference artifacts](/models/artifacts/track-external-files/) to track files saved outside the W\&B system instead of uploading them to W\&B storage. * Use an [external cloud storage bucket](/platform/app/settings-page/teams/) for storage (Enterprise only). ## Delete data If optimizing consumption isn't sufficient, you can also choose to delete data to remain under your storage limit. You have several ways to do this: * Delete data interactively with the app UI. * [Set a TTL policy](/models/artifacts/ttl/) on artifacts so W\&B deletes them automatically. # Team settings Source: https://docs.wandb.ai/platform/app/settings-page/teams Collaborate with your colleagues, share results, and track all the experiments across your team. Manage team settings including members, alerts, and privacy. Use W\&B Teams as a central workspace for your ML team. This page explains how organization admins and team admins can create a team, configure its members and privacy, and assign roles so that collaborators can share experiments and reports while keeping sensitive work appropriately scoped. With W\&B Teams, you can do the following: * **Track all the experiments** your team has tried so you never duplicate work. * **Save and reproduce** previously trained models. * **Share progress** and results with your boss and collaborators. * **Catch regressions** and immediately get alerted when performance drops. * **Benchmark model performance** and compare model versions. Teams workspace overview Only Administration account types can change team settings or remove a member from a team. ## Create a collaborative team Follow these steps to create a new team and invite your first collaborators. After the team exists, you can refine membership, alerts, and privacy in [Team configuration](#team-configuration). 1. [Sign up or log in](https://app.wandb.ai/login?signup=true) to your free W\&B account. 2. Click **Invite Team** in the navigation bar. 3. Create your team and invite collaborators. 4. Configure your team using the settings described in the following sections. Only the admin of an organization can create a new team. ## Team configuration You can configure settings for your team, including its membership, alerts, and privacy settings. The following sections describe each settings category. To manage your team's settings, navigate to the **Teams** section in the left menu and click the team you want to change settings for. ### Members The Members section shows a list of all pending invitations and the members who have accepted the invitation to join the team. The list displays each member's name, username, email, team role, and access privileges to Models and W\&B Weave, which are inherited from the organization. You can choose from the standard team roles **Admin**, **Member**, and **View-Only**. If your organization has created [custom roles](/platform/hosting/iam/access-management/manage-organization#add-and-manage-custom-roles), you can assign a custom role instead. See [Add and Manage teams](/platform/hosting/iam/access-management/manage-organization#add-and-manage-teams) for information on how to create a team, manage teams, and manage team membership and roles. To configure who can invite new members and configure other privacy settings for the team, refer to [Privacy](#privacy). To remove a team member, admins can open the team settings page and click the delete button next to the departing member's name. Any runs logged to the team remain after a user leaves. ### Avatar A team avatar helps members and visitors recognize the team across the W\&B app. To set an avatar for your team: 1. Navigate to the **Teams** section in the left menu and click the team you want to add an avatar for. This opens the team's overview page. 2. Hover over the team's default avatar image in the upper-left corner of the page and click the **Upload photo** button. This opens a file prompt. 3. From the file prompt, select the image you want to use and then click **Open**. This uploads the photo to your team and sets it as your team's avatar. ### Alerts Alerts keep your team informed about run status without requiring anyone to watch the dashboard. You can set up alerts to notify your team when runs crash and finish. W\&B sends alerts through email or Slack, and you can customize them to meet your needs. In the **Team alerts** section, toggle the switch next to the event type you want to receive alerts from. W\&B provides the following event type options: * **Runs finished**: A W\&B run successfully finishes. * **Run crashed**: A run fails to finish. * **wandb.alert()**: A custom scriptable alert. See [Send alerts with `wandb.Run.alert()`](/models/runs/alert) for more information. ### Slack notifications Configure Slack destinations where your team's [automations](/models/automations/) can send notifications when an event occurs in a registry or a project, such as when a new artifact is created or when a run metric meets a defined threshold. Refer to [Create a Slack automation](/models/automations/create-automations/slack). This feature is available for all [Enterprise](https://wandb.ai/site/pricing/) licenses. ### Webhooks Configure webhooks that your team's [automations](/models/automations/) can run when an event occurs in a registry or a project, such as when a new artifact is created or when a run metric meets a defined threshold. Refer to [Create a webhook automation](/models/automations/create-automations/webhook). This feature is available for all [Enterprise](https://wandb.ai/site/pricing/) licenses. ### Privacy Team privacy settings control visibility, invitations, report sharing, and defaults such as code saving for runs in the team. Team admins can change these options on the team **Settings** page unless an organization admin has [enforced the same category of policy](/platform/hosting/privacy-settings#enforce-privacy-settings-for-all-teams) for all teams. Organization admins can still open the team **Settings** page to review configuration. You can change the following privacy settings: * Hide this team from all non-members. * Make all future team projects private (public sharing not allowed). * Allow any team member to invite other team members (not only admins). * Disable public sharing to outside of team for reports in private projects. This disables existing magic links. * Automatically recommend new users with matching email domains join this team upon signup. * Enable code saving by default. For step-by-step navigation and how organization enforcement interacts with team toggles, see [Configure privacy settings](/platform/hosting/privacy-settings). To change these settings, open the team dashboard at `https://wandb.ai/[TEAM-NAME]`, select **Team settings** in the left navigation, then open the **Privacy** section. Replace `[TEAM-NAME]` with your team name. ### Usage The **Usage** section describes the total memory usage the team has consumed on the W\&B servers. The default storage plan is 100 GB. For more information about storage and pricing, see the [Pricing](https://wandb.ai/site/pricing) page. ### Storage The **Storage** section describes the cloud storage bucket configuration that the team's data uses. For more information, see [Secure Storage Connector](#secure-storage-connector) or check out our [W\&B Server](/platform/hosting/data-security/secure-storage-connector) docs if you are self-hosting. ## Create a team profile A team profile is a public-facing page that helps your team attract collaborators and highlight its work. You can customize your team's profile page to show an introduction and showcase reports and projects that are visible to the public or team members. Present reports, projects, and external links. Use a team profile to do the following: * **Highlight your research** to visitors by showcasing your best public reports. * **Showcase the most active projects** to make it easier for teammates to find them. * **Find collaborators** by adding external links to your company or research lab's website and any papers you've published. ## Team roles and permissions Team roles determine what each member can see and do within the team, from viewing reports to managing other members. Select a team role when you invite colleagues to join a team. The following team role options are available: * **Admin**: Team admins can add and remove other admins or team members. They have permissions to modify all projects and full deletion permissions. This includes, but is not limited to, deleting runs, projects, artifacts, and sweeps. * **Member**: A regular member of the team. By default, only an admin can invite a team member. To change this behavior, refer to [Privacy settings](#privacy). * **View-Only (Enterprise-only feature)**: View-Only members can view assets within the team such as runs, reports, and workspaces. They can follow and comment on reports, but they can't create, edit, or delete project overview, reports, or runs. * **Custom roles (Enterprise-only feature)**: Custom roles let organization admins compose new roles based on either of the **View-Only** or **Member** roles, together with additional permissions to achieve fine-grained access control. Team admins can then assign any of those custom roles to users in their respective teams. Refer to [Introducing Custom Roles for W\&B Teams](https://wandb.ai/wandb_fc/announcements/reports/Introducing-Custom-Roles-for-W-B-Teams--Vmlldzo2MTMxMjQ3) for details. A team member can delete only runs they created. Suppose you have two members A and B. Member B moves a run from team B's project to a different project owned by Member A. Member A cannot delete the run Member B moved to Member A's project. An admin can manage runs and sweep runs created by any team member. ### Service accounts In addition to user roles, teams can also use **service accounts** for automation. Service accounts aren't users, but rather non-human identities used for automated workflows. Refer to [Use service accounts to automate workflows](/platform/hosting/iam/service-accounts) for detailed information. W\&B recommends assigning more than one admin in a team to ensure that admin operations can continue when the primary admin isn't available. ### Team settings The following table summarizes which roles can manage team membership and team-wide settings. Team settings let you manage the settings for your team and its members. With these privileges, you can effectively oversee and organize your team within W\&B. | Permissions | View-Only | Team Member | Team Admin | | -------------------- | --------- | ----------- | ---------- | | Add team members | | | X | | Remove team members | | | X | | Manage team settings | | | X | ### Reports Report permissions grant access to create, view, and edit reports. The following table lists permissions that apply to all reports across a given team. | Permissions | View-Only | Team Member | Team Admin | | -------------- | --------- | ------------------------------------------------ | ---------- | | View reports | X | X | X | | Create reports | | X | X | | Edit reports | | X (team members can only edit their own reports) | X | | Delete reports | | X (team members can only edit their own reports) | X | ### Experiments The following table lists permissions that apply to all experiments across a given team. | Permissions | View-Only | Team Member | Team Admin | | ------------------------------------------------------------------------------------ | --------- | --------------------------------------------------------- | ---------- | | View experiment metadata (includes history metrics, system metrics, files, and logs) | X | X | X | | Edit experiment panels and workspaces | | X | X | | Log experiments | | X | X | | Delete experiments | | X (team members can only delete experiments they created) | X | | Stop experiments | | X (team members can only stop experiments they created) | X | ### Artifacts The following table lists permissions that apply to all artifacts across a given team. | Permissions | View-Only | Team Member | Team Admin | | ------------------ | --------- | ----------- | ---------- | | View artifacts | X | X | X | | Download artifacts | X | X | X | | Create artifacts | | X | X | | Delete artifacts | | X | X | | Edit metadata | | X | X | | Edit aliases | | X | X | | Delete aliases | | X | X | ### System settings (W\&B Server only) This section applies only to self-hosted W\&B Server deployments. Use system permissions to create and manage teams and their members and to adjust system settings. These privileges enable you to effectively administer and maintain the W\&B instance. | Permissions | View-Only | Team Member | Team Admin | System Admin | | ------------------------- | --------- | ----------- | ---------- | ------------ | | Configure system settings | | | | X | | Create or delete teams | | | | X | ### Team service account behavior * When you configure a team in your training environment, you can use a service account from that team to log runs in either of private or public projects within that team. Additionally, you can attribute those runs to a user if `WANDB_USERNAME` or `WANDB_USER_EMAIL` variable exists in your environment and the referenced user is part of that team. * When you do not configure a team in your training environment and use a service account, the runs log to the named project within that service account's parent team. In this case as well, you can attribute the runs to a user if `WANDB_USERNAME` or `WANDB_USER_EMAIL` variable exists in your environment and the referenced user is part of the service account's parent team. * A service account can't log runs to a private project in a team different from its parent team. A service account can log runs to a project only if the project visibility is set to `Open`. ## Team trials See the [pricing page](https://wandb.ai/site/pricing) for more information on W\&B plans. You can download all your data at any time, either using the dashboard UI or the [Export API](/models/ref/python/public-api). ## Advanced configuration The following advanced options are available to teams with additional data residency or compliance requirements. ### Secure storage connector The team-level secure storage connector lets teams use their own cloud storage bucket with W\&B. This provides greater data access control and data isolation for teams with highly sensitive data or strict compliance requirements. Refer to [Secure Storage Connector](/platform/hosting/data-security/secure-storage-connector) for more information. # Manage user settings Source: https://docs.wandb.ai/platform/app/settings-page/user-settings Manage your profile information, account defaults, alerts, participation in beta products, GitHub integration, storage usage, account activation, and create teams in your user settings. Your user settings let you manage your profile, default team, API keys, alerts, integrations, and other account-level options. This page describes each section of the user settings and how to use it. To open your user settings, navigate to your user profile page and select your user icon on the top right corner. From the dropdown, choose **Settings**. ## Profile In the **Profile** section, you can manage and modify your account name and institution. You can optionally add a biography, location, link to a personal or your institution's website, and upload a profile image. ### Edit your intro Your intro appears at the top of your profile and is a good place to introduce yourself, link to projects, or share related accounts. To edit your intro, click **Edit** at the top of your profile. The WYSIWYG editor that opens supports Markdown. 1. To edit a line, click it. To save time, type `/` and choose Markdown from the list. 2. To move an item, use its drag handles. 3. To delete a block, click the drag handle, then click **Delete**. 4. To save your changes, click **Save**. Your updated intro is now visible on your profile. #### Add social badges To add a follow badge for the `@weights_biases` account on X, add a Markdown-style link with an HTML `` tag that points to the badge image: ```markdown theme={null} [![X: @weights_biases](https://img.shields.io/twitter/follow/weights_biases?style=social)](https://x.com/intent/follow?screen_name=weights_biases) ``` In an `` tag, you can specify `width`, `height`, or both. If you specify only one, the image keeps its proportions. ## Default team If you're a member of more than one team, the **Default team** section lets you configure the default team to use when a run or a Weave trace doesn't specify a team. If you're a member of only one team, that team is the default and this section doesn't appear. The steps to set your default team differ depending on your deployment. Select the tab that matches your environment. Next to **Default location to create new projects in**, click the dropdown, then select your default team. 1. Next to **Default location to create new projects in**, click the dropdown, then select your default team or your personal entity. 2. **Optional:** If an admin has turned on public projects in **Account** > **Settings** > **Privacy**, configure the default visibility for your new projects. Click the button next to **Default project privacy in your personal account**, then select **Private** (the default) or **Public**. 3. **Optional:** If an admin has turned on [default saving and diffing code](/models/app/features/panels/code/) in **Account** > **Settings** > **Privacy**, click **Enable code saving in your personal account** to turn it on for your runs. To specify the default team when you're running a script in an automated environment, specify the default location with the `WANDB_ENTITY` [environment variable](https://docs.wandb.ai/models/track/environment-variables). ## Teams The **Teams** section lists all of your teams. * Click a team name to go to the team page. * If you have permission to join additional teams, click **View teams** next to **We found teams for you to join**. * **Optional:** Turn on **Hide teams in public profile**. To create or manage a team, see [Manage teams](/platform/app/settings-page/teams/). ## API keys The **API Keys** section lets you manage your personal API keys for authenticating with W\&B services. From this section, you can review existing keys, create new ones, and revoke keys you no longer need. ### View your API keys The API keys table displays: * **Key ID**: The first part of each API key, used for identification. * **Name**: A descriptive name you provided when you created the key. * **Created**: When the key was created. * **Last used**: The most recent usage timestamp. For security, the table shows only the key ID (the first part of the key). The full secret API key appears only once, when you create it. To filter the list of API keys, enter a partial key name or ID. ### Create a new API key To create an API key, select the **Personal API key** or **Service Account API key** tab for details. To create a personal API key owned by your user ID: 1. Log in to W\&B, click your user profile icon, then click **User Settings**. 2. Click **Create new API key**. 3. Provide a descriptive name for your API key. 4. Click **Create**. 5. Copy the displayed API key immediately and store it securely. To create an API key owned by a service account: 1. Navigate to the **Service Accounts** tab in your team or organization settings. 2. Find the service account in the list. 3. Click the **action ()** menu, then click **Create API key**. 4. Provide a name for the API key, then click **Create**. 5. Copy the displayed API key immediately and store it securely. 6. Click **Done**. You can create multiple API keys for a single service account to support different environments or workflows. The full API key is only shown once at creation time. After you close the dialog, you cannot view the full API key again. Only the key ID (first part of the key) is visible in your settings. If you lose the full API key, you must create a new API key. For secure storage options, see [Store API keys securely](/platform/app/settings-page/user-settings/#store-and-handle-api-keys-securely). ### Delete an API key Delete an API key when you no longer need it or when it may have been exposed. To revoke access by deleting an API key: 1. Find the key you want to delete in the API keys table. 2. Click the delete button next to the key. 3. Confirm the deletion. W\&B removes the key from the table, and you can no longer use it to authenticate. Deleting an API key immediately revokes access for any scripts or services that use that key. Before you delete the old key, make sure you've updated all systems to use a new key. ## Store and handle API keys securely API keys provide access to your W\&B account and should be protected like passwords. Follow these best practices: ### Recommended storage methods * **Secrets manager**: Use a dedicated secrets management system such as [AWS Secrets Manager](https://aws.amazon.com/secrets-manager/), [HashiCorp Vault](https://developer.hashicorp.com/vault), [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault), or [Google Secret Manager](https://cloud.google.com/security/products/secret-manager). * **Password manager**: Use a reputable password manager application. * **OS-level keychains**: Store keys in macOS Keychain, Windows Credential Manager, or Linux secret service. Not suggested for production. ### What to avoid * Never commit API keys to version control systems such as Git. * Do not store API keys in plain text configuration files. * Do not pass API keys on the command line, because they will be visible in the output of OS commands like `ps`. * Avoid sharing API keys through email, chat, or other unencrypted channels. * Do not hard-code API keys in your source code. If an API key is exposed, delete the API key from your W\&B account immediately and contact [support](mailto:support@wandb.ai) or your AISE. ### Environment variables When using API keys in your code, pass them through environment variables: ```bash theme={null} export WANDB_API_KEY="your-api-key-here" ``` This approach keeps keys out of your source code and makes it easier to rotate them when needed. Avoid setting the environment variable in line with the command, because it will be visible in the output of OS commands like `ps`: ```bash theme={null} # Avoid this pattern, which can expose the API key in process managers export WANDB_API_KEY="your-api-key-here" ./my-script.sh ``` ### SDK version compatibility New API keys are longer than legacy keys. When authenticating with older versions of the `wandb` or `weave` SDKs, you may encounter an API key length error. **Solution**: Update to a newer SDK version: * `wandb` SDK v0.22.3+ ```bash theme={null} pip install --upgrade wandb==0.22.3 ``` * `weave` SDK v0.52.17+ ```bash theme={null} pip install --upgrade weave==0.52.17 ``` If you cannot upgrade the SDK immediately, set the API key using the `WANDB_API_KEY` environment variable as a workaround. ## Beta features In the **Beta Features** section, you can optionally enable add-ons and previews of products in development. Select the toggle switch next to the beta feature you want to enable. ## Alerts Get notified when your runs crash or finish, or set custom alerts with [`wandb.Run.alert()`](/models/runs/alert/). You receive notifications through email or Slack. Toggle the switch next to the event type you want to receive alerts for: * **Runs finished**: Notify you when a W\&B run successfully finishes. * **Run crashed**: Notify you when a run fails to finish. For more information about how to set up and manage alerts, see [Send alerts with `wandb.Run.alert()`](/models/runs/alert/). ## Personal GitHub integration Connect a personal GitHub account so that W\&B can access GitHub resources on your behalf. To connect a GitHub account: 1. Select the **Connect Github** button. W\&B redirects you to an open authorization (OAuth) page. 2. In the **Organization access** section, select the organization to grant access. 3. Select **Authorize wandb**. Your personal GitHub account is now linked to your W\&B account. ## Delete your account Select the **Delete Account** button to delete your account. You can't reverse account deletion. ## Storage The **Storage** section describes the total memory your account has consumed on the W\&B servers. The default storage plan is 100 GB. For more information about storage and pricing, see the [Pricing](https://wandb.ai/site/pricing) page. # Deployment options overview Source: https://docs.wandb.ai/platform/hosting Compare W&B deployment options including Multi-tenant Cloud, Dedicated Cloud, and Self-Managed installations. You can deploy W\&B in various ways to meet your organization's infrastructure and security requirements. This page helps administrators and platform decision-makers compare the available deployment options, so you can choose the one that best fits your security, scale, and operational needs. To get started quickly, W\&B recommends [Multi-tenant Cloud](https://wandb.ai/home). Some features and functionality require an [Enterprise](https://wandb.ai/site/pricing/) license. To try these features, sign up for an [Enterprise trial](https://wandb.ai/site/enterprise-trial/) and choose your deployment type. A fully managed service deployed in W\&B's cloud infrastructure, where you can access the W\&B products at the desired scale, with cost-efficient pricing options, and with continuous updates for the latest features and functionality. W\&B recommends Multi-tenant Cloud for your product trial, or to manage your production AI workflows if you don't need the security of a private deployment, self-service onboarding is important, and cost efficiency is critical. For more information, see [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). A fully managed service with dedicated, isolated infrastructure in W\&B's cloud. Choose this option if your organization requires conformance to strict governance controls including data residency, needs advanced security capabilities, and wants to optimize AI operating costs by not having to build and manage the required infrastructure with security, scale, and performance characteristics. For more information, see [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud). Deploy and manage W\&B Server on your own managed infrastructure. W\&B Server is a self-contained packaged mechanism to run the W\&B Platform and its supported W\&B products. W\&B recommends this option if all your existing infrastructure is on-premises, or your organization has strict regulatory needs that W\&B Dedicated Cloud doesn't satisfy. With this option, you're fully responsible for managing the provisioning, and continuous maintenance and upgrades of the infrastructure required to support W\&B Server. For more information, see [W\&B Self-Managed](/platform/hosting/hosting-options/self-managed). ## Key differences After reviewing the preceding deployment options, use the following table to compare how each option handles infrastructure, security, and operational responsibilities. | | Multi-tenant Cloud | Dedicated Cloud | Self-Managed | | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------ | | MySQL / DB management | Fully hosted and managed by W\&B | Fully hosted and managed by W\&B on cloud or region of customer choice | Fully hosted and managed by customer | | Object storage (S3/GCS/Blob storage) | **Option 1**: Fully hosted by W\&B
**Option 2**: Customer can configure their own bucket per team, using the [Secure Storage Connector](/platform/hosting/data-security/secure-storage-connector) | **Option 1**: Fully hosted by W\&B
**Option 2**: Customer can configure their own bucket per instance or team, using the [Secure Storage Connector](/platform/hosting/data-security/secure-storage-connector) | Fully hosted and managed by customer | | SSO support | W\&B managed via Auth0 | **Option 1**: Customer managed
**Option 2**: Managed by W\&B via Auth0 | Fully managed by customer | | W\&B Service (App) | Fully managed by W\&B | Fully managed by W\&B | Fully managed by customer | | App security | Fully managed by W\&B | Shared responsibility of W\&B and customer | Fully managed by customer | | Maintenance (upgrades, backups, etc.) | Managed by W\&B | Managed by W\&B | Managed by customer | | Support | Support SLA | Support SLA | Support SLA | | Supported cloud infrastructure | Google Cloud | AWS, Google Cloud, Azure | AWS, Google Cloud, Azure, on-premises bare-metal | # Data encryption in Dedicated Cloud Source: https://docs.wandb.ai/platform/hosting/data-security/data-encryption Learn how W&B encrypts data in Dedicated Cloud using cloud-native keys and the customer-managed encryption key policy. This page describes how W\&B encrypts the W\&B-managed database and object storage in [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), and explains W\&B's policy on customer-managed encryption keys. This page is intended for security and compliance teams evaluating Dedicated Cloud for use with sensitive AI workloads. W\&B uses a W\&B-managed cloud-native key to encrypt the W\&B-managed database and object storage in every Dedicated Cloud instance, using the customer-managed encryption key (CMEK) capability in each cloud. In this case, W\&B acts as a customer of the cloud provider while providing the W\&B platform as a service to you. Using a W\&B-managed key means that W\&B controls the keys that encrypt the data in each cloud, reinforcing its commitment to provide a secure platform to its customers. W\&B uses a unique key to encrypt the data in each customer instance, providing another layer of isolation between Dedicated Cloud tenants. The capability is available on AWS, Azure, and Google Cloud. Dedicated Cloud instances on AWS have used the W\&B-managed cloud-native key for encryption since before August 2024. On Google Cloud and Azure, Dedicated Cloud instances that W\&B created in August 2024 or later use the W\&B-managed cloud-native key to encrypt the W\&B-managed database and object storage. Instances that W\&B provisioned before August 2024 use the default cloud provider managed key. W\&B doesn't generally allow customers to bring their own cloud-native key to encrypt the W\&B-managed database and object storage in their Dedicated Cloud instance. Multiple teams in an organization often have access to its cloud infrastructure, and some teams might not know that W\&B is a critical component in the organization's technology stack. They might remove the cloud-native key or revoke W\&B's access to it, which could corrupt all data in the organization's W\&B instance and leave it in an unrecoverable state. If your organization needs to use its own cloud-native key to encrypt the W\&B-managed database and object storage as a condition for adopting Dedicated Cloud, W\&B can review the request on an exception basis. If approved, use of your cloud-native key for encryption conforms to the shared responsibility model of W\&B Dedicated Cloud. If any user in your organization removes your key or revokes W\&B's access to it at any point when your Dedicated Cloud instance is live, W\&B isn't liable for any resulting data loss or corruption and isn't responsible for recovery of the data. # Configure IP allowlisting for Dedicated Cloud Source: https://docs.wandb.ai/platform/hosting/data-security/ip-allowlisting Restrict access to a W&B Dedicated Cloud instance to authorized IP addresses using IP allowlisting. This page explains how IP allowlisting works on W\&B [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) and how to request it for your instance. Use IP allowlisting to restrict access to your Dedicated Cloud instance to an authorized list of IP addresses, so that only traffic from approved locations can reach the W\&B APIs and the W\&B app UI. IP allowlisting applies to access from your AI workloads to the W\&B APIs and from your user browsers to the W\&B app UI. After you set up IP allowlisting for your Dedicated Cloud instance, W\&B denies any requests from other unauthorized locations. IP allowlisting is available on Dedicated Cloud instances on AWS, Google Cloud, and Azure. To configure IP allowlisting for your Dedicated Cloud instance, contact your W\&B team. You can use IP allowlisting with [secure private connectivity](./private-connectivity). If you use both, W\&B recommends that you use secure private connectivity for all traffic from your AI workloads and most of the traffic from your user browsers, and use IP allowlisting for instance administration from privileged locations. W\&B recommends that you use [CIDR blocks](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing) assigned to your corporate or business egress gateways rather than individual `/32` IP addresses. Using individual IP addresses doesn't scale and has strict limits per cloud. # Access BYOB using pre-signed URLs Source: https://docs.wandb.ai/platform/hosting/data-security/presigned-urls Understand how W&B uses pre-signed URLs for blob storage access, including team-level access control and audit logging. W\&B uses pre-signed URLs to simplify access to blob storage from your AI workloads or user browsers. This page explains how pre-signed URLs work in W\&B. It also outlines the access controls, network restrictions, and audit logging that administrators should configure to secure blob storage access. For background on pre-signed URLs, refer to the cloud provider's documentation: * [Pre-signed URLs for AWS S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html), which also applies to S3-compatible storage like [CoreWeave AI Object Storage](https://docs.coreweave.com/docs/products/storage/object-storage). * [Signed URLs for Google Cloud Storage](https://cloud.google.com/storage/docs/access-control/signed-urls). * [Shared Access Signature for Azure Blob Storage](https://learn.microsoft.com/azure/storage/common/storage-sas-overview). Pre-signed URLs work as follows: 1. When needed, AI workloads or user browser clients within your network request pre-signed URLs from W\&B. 2. W\&B responds to the request by accessing the blob storage to generate the pre-signed URL with the required permissions. 3. W\&B returns the pre-signed URL to the client. 4. The client uses the pre-signed URL to read from or write to the blob storage. A pre-signed URL expires after the following durations: * **Read operations**: 1 hour. * **Write operations**: 24 hours, to allow more time to upload large objects in chunks. ## Team-level access control Each pre-signed URL is restricted to specific buckets based on [team-level access control](/platform/hosting/iam/access-management/manage-organization#add-and-manage-teams) in the W\&B platform. Consider a user who belongs to only one team, and that team is mapped to a storage bucket using the [secure storage connector](./secure-storage-connector). In this case, the pre-signed URLs generated for their requests can't access storage buckets mapped to other teams. W\&B recommends adding users only to the teams they need to belong to. ## Network restriction W\&B recommends using IAM policies to restrict the networks that can use pre-signed URLs to access external storage. This helps ensure that only networks running your AI workloads, or gateway IP addresses that map to your user machines, can access your W\&B-specific buckets. Consult your cloud provider's documentation for guidance on configuring these IAM policies: * For CoreWeave AI Object Storage, refer to [Bucket policy reference](https://docs.coreweave.com/docs/products/storage/object-storage/reference/bucket-policy#condition) in the CoreWeave documentation. * For AWS S3 or S3-compatible storage like MinIO hosted on your premises, refer to the [Amazon S3 User Guide](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html#PresignedUrlUploadObject-LimitCapabilities), the [MinIO documentation](https://github.com/minio/minio), or the documentation for your S3-compatible storage provider. ## Audit logs W\&B recommends using [W\&B audit logs](../monitoring-usage/audit-logging) together with blob-storage-specific audit logs. For blob storage audit logs, refer to the documentation for each cloud provider: * [CoreWeave audit logs](https://docs.coreweave.com/docs/products/storage/object-storage/concepts/audit-logging#audit-logging-policies). * [AWS S3 access logs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ServerLogs.html). * [Google Cloud Storage audit logs](https://cloud.google.com/storage/docs/audit-logging). * [Monitor Azure Blob Storage](https://learn.microsoft.com/azure/storage/blobs/monitor-blob-storage). Admin and security teams can use audit logs to track what each user does in W\&B and take action if they need to limit certain operations for specific users. Pre-signed URLs are the only supported blob storage access mechanism in W\&B. W\&B recommends configuring some or all of the preceding security controls to fit your organization's needs. ## Determine the user that requested a pre-signed URL To correlate pre-signed URL activity with specific W\&B users when reviewing audit logs, inspect the query parameter that W\&B appends to each URL. When W\&B returns a pre-signed URL, a query parameter in the URL contains the requester's username: | Storage provider | Signed URL query parameter | | --------------------------- | -------------------------- | | CoreWeave AI Object Storage | `X-User` | | AWS S3 | `X-User` | | Google Cloud Storage | `X-User` | | Azure Blob Storage | `scid` | # Configure private connectivity to Dedicated Cloud Source: https://docs.wandb.ai/platform/hosting/data-security/private-connectivity Connect to a W&B Dedicated Cloud instance over a private network using AWS PrivateLink, GCP, or Azure Private Link. This document explains how to connect to a W\&B Dedicated Cloud instance over a cloud provider's secure private network, so that traffic between your environment and W\&B avoids the public internet. It's intended for cloud and security administrators who manage Dedicated Cloud instances and want to harden network access for AI workloads and user traffic. You can connect to your [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) instance over the cloud provider's secure private network. This applies to access from your AI workloads to the W\&B APIs and optionally from your user browsers to the W\&B app UI. When you use private connectivity, the relevant requests and responses don't transit through the public network or internet. This reduces exposure to the public internet and helps satisfy network isolation requirements for sensitive workloads. Secure private connectivity is offered as an advanced security option with Dedicated Cloud. Secure private connectivity is available on Dedicated Cloud instances on AWS, Google Cloud, and Azure: * Use [AWS PrivateLink](https://aws.amazon.com/privatelink/) on AWS. * Use [Google Cloud Private Service Connect](https://cloud.google.com/vpc/docs/private-service-connect) on Google Cloud. * Use [Azure Private Link](https://azure.microsoft.com/products/private-link) on Azure. After W\&B enables the feature, W\&B creates a private endpoint service for your instance and provides you the relevant DNS URI to connect to. With that, you can create private endpoints in your cloud accounts that route the relevant traffic to the private endpoint service. You can set up private endpoints for your AI training workloads that run within your cloud VPC or VNet. To use the same mechanism for traffic from your user browsers to the W\&B app UI, you must configure appropriate DNS-based routing from your corporate network to the private endpoints in your cloud accounts. The DNS configuration ensures browser traffic resolves to your private endpoint instead of the public W\&B endpoint. To use this feature, contact your W\&B team. You can use secure private connectivity with [IP allowlisting](./ip-allowlisting) to combine network isolation with location-based access controls. If you combine the two, W\&B recommends that you use secure private connectivity for all traffic from your AI workloads and for browser traffic from your users where possible, and use IP allowlisting for instance administration from privileged locations. # Bring your own bucket (BYOB) Source: https://docs.wandb.ai/platform/hosting/data-security/secure-storage-connector Store W&B artifacts and data in your own cloud storage buckets using the Bring Your Own Bucket (BYOB) feature. **This guide applies to all W\&B deployment types:** * **Multi-tenant Cloud**: Team-level BYOB * **Dedicated Cloud**: Instance and team-level BYOB * **Self-Managed**: Instance and team-level BYOB The bucket provisioning instructions in this guide are the same regardless of your deployment type. ## Overview Bring your own bucket (BYOB) lets you store W\&B artifacts and other sensitive data in your own cloud or on-premises infrastructure. For [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud), W\&B doesn't copy the data you store in your bucket to the W\&B managed infrastructure. This page is for W\&B administrators and platform engineers who need to retain ownership of artifact storage to meet data governance, residency, or compliance requirements. * Communication between W\&B SDK / CLI / UI and your buckets occurs using [pre-signed URLs](./presigned-urls). * W\&B uses garbage collection and related processes to remove deleted **artifacts** and **run data** from your bucket over time. For artifact deletion, see [Delete an artifact](/models/artifacts/delete-artifacts). Deleted run data on Dedicated Cloud and Self-Managed deployments also depends on `GORILLA_DATA_RETENTION_PERIOD` as described in [Configure environment variables](/platform/hosting/env-vars). W\&B doesn't guarantee cleanup timing. For a single overview of bucket usage and costs, see [Manage bucket storage and costs](/platform/hosting/managing-bucket-storage). * You can specify a sub-path when you configure a bucket, to ensure that W\&B doesn't store any files in a folder at the root of the bucket. This helps you better conform to your organization's bucket governance policy. ### Data stored in the central database vs buckets When you use BYOB functionality, W\&B stores certain types of data in the W\&B central database, and other types in your bucket. Use the following lists to understand which data remains in W\&B-managed infrastructure and which data W\&B writes to your own storage. #### Database The W\&B central database stores the following data: * Metadata for users, teams, artifacts, experiments, and projects. * Reports. * Experiment logs. * System metrics. * Console logs. #### Buckets Your storage bucket stores the following data: * Experiment files and metrics. * Artifact files. * Media files. * Run files. * Exported history metrics and system events in Parquet format. ### Bucket scopes You can configure your storage bucket to one of two scopes: | Scope | Description | | -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Instance level | In [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) and [Self-Managed](/platform/hosting/hosting-options/self-managed), any user with the required permissions within your organization or instance can access files stored in your instance's storage bucket. Not applicable to [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). | | Team level | If you configure a W\&B team to use a team level storage bucket, team members can access files stored in it. Team level storage buckets allow greater data access control and data isolation for teams with sensitive data or strict compliance requirements.

Team level storage helps different business units or departments that share an instance to efficiently use the infrastructure and administrative resources. It also lets separate project teams manage AI workflows for separate customer engagements. Available for all deployment types. You configure team level BYOB when you set up the team. | This design supports different storage topologies, depending on your organization's needs. For example: * The same bucket can serve the instance and one or more teams. * Each team can use a separate bucket, some teams can choose to write to the instance bucket, or multiple teams can share a bucket by writing to subpaths. * Buckets for different teams can reside in different cloud infrastructure environments or regions, and different storage admin teams can manage them. For example, suppose you have a team called Kappa in your organization. Your organization (and team Kappa) uses the instance level storage bucket by default. Next, you create a team called Omega. When you create team Omega, you configure a team level storage bucket for that team. Team Kappa can't access files that team Omega generates. However, team Omega can access files that team Kappa creates. To isolate data for team Kappa, you must also configure a team level storage bucket for them. ### Availability matrix Before you begin, confirm that BYOB is available for your deployment type and storage provider. W\&B can connect to the following storage providers: * [CoreWeave AI Object Storage](https://docs.coreweave.com/products/storage/object-storage): High-performance, S3-compatible object storage service optimized for AI workloads. * [Amazon S3](https://aws.amazon.com/s3/): Object storage service offering scalability, data availability, security, and performance. * [Google Cloud Storage](https://cloud.google.com/storage): Managed service for storing unstructured data at scale. * [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs): Cloud-based object storage solution for storing massive amounts of unstructured data like text, binary data, images, videos, and logs. * S3-compatible storage such as [MinIO Enterprise (AIStor)](https://min.io/product/aistor) or other enterprise-grade solutions hosted in your cloud or on-premises infrastructure. The following table shows the availability of BYOB at each scope for each W\&B deployment type. | W\&B deployment type | Instance level | Team level | Additional information | | -------------------- | -------------- | --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Dedicated Cloud | ✓ | ✓ | Instance and team level BYOB are supported for CoreWeave AI Object Storage, Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, and S3-compatible storage such as [MinIO Enterprise (AIStor)](https://www.min.io/product/aistor) hosted in your cloud or on-premises infrastructure. | | Multi-tenant Cloud | Not applicable | ✓
1 | Team level BYOB is supported for CoreWeave AI Object Storage, Amazon S3, and Google Cloud Storage. | | Self-Managed | ✓ | ✓ | Instance and team level BYOB are supported for CoreWeave AI Object Storage, Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, and S3-compatible storage such as [MinIO Enterprise (AIStor)](https://www.min.io/product/aistor) hosted in your cloud or on-premises infrastructure. | 1.Azure Blob Storage is not supported for team level BYOB on Multi-tenant Cloud. The following sections guide you through the process of setting up BYOB. ## Provision your bucket After you [verify availability](#availability-matrix), you're ready to provision your storage bucket, including its access policy and CORS. Provisioning creates the bucket that W\&B writes to and grants the W\&B platform the permissions it needs to generate pre-signed URLs on your behalf. Select a tab to continue. **Requirements**: * **Multi-tenant Cloud**, or * **Dedicated Cloud** v0.73.0 or later, or * **Self-Managed** v0.73.0 or later deployed with v0.33.14+ of the Helm chart * A CoreWeave account with AI Object Storage enabled and with permission to create buckets, API access keys, and secret keys. * Your W\&B instance must be able to connect to CoreWeave network endpoints. For details, see [Create a CoreWeave AI Object Storage bucket](https://docs.coreweave.com/docs/products/storage/object-storage/buckets/create-bucket) in the CoreWeave documentation. 1. **Multi-tenant Cloud**: Obtain your organization ID, which is required for your bucket policy. 1. Log in to the [W\&B App](https://wandb.ai/site). 2. In the left navigation, click **Create a new team**. 3. In the drawer that opens, copy the W\&B organization ID, which appears above **Invite team members**. 4. Leave this page open. You use it to [configure W\&B](#configure-byob). 2. **Dedicated Cloud** / **Self-Managed**: Obtain your customer namespace, which is required for your bucket policy. 1. In the W\&B App, click your user profile icon, then click **System Console**. 2. Click the **Authentication** tab. 3. At the bottom of the page, copy the value for **Customer Namespace**. Keep this value for configuring the bucket policy. 4. You can close the System Console. 3. In CoreWeave, create the bucket with a name of your choice in your preferred CoreWeave availability zone. Optionally, create a folder for W\&B to use as a sub-path for all W\&B files. Make a note of the bucket name, availability zone, API access key, secret key, and sub-path. 4. Set the following cross-origin resource sharing (CORS) policy for the bucket: ```json theme={null} [ { "AllowedHeaders": [ "*" ], "AllowedMethods": [ "GET", "HEAD", "PUT" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [ "ETag" ], "MaxAgeSeconds": 3000 } ] ``` CoreWeave storage is S3-compatible. For details about CORS, see [Configuring cross-origin resource sharing (CORS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html) in the AWS documentation. 5. Configure a bucket policy that grants the required permissions for your W\&B deployment to access the bucket and generate [pre-signed URLs](./presigned-urls) that AI workloads in your cloud infrastructure or user browsers use to access the bucket. See [Bucket Policy Reference](https://docs.coreweave.com/docs/products/storage/object-storage/auth-access/bucket-access/bucket-policies) in the CoreWeave documentation. ```json theme={null} { "Version": "2012-10-17", "Statement": [ { "Sid": "AllowWandbUser", "Action": [ "s3:GetObject*", "s3:GetEncryptionConfiguration", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListBucketVersions", "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:PutObject", "s3:GetBucketCORS", "s3:GetBucketLocation", "s3:GetBucketVersioning" ], "Effect": "Allow", "Resource": [ "arn:aws:s3:::/*", "arn:aws:s3:::" ], "Principal": { "CW": "arn:aws:iam::wandb:static/" }, "Condition": { "StringLike": { "wandb:OrgID": [ "" ] } } }, { "Sid": "AllowUsersInOrg", "Action": "s3:*", "Effect": "Allow", "Resource": [ "arn:aws:s3:::", "arn:aws:s3:::/*" ], "Principal": { "CW": "arn:aws:iam:::*" } }] } ``` The clause beginning with `"Sid": "AllowUsersInOrg"` grants users in your organization direct access to the bucket. If you don't need this ability, you can omit the clause from your policy. 6. In the bucket policy, replace placeholders: * ``: your bucket name. * ``: * **Multi-tenant Cloud**: `arn:aws:iam::wandb:static/wandb-integration-public` * **Dedicated Cloud** or **Self-Managed**: `arn:aws:iam::wandb:static/wandb-integration` * ``: * **Multi-tenant Cloud**: The organization ID from [Provision your bucket](#coreweave-org-id). * **Dedicated Cloud** or **Self-Managed**: The customer namespace from [Provision your bucket](#coreweave-customer-namespace). 7. **Dedicated Cloud**: Contact [support](mailto:support@wandb.ai) to complete additional steps. 8. **Self-Managed**: Update your W\&B deployment to set the environment variable `GORILLA_SUPPORTED_FILE_STORES` to the exact string `cw://` and restart W\&B. Otherwise, CoreWeave doesn't appear as an option when you configure team storage. Next, [configure W\&B](#configure-byob). For details, see [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) in the AWS documentation. 1. Provision the KMS key. W\&B requires you to provision a KMS key to encrypt and decrypt the data on the S3 bucket. The key usage type must be `ENCRYPT_DECRYPT`. Assign the following policy to the key: ```json theme={null} { "Version": "2012-10-17", "Statement": [ { "Sid" : "Internal", "Effect" : "Allow", "Principal" : { "AWS" : "" }, "Action" : "kms:*", "Resource" : "" }, { "Sid" : "External", "Effect" : "Allow", "Principal" : { "AWS" : "" }, "Action" : [ "kms:Decrypt", "kms:Describe*", "kms:Encrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*" ], "Resource" : "" } ] } ``` Replace `` and `` accordingly. If you use [Multi-tenant Cloud](/platform/hosting/hosting-options#w%26b-multi-tenant-cloud) or [Dedicated Cloud](/platform/hosting/hosting-options#w%26b-dedicated-cloud), replace `` with the corresponding value: * **Multi-tenant Cloud**: `arn:aws:iam::725579432336:role/WandbIntegration` * **Dedicated Cloud**: `arn:aws:iam::830241207209:root` This policy grants your AWS account full access to the key and also assigns the required permissions to the AWS account that hosts the W\&B platform. Keep a record of the KMS key ARN. 2. Provision the S3 bucket. Follow these steps to provision the S3 bucket in your AWS account: 1. Create the S3 bucket with a name of your choice. Optionally, create a folder that you can configure as a sub-path to store all W\&B files. 2. Enable server-side encryption, using the KMS key from the previous step. 3. Configure CORS with the following policy: ```json theme={null} [ { "AllowedHeaders": [ "*" ], "AllowedMethods": [ "GET", "HEAD", "PUT" ], "AllowedOrigins": [ "*" ], "ExposeHeaders": [ "ETag" ], "MaxAgeSeconds": 3000 } ] ``` If data in your bucket expires because of an [object lifecycle management policy](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lifecycle-mgmt.html), you may lose the ability to read the history of some runs. 4. Grant the required S3 permissions to the AWS account that hosts the W\&B platform, which requires these permissions to generate [pre-signed URLs](./presigned-urls) that AI workloads in your cloud infrastructure or user browsers use to access the bucket. ```json theme={null} { "Version": "2012-10-17", "Id": "WandBAccess", "Statement": [ { "Sid": "WAndBAccountAccess", "Effect": "Allow", "Principal": { "AWS": "" }, "Action" : [ "s3:GetObject*", "s3:GetEncryptionConfiguration", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:ListBucketVersions", "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:PutObject", "s3:GetBucketCORS", "s3:GetBucketLocation", "s3:GetBucketVersioning" ], "Resource": [ "arn:aws:s3:::", "arn:aws:s3:::/*" ] } ] } ``` Replace `` accordingly and keep a record of the bucket name. Next, [configure W\&B](#configure-byob). If you use [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) or [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), replace `` with the corresponding value. * For [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud): `arn:aws:iam::725579432336:role/WandbIntegration` * For [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud): `arn:aws:iam::830241207209:root` For more details, see the [AWS Self-Managed hosting guide](/platform/hosting/hosting-options). For details, see [Create a bucket](https://docs.cloud.google.com/storage/docs/creating-buckets) in the Google Cloud documentation. 1. Provision the GCS bucket. Follow these steps to provision the GCS bucket in your Google Cloud project: 1. Create the GCS bucket with a name of your choice. Optionally, create a folder that you can configure as a sub-path to store all W\&B files. 2. Set encryption type to `Google-managed`. 3. Turn on soft deletion. See [Edit a bucket's soft delete policy](https://docs.cloud.google.com/storage/docs/use-soft-delete). 4. Set the CORS policy with `gsutil`. This is not possible in the UI. 1. Create a file named `cors-policy.json` locally. 2. Copy the following CORS policy into the file and save it. ```json theme={null} [ { "origin": ["*"], "responseHeader": ["Content-Type"], "exposeHeaders": ["ETag"], "method": ["GET", "HEAD", "PUT"], "maxAgeSeconds": 3000 } ] ``` If data in your bucket expires because of an [object lifecycle management policy](https://cloud.google.com/storage/docs/lifecycle), you may lose the ability to read the history of some runs. 5. Replace `` with the correct bucket name and run `gsutil`. ```bash theme={null} gsutil cors set cors-policy.json gs:// ``` 6. Verify the bucket's policy. Replace `` with the correct bucket name. ```bash theme={null} gsutil cors get gs:// ``` 2. If you use [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) or [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), grant the `storage.admin` role to the Google Cloud service account linked to the W\&B platform. W\&B requires this role to check the bucket's CORS configuration and attributes, such as whether object versioning is enabled. If the service account doesn't have the `storage.admin` role, these checks result in an HTTP 403 error. * For [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud), the account is: `wandb-integration@wandb-production.iam.gserviceaccount.com` * For [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), the account is: `deploy@wandb-production.iam.gserviceaccount.com` Keep a record of the bucket name. Next, [configure W\&B for BYOB](#configure-byob). For details, see [Create a blob storage container](https://learn.microsoft.com/en-us/azure/storage/blobs/blob-containers-portal) in the Azure documentation. **Instance level BYOB**: 1. Provision the Azure Blob Storage container. For Self-Managed deployments, and for Dedicated Cloud deployments that don't use [this Terraform module](https://github.com/wandb/terraform-azurerm-wandb/tree/main/examples/byob), follow these steps to provision an Azure Blob Storage container in your Azure subscription: 1. Create a container with a name of your choice. Optionally, create a folder that you can configure as a sub-path to store all W\&B files. 2. Configure the CORS policy on the container. To set the CORS policy through the UI, go to the blob storage, scroll down to `Settings/Resource Sharing (CORS)`, and then set the following: | Parameter | Value | | --------------- | -------------------- | | Allowed Origins | `*` | | Allowed Methods | `GET`, `HEAD`, `PUT` | | Allowed Headers | `*` | | Exposed Headers | `*` | | Max Age | `3000` | If data in your bucket expires because of an [object lifecycle management policy](https://learn.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-policy-configure?tabs=azure-portal), you may lose the ability to read the history of some runs. 2. Generate a storage account access key and make a note of its name and the storage account name. If you use [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), share the storage account name and access key with your W\&B team using a secure sharing mechanism. **Team level BYOB**: For Dedicated Cloud deployments, W\&B recommends that you use [Terraform](https://github.com/wandb/terraform-azurerm-wandb/tree/main/examples/secure-storage-connector) to provision the Azure Blob Storage container with the necessary access mechanism and permissions. For Dedicated Cloud deployments that don't use Terraform, or for Self-Managed deployments, provision the bucket by following the steps for provisioning instance-level storage. Provide the OIDC issuer URL for your instance. Make a note of the following details: * Storage account name * Storage container name * Managed identity client ID * Azure tenant ID Create your S3-compatible bucket. Make a note of: * Access key * Secret access key * URL endpoint * Bucket name * Folder path, if applicable * Region Next, [determine the storage address](#determine-the-storage-address). ## Determine the storage address After you provision the bucket, you need a storage address that W\&B uses to locate and authenticate to it. The following sections describe the syntax to use to connect a W\&B team to a BYOB storage bucket. In the examples, replace placeholder values between angle brackets (`<>`) with your bucket's details. Select a tab for detailed instructions. This section is relevant only for team level BYOB on **Dedicated Cloud** or **Self-Managed**. For instance level BYOB or for Multi-tenant Cloud, you're ready to [Configure W\&B](#configure-byob). Determine the full bucket path using the following format. Replace placeholders between angle brackets (`<>`) with the bucket's values. **Bucket format**: ```text theme={null} cw://:@cwobject.com/?tls=true ``` W\&B supports the `cwobject.com` HTTPS endpoint. TLS 1.3 is required. Contact [support](mailto:support@wandb.com) to express interest in other CoreWeave endpoints. **Bucket format**: ```text theme={null} s3://:@/?region= ``` In the address, the `region` parameter is mandatory unless both your W\&B instance and your storage bucket are deployed on AWS, and the W\&B instance's `AWS_REGION` matches the bucket's AWS S3 region. **Bucket format**: ```text theme={null} gs://:@ ``` **Bucket format**: ```text theme={null} az://:@/ ``` **Bucket format**: ```text theme={null} s3://:@/?region=&tls=true ``` In the address, the `region` parameter is mandatory. This section is for S3-compatible storage buckets that aren't hosted in S3, such as [MinIO Enterprise (AIStor)](https://www.min.io/product/aistor) or other enterprise-grade S3-compatible solutions hosted on your premises. For storage buckets hosted in AWS S3, see the **AWS** tab instead. MinIO Open Source is in [maintenance mode](https://github.com/minio/minio) with no active development or pre-compiled binaries. For production deployments, use enterprise-grade S3-compatible solutions. For cloud-native storage buckets with an optional S3-compatible mode, use the cloud-native protocol specifier when possible. For example, use `cw://` for a CoreWeave bucket, rather than `s3://`. After you determine the storage address, you're ready to [configure team level BYOB](#configure-team-level-byob). ## Configure W\&B After you [provision your bucket](#provision-your-bucket) and [determine its address](#determine-the-storage-address), you're ready to configure BYOB at the [instance level](#instance-level-byob) or [team level](#team-level-byob). This final step tells W\&B to route storage of artifacts, run files, and other large objects to your bucket. Plan your storage bucket layout carefully. After you configure a storage bucket for W\&B, migrating its data to another bucket is complex and requires assistance from W\&B. This applies to storage for Dedicated Cloud and Self-Managed, as well as team-level storage for Multi-tenant Cloud. For questions, contact [support](mailto:support@wandb.com). ### Instance level BYOB For CoreWeave AI Object Storage at the instance level, contact [W\&B support](mailto:support@wandb.com) instead of following these instructions. Self-service configuration isn't yet supported. For **Dedicated Cloud**: Share the bucket details with your W\&B team, who configures your Dedicated Cloud instance. For **Self-Managed**, you can configure instance level BYOB using the W\&B App: 1. Log in to W\&B as a user with the `admin` role. 2. Click the user icon at the top, then click **System Console**. 3. Navigate to **Settings** > **System Connections**. 4. In the **Bucket Storage** section, ensure the identity in the **Identity** field has access to the new bucket. 5. Select the **Provider**. 6. Enter the **Bucket Name**. 7. Optional: Enter the **Path** to use in the new bucket. 8. Click **Save**. After you save, W\&B uses the configured bucket as the default storage destination for new artifacts and run files at the instance level. ### Team level BYOB You can configure team level BYOB when you create a team in the W\&B App or using the [SCIM API](/platform/hosting/iam/scim#create-team) (POST Groups with optional `storageBucket`). You have two options: * **Use an existing bucket**: You must [determine the storage location](#determine-the-storage-address) for your bucket first. * **Create a new bucket** (Multi-tenant Cloud only): W\&B can automatically create a bucket in your cloud provider when you create the team. W\&B supports this for CoreWeave, AWS, and Google Cloud. - After you create a team, you can't change its storage. - For instance level BYOB, see [Instance level BYOB](#instance-level-byob) instead. - If you plan to configure CoreWeave storage for the team, review the [CoreWeave requirements](#coreweave-requirements) and contact [support](mailto:support@wandb.com) to verify that your bucket is configured correctly in CoreWeave and to validate your team's configuration, since you can't change the storage details after you create the team. Select your deployment type to continue. 1. **Dedicated Cloud**: You **must** provide the bucket path to your account team so that they can add it to your instance's supported file stores before you follow the rest of these steps to use the storage bucket for a team. 2. **Self-Managed**: You **must** add the bucket path to the `GORILLA_SUPPORTED_FILE_STORES` environment variable and then restart W\&B before you follow the rest of these steps to use the storage bucket for a team. 3. Log in to W\&B as a user with the `admin` role, click the icon at the top left to open the left navigation, then click **Create a team to collaborate**. 4. Provide a name for the team. 5. Set **Storage Type** to **External storage**. To use the instance level storage for team storage (regardless of whether it's internal or external), leave **Storage Type** set to **Internal**, even if the instance level bucket is configured for BYOB. To use separate external storage for the team, set **Storage Type** for the team to **External** and configure the bucket details in the next step. 6. Click **Bucket location**. 7. To use an existing bucket, select it from the list. To add a new bucket, click **Add bucket** at the bottom, then provide the bucket's details. Click **Cloud provider** and select **CoreWeave**, **AWS**, **Google Cloud**, or **Azure**. If the cloud provider isn't listed, ensure that you've followed the instructions in [Provision your bucket](#set-environment-variable) to add the bucket path to the supported file stores for your instance. If the storage provider is still not listed, [contact support](mailto:support@wandb.ai) for assistance. 8. Specify the bucket details. * For **CoreWeave**, provide only the bucket name. * For Amazon S3, Google Cloud, or S3-compatible storage, provide the full bucket path you [determined earlier](#determine-the-storage-address). * For Azure on W\&B Dedicated or Self-Managed, set **Account name** to the Azure account and **Container name** to the Azure Blob Storage container. * Optionally, provide additional connection settings: * If applicable, set **Path** to the bucket sub-path. * **CoreWeave**: No additional connection settings required. * **AWS**: Set **KMS key ARN** to the ARN of your KMS encryption key. * **Google Cloud**: No additional connection settings required. * **Azure**: Specify values for **Tenant ID** and **Managed Identity Client ID**. These fields are mandatory unless you configured the connection string with `GORILLA_SUPPORTED_FILE_STORES`. 9. Click **Create team**. If W\&B encounters errors accessing the bucket or detects invalid settings, an error or warning displays at the bottom of the page. Otherwise, W\&B creates the team. 1. Switch to the browser window where you previously began to create the new team to find the W\&B organization ID. Otherwise, log in to W\&B as a user with the `admin` role, click the icon at the top left to open the left navigation, then click **Create a team to collaborate**. 2. Provide a name for the team. 3. Set **Storage Type** to **External storage**. 4. Click **Bucket location**. 5. To use an existing bucket, select it from the list. 6. To create a new bucket, click **Add bucket** at the bottom, then: 1. Click **Cloud provider** and select **CoreWeave**, **AWS**, or **Google Cloud**. 2. Enter the bucket details: * **Name**: Enter the bucket name. * **Path** (optional): Enter a sub-path to use within the bucket. 3. Provide additional connection settings for the chosen cloud provider: * CoreWeave: No additional settings required. * AWS: Optionally provide a **KMS key ARN** for encryption. * Google Cloud: No additional settings required. When you click **Create team**, W\&B automatically creates the bucket in your cloud provider with the specified configuration. 7. Invite members to the team. In **Invite team members**, specify a comma-separated list of email addresses. Otherwise, you can invite members to the team after you create it. 8. Click **Create team**. If W\&B encounters errors accessing the bucket or detects invalid settings, an error or warning displays at the bottom of the page. Otherwise, W\&B creates the team. ## Troubleshooting If W\&B reports errors when validating or connecting to your bucket, use the following sections to diagnose the most common causes by storage provider. ### CoreWeave This section helps troubleshoot problems connecting to CoreWeave AI Object Storage. * **Connection errors** * Verify that your W\&B instance can connect to CoreWeave network endpoints. * CoreWeave uses virtual-hosted style paths, where the bucket name is a subdomain at the beginning of the path. For example, `cw://bucket-name.cwobject.com` is correct, while `cw://cwobject.com/bucket-name/` isn't. * Bucket names must not contain underscores (`_`) or other characters incompatible with DNS rules. * Bucket names must be globally unique among CoreWeave locations. * Bucket names must not begin with `cw-` or `vip-`, which are reserved prefixes. * **CORS validation failures** * A CORS policy is required. CoreWeave is S3-compatible. For details about CORS, see [Configuring cross-origin resource sharing (CORS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html) in the AWS documentation. * `AllowedMethods` must include methods `GET`, `PUT`, and `HEAD`. * `ExposeHeaders` must include `ETag`. * The CORS policy's `AllowedOrigins` must include W\&B front-end domains. The example CORS policies provided on this page include all domains using `*`. * **LOTA endpoint issues** * W\&B doesn't yet support connections to LOTA endpoints. To express interest, [contact support](mailto:support@wandb.com). * **Access key and permission errors** * Verify that your CoreWeave API access key isn't expired. * Verify that your CoreWeave API access key and secret key have sufficient permissions `GetObject`, `PutObject`, `DeleteObject`, `ListBucket`. The examples on this page meet this requirement. See [Create and Manage Access Keys](https://docs.coreweave.com/docs/products/storage/object-storage/auth-access/manage-access-keys/about) in the CoreWeave documentation. ### Google Cloud This section helps troubleshoot problems connecting to Google Cloud Storage. * `Bucket does not have soft deletion enabled` Ensure that soft deletion is turned on for your Google Cloud Storage bucket. See [Edit a bucket's soft delete policy](https://docs.cloud.google.com/storage/docs/use-soft-delete). # Configure environment variables Source: https://docs.wandb.ai/platform/hosting/env-vars Configure a self-managed W&B Server installation using environment variables for database, storage, Redis, and IAM settings. In addition to configuring instance-level settings through the System Settings admin UI, W\&B also provides a way to configure these values in code using environment variables. This page lists the environment variables you can set to control database, storage, Redis, identity provider, and other instance-level behavior for a self-managed W\&B Server deployment. You can use these variables to manage configuration as code instead of through the admin UI. For IAM-specific variables, see [advanced configuration for IAM](./iam/advanced_env_vars). ## Environment variable reference The following table describes each environment variable, the behavior it controls, and any constraints on its value. | Environment variable | Description | | --------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `LICENSE` | Your wandb/local license | | `MYSQL` | The MySQL connection string | | `BUCKET` | The S3 / GCS bucket for storing data | | `BUCKET_QUEUE` | The SQS / Google PubSub queue for object creation events | | `NOTIFICATIONS_QUEUE` | The SQS queue on which to publish run events | | `AWS_REGION` | The AWS Region where your bucket lives | | `HOST` | The FQD of your instance, for example `https://my.domain.net` | | `OIDC_ISSUER` | A URL to your Open ID Connect identity provider, for example `https://cognito-idp.us-east-1.amazonaws.com/us-east-1_uiIFNdacd` | | `OIDC_CLIENT_ID` | The Client ID of application in your identity provider | | `OIDC_AUTH_METHOD` | Implicit (default) or pkce. For more context, see the following sections. | | `SLACK_CLIENT_ID` | The client ID of the Slack application you want to use for alerts | | `SLACK_SECRET` | The secret of the Slack application you want to use for alerts | | `LOCAL_RESTORE` | If you can't access your instance, you can temporarily set this to true. Check the logs from the container for temporary credentials. | | `REDIS` | Can be used to set up an external REDIS instance with W\&B. | | `LOGGING_ENABLED` | When set to true, access logs are streamed to stdout. You can also mount a sidecar container and tail `/var/log/gorilla.log` without setting this variable. | | `GORILLA_ALLOW_USER_TEAM_CREATION` | When set to true, lets non-admin users create a new team. False by default. | | `GORILLA_CUSTOMER_SECRET_STORE_SOURCE` | Sets the secret manager for storing team secrets used by W\&B Weave. These secret managers are supported:
  • Internal secret manager (default): k8s-secretmanager://wandb-secret
  • AWS Secret Manager: aws-secretmanager
  • Google Cloud Secret Manager: gcp-secretmanager
  • Azure: az-secretmanger
| | `GORILLA_DATA_RETENTION_PERIOD` | How long to retain deleted data from runs in hours. Deleted run data is unrecoverable. Append an `h` to the input value. For example, `"24h"`. | | `GORILLA_DISABLE_PERSONAL_ENTITY` | When set to true, turns off [personal entities](/support/models/articles/what-is-the-difference-between-team-and-). Prevents creation of new personal projects in their personal entities and prevents writing to existing personal projects. | | `GORILLA_GRAPHQL_DISABLE_INTROSPECTION` | When set to true, disables GraphQL introspection: `__type` and `__schema` queries return no schema data while the request still succeeds. On **Self-Managed**, setting the Gorilla configuration field `graphql-disable-introspection` has the same effect. Set this variable under `spec.values.global.extraEnv` in your `WeightsAndBiases` custom resource (see the [`global.extraEnv` example](/platform/hosting/self-managed/operator#ldap) in the Operator guide). **Client applications require [W\&B SDK v0.26.0](/release-notes/sdk-releases#0-26-0) or later** against deployments with introspection already turned off. | | `GRAPHQL_REJECT_UNAUTHED_REQUESTS` | When set to `true` on the **API** service, rejects GraphQL requests that don't have an authenticated user. Unauthenticated requests receive HTTP 401. **Self-Managed** and **Dedicated Cloud** v0.80.0+ only; not available on Multi-tenant Cloud. This feature is opt-in: if the environment variable is unset or not `true`, behavior is unchanged. Set on the API component only (for example, `api.env` in Helm values). Before activating, confirm that workflows that rely on anonymous GraphQL access (such as viewing shared reports without signing in, or open projects) still meet your requirements. On **Self-Managed**, setting the Gorilla configuration field `graphql-reject-unauthed-requests` to `true` has the same effect. | | `GORILLA_ARTIFACT_GC_ENABLED` | When set to true, enables garbage collection for deleted artifacts. Required for self-managed deployments. See [Delete an artifact](/models/artifacts/delete-artifacts) for more information. | | `WANDB_ARTIFACT_DIR` | Where to store all downloaded artifacts. If unset, defaults to the `artifacts` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of generated metadata files, which you can set using the `WANDB_DIR` environment variable. | | `WANDB_DATA_DIR` | Where to upload staging artifacts. The default location depends on your platform, because it uses the value of `user_data_dir` from the `platformdirs` Python package. Make sure this directory exists and the running user has permission to write to it. | | `WANDB_DIR` | Where to store all generated files. If unset, defaults to the `wandb` directory relative to your training script. Make sure this directory exists and the running user has permission to write to it. This does not control the location of downloaded artifacts, which you can set using the `WANDB_ARTIFACT_DIR` environment variable. | | `WANDB_IDENTITY_TOKEN_FILE` | For [identity federation](/platform/hosting/iam/identity_federation/), the absolute path to the local directory where Java Web Tokens (JWTs) are stored. | * Use the `GORILLA_DATA_RETENTION_PERIOD` environment variable cautiously. It applies to **deleted run data** (including run-associated files such as media after deletion flows). It does **not** delete artifacts; use artifact deletion and `GORILLA_ARTIFACT_GC_ENABLED` as described in [Delete an artifact](/models/artifacts/delete-artifacts). For how deleting runs and files relates to storage and this setting, see [When deleted run data is removed from storage](/models/runs/delete-runs#when-deleted-run-data-is-removed-from-storage) in **Delete runs**. Data is removed according to the retention window once the variable is set. Back up both the database and the storage bucket before you enable or change this value. Background removal of objects from your bucket is **approximate** and not guaranteed to finish within a specific time. For expectations, troubleshooting, and how this relates to storage costs, see [Manage bucket storage and costs](/platform/hosting/managing-bucket-storage). * To enable `GRAPHQL_REJECT_UNAUTHED_REQUESTS` with the [Kubernetes Operator](/platform/hosting/self-managed/operator), set it on the API component only: ```yaml theme={null} api: env: GRAPHQL_REJECT_UNAUTHED_REQUESTS: "true" ``` Apply your changes and wait for the API pods to roll out before you verify the setting. You can disable the behavior by removing the variable or setting it to another value. ## Advanced reliability settings The following section describes optional configuration you can apply to improve the reliability and performance of your W\&B Server deployment. ### Redis An external Redis server is optional but recommended for production systems. Redis helps improve the reliability of the service and enables caching to decrease load times, especially in large projects. Use a managed Redis service such as ElastiCache with high availability (HA) and the following specifications: * Minimum 4 GB of memory, suggested 8 GB * Redis version 6.x * In transit encryption * Authentication enabled To configure the Redis instance with W\&B, go to the W\&B settings page at `http(s)://YOUR-W&B-SERVER-HOST/system-admin`. Enable the **Use an external Redis instance** option, and fill in the Redis connection string in the following format: Configuring REDIS in W&B You can also configure Redis using the environment variable `REDIS` on the container or in your Kubernetes deployment. Alternatively, you can set up `REDIS` as a Kubernetes secret. This page assumes the Redis instance is running at the default port of `6379`. If you configure a different port, set up authentication, and want TLS enabled on the `redis` instance, the connection string format is: `redis://$USER:$PASSWORD@$HOST:$PORT?tls=true` # Dedicated Cloud Source: https://docs.wandb.ai/platform/hosting/hosting-options/dedicated-cloud Learn about W&B Dedicated Cloud deployment features including compliance, data security, IAM, and maintenance policies. W\&B Dedicated Cloud is a fully managed platform with dedicated, isolated infrastructure, deployed in W\&B's AWS, Google Cloud, or Azure cloud accounts. Each Dedicated Cloud instance has its own isolated network, compute, and storage from other W\&B Dedicated Cloud instances. W\&B stores your W\&B-specific metadata and data in isolated cloud storage and processes it using isolated cloud compute services. Use Dedicated Cloud when you need the benefits of a managed W\&B platform along with isolation that helps you meet security, compliance, and data residency requirements. This page describes the compliance, data security, IAM, monitoring, and maintenance features available with Dedicated Cloud, and links to deeper documentation for each topic. W\&B Dedicated Cloud is available in [multiple global regions for each cloud provider](./dedicated-cloud/regions). ## Rate limits W\&B applies default rate limits on Dedicated Cloud to maintain instance stability. See [Rate limits](/platform/hosting/hosting-options/dedicated-cloud/rate-limits) for default values, how limits are enforced, and how to request higher limits when you scale up training. ## Compliance W\&B Dedicated Cloud supports the following compliance frameworks: * **SOC 2**: W\&B Dedicated Cloud's hosting platform meets the requirements of the [Service and Organization Controls (SOC) 2 Type 2](https://www.aicpa-cima.com/topic/audit-assurance/audit-and-assurance-greater-than-soc-2), published by the [Auditing Standards Board of the American Institute of Certified Public Accountants (AICPA)](https://www.aicpa-cima.com/home). A SOC 2 report evaluates a service organization's controls for security, availability, processing integrity, confidentiality, and privacy. W\&B Dedicated Cloud is subject to periodic internal and external audits to verify continued compliance. Refer to the [W\&B Security Portal](https://security.wandb.ai/) to request the SOC 2 report and other security and compliance documents. * **HIPAA**: When configured appropriately, W\&B Dedicated Cloud meets the requirements of the [Health Insurance Portability and Accountability Act of 1996 (HIPAA)](https://www.hhs.gov/hipaa/for-professionals/index.html). Compliance with HIPAA is a shared responsibility that involves W\&B, the customer, and any third-party services involved in the deployment. Organizations subject to HIPAA must have a **Business Associate Agreement** on file with W\&B. Refer to the [W\&B Security Portal](https://security.wandb.ai/) to request more information. ## Data security Dedicated Cloud provides several mechanisms for controlling where your data is stored and how it can be accessed. The following options let you bring your own storage, restrict network access, and connect privately to your instance. You can bring your own bucket (BYOB) using the [secure storage connector](/platform/hosting/data-security/secure-storage-connector) at the [instance and team levels](/platform/hosting/data-security/secure-storage-connector#configuration-options) to store your files such as models, datasets, and more. Similar to W\&B Multi-tenant Cloud, you can configure a single bucket for multiple teams or you can use separate buckets for different teams. If you don't configure secure storage connector for a team, W\&B stores that data in the instance-level bucket. Dedicated Cloud architecture diagram In addition to BYOB with secure storage connector, you can use [IP allowlisting](/platform/hosting/data-security/ip-allowlisting) to restrict access to your Dedicated Cloud instance from only trusted network locations. You can connect privately to your Dedicated Cloud instance using [cloud provider's secure connectivity solution](/platform/hosting/data-security/private-connectivity). You're responsible for ensuring that your deployment complies with your organization's policies and [Security Technical Implementation Guidelines (STIG)](https://en.wikipedia.org/wiki/Security_Technical_Implementation_Guide), if applicable. ## Identity and access management (IAM) Use the identity and access management capabilities for secure authentication and effective authorization in your W\&B Organization. The following features are available for IAM in Dedicated Cloud instances: * Authenticate with [SSO using OpenID Connect (OIDC)](/platform/hosting/iam/sso) or with [LDAP](/platform/hosting/iam/ldap). * [Configure appropriate user roles](/platform/hosting/iam/access-management/manage-organization#assign-or-update-a-users-role) at the scope of the organization and within a team. * Define the scope of a W\&B project to limit who can view, edit, and submit W\&B runs to it with [restricted projects](/platform/hosting/iam/access-management/restricted-projects). * Use JSON Web Tokens with [identity federation](/platform/hosting/iam/identity_federation) to access W\&B APIs. ## Monitor Use [Audit logs](/platform/hosting/monitoring-usage/audit-logging) to track user activity within your teams and to conform to your enterprise governance requirements. Also, you can view organization usage in your Dedicated Cloud instance with [W\&B Organization Dashboard](/platform/hosting/monitoring-usage/org_dashboard). ## Maintenance Similar to W\&B Multi-tenant Cloud, you don't incur the overhead and costs of provisioning and maintaining the W\&B platform with Dedicated Cloud. To understand how W\&B manages updates on Dedicated Cloud, refer to the [server release process](/platform/hosting/server-upgrade-process). ## Data retention policy By default, a Dedicated Cloud instance retains the following items for 7 days after deletion: * Runs and history * Non-artifact run files, such as media, configuration files, and log files * Artifacts and artifact references Until this period elapses, you can restore these items. Contact [support](mailto:support@wandb.ai) or your AISE for assistance. To meet your data retention requirements, you can change the data retention period for your Dedicated Cloud instance. Depending on your use case, select the **Environment variable** or **Helm** tab for details. To change the data retention policy, set the environment variable `GORILLA_DATA_RETENTION_PERIOD` to a value in hours. For example, to retain deleted data for 14 days (336 hours): ```bash theme={null} export GORILLA_DATA_RETENTION_PERIOD="336h" ``` To change the data retention policy, set the Helm value `env.dataRetentionPeriod` to a value in hours. For example, to retain deleted data for 14 days (336 hours): ```yaml theme={null} env: dataRetentionPeriod: "336h" ``` ## Migration options W\&B supports migration to Dedicated Cloud from a [Self-Managed instance](/platform/hosting/hosting-options/self-managed) or [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud), subject to specific limits and migration-related constraints. ## Next steps If you're interested in using Dedicated Cloud, submit the [Dedicated SaaS trial request form](https://wandb.ai/site/for-enterprise/dedicated-saas-trial) to start the process with W\&B. # Export data from Dedicated Cloud Source: https://docs.wandb.ai/platform/hosting/hosting-options/dedicated-cloud/export-data Export runs, metrics, artifacts, and reports from a W&B Dedicated Cloud instance using the Python SDK API. Export the data managed in your W\&B Dedicated Cloud instance, such as runs, metrics, artifacts, and reports, when you need a portable copy for backup, migration, or external analysis. To extract this data, use the W\&B SDK API with the [Import and Export API](/models/ref/python/public-api/). The following table covers some key export use cases. | Purpose | Documentation | | ------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Export project metadata | [Projects API](/models/ref/python/public-api/projects) | | Export runs in a project | [Runs API](/models/ref/python/public-api/runs) | | Export reports | [Report and Workspace API](/models/reports/clone-and-export-reports/) | | Export artifacts | [Explore artifact graphs](/models/artifacts/explore-and-traverse-an-artifact-graph/), [Download and use artifacts](/models/artifacts/download-and-use-an-artifact/#download-and-use-an-artifact-stored-on-wb) | If you manage artifacts stored in the Dedicated Cloud with [Secure Storage Connector](/platform/app/settings-page/teams/#secure-storage-connector), you might not need to export the artifacts using the W\&B SDK API. Using the W\&B SDK API to export all of your data can be slow if you have many runs, artifacts, and similar resources. W\&B recommends running the export process in appropriately sized batches to avoid overwhelming your Dedicated Cloud instance. # Rate limits Source: https://docs.wandb.ai/platform/hosting/hosting-options/dedicated-cloud/rate-limits Default rate limits on Dedicated Cloud and how to request changes W\&B Dedicated Cloud applies rate limits to maintain instance stability. W\&B can adjust limits when you scale up training or need higher throughput. Use this page to understand the default limits that apply to your Dedicated Cloud instance and how to request an increase if your workloads need more capacity. ## Default limits and notification The following section lists the default rate limits and describes how the W\&B SDK behaves when you exceed a limit. The following default limits help maintain platform stability: | Limit | Default | Scope | | -------------------------------- | ------- | ------- | | Filestream requests per second | 500 | Project | | Filestream ingestion per second | 120 MB | Project | | Filestream requests per second | 2 | Run | | Run creation requests per second | 80 | Project | When a limit is exceeded, the W\&B SDK returns HTTP response `429`, and the message `HTTP 429: rate limited exceeded` appears in the SDK logs. * Filesystem rate limits never cause logging to crash or fail. When the SDK receives a `429` response on a filestream request, it will back off and retry the rate-limited request as-is, while subsequent updates accumulate. * Run creation rate limits block further training. ``` HTTP 429: rate limit exceeded ``` W\&B recommends these defaults for typical production workloads. If your workloads consistently exceed these limits, contact [W\&B support](mailto:support@wandb.com) or your Account Solutions Engineer (AISE) to request higher limits. Provide details of your experimental setup and usage patterns so that W\&B can scope an appropriate adjustment. # Supported Dedicated Cloud regions Source: https://docs.wandb.ai/platform/hosting/hosting-options/dedicated-cloud/regions View all supported AWS, Google Cloud, and Azure regions available for W&B Dedicated Cloud instances. AWS, Google Cloud, and Azure support cloud computing services in multiple locations worldwide. Global regions help ensure that you satisfy requirements such as data residency, compliance, latency, and cost efficiency. W\&B supports many of the available global regions for Dedicated Cloud. The following sections list the supported regions for each cloud provider. Reach out to W\&B Support if your preferred AWS, Google Cloud, or Azure region isn't listed. W\&B can validate whether the relevant region has all the services that Dedicated Cloud needs and prioritize support depending on the outcome of the evaluation. ## Supported AWS regions The following table lists [AWS regions](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html) that W\&B supports for Dedicated Cloud instances. | Region location | Region name | | ------------------------ | ---------------- | | US East (Ohio) | `us-east-2` | | US East (N. Virginia) | `us-east-1` | | US West (N. California) | `us-west-1` | | US West (Oregon) | `us-west-2` | | Canada (Central) | `ca-central-1` | | Europe (Frankfurt) | `eu-central-1` | | Europe (Ireland) | `eu-west-1` | | Europe (London) | `eu-west-2` | | Europe (Milan) | `eu-south-1` | | Europe (Stockholm) | `eu-north-1` | | Asia Pacific (Mumbai) | `ap-south-1` | | Asia Pacific (Singapore) | `ap-southeast-1` | | Asia Pacific (Sydney) | `ap-southeast-2` | | Asia Pacific (Tokyo) | `ap-northeast-1` | | Asia Pacific (Seoul) | `ap-northeast-2` | For more information about AWS regions, see [Regions, Availability Zones, and Local Zones](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZones.html) in the AWS Documentation. See [What to Consider when Selecting a Region for your Workloads](https://aws.amazon.com/blogs/architecture/what-to-consider-when-selecting-a-region-for-your-workloads/) for an overview of factors to consider when choosing an AWS region. ## Supported Google Cloud regions The following table lists [Google Cloud regions](https://cloud.google.com/compute/docs/regions-zones) that W\&B supports for Dedicated Cloud instances. | Region location | Region name | | --------------- | ------------------------- | | South Carolina | `us-east1` | | N. Virginia | `us-east4` | | Iowa | `us-central1` | | Oregon | `us-west1` | | Los Angeles | `us-west2` | | Las Vegas | `us-west4` | | Toronto | `northamerica-northeast2` | | Belgium | `europe-west1` | | London | `europe-west2` | | Frankfurt | `europe-west3` | | Netherlands | `europe-west4` | | Sydney | `australia-southeast1` | | Tokyo | `asia-northeast1` | | Seoul | `asia-northeast3` | For more information about Google Cloud regions, see [Regions and zones](https://cloud.google.com/compute/docs/regions-zones) in the Google Cloud Documentation. ## Supported Azure regions The following table lists [Azure regions](https://azure.microsoft.com/explore/global-infrastructure/geographies/#geographies) that W\&B supports for Dedicated Cloud instances. | Region location | Region name | | --------------- | --------------- | | Virginia | `eastus` | | Iowa | `centralus` | | Washington | `westus2` | | California | `westus` | | Canada Central | `canadacentral` | | France Central | `francecentral` | | Netherlands | `westeurope` | | Tokyo, Saitama | `japaneast` | | Seoul | `koreacentral` | For more information about Azure regions, see [Azure geographies](https://azure.microsoft.com/explore/global-infrastructure/geographies/#overview) in the Azure Documentation. # Multi-tenant Cloud Source: https://docs.wandb.ai/platform/hosting/hosting-options/multi_tenant_cloud Learn about W&B Multi-tenant Cloud, a fully managed deployment on Google Cloud with built-in compliance and data security. This page describes W\&B Multi-tenant Cloud, including its architecture, compliance posture, data security options, and identity and access management capabilities, so you can decide whether it fits your organization's hosting requirements. W\&B Multi-tenant Cloud is a fully managed platform deployed in W\&B's Google Cloud account in [Google Cloud's North America regions](https://cloud.google.com/compute/docs/regions-zones). W\&B Multi-tenant Cloud uses autoscaling in Google Cloud to scale the platform based on changes in traffic. Multi-tenant Cloud architecture diagram W\&B Multi-tenant Cloud scales to meet your organization's needs and supports logging up to 250,000 metrics per project with up to 1 million data points per metric. For larger deployments, contact [support](mailto:support@wandb.com). ## Compliance W\&B Multi-tenant Cloud's hosting platform meets the requirements of the [Service and Organization Controls (SOC) 2 Type 2](https://www.aicpa-cima.com/topic/audit-assurance/audit-and-assurance-greater-than-soc-2), published by the [Auditing Standards Board of the American Institute of Certified Public Accountants (AICPA)](https://www.aicpa-cima.com/home). A SOC 2 report evaluates a service organization's controls for security, availability, processing integrity, confidentiality, and privacy. W\&B Multi-tenant Cloud is subject to periodic internal and external audits to verify continued compliance. Refer to the [W\&B Security Portal](https://security.wandb.ai/) to request the SOC 2 report and other security and compliance documents. W\&B Multi-tenant Cloud doesn't meet the requirements of the [Health Insurance Portability and Accountability Act of 1996 (HIPAA)](https://www.hhs.gov/hipaa/for-professionals/index.html). If your organization is subject to HIPAA, consider [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) instead. ## Data security For users on Free or Pro plans, W\&B stores all data only in the shared cloud storage and processes it with shared cloud compute services. Depending on your pricing plan, you may be subject to storage limits. Users on an Enterprise plan can [bring their own bucket (BYOB) using the secure storage connector](/platform/hosting/data-security/secure-storage-connector) at the [team level](/platform/hosting/data-security/secure-storage-connector#configuration-options) to store their files such as models, datasets, and more. You can configure a single bucket for multiple teams or you can use separate buckets for different W\&B Teams. If you don't configure BYOB for a team, W\&B stores the team's data in the shared cloud storage. You are responsible for ensuring that your deployment complies with your organization's policies and [Security Technical Implementation Guidelines (STIG)](https://en.wikipedia.org/wiki/Security_Technical_Implementation_Guide), if applicable. ## Data retention policy Multi-tenant Cloud retains the following items for 7 days after deletion: * Runs and history. * Non-artifact run files, such as media, configuration files, and log files. * Artifacts and artifact references. Until this period elapses, W\&B can restore these items. Contact [support](mailto:support@wandb.ai) or your AISE for assistance. ## Identity and access management (IAM) If you're on an Enterprise plan, identity and access management features support secure authentication and authorization for your W\&B deployment: * Authenticate users with SSO using OIDC or SAML. Contact your W\&B team or support if you'd like to configure SSO for your organization. * [Configure user roles](/platform/hosting/iam/access-management/manage-organization#assign-or-update-a-users-role) at the scope of the organization and within a team. * Define the scope of a W\&B project to limit who can view, edit, and submit W\&B runs to it with [restricted projects](/platform/hosting/iam/access-management/restricted-projects). ## Billing and usage Organization admins can manage usage and billing for their account from the **Billing** tab in their account view. If using the shared cloud storage on Multi-tenant Cloud, an admin can optimize storage usage across different teams in their organization. ## Maintenance W\&B Multi-tenant Cloud is a multi-tenant, fully managed platform. Because W\&B manages Multi-tenant Cloud, you don't incur the overhead and costs of provisioning and maintaining the W\&B platform. ## Next steps Access [Multi-tenant Cloud directly](https://wandb.ai) to get started with most features for free. To try data security and IAM features, [request an Enterprise trial](https://wandb.ai/site/for-enterprise/multi-tenant-saas-trial). # W&B Self-Managed deployment overview Source: https://docs.wandb.ai/platform/hosting/hosting-options/self-managed Deploy W&B Self-Managed on cloud or on-premises infrastructure W\&B recommends fully managed deployment options such as [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) or [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud). W\&B fully managed services are secure to use, with minimal configuration required. W\&B Self-Managed lets you run W\&B Server in infrastructure you control, so you can meet internal policies for data residency, network isolation, and compliance. This overview is for IT, DevOps, and MLOps teams who plan, deploy, and operate W\&B Server on their own cloud account or on-premises infrastructure. Deploy W\&B Server on your [AWS, Google Cloud, or Azure cloud account](#deploy-wb-server-within-Self-Managed-cloud-accounts) or within your [on-premises infrastructure](#deploy-wb-server-in-on-prem-infrastructure). Your IT, DevOps, or MLOps team has the following responsibilities: * Provision your deployment. * Secure your infrastructure in accordance with your organization's policies, [Security Technical Implementation Guidelines (STIG)](https://en.wikipedia.org/wiki/Security_Technical_Implementation_Guide), if applicable. * Maintain compliance with your organization's regulatory requirements, such as [Service and Organization Controls (SOC) 2 Type 2](https://www.aicpa-cima.com/topic/audit-assurance/audit-and-assurance-greater-than-soc-2) and [Health Insurance Portability and Accountability Act of 1996 (HIPAA)](https://www.hhs.gov/hipaa/for-professionals/index.html). * Manage upgrades and apply patches. * Maintain your W\&B Self-Managed deployment on an ongoing basis. If your organization is subject to regulatory requirements, consider deploying on [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), which is maintained by W\&B. * W\&B Dedicated Cloud's hosting platform meets the requirements of SOC 2 Type 2. * When configured appropriately, a W\&B Dedicated Cloud deployment complies with HIPAA. Refer to the [W\&B Security Portal](https://security.wandb.ai/) to request more information. ## About the W\&B Kubernetes Operator W\&B Self-Managed installations are delivered and managed through a Kubernetes operator, which handles the day-to-day complexity of running W\&B Server. Use the W\&B Kubernetes Operator to deploy W\&B Self-Managed. The operator simplifies deploying, administering, troubleshooting, and scaling W\&B. The operator connects to a central [deploy.wandb.ai](https://deploy.wandb.ai) server to request the latest specification changes for a given release channel and apply them. The operator receives updates as long as the license is valid. The operator uses [Helm](https://helm.sh/) both to install itself and to manage the W\&B Kubernetes stack. The deployment consists of multiple pods, one per service, and each pod name is prefixed with `wandb-`. Configuration follows a hierarchy: **Release Channel Values** (defaults from W\&B), **User Input Values** (overrides via the System Console), and **Custom Resource Values** (your spec overrides both). Use the following resources based on your deployment target: * To deploy W\&B Self-Managed in public cloud or on-premises infrastructure, see [Deploy W\&B with Kubernetes Operator](/platform/hosting/self-managed/operator). * To deploy W\&B Self-Managed to a custom cloud platform that isn't AWS, the requirements are similar to the requirements to deploy in [on-premises infrastructure](#deploy-wb-server-in-on-prem-infrastructure). * To deploy W\&B Self-Managed in air-gapped environments, see [Deploy on Air-Gapped Kubernetes](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped). You can optionally configure rate limits on your instance to maintain stability. For more information, see [Rate limits](/platform/hosting/self-managed/rate-limits). ## Required infrastructure Before you deploy, make sure the following infrastructure is available. All W\&B Self-Managed deployment options depend on these components: * Kubernetes * MySQL 8 database * Amazon S3-compatible object storage * Redis cache W\&B can provide recommendations for the different components and provide guidance through the installation process. For more information, see [Requirements](/platform/hosting/self-managed/requirements). ## Obtain your W\&B Server license A W\&B Server license authorizes your deployment. You must obtain a license before installation. For step-by-step instructions, see [License](/platform/hosting/self-managed/requirements#license) on the Requirements page. # Access management Source: https://docs.wandb.ai/platform/hosting/iam/access-management-intro Manage users, teams, roles, and project visibility as an organization or team admin in W&B. This page introduces how access management works in W\&B, including how to administer users and teams, maintain admin access to your organization, and control who can view individual projects. It's intended for organization admins, team admins, and project owners. ## Manage users and teams within an organization This section explains how admin roles are assigned and what each role can do to manage users and teams. W\&B assigns the *instance admin role* to the first user who signs up to W\&B with a unique organization domain. The organization admin assigns specific users team admin roles. W\&B recommends that you have more than one instance admin in an organization. This best practice ensures that admin operations can continue when the primary admin isn't available. A *team admin* is a user in an organization who has administrative permissions within a team. Organization admins can access and use an organization's account settings at `https://wandb.ai/account-settings/` to invite users, assign or update a user's role, create teams, remove users from your organization, and assign the billing admin. See [Add and manage users](/platform/hosting/iam/access-management/manage-organization/#add-and-manage-users) for more information. After an organization admin creates a team, the instance admin or a team admin can: * Invite users to the team or remove users from the team. By default, only an admin can perform this action. To change this behavior, see [Team settings](/platform/app/settings-page/teams/#privacy). * Assign or update a team member's role. * Automatically add new users to a team when they join your organization. Both the organization admin and the team admin use team dashboards at `https://wandb.ai/[YOUR-TEAM-NAME]` to manage teams. For more information, and to configure a team's default privacy settings, see [Add and manage teams](/platform/hosting/iam/access-management/manage-organization/#add-and-manage-teams). ## Maintain admin access This section describes why you must always preserve admin access and how automated user deprovisioning can put that access at risk. You must ensure that at least one admin user always exists in your instance or organization. Otherwise, no user can configure or maintain your organization's W\&B account. If you manage users interactively, deleting a user requires admin access, including when deleting another admin user. This helps reduce the risk of removing the sole admin user. However, if an organization uses automated processes to deprovision users from W\&B, a deprovisioning operation could inadvertently remove the last remaining admin from the instance or organization. For assistance with developing operational procedures, or to restore admin access, contact [support](mailto:support@wandb.com). ## Limit visibility to specific projects Define the scope of a W\&B project to limit who can view, edit, and submit W\&B runs to it. Limiting who can view a project is particularly useful if a team works with sensitive or confidential data. An organization admin, team admin, or the owner of a project can set and edit a project's visibility. For more information, see [Project visibility](/platform/hosting/iam/access-management/restricted-projects/). # Manage your organization Source: https://docs.wandb.ai/platform/hosting/iam/access-management/manage-organization Manage users, teams, roles, seats, and billing within a W&B organization as an organization admin. This page describes how to administer a W\&B organization, including how to invite users, manage teams, assign roles and seats, and configure billing. Use these procedures to keep your organization's membership, access, and team structure aligned with how your company uses W\&B. As an admin of an organization, you can [manage individual users](#add-and-manage-users) within your organization and [manage teams](#add-and-manage-teams). As a team admin, you can [manage teams](#add-and-manage-teams). The following workflow applies to users with instance admin roles. Contact an admin in your organization if you believe you should have instance admin permissions. To simplify user management in your organization, see [Automate user and team management](../automate_iam). ## Change the name of your organization The following workflow only applies to W\&B Multi-tenant Cloud. 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. In the upper right corner of the page, select the **User menu** dropdown. Within the **Account** section of the dropdown, select **Settings**. 3. Within the **Settings** tab, select **General**. 4. Select the **Change name** button. 5. Within the modal that appears, provide a new name for your organization and select the **Save name** button. ## Add and manage users As an admin, use your organization's dashboard to: * Invite or remove users. * Assign or update a user's organization role, and create custom roles. * Assign the billing admin. An organization admin can add users to an organization in several ways: * Member-by-invite * Auto provisioning with SSO * Domain capture The following sections describe each method. ### Seats and pricing The following table summarizes how seats work for Models and Weave: | Product | Seats | Cost based on | | ------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Models | Pay per set | How many Models paid seats you have and how much usage you've accrued determines your overall subscription cost. You can assign each user one of three available seat types: Full, Viewer, or No-Access. | | Weave | Free | Usage based | ### Invite a user Admins can invite users to their organization, as well as to specific teams within the organization. 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. In the upper right corner of the page, select the **User menu** dropdown. Within the **Account** section of the dropdown, select **Users**. 3. Select **Invite new user**. 4. In the modal that appears, provide the email or username of the user in the **Email or username** field. 5. Optional: Add the user to a team from the **Choose teams** dropdown menu. 6. From the **Select role** dropdown, select the role to assign to the user. You can change the user's role later. See the table listed in [Assign a role](#assign-or-update-a-team-members-role) for more information about possible roles. 7. Click the **Send invite** button. After you select the **Send invite** button, W\&B sends an invite link to the user's email using a third-party email server. A user can access your organization once they accept the invite. 1. Navigate to `https://.io/console/settings/`. Replace `` with your organization name. 2. Select the **Add user** button. 3. Within the modal that appears, provide the email of the new user in the **Email** field. 4. Select a role to assign to the user from the **Role** dropdown. You can change the user's role later. See the table listed in [Assign a role](#assign-or-update-a-team-members-role) for more information about possible roles. 5. To have W\&B send an invite link to the user's email using a third-party email server, check the **Send invite email to user** box. 6. Select the **Add new user** button. ### Auto provision users Auto-provisioning streamlines onboarding by letting users join your organization automatically through SSO, so admins don't need to send individual invitations. If you configure SSO and your SSO provider permits it, a W\&B user with a matching email domain can log in to your W\&B organization with Single Sign-On (SSO). SSO is available for all Enterprise licenses. **Enable SSO for authentication** W\&B recommends that users authenticate using Single Sign-On (SSO). Contact your W\&B team to enable SSO for your organization. For more information about how to set up SSO with Dedicated Cloud or Self-Managed instances, see [SSO with OIDC](/platform/hosting/iam/sso/) or [SSO with LDAP](/platform/hosting/iam/ldap/). W\&B assigns auto-provisioning users "Member" roles by default. You can change the role of auto-provisioned users at any time. Auto-provisioning users with SSO is on by default for Dedicated Cloud instances and Self-Managed deployments. You can turn off auto provisioning. Turning auto provisioning off lets you selectively add specific users to your W\&B organization. The following tabs describe how to turn off SSO based on deployment type: Auto provisioning with SSO isn't configurable for Multi-tenant Cloud. Contact your W\&B team for assistance. For Dedicated Cloud instances, contact your W\&B team if you want to turn off auto provisioning with SSO. For Self-Managed deployments, use the W\&B Console to turn off auto provisioning with SSO: 1. Navigate to `https://.io/console/settings/`. Replace `` with your organization name. 2. Select **Security**. 3. Select the **Disable SSO Provisioning** to turn off auto provisioning with SSO. Auto provisioning with SSO is useful for adding users to an organization at scale because organization admins don't need to generate individual user invitations. ### Domain capture Domain capture helps your employees join your company's organization to ensure new users don't create assets outside your company's jurisdiction. **Domains must be unique** Domains are unique identifiers. This means that you can't use a domain that's already in use by another organization. Domain capture lets you automatically add people with a company email address, such as `@example.com`, to your W\&B Multi-tenant Cloud organization. This helps all your employees join the right organization and ensures that new users don't create assets outside your company jurisdiction. This table summarizes the behavior of new and existing users with and without domain capture enabled: | | With domain capture | Without domain capture | | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | | New users | Users who sign up for W\&B from verified domains are automatically added as members to your organization's default team. They can choose additional teams to join at sign up, if you enable team joining. They can still join other organizations and teams with an invitation. | Users can create W\&B accounts without knowing a centralized organization is available. | | Invited users | Invited users automatically join your organization when accepting your invite. Invited users aren't automatically added as members to your organization's default team. They can still join other organizations and teams with an invitation. | Invited users automatically join your organization when accepting your invite. They can still join other organizations and teams with an invitation. | | Existing users | Existing users with verified email addresses from your domains can join your organization's teams within the W\&B App. All data that existing users create before joining your organization remains. W\&B doesn't migrate the existing user's data. | Existing W\&B users may be spread across multiple organizations and teams. | To automatically assign non-invited new users to a default team when they join your organization: 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. In the upper right corner of the page, select the **User menu** dropdown. From the dropdown, choose **Settings**. 3. Within the **Settings** tab, select **General**. 4. Click the **Claim domain** button within **Domain capture**. 5. Select the team that you want new users to automatically join from the **Default team** dropdown. If no teams are available, update team settings. See the instructions in [Add and manage teams](#add-and-manage-teams). 6. Click the **Claim email domain** button. You must enable domain matching within a team's settings before you can automatically assign non-invited new users to that team. 1. Navigate to the team's dashboard at `https://wandb.ai/`, where `` is the name of the team for which you want to enable domain matching. 2. Select **Team settings** in the global navigation on the left side of the team's dashboard. 3. Within the **Privacy** section, toggle the **Recommend new users with matching email domains join this team upon signing up** option. If you use Dedicated Cloud or Self-Managed deployment type, contact your W\&B Account Team to configure domain capture. Once configured, your W\&B Multi-tenant Cloud instance automatically prompts users who create a W\&B account with your company email address to contact your admin to request access to your Dedicated Cloud or Self-Managed instance. | | With domain capture | Without domain capture | | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------- | | New users | Users who sign up for W\&B on Multi-tenant Cloud from verified domains are automatically prompted to contact an admin with an email address you customize. They can still create an organization on Multi-tenant Cloud to trial the product. | Users can create W\&B Multi-tenant Cloud accounts without learning that their company has a centralized dedicated instance. | | Existing users | Existing W\&B users may be spread across multiple organizations and teams. | Existing W\&B users may be spread across multiple organizations and teams. | ### Assign or update a user's role A user's organization role controls what they can do across the organization, such as inviting other users, managing teams, or viewing content. Assign or update roles to give each user the appropriate level of administrative authority. Every member in an organization has an organization role and seat for both W\&B Models and Weave. The type of seat they have determines both their billing status and the actions they can take in each product line. You initially assign an organization role to a user when you invite them to your organization. You can change any user's role later. A user within an organization can have one of the following roles: | Role | Descriptions | | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Admin | An instance admin who can add or remove other users to the organization, change user roles, manage custom roles, add teams, and more. W\&B recommends having more than one admin in case your admin is unavailable. | | Member | A regular user of the organization, invited by an instance admin. An organization member can't invite other users or manage existing users in the organization. | | Viewer (Enterprise-only feature) | A view-only user of your organization, invited by an instance admin. A viewer only has read access to the organization and the underlying teams that they are a member of. | | Custom Roles (Enterprise-only feature) | Custom roles let organization admins compose new roles by inheriting from the preceding View-Only or Member roles and adding additional permissions to achieve fine-grained access control. Team admins can then assign any of those custom roles to users in their respective teams. For more information, see [Add and manage custom roles](#add-and-manage-custom-roles). | To change a user's role: 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. In the upper right corner of the page, select the **User menu** dropdown. From the dropdown, choose **Users**. 3. Provide the name or email of the user in the search bar. 4. Select a role from the **TEAM ROLE** dropdown next to the name of the user. ### Assign or update a user's access While the organization role controls administrative actions, the seat type controls what a user can do within Models and Weave. Use this procedure when you need to change a user's product-level permissions independent of their organization role. A user within an organization has one of the following Model seat or Weave access types: full, viewer, or no access. | Seat type | Description | | --------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Full | Users with this role type have full permissions to write, read, and export data for Models or Weave. | | Viewer | A view-only user of your organization. A viewer only has read access to the organization and the underlying teams that they are a part of, and view-only access to Models or Weave. | | No access | Users with this role have no access to the Models or Weave products. | Model seat type and Weave access type are defined at the organization level and inherited by the team. To change a user's seat type, navigate to the organization settings and follow these steps: 1. For Multi-tenant Cloud users, navigate to your organization's settings at `https://wandb.ai/account-settings//settings`. Replace the values enclosed in angle brackets (`<>`) with your organization name. For Dedicated Cloud and Self-Managed deployments, navigate to `https://.wandb.io/org/dashboard`. 2. Select the **Users** tab. 3. From the **Role** dropdown, select the seat type you want to assign to the user. The organization role and subscription type determine which seat types are available within your organization. ### Remove a user 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. In the upper right corner of the page, select the **User menu** dropdown. From the dropdown, choose **Users**. 3. Provide the name or email of the user in the search bar. 4. Select the **action ()** menu when it appears. 5. From the dropdown, choose **Remove member**. ### Assign the billing admin 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. In the upper right corner of the page, select the **User menu** dropdown. From the dropdown, choose **Users**. 3. Provide the name or email of the user in the search bar. 4. Under the **Billing admin** column, choose the user you want to assign as the billing admin. ## Add and manage teams Teams group related users together so they can collaborate on projects and share resources within the organization. The following sections describe how to create teams, invite users to them, and manage team membership and roles. Use your organization's dashboard to create and manage teams within your organization. An organization admin or a team admin can: * Invite users to a team or remove users from a team. * Manage a team member's roles. * Automate the addition of users to a team when they join your organization. * Manage team storage with the team's dashboard at `https://wandb.ai/`. ### Create a team Use your organization's dashboard to create a team: 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. Select **Create a team to collaborate** on the left navigation panel underneath **Teams**. Create new team 3. Provide a name for your team in the **Team name** field in the modal that appears. 4. Choose a storage type. 5. Select the **Create team** button. After you select the **Create team** button, W\&B redirects you to a new team page at `https://wandb.ai/`, where `` consists of the name you provide when you create a team. Once you have a team, you can add users to that team. ### Invite users to a team Invite users to a team in your organization. Use the team's dashboard to invite users using their email address or W\&B username if they already have a W\&B account. 1. Navigate to `https://wandb.ai/`. 2. Select **Team settings** in the global navigation on the left side of the dashboard. Team settings 3. Select the **Users** tab. 4. Click **Invite a new user**. 5. Within the modal that appears, provide the email of the user in the **Email or username** field and select the role to assign to that user from the **Select a team** role dropdown. For more information about roles a user can have in a team, see [Assign or update a team member's role](#assign-or-update-a-team-members-role). 6. Click the **Send invite** button. By default, only a team or instance admin can invite members to a team. To change this behavior, see [Team settings](/platform/app/settings-page/teams#privacy). Besides inviting users manually with email invites, you can automatically add new users to a team if the new user's [email matches the domain of your organization](#domain-capture). ### Match members to a team organization during sign up Allow new users within your organization to discover teams within your organization when they sign up. New users must have a verified email domain that matches your organization's verified email domain. Verified new users can view a list of verified teams that belong to an organization when they sign up for a W\&B account. An organization admin must enable domain claiming. To enable domain capture, see the steps described in [Domain capture](#domain-capture). ### Assign or update a team member's role A team member's role determines what they can do within the team, such as managing other members, contributing content, or viewing data only. Update a team member's role when their responsibilities within the team change. 1. Select the account type icon next to the name of the team member. 2. From the dropdown, choose the account type you want that team member to possess. This table lists the roles you can assign to a member of a team: | Role | Definition | | -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Admin | A user who can add and remove other users in the team, change user roles, and configure team settings. | | Member | A regular user of a team, invited by email or their organization-level username by the team admin. A member user can't invite other users to the team. | | View-Only (Enterprise-only feature) | A view-only user of a team, invited by email or their organization-level username by the team admin. A view-only user only has read access to the team and its contents. | | Custom Roles (Enterprise-only feature) | Custom roles let organization admins compose new roles by inheriting from the preceding View-Only or Member roles and adding additional permissions to achieve fine-grained access control. Team admins can then assign any of those custom roles to users in their respective teams. For more information, see the [custom roles announcement](https://wandb.ai/wandb_fc/announcements/reports/Introducing-Custom-Roles-for-W-B-Teams--Vmlldzo2MTMxMjQ3). | Service accounts aren't users, but rather non-human identities used for automation. **Service accounts** provide API keys for automated workflows and don't consume user licenses. For more information about creating and managing service accounts, see [Use service accounts to automate workflows](/platform/hosting/iam/service-accounts/). Only enterprise licenses on Dedicated Cloud or Self-Managed deployment can assign custom roles to members in a team. ### Remove users from a team Remove a user from a team using the team's dashboard. W\&B preserves runs created in a team, even if the member who created the runs is no longer on that team. 1. Navigate to `https://wandb.ai/`. 2. Select **Team settings** in the left navigation bar. 3. Select the **Users** tab. 4. Hover your mouse next to the name of the user you want to delete. Select the **action ()** menu when it appears. 5. From the dropdown, select **Remove user**. ## Add and manage custom roles Custom roles let you tailor permissions beyond the built-in roles when the standard View-Only, Member, and Admin roles don't match your organization's access requirements. Use custom roles to grant specific permissions while still building on a familiar base role. An Enterprise license is required to create or assign custom roles on Dedicated Cloud or Self-Managed deployments. Organization admins can compose a new role based on either the View-Only or Member role and add additional permissions to achieve fine-grained access control. Team admins can assign a custom role to a team member. You create custom roles at the organization level but assign them at the team level. To create a custom role: 1. Navigate to [https://wandb.ai/home](https://wandb.ai/home). 2. In the upper right corner of the page, select the **User menu** dropdown. Within the **Account** section of the dropdown, select **Settings**. 3. Click **Roles**. 4. In the **Custom roles** section, click **Create a role**. 5. Provide a name for the role. Optionally provide a description. 6. Choose the role to base the custom role on, either **Viewer** or **Member**. 7. To add permissions, click the **Search permissions** field, then select one or more permissions to add. 8. Review the **Custom role permissions** section, which summarizes the permissions the role has. 9. Click **Create Role**. 1. Navigate to `https://.wandb.io/org/settings/`. Replace `` with your organization name. 2. In the **Custom roles** section, click **Create a role**. 3. Provide a name for the role. Optionally provide a description. 4. Choose the role to base the custom role on, either **Viewer** or **Member**. 5. To add permissions, click the **Search permissions** field, then select one or more permissions to add. 6. Review the **Custom role permissions** section, which summarizes the permissions the role has. 7. Click **Create Role**. A team admin can now assign the custom role to members of a team from the [Team settings](#invite-users-to-a-team). # Manage access control for projects Source: https://docs.wandb.ai/platform/hosting/iam/access-management/restricted-projects Manage project access using visibility scopes and project-level roles Define the scope of a W\&B project to limit who can view, edit, and submit W\&B runs to it. This page is for team and organization admins, and project owners, who need to control access to sensitive workflows or limit collaboration to a specific group of users. You can combine two controls to configure the access level for any project within a W\&B team. **Visibility scope** is the higher-level mechanism. Use it to control which groups of users can view or submit runs in a project. For a project with *Team* or *Restricted* visibility scope, you can then use **Project level roles** to control the level of access that each user has within the project. The owner of a project, a team admin, or an organization admin can set or edit a project's visibility. ## Visibility scopes Visibility scope determines which users in your organization can see and contribute to a project. You can choose from four project visibility scopes. From most public to most private, they are: | Scope | Description | | ---------- | ---------------------------------------------------------------------------------------------------------------------------------- | | Open | Anyone who knows about the project can view it and submit runs or reports. | | Public | Anyone who knows about the project can view it. Only your team can submit runs or reports. | | Team | Only members of the parent team can view the project and submit runs or reports. Anyone outside the team can't access the project. | | Restricted | Only invited members from the parent team can view the project and submit runs or reports. | Set a project's scope to **Restricted** if you want to collaborate on workflows related to sensitive or confidential data. When you create a restricted project within a team, you can invite or add specific members from the team to collaborate on relevant experiments, artifacts, and reports. Unlike other project scopes, all members of a team don't get implicit access to a restricted project. At the same time, team admins can join restricted projects if needed. ### Set visibility scope on a new or existing project Set a project's visibility scope when you create a project or when you edit it later. The following sections describe both workflows. * Only the owner of the project or a team admin can set or edit its visibility scope. * When a team admin enables **Make all future team projects private (public sharing not allowed)** within a team's privacy setting, that turns off **Open** and **Public** project visibility scopes for that team. In this case, your team can only use **Team** and **Restricted** scopes. #### Set visibility scope when you create a project To set the visibility scope for a new project: 1. Navigate to your W\&B organization on a W\&B Multi-tenant Cloud, Dedicated Cloud, or Self-Managed instance. 2. Click the **Create a new project** button in the left-hand sidebar's **My projects** section. Alternatively, navigate to the **Projects** tab of your team and click the **Create new project** button in the upper right-hand corner. 3. After you select the parent team and enter the name of the project, select the desired scope from the **Project Visibility** dropdown. Creating restricted project Complete the following step if you select **Restricted** visibility. 4. Provide names of one or more W\&B team members in the **Invite team members** field. Add only those members who are essential to collaborate on the project, since other team members don't get implicit access to a restricted project. Restricted project configuration You can add or remove members in a restricted project later, from its **Users** tab. W\&B creates the project with the selected visibility scope, and only the invited members (for a restricted project) or in-scope users can access it. #### Edit visibility scope of an existing project To change the visibility scope of an existing project: 1. Navigate to your W\&B project. 2. Select the **Overview** tab on the left column. 3. Click the **Edit Project Details** button on the upper right corner. 4. From the **Project Visibility** dropdown, select the desired scope. Editing restricted project settings Complete the following step if you select **Restricted** visibility. 5. Navigate to the **Users** tab in the project, and click the **Add user** button to invite specific users to the restricted project. W\&B updates the project's visibility scope, and access reflects the new scope along with any invited members. * All members of a team lose access to a project if you change its visibility scope from **Team** to **Restricted**, unless you invite the required team members to the project. * All members of a team get access to a project if you change its visibility scope from **Restricted** to **Team**. * If you remove a team member from the user list for a restricted project, they lose access to that project. ### Other key things to note for restricted scope Keep the following behaviors in mind when working with restricted projects: * If you want to use a team-level service account in a restricted project, you must invite or add it specifically to the project. Otherwise a team-level service account can't access a restricted project by default. * You can't move runs from a restricted project, but you can move runs from a non-restricted project to a restricted one. * You can convert the visibility of a restricted project to only **Team** scope, irrespective of the team privacy setting **Make all future team projects private (public sharing not allowed)**. * If the owner of a restricted project isn't part of the parent team anymore, the team admin must change the owner to maintain access to the project. ## Project level roles After you set a project's visibility scope, you can further refine each user's permissions inside the project by assigning a project level role. For *Team* or *Restricted* scoped projects in your team, you can assign a specific role to a user, which can differ from that user's team level role. For example, if a user has *Member* role at the team level, you can assign the *View-Only*, *Admin*, or any available custom role to that user within a *Team* or *Restricted* scope project in that team. Project level roles are in preview on W\&B Multi-tenant Cloud, Dedicated Cloud, and Self-Managed instances. ### Assign a project level role to a user To assign a project level role: 1. Navigate to your W\&B project. 2. Select the **Overview** tab on the left column. 3. Navigate to the **Users** tab in the project. 4. Click the currently assigned role for the relevant user in the **Project Role** field, which opens a dropdown listing the other available roles. 5. Select another role from the dropdown. The change saves instantly. When you change the project level role for a user to be different from their team level role, the project level role includes a **\*** to indicate the difference. ### Other key things to note for project level roles Keep the following behaviors in mind when assigning or changing project level roles: * By default, project level roles for all users in a *Team* or *Restricted* scoped project **inherit** their respective team level roles. * You **can't** change the project level role of a user who has *View-Only* role at the team level. * If the project level role for a user within a particular project **is the same as** the team level role, and a team admin later changes the team level role, W\&B automatically changes the relevant project role to track the team level role. * If you change the project level role for a user within a particular project such that **it differs from** the team level role, and a team admin later changes the team level role, the relevant project level role remains as is. * If you remove a user from a *Restricted* project when their project level role differed from the team level role, and you then add the user back to the project later, they inherit the team level role due to the default behavior. If needed, change the project level role again to differ from the team level role. # Advanced IAM configuration Source: https://docs.wandb.ai/platform/hosting/iam/advanced_env_vars Configure advanced IAM options for W&B with environment variables for SSO, session length, OIDC, and LDAP settings. In addition to basic [environment variables](../env-vars), you can use environment variables to configure advanced IAM options for your [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [Self-Managed](/platform/hosting/hosting-options/self-managed) instance. Use these variables to customize SSO behavior, session expiration, OIDC and LDAP integration, and other identity-related settings to match your organization's security and access requirements. Choose any of the following environment variables for your instance depending on your IAM needs. | Environment variable | Description | | ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `DISABLE_SSO_PROVISIONING` | Set this to `true` to turn off user auto-provisioning in your W\&B instance. | | `SESSION_LENGTH` | To change the default user session expiry time, set this variable to the desired number of hours. For example, set `SESSION_LENGTH` to `24` to configure session expiry time to 24 hours. The default value is 720 hours. | | `GORILLA_ENABLE_SSO_GROUPS_CLAIMS` | When you use OIDC-based SSO, set this variable to `true` to automate W\&B team membership in your instance based on your OIDC groups. You must also add a `groups` claim to the user OIDC token, formatted as a string array of all team names the user is part of. | | `GORILLA_LDAP_GROUP_SYNC` | If you use LDAP-based SSO, set it to `true` to automate W\&B team membership in your instance based on your LDAP groups. | | `GORILLA_OIDC_CUSTOM_SCOPES` | If you use OIDC-based SSO, you can specify additional [scopes](https://auth0.com/docs/get-started/apis/scopes/openid-connect-scopes) that the W\&B instance requests from your identity provider. These custom scopes don't change the SSO functionality. | | `GORILLA_OIDC_SECRET` | If you use OIDC-based SSO and your IdP requires an OIDC Client Secret, set this variable to the secret. | | `GORILLA_USE_IDENTIFIER_CLAIMS` | If you use OIDC-based SSO, set this variable to `true` to enforce the username and full name of your users using specific OIDC claims from your identity provider. If set, ensure that you configure the enforced username and full name in the `preferred_username` and `name` OIDC claims respectively. Usernames can only contain alphanumeric characters along with underscores and hyphens as special characters. | | `GORILLA_DISABLE_PERSONAL_ENTITY` | When set to `true`, turns off [personal entities](/support/models/articles/what-is-the-difference-between-team-and-). Prevents creation of new personal projects in their personal entities and prevents writing to existing personal projects. | | `GORILLA_DISABLE_ADMIN_TEAM_ACCESS` | Set this to `true` to restrict Organization or Instance Admins from self-joining or adding themselves to a W\&B team, ensuring that only Data and AI personas have access to the projects within the teams. | | `WANDB_IDENTITY_TOKEN_FILE` | For [identity federation](/platform/hosting/iam/identity_federation/), the absolute path to the local directory where Java Web Tokens (JWTs) are stored. | W\&B advises caution and understanding all implications before you enable some of these settings, such as `GORILLA_DISABLE_ADMIN_TEAM_ACCESS`. Contact your W\&B team with any questions. # Automate user and team management Source: https://docs.wandb.ai/platform/hosting/iam/automate_iam Automate user and team management at scale in W&B using the SCIM API and the Python SDK API. This page describes how to automate user and team management at scale using the [SCIM API](/platform/hosting/iam/scim) and the [Python SDK API](/models/ref/python/public-api). ## SCIM API Use the W\&B SCIM API to manage your W\&B organization's users and teams at scale through an identity provider (IdP) like Okta or Microsoft Entra. For more information, see [Authentication](./scim#authentication) on the SCIM reference page. W\&B's implementation includes endpoints for creating and managing custom roles and for assigning built-in and custom roles. Role endpoints aren't part of the official SCIM schema. W\&B adds role endpoints to support automated management of custom roles. The following sections describe each category of the SCIM API. ### User SCIM API The [User SCIM API](./scim#user-resource) lets you create, deactivate, fetch, and list users in a W\&B organization, and assign predefined or custom roles. For complete request and response examples, see the [SCIM reference](./scim#user-management). To deactivate a user, send `PATCH /scim/Users/{id}` with `{"active": false}`. The hosting option determines the outcome: Dedicated Cloud and Self-Managed deployments retain the user record, while Multi-tenant Cloud removes the user from the organization. Reactivation isn't available in Multi-tenant Cloud. Re-add the user instead. See [Deactivate user](./scim#deactivate-user) and [Reactivate user](./scim#reactivate-user). ### Group SCIM API The [Group SCIM API](./scim#group-resource) lets you manage W\&B teams, including creating or removing teams in an organization. To add or remove users in an existing team, use `PATCH Group`. W\&B has no notion of a "group of users having the same role." A W\&B team closely resembles a group and lets users with different roles work collaboratively on a set of related projects. Teams can consist of different groups of users. Assign each user in a team a role: team admin, member, viewer, or a custom role. W\&B maps Group SCIM API endpoints to W\&B teams because of the similarity between groups and W\&B teams. ### Custom role SCIM API The [Custom Role SCIM API](./scim#role-resource) lets you manage custom roles, including creating, listing, or updating custom roles in an organization. Delete a custom role with caution. To delete a custom role within a W\&B organization, use the `DELETE Role` endpoint. W\&B assigns the inherited predefined role to all users who had the custom role before deletion. To update the inherited role for a custom role, use the `PUT Role` endpoint. This operation doesn't affect any existing non-inherited custom permissions in the custom role. ## W\&B Python SDK API Use the [W\&B Python SDK API](/models/ref/python/public-api/api) to manage organization users, teams, and team membership through the following classes: * [`User`](/models/ref/python/public-api/user) * [`Team`](/models/ref/python/public-api/team) * [`Member`](/models/ref/python/public-api/member) # Use federated identities with SDK Source: https://docs.wandb.ai/platform/hosting/iam/identity_federation Use identity federation with JSON Web Tokens (JWTs) to authenticate with the W&B SDK and CLI without API keys. Use identity federation to sign in to the W\&B SDK and CLI with your organizational credentials, instead of using a long-lived API key. If your W\&B organization admin has configured SSO for your organization, you already use those credentials to sign in to the W\&B app UI. Identity federation is like SSO for the W\&B SDK, but uses JSON Web Tokens (JWTs) directly. Use identity federation as an alternative to API keys. This page is for organization admins who configure the JWT issuer for a W\&B organization. It's also for users or service accounts that authenticate to W\&B using JWTs. [RFC 7523](https://datatracker.ietf.org/doc/html/rfc7523) forms the underlying basis for identity federation with SDK. Identity federation is available in preview for Multi-tenant Cloud, Dedicated Cloud, and Self-Managed. An [Enterprise license](/platform/hosting/enterprise-licenses) is required. For details or assistance, contact your AISE or [support](mailto:support@wandb.com). This document uses the terms "identity provider" and "JWT issuer" interchangeably. Both refer to the same thing in the context of this capability. ## Set up the JWT issuer Before users can authenticate with JWTs, an organization admin must set up a federation between your W\&B organization and a publicly accessible JWT issuer. 1. Navigate to the **Settings** tab in your organization dashboard. 2. In the **Authentication** option, click **Set up JWT Issuer**. 3. Add the JWT issuer URL in the text box and click **Create**. W\&B automatically looks for an OIDC discovery document at the path `${ISSUER_URL}/.well-known/openid-configuration`. From the discovery document, W\&B locates the JSON Web Key Set (JWKS) at the relevant URL. W\&B uses the JWKS for real-time validation of the JWTs to ensure that the relevant identity provider issued them. After this step, your W\&B organization is federated with the JWT issuer. Users in your organization can then authenticate to W\&B using JWTs issued by that provider. ## Use the JWT to access W\&B After an organization admin sets up a JWT issuer, users can start accessing W\&B projects using JWTs issued by that identity provider. The mechanism for using JWTs is as follows: 1. You must sign in to the identity provider using one of the mechanisms available in your organization. You can access some providers in an automated manner using an API or SDK, while others are only accessible through a relevant UI. Contact your W\&B organization admin or the owner of the JWT issuer for details. 2. After you've retrieved the JWT by signing in to your identity provider, store it in a file at a secure location. Configure the absolute file path in an environment variable `WANDB_IDENTITY_TOKEN_FILE`. 3. Access your W\&B project using the W\&B SDK or CLI. The SDK or CLI automatically detects the JWT and exchanges it for a W\&B access token after validating the JWT. The W\&B access token grants access to the relevant APIs for enabling your AI workflows, such as logging runs, metrics, and artifacts. By default, the access token is stored at the path `~/.config/wandb/credentials.json`. You can change that path by specifying the environment variable `WANDB_CREDENTIALS_FILE`. JWTs are short-lived credentials that address the shortcomings of long-lived credentials such as API keys and passwords. The JWT expiry time depends on your identity provider's configuration. Refresh the JWT before it expires, and ensure that it's stored in the file referenced by the environment variable `WANDB_IDENTITY_TOKEN_FILE`. The W\&B access token also has a default expiry duration, after which the SDK or CLI tries to refresh it using your JWT. If the user JWT has also expired by that time and isn't refreshed, authentication fails. If possible, implement the JWT retrieval and post-expiry refresh mechanism as part of the AI workload that uses the W\&B SDK or CLI. ### JWT validation To ensure that only valid tokens grant access, the JWT undergoes the following validations. These validations run when the SDK or CLI exchanges the JWT for a W\&B access token and then accesses a project: * W\&B verifies the JWT signature using the JWKS at the W\&B organization level. This is the first line of defense, and if this fails, that means there's a problem with your JWKS or how your JWT is signed. * The `iss` claim in the JWT must equal the issuer URL configured at the organization level. * The `sub` claim in the JWT must equal the user's email address as configured in the W\&B organization. * The `aud` claim in the JWT must equal the name of the W\&B organization that houses the project you're accessing as part of your AI workflow. On [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [Self-Managed](/platform/hosting/hosting-options/self-managed) instances: * To skip audience validation, you can set the environment variable `FEDERATED_AUTH_AUDIENCES` to `wandb`. * Some organizations have specific requirements for the audience. To customize the `aud` value, set the environment variable `FEDERATED_AUTH_AUDIENCES` to a string with a comma-separated list of audience values. * W\&B checks the `exp` claim in the JWT to determine whether the token is valid or has expired and needs to be refreshed. ## External service accounts W\&B has long supported built-in service accounts with long-lived API keys. With identity federation for the SDK and CLI, you can also bring external service accounts that use JWTs for authentication. The organization's configured issuer must issue those JWTs. A team admin can configure external service accounts within the scope of a team, like the built-in service accounts. To configure an external service account, a team admin must: 1. Navigate to the **Service Accounts** tab for your team. 2. Click **New service account**. 3. Provide a name for the service account. 4. Select **Federated Identity** as the **Authentication Method**, then provide a **Subject**. See [Determine the Subject value for your identity provider](#determine-the-subject-value-for-your-identity-provider). 5. Click **Create**. After this step, the external service account is registered with the team and can use JWTs issued by the configured identity provider to access W\&B. The `sub` claim in the external service account's JWT must equal the subject configured by the team admin in the team-level **Service Accounts** tab. W\&B verifies that claim as part of [JWT validation](#jwt-validation). The `aud` claim requirement is similar to that for human user JWTs. When [using an external service account's JWT to access W\&B](#use-the-jwt-to-access-wb), it's often easier to automate the workflow. Automation generates the initial JWT and refreshes it as needed. To attribute runs logged using an external service account to a human user, configure the environment variables `WANDB_USERNAME` or `WANDB_USER_EMAIL` for your AI workflow, similar to built-in service accounts. W\&B recommends using a mix of built-in and external service accounts across your AI workloads with different levels of data sensitivity. This mix strikes a balance between flexibility and simplicity. ### Determine the Subject value for your identity provider The value you enter for **Subject** must exactly match the `sub` (subject) claim in JWTs issued by your IdP for the service account. W\&B applies the same comparison for every identity provider. The match is exact, case-sensitive, and whitespace-sensitive, so even a trailing space or a difference in capitalization causes authentication to fail. The W\&B App accepts any non-empty **Subject** without validating its value, since the value is entirely dependent on the IdP. W\&B can't detect an incorrect value when you create the service account. Instead, authentication fails later, when the service account presents its JWT. The most reliable way to determine the correct value is to read it from a real token. Obtain a sample JWT issued for the service account, base64url-decode its payload (the middle segment, between the two dots) locally, and copy the `sub` value verbatim into the **Subject** field. Don't paste JWTs into third-party online decoders, because a JWT is a credential. The value differs by provider. The following are typical, but always confirm them against an actual token: | Identity provider | Where to find the `sub` value | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Microsoft Entra ID | The service principal's **Object ID**, found under **Enterprise Applications** in the [Microsoft Entra admin center](https://entra.microsoft.com). Use the **Enterprise Application** (service principal) Object ID, not the **App registration** Object ID. For app-only (client credentials) tokens, this is typically the value Entra ID places in `sub`. | | Google Cloud (GCP) | The `sub` value from the ID token that Google issues for the service account. | # Configure SSO with LDAP Source: https://docs.wandb.ai/platform/hosting/iam/ldap Configure LDAP-based SSO authentication for W&B Server including connection parameters and environment variables. This guide is for W\&B Admins who enable LDAP-based single sign-on (SSO) for W\&B Server, so that users can authenticate against an existing LDAP directory instead of managing separate W\&B credentials. It explains how to configure the LDAP connection from the W\&B App system settings UI or with environment variables. It also describes the required and optional configuration parameters, including the address, base distinguished name, and attributes. You can set up either an anonymous bind, or bind with an administrator DN and password. Only W\&B Admin roles can enable and configure LDAP authentication. ## Configure LDAP connection Choose one of the following methods to configure the LDAP connection. Use the W\&B App tab to configure LDAP through the system settings UI, or use the Environment variable tab to configure LDAP at deployment time. To configure LDAP through the W\&B App: 1. Go to the W\&B App. 2. Select your profile icon, then select **System Settings**. 3. Toggle **Configure LDAP Client**. 4. Add the details in the form. For more information about each input, see [Configuration parameters](#configuration-parameters). 5. Click **Update Settings** to test your settings. This step establishes a test client or connection with W\&B Server. 6. If your connection is verified, toggle **Enable LDAP Authentication** and select **Update Settings**. After you complete these steps, W\&B Server authenticates users against your LDAP directory. Set an LDAP connection with the following environment variables: | Environment variable | Required | Example | | ----------------------------- | -------- | ------------------------------- | | `LOCAL_LDAP_ADDRESS` | Yes | `ldaps://ldap.example.com:636` | | `LOCAL_LDAP_BASE_DN` | Yes | `email=mail,group=gidNumber` | | `LOCAL_LDAP_BIND_DN` | No | `cn=admin`, `dc=example,dc=org` | | `LOCAL_LDAP_BIND_PW` | No | | | `LOCAL_LDAP_ATTRIBUTES` | Yes | `email=mail`, `group=gidNumber` | | `LOCAL_LDAP_TLS_ENABLE` | No | | | `LOCAL_LDAP_GROUP_ALLOW_LIST` | No | | | `LOCAL_LDAP_LOGIN` | No | | The following Configuration parameters section defines each environment variable. For clarity, the definition names omit the environment variable prefix `LOCAL_LDAP`. ## Configuration parameters The following table lists and describes required and optional LDAP configurations. | Environment variable | Definition | Required | | -------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | | `ADDRESS` | The address of your LDAP server within the VPC that hosts W\&B Server. | Yes | | `BASE_DN` | The root path that searches start from, required for any queries into this directory. | Yes | | `BIND_DN` | Path of the administrative user registered in the LDAP server. Required if the LDAP server doesn't support unauthenticated binding. If specified, W\&B Server connects to the LDAP server as this user. Otherwise, W\&B Server connects with anonymous binding. | No | | `BIND_PW` | The password for the administrative user, used to authenticate the binding. If left blank, W\&B Server connects with anonymous binding. | No | | `ATTRIBUTES` | Provide email and group ID attribute names as comma-separated string values. | Yes | | `TLS_ENABLE` | Enable TLS. | No | | `GROUP_ALLOW_LIST` | Group allowlist. | No | | `LOGIN` | Instructs W\&B Server to use LDAP to authenticate. Set to either `True` or `False`. Optionally set this to `False` to test the LDAP configuration. Set this to `True` to start LDAP authentication. | No | # Identity and access management (IAM) Source: https://docs.wandb.ai/platform/hosting/iam/org_team_struct Understand the three IAM scopes in W&B (organizations, teams, and projects) and how they map to your company. W\&B Platform has three IAM scopes within W\&B: [Organizations](#organization), [Teams](#team), and [Projects](#project). Understanding how these scopes nest helps you plan how to structure access, group users, and organize AI work across your company. The following sections describe each scope and how it maps to your company's structure, and link to detailed management guides. ## Organization An *Organization* is the root scope in your W\&B account or instance. All actions in your account or instance take place within the context of that root scope, including managing users, managing teams, managing projects within teams, and tracking usage. If you use [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud), you may have more than one organization where each may correspond to a business unit, a personal user, or a joint partnership with another business. If you use [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or a [Self-Managed instance](/platform/hosting/hosting-options/self-managed), it corresponds to one organization. Your company may have more than one Dedicated Cloud or Self-Managed instance to map to different business units or departments, though that's strictly an optional way to manage AI practitioners across your businesses or departments. For more information, see [Manage organizations](./access-management/manage-organization). ## Team A *Team* is a subscope within an organization that may map to a business unit, function, department, or project team in your company. You may have more than one team in your organization depending on your deployment type and pricing plan. AI projects are organized within the context of a team. Team admins govern access control within a team, and they may or may not be admins at the parent organization level. For more information, see [Add and manage teams](./access-management/manage-organization#add-and-manage-teams). ## Project A *Project* is a subscope within a team that maps to an AI project with specific intended outcomes. You may have more than one project within a team. Each project has a visibility mode that determines who can access it. Within each project, your team's work is organized into the following building blocks. Every project comprises [Workspaces](/models/track/workspaces/) and [Reports](/models/reports/), and links to relevant [Artifacts](/models/artifacts/), [Sweeps](/models/sweeps/), and [Automations](/models/automations/). # Manage users, groups, and roles with SCIM Source: https://docs.wandb.ai/platform/hosting/iam/scim Use the SCIM API to manage users, groups, and custom roles in a W&B organization with automated provisioning. Watch a [video demonstrating SCIM in action](https://www.youtube.com/watch?v=Nw3QBqV0I-o) (12 min) ## Overview This page describes how instance and organization admins use the System for Cross-domain Identity Management (SCIM) API to automate identity management in W\&B. With the SCIM API, you can provision and deprovision users, manage team membership, and define custom roles programmatically through an identity provider or CI/CD pipeline instead of clicking through the W\&B App. SCIM groups map to W\&B Teams. W\&B's SCIM API is compatible with identity providers such as Okta. For SSO configuration with Okta and other identity providers, see the [SSO documentation](/platform/hosting/iam/sso/). For practical Python examples that demonstrate how to interact with the SCIM API, visit the [`wandb-scim`](https://github.com/wandb/examples/tree/master/wandb-scim) repository. ### Supported features The SCIM API supports the following features: * **Filtering**: The API supports filtering for `/Users` and `/Groups` endpoints. * **PATCH operations**: Supports `PATCH` for partial resource updates. * **ETag support**: Conditional updates using ETags for conflict detection. * **Service account authentication**: Organization service accounts can access the API. * **Service account lifecycle**: Provision and deprovision [team-scoped and organization-scoped service accounts](/platform/hosting/iam/service-accounts). Supported on **Multi-tenant Cloud** and on **Dedicated Cloud** and **Self-Managed** v0.81.0+. If you're an admin of multiple Enterprise [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) organizations, configure the organization that receives SCIM API requests so that requests made with your API key affect the correct organization. Click your profile image, click **User Settings**, then check the **Default API organization** setting. The chosen hosting option determines the value for the `[HOST-URL]` placeholder used in the examples on this page. Examples use user IDs such as `abc` and `def`. Real requests and responses use hashed values for user IDs. ## Authentication Every SCIM request must be authenticated as an admin principal. Organization admins can authenticate with a **Bearer token** or **HTTP Basic** credentials. Either style uses the *same API key string* where a key applies. Choose a user identity or an organization-scoped service account after reviewing the key differences in the following section. ### Key differences The following list compares user credentials and service account credentials for SCIM authentication: * Who should use it: Users are best for interactive, one-off admin actions. Service accounts are best for automation and integrations (CI/CD, provisioning tools). * Credentials: Users send username and API key for Basic auth. Service accounts send only an API key (no username) for Basic auth. For Bearer auth, send only the API key in the header (no username). * Bearer versus Basic: Bearer uses `Authorization: Bearer [API-KEY]` with the key verbatim. Basic uses `Authorization: Basic ` (users encode `username:API-KEY`, and service accounts encode `:API-KEY` with a leading colon and empty username). * Scope and permissions: Use an API key from an instance or organization admin user, or from an [organization-scoped service account](/platform/hosting/iam/service-accounts/#organization-scoped-service-accounts). Keys from [team-scoped service accounts](/platform/hosting/iam/service-accounts/#team-scoped-service-accounts) can't authenticate to the SCIM API. Service accounts that use SCIM are organization-scoped and headless, which supports clearer audit trails for automation. * Where to get credentials: Users copy their API key from User Settings. Organization-scoped service account keys are in the organization dashboard under the **Service account** tab. * Multi-tenant Cloud: If you have access to more than one Multi-tenant Cloud organization, you must set the Default API organization to ensure that SCIM API calls are routed to the intended organization. ### Bearer token Send the API key as a Bearer token: ```bash theme={null} Authorization: Bearer [API-KEY] ``` The `[API-KEY]` value is the same string you'd use as the password in HTTP Basic authentication for that principal. Don't Base64-encode the key for Bearer requests. Bearer authentication for the SCIM API is available in W\&B Multi-tenant Cloud, and in Dedicated Cloud and Self-Managed v0.79.0 and later. The following examples use `[API-KEY]` as a placeholder. Replace it with a real key from an admin user or an organization-scoped service account. **List users** ```bash theme={null} curl -s -S \ -H "Authorization: Bearer [API-KEY]" \ -H "Content-Type: application/scim+json" \ "[HOST-URL]/scim/Users" ``` **Create a user** ```bash theme={null} curl -s -S -X POST \ -H "Authorization: Bearer [API-KEY]" \ -H "Content-Type: application/scim+json" \ "[HOST-URL]/scim/Users" \ -d '{ "schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"], "userName": "dev-user2", "emails": [{"primary": true, "value": "dev-user2@example.com"}] }' ``` For more information, see [Create user](#create-user). ### Users Use your personal admin credentials when you perform interactive admin tasks. Construct the HTTP `Authorization` header as `Basic `. For example, authorize as `demo:p@55w0rd`: ```bash theme={null} Authorization: Basic ZGVtbzpwQDU1dzByZA== ``` ### Service accounts Use an organization-scoped service account for automation or integrations. Construct the HTTP `Authorization` header as `Basic ` (note the leading colon and empty username). Find service account API keys in the organization dashboard under the **Service account** tab. Refer to [Organization-scoped service accounts](/platform/hosting/iam/service-accounts/#organization-scoped-service-accounts). For example, authorize with API key `sa-p@55w0rd`: ```bash theme={null} Authorization: Basic OnNhLXBANTV3MHJk ``` ## User management The SCIM user resource maps to W\&B users and service accounts. Use the endpoints in this section to provision, update, and remove users and service accounts in your organization, for example, when you onboard new employees, rotate service credentials, or remove access for departing users. For service account concepts and UI workflows, see [Use service accounts to automate workflows](/platform/hosting/iam/service-accounts). **Breaking change for integrations that parse SCIM User JSON** * In Dedicated Cloud and Self-Managed v0.80.1+, and in Multi-tenant Cloud deployments after April 30, 2026, responses from `/scim/Users` (including `GET` user, `GET` users, and `PATCH` responses that return a User) serialize `emails` as a JSON array of objects using lowercase field names (`value`, `primary`, and optional `type` or `display`), matching SCIM 2.0. * Deployments on older releases return `emails` as a single JSON object with PascalCase keys (`Value`, `Primary`, and similar). If your code reads `emails` from SCIM *responses*, treat `emails` as an array and read the primary entry (or the first element). Request bodies for creating or updating users already used the array form and are unchanged. The `list-users` filter `emails.value eq "..."` is also unchanged. ### Get user Retrieves information for a specific user or service account in your organization by user ID, or for a user by email address. Service account responses include `accountType` (`SERVICE` for team-scoped service accounts, `ORG_SERVICE` for organization-scoped service accounts). Service accounts don't include `emails`. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `GET` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ------------------------- | | `id` | string | Yes | The unique ID of the user | #### Example ```bash theme={null} GET /scim/Users/abc ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "daysActive": 42, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "lastActiveAt": "2023-10-15T14:32:10Z", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1" } ``` The response includes details about the user's activity in the organization: * **`daysActive`**: Total number of days the user has been active in the organization. * **`lastActiveAt`**: ISO 8601 timestamp of the user's most recent activity. Returns `null` if the user hasn't been active. The definition of "active" differs by deployment type: * **Dedicated Cloud / Self-Managed**: A user is active if they sign in, open any page in the W\&B App, log runs, use the SDK, or interact with the W\&B server in any way. * **Multi-tenant Cloud**: A user is active if they perform any auditable action scoped to the organization after May 8, 2025. See [Audit logging actions](/platform/hosting/monitoring-usage/audit-logging#actions) for the full list. ### List users Retrieves a list of all users and service accounts in your organization. Each resource includes `accountType` (`USER`, `SERVICE`, or `ORG_SERVICE`). #### Filter users The `/Users` endpoint supports filtering users by username or email: * `userName eq "value"`: Filter by username. * `emails.value eq "value"`: Filter by email address. ##### Example ```bash theme={null} GET /scim/Users?filter=userName eq "john.doe" GET /scim/Users?filter=emails.value eq "john@example.com" ``` #### Endpoint * **URL**: `[HOST-URL]/scim/Users` * **Method**: `GET` #### Example ```bash theme={null} GET /scim/Users ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "Resources": [ { "active": true, "daysActive": 42, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "lastActiveAt": "2023-10-15T14:32:10Z", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1" } ], "itemsPerPage": 9999, "schemas": [ "urn:ietf:params:scim:api:messages:2.0:ListResponse" ], "startIndex": 1, "totalResults": 1 } ``` The response includes details about each user's activity in the organization: * **`daysActive`**: Total number of days the user has been active in the organization. * **`lastActiveAt`**: ISO 8601 timestamp of the user's most recent activity. Returns `null` if the user hasn't been active. The definition of "active" differs by deployment type: * **Dedicated Cloud / Self-Managed**: A user is active if they sign in, open any page in the W\&B App, log runs, use the SDK, or interact with the W\&B server in any way. * **Multi-tenant Cloud**: A user is active if they perform any auditable action scoped to the organization after May 8, 2025. See [Audit logging actions](/platform/hosting/monitoring-usage/audit-logging#actions) for the full list. ### Create user Creates a new user in your organization. #### Endpoint * **URL**: `[HOST-URL]/scim/Users` * **Method**: `POST` #### Parameters | Parameter | Type | Required | Description | | ------------ | ------ | -------- | -------------------------------------------------------------------------- | | `emails` | array | Yes | Array of email objects. Must include a primary email | | `userName` | string | Yes | The username for the new user | | `modelsSeat` | string | No | Models seat level. One of `full`, `viewer`, or `none`. Defaults to `full`. | | `weaveRole` | string | No | Weave role level. One of `full`, `viewer`, or `none`. Defaults to `full`. | #### Example ```bash theme={null} POST /scim/Users ``` ```json theme={null} { "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "emails": [ { "primary": true, "value": "dev-user2@example.com" } ], "userName": "dev-user2", "modelsSeat": "full", "weaveRole": "full" } ``` ```bash theme={null} POST /scim/Users ``` ```json theme={null} { "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User", "urn:ietf:params:scim:schemas:extension:teams:2.0:User" ], "emails": [ { "primary": true, "value": "dev-user2@example.com" } ], "userName": "dev-user2", "modelsSeat": "full", "weaveRole": "full", "urn:ietf:params:scim:schemas:extension:teams:2.0:User": { "teams": ["my-team"] } } ``` #### Response ```text theme={null} (Status 201) ``` ```json theme={null} { "active": true, "displayName": "Dev User 2", "emails": [ { "primary": true, "value": "dev-user2@example.com" } ], "id": "def", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "location": "Users/def" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "modelsSeat": "full", "weaveRole": "full", "userName": "dev-user2" } ``` ```text theme={null} (Status 201) ``` ```json theme={null} { "active": true, "displayName": "Dev User 2", "emails": [ { "primary": true, "value": "dev-user2@example.com" } ], "id": "def", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "location": "Users/def" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User", "urn:ietf:params:scim:schemas:extension:teams:2.0:User" ], "userName": "dev-user2", "organizationRole": "member", "modelsSeat": "full", "weaveRole": "full", "teamRoles": [ { "teamName": "my-team", "roleName": "member" } ], "groups": [ { "value": "my-team-id" } ] } ``` ### Provision service account Creates a team-scoped or organization-scoped service account in your organization. Use this endpoint to create headless identities for automation, CI/CD, or integrations that shouldn't be tied to a human user. Omit `accountType` to create a regular user instead. See [Create user](#create-user). Available in **Dedicated Cloud** and **Self-Managed** v0.81.0+ and in **Multi-tenant Cloud**. * Set `userName` to the service account name. The API uses `userName` for the account's display name. The `displayName` field in the request body is ignored. * `emails` aren't required for service accounts. * `modelsSeat` and `weaveRole` aren't supported on create and return `400 Bad Request` if present. * Service accounts can't be updated with `PATCH` or `PUT`, can't be deactivated, and can't be assigned organization, team, or registry roles through SCIM. Create API keys in the W\&B App after provisioning. #### Endpoint * **URL**: `[HOST-URL]/scim/Users` * **Method**: `POST` #### Parameters | Parameter | Type | Required | Description | | ------------------------------------------------------- | ------ | -------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `userName` | string | Yes | Unique name for the service account. | | `accountType` | string | Yes | `SERVICE` for a [team-scoped service account](/platform/hosting/iam/service-accounts/#team-scoped-service-accounts), or `ORG_SERVICE` for an [organization-scoped service account](/platform/hosting/iam/service-accounts/#organization-scoped-service-accounts). | | `urn:ietf:params:scim:schemas:extension:teams:2.0:User` | object | Yes | Teams extension object. | | `defaultTeam` | string | Yes | Sub-field of the teams extension. Name of an existing W\&B Team. The service account is created as a member of this team. For team-scoped service accounts, this is the only team they join. Organization-scoped service accounts are also added automatically to teams created later through SCIM. | | `teams` | array | No | **Multi-tenant Cloud only.** Team names to add the account to. Include the same team as `defaultTeam` when you use this field. | #### Example ```bash theme={null} POST /scim/Users ``` ```json theme={null} { "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User", "urn:ietf:params:scim:schemas:extension:teams:2.0:User" ], "userName": "sa-deploy-bot", "accountType": "SERVICE", "urn:ietf:params:scim:schemas:extension:teams:2.0:User": { "defaultTeam": "ml-platform" } } ``` ```bash theme={null} POST /scim/Users ``` ```json theme={null} { "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User", "urn:ietf:params:scim:schemas:extension:teams:2.0:User" ], "userName": "sa-ci-runner", "accountType": "ORG_SERVICE", "urn:ietf:params:scim:schemas:extension:teams:2.0:User": { "defaultTeam": "ml-platform" } } ``` #### Response ```text theme={null} (Status 201) ``` ```json theme={null} { "accountType": "SERVICE", "active": true, "displayName": "sa-deploy-bot", "id": "xyz", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "location": "Users/xyz" }, "organizationRole": "member", "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User", "urn:ietf:params:scim:schemas:extension:wandb:2.0:User" ], "teamRoles": [ { "teamName": "ml-platform", "roleName": "member" } ], "urn:ietf:params:scim:schemas:extension:wandb:2.0:User": { "organizationRole": "member" }, "userName": "sa-deploy-bot" } ``` ```text theme={null} (Status 201) ``` ```json theme={null} { "accountType": "ORG_SERVICE", "active": true, "displayName": "sa-ci-runner", "id": "xyz", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "location": "Users/xyz" }, "organizationRole": "member", "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User", "urn:ietf:params:scim:schemas:extension:wandb:2.0:User" ], "teamRoles": [ { "teamName": "ml-platform", "roleName": "member" } ], "urn:ietf:params:scim:schemas:extension:wandb:2.0:User": { "organizationRole": "member" }, "userName": "sa-ci-runner" } ``` For an organization-scoped service account, `accountType` is `ORG_SERVICE`. In **Self-Managed** deployments, `organizationRole` is `service` or `org_service` instead of `member`, matching the account type. If the response returns one of the following errors, check the request for these common problems: * `409 Conflict`: The request includes duplicate `userName` keys for the same service account. * `400 Bad Request`: The request is missing `defaultTeam` or sets it to an invalid value. ### Deprovision service account Permanently deletes a service account and its organization membership. Use this endpoint when a service account is no longer needed (for example, after you retire an automation pipeline). This is a hard delete, and the account can't be reactivated through SCIM. Available in **Dedicated Cloud** and **Self-Managed** v0.81.0+ and in **Multi-tenant Cloud**. Use the service account's SCIM user `id` from the provision response or from [Get user](#get-user). Deprovisioning doesn't delete API keys that were already issued. Revoke keys separately in the W\&B App if needed. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `DELETE` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ------------------------------------- | | `id` | string | Yes | The unique ID of the service account. | #### Example ```bash theme={null} DELETE /scim/Users/xyz ``` ```text theme={null} (Status 204) ``` ### Delete user **Maintain admin access** Ensure that at least one admin user always exists in your instance or organization. Otherwise, no user can configure or maintain your organization's W\&B account. If an organization uses SCIM or another automated process to deprovision users from W\&B, a deprovisioning operation could inadvertently remove the last remaining admin from the instance or organization. For assistance with developing operational procedures, or to restore admin access, contact [support](mailto:support@wandb.com). Fully deletes a user from your organization. To delete a service account, see [Deprovision service account](#deprovision-service-account). #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `DELETE` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ----------------------------------- | | `id` | string | Yes | The unique ID of the user to delete | #### Example ```bash theme={null} DELETE /scim/Users/abc ``` ```text theme={null} (Status 204) ``` To temporarily deactivate the user, refer to the [Deactivate user](#deactivate-user) API, which uses the `PATCH` endpoint. ### Update user email Updates a user's primary email address. **Not supported for Multi-tenant Cloud**, where a user's account isn't managed by the organization. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | --------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `replace` | | `path` | string | Yes | `emails` | | `value` | array | Yes | Array with new email object | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "emails", "value": [ { "value": "newemail@example.com", "primary": true } ] } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "newemail@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1" } ``` ### Update user display name Updates a user's display name. **Not supported for Multi-tenant Cloud**, where a user's account isn't managed by the organization. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `replace` | | `path` | string | Yes | `displayName` | | `value` | string | Yes | New display name | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "displayName", "value": "John Doe" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "John Doe", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2025-7-01T00:00:00Z", "lastModified": "2025-7-01T00:00:00Z", "location": "users/dev-user1" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1" } ``` ### Deactivate user Deactivates a user in your organization. The result differs by deployment type: * **Dedicated Cloud** / **Self-Managed**: Sets the user's `active` field to `false`. To restore a deactivated user's access to your organization, see [Reactivate user](#reactivate-user). * **Multi-tenant Cloud**: Removes the user from the organization. To restore the user's access, re-add them to your organization. See [Create user](#create-user-request-multi-tenant). In Multi-tenant Cloud, a user's account isn't managed by the organization. This operation works for users only, not service accounts. Deactivating a service account isn't supported. Manage team service accounts in the settings for the W\&B Team. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | --------------------------------------- | | `id` | string | Yes | The unique ID of the user to deactivate | | `op` | string | Yes | `replace` | | `value` | object | Yes | Object with `{"active": false}` | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "value": {"active": false} } ] } ``` ```bash theme={null} PATCH /scim/Users ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "value": {"active": false} } ] } ``` #### Response ```text theme={null} (Status 200) ``` ```json theme={null} { "active": false, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1" } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "value": {"active": true} } ] } ``` ### Reactivate user Reactivates a previously deactivated user in your organization. * User reactivation works for users only, not service accounts. Reactivation isn't supported for service accounts. Manage service accounts in the settings for the W\&B Team. * User reactivation isn't supported in [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). To restore the user's access, re-add them to your organization. See [Create user](#create-user-request-multi-tenant). In Multi-tenant Cloud, a user's account isn't managed by the organization. An attempt to reactivate a user results in an HTTP `400` error. The `detail` field in the response body is returned verbatim from the API and may still use legacy product wording: ```json theme={null} { "schemas": [ "urn:ietf:params:scim:api:messages:2.0:Error" ], "detail": "User reactivation operations are not supported in SaaS Cloud", "status": "400" } ``` #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | --------------------------------------- | | `id` | string | Yes | The unique ID of the user to reactivate | | `op` | string | Yes | `replace` | | `value` | object | Yes | Object with `{"active": true}` | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "value": {"active": true} } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1" } ``` ### Assign organization role Assigns an organization-level role to a user. This operation works for users only, not service accounts. Custom roles aren't supported for service accounts. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ------------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `replace` | | `path` | string | Yes | `organizationRole` | | `value` | string | Yes | Role name (`admin` or `member`) | The organization-scoped `viewer` role is deprecated and can no longer be assigned in the UI. If you use SCIM to assign the `viewer` role to a user: * They're assigned the `member` role in the organization. * Their `modelsSeat` is set to `viewer` instead of `full`. This allows view-only access to Models and full access to Registry. If no Models seats are available, a `Seat limit reached` error is returned. This can be updated later if a seat is available. * Their `weaveRole` is set to `viewer` instead of `full`. This allows view-only access to Weave. * All of their existing team and project roles are set to `viewer`. * They're assigned the Registry `viewer` role in registries that are visible at the organization level. Assigning the `member` or `admin` organization role doesn't change the user's `modelsSeat` or `weaveRole`. #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "organizationRole", "value": "admin" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1", "teamRoles": [ { "teamName": "team1", "roleName": "admin" } ], "organizationRole": "admin" } ``` ### Update Models seat Updates a user's Models seat. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ---------------------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `replace` | | `path` | string | Yes | `modelsSeat` | | `value` | string | Yes | Seat level (`full`, `viewer`, or `none`) | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "modelsSeat", "value": "full" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1", "organizationRole": "member", "modelsSeat": "full", "weaveRole": "full" } ``` ### Update Weave role Updates a user's Weave role. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ---------------------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `replace` | | `path` | string | Yes | `weaveRole` | | `value` | string | Yes | Role level (`full`, `viewer`, or `none`) | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "weaveRole", "value": "full" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1", "organizationRole": "member", "modelsSeat": "full", "weaveRole": "full" } ``` ### Assign team role Assigns a team-level role to a user. This operation works for users only, not service accounts. Custom roles aren't supported for service accounts. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | ----------------------------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `replace` | | `path` | string | Yes | `teamRoles` | | `value` | array | Yes | Array of objects with `teamName` and `roleName` | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "teamRoles", "value": [ { "roleName": "admin", "teamName": "team1" } ] } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1", "teamRoles": [ { "teamName": "team1", "roleName": "admin" } ], "organizationRole": "admin" } ``` ### Add to Registry Adds a user to a registry with an assigned registry-level role. This operation works for users only, not service accounts. Custom roles aren't supported for service accounts. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | --------------------------------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `add` | | `path` | string | Yes | `registryRoles` | | `value` | array | Yes | Array of objects with `registryName` and `roleName` | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "registryRoles", "value": [ { "roleName": "admin", "registryName": "hello-registry" } ] } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1", "registryRoles": [ { "registryName": "hello-registry", "roleName": "admin" } ], "organizationRole": "admin" } ``` ### Remove from Registry Removes a user from a registry. * The remove operations follow RFC 7644 SCIM protocol specifications. Use the filter syntax `"registryRoles[registryName eq \"{registry_name}\"]"` to remove a user from a specific registry, or `"registryRoles"` to remove the user from all registries. * This operation works for users only, not service accounts. Remove service accounts from a registry in the settings for the W\&B Team. #### Endpoint * **URL**: `[HOST-URL]/scim/Users/{id}` * **Method**: `PATCH` #### Parameters | Parameter | Type | Required | Description | | --------- | ------ | -------- | --------------------------------------------------------------------------- | | `id` | string | Yes | The unique ID of the user | | `op` | string | Yes | `remove` | | `path` | string | Yes | `"registryRoles[registryName eq \"{registry_name}\"]"` or `"registryRoles"` | #### Example ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "registryRoles[registryName eq \"goodbye-registry\"]" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1", "registryRoles": [ { "registryName": "hello-registry", "roleName": "admin" } ], "organizationRole": "admin" } ``` ```bash theme={null} PATCH /scim/Users/abc ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "registryRoles" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "active": true, "displayName": "Dev User 1", "emails": [ { "primary": true, "value": "dev-user1@example.com" } ], "id": "abc", "meta": { "resourceType": "User", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Users/abc" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:User" ], "userName": "dev-user1", "organizationRole": "admin" } ``` ## Group resource The SCIM group resource maps to a W\&B Team. Use the endpoints in this section to create teams, manage team membership, and (optionally) configure team-level storage from your identity provider or automation. When you create a SCIM group in your IAM, it creates and maps to a W\&B Team, and other SCIM group operations act on the team. To configure custom storage during team creation, include `storageBucket` in the request. ### Service accounts When you create a W\&B Team using SCIM, all organization-level service accounts are automatically added to the team, to maintain the service account's access to team resources. ### Filter groups The `/Groups` endpoint supports filtering to search for specific teams. #### Supported filters The `/Groups` endpoint supports the following filter: * `displayName eq "value"`: Filter by team display name. #### Example ```bash theme={null} GET /scim/Groups?filter=displayName eq "engineering-team" ``` ### Get team Retrieve team information by providing the team's unique ID. #### Endpoint * **URL**: `[HOST-URL]/scim/Groups/{id}` * **Method**: `GET` #### Example ```bash theme={null} GET /scim/Groups/ghi ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "displayName": "acme-devs", "id": "ghi", "members": [ { "Value": "abc", "Ref": "", "Type": "", "Display": "dev-user1" } ], "meta": { "resourceType": "Group", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Groups/ghi" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:Group" ] } ``` ### List teams Retrieve a list of teams. #### Endpoint * **URL**: `[HOST-URL]/scim/Groups` * **Method**: `GET` #### Example ```bash theme={null} GET /scim/Groups ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "Resources": [ { "displayName": "acme-devs", "id": "ghi", "members": [ { "Value": "abc", "Ref": "", "Type": "", "Display": "dev-user1" } ], "meta": { "resourceType": "Group", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Groups/ghi" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:Group" ] } ], "itemsPerPage": 9999, "schemas": [ "urn:ietf:params:scim:api:messages:2.0:ListResponse" ], "startIndex": 1, "totalResults": 1 } ``` ### Create team Creates a new team resource. #### Endpoint * **URL**: `[HOST-URL]/scim/Groups` * **Method**: `POST` #### Supported fields | Field | Type | Required | | --------------- | ------------------ | --------------------------------------------------------- | | `displayName` | String | Yes | | `members` | Multi-Valued Array | Yes (`value` sub-field is required and maps to a user ID) | | `storageBucket` | Object | No | You can configure team-level [Bring your own bucket (BYOB)](/platform/hosting/data-security/secure-storage-connector) during team creation by including a `storageBucket` object. If omitted, the team uses default or instance-level storage. Provision the bucket (policy, CORS, credentials) and determine the storage address format per provider using the BYOB guide. The `storageBucket` object has the following sub-fields: * **Required**: `name` (bucket name), `provider` (one of `COREWEAVE`, `AWS`, `AZURE`, `GCP`, or `MINIO`). The value is case-sensitive. Use uppercase as shown. * **Optional**: `path` (path prefix within the bucket), `kmsKeyId` (KMS key for encryption, for example for AWS), `awsExternalId` (AWS cross-account access), `azureTenantId` (Azure tenant ID), `azureClientId` (Azure managed identity client ID). W\&B validates that the bucket exists and is reachable before creating the team. If validation fails, the SCIM request fails and the team isn't created. An invalid `provider` value returns `400 Bad Request` with a SCIM error that lists the allowed values. #### Examples These examples show how to create a team without custom storage and with BYOB storage on a specific provider. Select a tab for the desired storage configuration to see an example request, and select the **Response** tab for an example response. ```bash theme={null} POST /scim/Groups ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"], "displayName": "wandb-support", "members": [ { "value": "def" } ] } ``` ```bash theme={null} POST /scim/Groups Content-Type: application/scim+json ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"], "displayName": "ml-training-team", "members": [ { "value": "user@example.com", "display": "user@example.com" } ], "storageBucket": { "name": "wandb-coreweave-bucket", "provider": "COREWEAVE", "path": "ml-training/experiments" } } ``` ```bash theme={null} POST /scim/Groups Content-Type: application/scim+json ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"], "displayName": "ml-team", "members": [ { "value": "user@example.com", "display": "user@example.com" } ], "storageBucket": { "name": "my-company-wandb-data", "provider": "AWS", "path": "ml-team/experiments", "kmsKeyId": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012", "awsExternalId": "wandb-external-id-abc123" } } ``` ```bash theme={null} POST /scim/Groups Content-Type: application/scim+json ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"], "displayName": "research-team", "members": [], "storageBucket": { "name": "wandbstorage", "provider": "AZURE", "path": "research/artifacts", "azureTenantId": "12345678-1234-1234-1234-123456789012", "azureClientId": "87654321-4321-4321-4321-210987654321" } } ``` ```bash theme={null} POST /scim/Groups Content-Type: application/scim+json ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"], "displayName": "data-science-team", "members": [ { "value": "VXNlcjox", "display": "jane.doe@example.com" }, { "value": "VXNlcjoy", "display": "john.smith@example.com" } ], "storageBucket": { "name": "my-gcs-bucket", "provider": "GCP", "path": "data-science/runs" } } ``` ```text theme={null} (Status 201) ``` ```json theme={null} { "displayName": "wandb-support", "id": "jkl", "members": [ { "Value": "def", "Ref": "", "Type": "", "Display": "dev-user2" } ], "meta": { "resourceType": "Group", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:00:00Z", "location": "Groups/jkl" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:Group" ] } ``` ### Update team Updates an existing team's membership list. #### Endpoint * **URL**: `[HOST-URL]/scim/Groups/{id}` * **Method**: `PATCH` * **Supported operations**: `add` member, `remove` member, `replace` members. - The remove operations follow RFC 7644 SCIM protocol specifications. Use the filter syntax `members[value eq "{user_id}"]` to remove a specific user, or `members` to remove all users from the team. **User identification**: The `{user_id}` in member operations can be either of the following: * A W\&B user ID. * An email address (for example, "[user@example.com](mailto:user@example.com)"). - These operations work for users only, not service accounts. Update a team's service accounts in the settings for the W\&B Team. Replace `{team_id}` with the actual team ID and `{user_id}` with the actual user ID or email address in your requests. ### Replace team members Replaces all members of a team with a new list. This operation works for users only, not service accounts. Manage service accounts in the settings for the W\&B Team. #### Endpoint * **URL**: `[HOST-URL]/scim/Groups/{id}` * **Method**: `PUT` ```bash theme={null} PUT /scim/Groups/{team_id} ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Group"], "displayName": "acme-devs", "members": [ { "value": "{user_id_1}" }, { "value": "{user_id_2}" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "displayName": "acme-devs", "id": "ghi", "members": [ { "Value": "user_id_1", "Ref": "", "Type": "", "Display": "user1" }, { "Value": "user_id_2", "Ref": "", "Type": "", "Display": "user2" } ], "meta": { "resourceType": "Group", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:01:00Z", "location": "Groups/ghi" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:Group" ] } ``` ### Add a user to a team Adds `dev-user2` to `acme-devs`: This operation works for users only, not service accounts. Manage service accounts in the settings for the W\&B Team. ```bash theme={null} PATCH /scim/Groups/{team_id} ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "add", "path": "members", "value": [ { "value": "{user_id}" } ] } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "displayName": "acme-devs", "id": "ghi", "members": [ { "Value": "abc", "Ref": "", "Type": "", "Display": "dev-user1" }, { "Value": "def", "Ref": "", "Type": "", "Display": "dev-user2" } ], "meta": { "resourceType": "Group", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:01:00Z", "location": "Groups/ghi" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:Group" ] } ``` ### Remove a specific user from a team Removes `dev-user2` from `acme-devs`: This operation works for users only, not service accounts. Manage service accounts in the settings for the W\&B Team. ```bash theme={null} PATCH /scim/Groups/{team_id} ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "remove", "path": "members[value eq \"{user_id}\"]" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "displayName": "acme-devs", "id": "ghi", "members": [ { "Value": "abc", "Display": "dev-user1" } ], "meta": { "resourceType": "Group", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:01:00Z", "location": "Groups/ghi" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:Group" ] } ``` ### Remove all users from a team Removes all users from `acme-devs`: This operation works for users only, not service accounts. Manage service accounts in the settings for the W\&B Team. ```bash theme={null} PATCH /scim/Groups/{team_id} ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "remove", "path": "members" } ] } ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "displayName": "acme-devs", "id": "ghi", "members": null, "meta": { "resourceType": "Group", "created": "2023-10-01T00:00:00Z", "lastModified": "2023-10-01T00:01:00Z", "location": "Groups/ghi" }, "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:Group" ] } ``` ### Delete team The SCIM API doesn't support deleting teams because additional data is linked to teams. Delete teams from the W\&B App to confirm you want everything deleted. ## Role resource The SCIM role resource maps to W\&B custom roles. Use the endpoints in this section to create and maintain custom roles programmatically (for example, to keep role definitions in sync with your access policies). The `/Roles` endpoints aren't part of the official SCIM schema. W\&B adds `/Roles` endpoints to support automated management of custom roles in W\&B organizations. ### Get custom role Retrieve information for a custom role by providing the role's unique ID. #### Endpoint * **URL**: `[HOST-URL]/scim/Roles/{id}` * **Method**: `GET` #### Example ```bash theme={null} GET /scim/Roles/abc ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "description": "A sample custom role for example", "id": "Um9sZTo3", "inheritedFrom": "member", // indicates the predefined role "meta": { "resourceType": "Role", "created": "2023-11-20T23:10:14Z", "lastModified": "2023-11-20T23:31:23Z", "location": "Roles/Um9sZTo3" }, "name": "Sample custom role", "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==", "permissions": [ { "name": "artifact:read", "isInherited": true // inherited from member predefined role }, ... ... { "name": "project:update", "isInherited": false // custom permission added by admin } ], "schemas": [ "" ] } ``` ### List custom roles Retrieve information for all custom roles in the W\&B organization. #### Endpoint * **URL**: `[HOST-URL]/scim/Roles` * **Method**: `GET` #### Example ```bash theme={null} GET /scim/Roles ``` ```text theme={null} (Status 200) ``` ```json theme={null} { "Resources": [ { "description": "A sample custom role for example", "id": "Um9sZTo3", "inheritedFrom": "member", // indicates the predefined role that the custom role inherits from "meta": { "resourceType": "Role", "created": "2023-11-20T23:10:14Z", "lastModified": "2023-11-20T23:31:23Z", "location": "Roles/Um9sZTo3" }, "name": "Sample custom role", "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==", "permissions": [ { "name": "artifact:read", "isInherited": true // inherited from member predefined role }, ... ... { "name": "project:update", "isInherited": false // custom permission added by admin } ], "schemas": [ "" ] }, { "description": "Another sample custom role for example", "id": "Um9sZToxMg==", "inheritedFrom": "viewer", // indicates the predefined role that the custom role inherits from "meta": { "resourceType": "Role", "created": "2023-11-21T01:07:50Z", "location": "Roles/Um9sZToxMg==" }, "name": "Sample custom role 2", "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==", "permissions": [ { "name": "launchagent:read", "isInherited": true // inherited from viewer predefined role }, ... ... { "name": "run:stop", "isInherited": false // custom permission added by admin } ], "schemas": [ "" ] } ], "itemsPerPage": 9999, "schemas": [ "urn:ietf:params:scim:api:messages:2.0:ListResponse" ], "startIndex": 1, "totalResults": 2 } ``` ### Create custom role Creates a new custom role in the W\&B organization. #### Endpoint * **URL**: `[HOST-URL]/scim/Roles` * **Method**: `POST` #### Supported fields | Field | Type | Required | | --------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `name` | String | Name of the custom role | | `description` | String | Description of the custom role | | `permissions` | Object array | Array of permission objects where each object includes a `name` string field that has value of the form `w&bobject:operation`. For example, a permission object for delete operation on W\&B runs would have `name` as `run:delete`. | | `inheritedFrom` | String | The predefined role that the custom role inherits from. It can be either `member` or `viewer`. | #### Example ```bash theme={null} POST /scim/Roles ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Role"], "name": "Sample custom role", "description": "A sample custom role for example", "permissions": [ { "name": "project:update" } ], "inheritedFrom": "member" } ``` ```text theme={null} (Status 201) ``` ```json theme={null} { "description": "A sample custom role for example", "id": "Um9sZTo3", "inheritedFrom": "member", // indicates the predefined role "meta": { "resourceType": "Role", "created": "2023-11-20T23:10:14Z", "lastModified": "2023-11-20T23:31:23Z", "location": "Roles/Um9sZTo3" }, "name": "Sample custom role", "organizationID": "T3JnYW5pemF0aW9uOjE0ODQ1OA==", "permissions": [ { "name": "artifact:read", "isInherited": true // inherited from member predefined role }, ... ... { "name": "project:update", "isInherited": false // custom permission added by admin } ], "schemas": [ "" ] } ``` ### Update custom role The following sections describe how to add or remove permissions on an existing custom role. #### Add permissions to role Adds permissions to an existing custom role. ##### Endpoint * **URL**: `[HOST-URL]/scim/Roles/{id}` * **Method**: `PATCH` ```bash theme={null} PATCH /scim/Roles/{role_id} ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "add", "path": "permissions", "value": [ { "name": "project:delete" }, { "name": "run:stop" } ] } ] } ``` ```text theme={null} (Status 200) ``` Returns the updated role with new permissions added. #### Remove a permission from a role Removes permissions from an existing custom role. ##### Endpoint * **URL**: `[HOST-URL]/scim/Roles/{id}` * **Method**: `PATCH` ```bash theme={null} PATCH /scim/Roles/{role_id} ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "remove", "path": "permissions", "value": [ { "name": "project:update" } ] } ] } ``` ```text theme={null} (Status 200) ``` Returns the updated role with specified permissions removed. ### Replace custom role Replaces an entire custom role definition. #### Endpoint * **URL**: `[HOST-URL]/scim/Roles/{id}` * **Method**: `PUT` ```bash theme={null} PUT /scim/Roles/{role_id} ``` ```json theme={null} { "schemas": ["urn:ietf:params:scim:schemas:core:2.0:Role"], "name": "Updated custom role", "description": "Updated description for the custom role", "permissions": [ { "name": "project:read" }, { "name": "run:read" }, { "name": "artifact:read" } ], "inheritedFrom": "viewer" } ``` ```text theme={null} (Status 200) ``` Returns the replaced role definition. ### Delete custom role Delete a custom role in the W\&B organization. **Use this operation with caution**. The predefined role that the custom role inherited from is reassigned to all users who held the custom role before the deletion. #### Endpoint * **URL**: `[HOST-URL]/scim/Roles/{id}` * **Method**: `DELETE` #### Example ```bash theme={null} DELETE /scim/Roles/abc ``` ```text theme={null} (Status 204 No Content) ``` ## Advanced features The following sections describe optional capabilities (ETag-based concurrency control and standard error responses) that help SCIM integrations behave safely in production. ### ETag support The SCIM API supports ETags for conditional updates to prevent concurrent modification conflicts. This matters when multiple admins or automated systems update the same resource, because it ensures that one update doesn't silently overwrite another. ETags are returned in the `ETag` response header and the `meta.version` field. #### ETags To use ETags, follow these steps: 1. **Get current ETag**: When you GET a resource, note the ETag header in the response. 2. **Conditional update**: Include the ETag in the `If-Match` header when updating. #### Example ```text theme={null} # Get user and note ETag GET /scim/Users/abc # Response includes: ETag: W/"xyz123" # Update with ETag PATCH /scim/Users/abc If-Match: W/"xyz123" { "schemas": ["urn:ietf:params:scim:api:messages:2.0:PatchOp"], "Operations": [ { "op": "replace", "path": "organizationRole", "value": "admin" } ] } ``` A `412 Precondition Failed` error response indicates that the resource has been modified since you retrieved it. ### Error handling The SCIM API returns standard SCIM error responses: | Status Code | Description | | ----------- | ----------------------------------------------- | | `200` | Success | | `201` | Created | | `204` | No Content (successful deletion) | | `400` | Bad Request: invalid parameters or request body | | `401` | Unauthorized: authentication failed | | `403` | Forbidden: insufficient permissions | | `404` | Not Found: resource doesn't exist | | `409` | Conflict: resource already exists | | `412` | Precondition Failed: ETag mismatch | | `500` | Internal Server Error | ### Implementation differences per deployment type W\&B maintains two separate SCIM API implementations, and the features differ between them. Review the following table before you integrate with SCIM to confirm that the operations you rely on are available on your deployment type. | Feature | Multi-tenant Cloud | Dedicated Cloud and Self-Managed | | --------------------------------- | ------------------ | -------------------------------- | | Update user email | - | ✓ | | Update user display name | - | ✓ | | User deactivation | ✓ | ✓ | | User reactivation | - | ✓ | | Multiple emails per user | ✓ | - | | Set `modelsSeat` on create/update | ✓ | ✓ | | Set `weaveRole` on create/update | ✓ | ✓ | ## Limitations Keep the following constraints in mind when you design SCIM integrations: * **Maximum results**: 9,999 items per request. * **Dedicated Cloud and Self-Managed**: Only support one email per user. * **Team deletion**: Not supported through SCIM (use the W\&B web interface). * **User reactivation**: Not supported in Multi-tenant Cloud environments. * **Seat limits**: Operations may fail if organization seat limits are reached. # Use service accounts to automate workflows Source: https://docs.wandb.ai/platform/hosting/iam/service-accounts Manage automated or non-interactive workflows using org and team scoped service accounts A service account represents a non-human or machine user that can automatically perform common tasks across projects within a team or across teams. Service accounts are ideal for CI/CD pipelines, automated training jobs, and other machine-to-machine workflows. This page explains the scopes available for service accounts, walks through how to create and manage them, and outlines best practices for using them safely in production automation. It's intended for organization and team admins who provision credentials for automated systems. ## Key benefits Key benefits of service accounts: * **No license consumption**: Service accounts do not consume user seats or licenses * **Dedicated API keys**: Secure credentials for automated workflows * **User attribution**: Optionally attribute automated runs to human users * **Enterprise-ready**: Built for production automation at scale * **Delegated operations**: Service accounts operate on behalf of the user or organization that creates them ## Overview Service accounts provide a secure way to automate W\&B workflows without using personal user credentials or hardcoded credentials. You can create them at two scopes: * **Organization-scoped**: Created by org admins, with access across all teams. * **Team-scoped**: Created by team admins, with access limited to a specific team. A service account's API key lets the caller read from or write to projects within the service account's scope. This enables centralized management of automated workflows for experiment tracking in W\&B Models or logging traces in W\&B Weave. Service accounts are useful for: * **CI/CD pipelines**: Automatically log model training runs from GitHub Actions, GitLab CI, or Jenkins. * **Scheduled jobs**: Nightly model retraining, periodic evaluation runs, or data validation workflows. * **Production monitoring**: Log inference metrics and model performance from production systems. * **Jupyter notebooks**: Shared notebooks in JupyterHub or Google Colab environments. * **Kubernetes jobs**: Automated workflows running in Kubernetes clusters. * **Airflow/Prefect/Dagster**: ML pipeline orchestration tools. Service accounts are available on [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), [Self-Managed instances](/platform/hosting/hosting-options/self-managed) with an enterprise license, and enterprise accounts in [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). ## Organization-scoped service accounts Use an organization-scoped service account when your automation needs to read or write across projects in multiple teams. Service accounts scoped to an organization have permissions to read and write in all projects in the organization, regardless of the team, with the exception of [restricted projects](/platform/hosting/iam/access-management/restricted-projects/#visibility-scopes). Before an organization-scoped service account can access a restricted project, an admin of that project must explicitly add the service account to the project. ### Create an organization-scoped service account To create a new organization-scoped service account and API key: 1. Log in to W\&B, click your user profile icon, then: * **Dedicated Cloud** or **Self-Managed**: Click **Organization Dashboard**, then click **Service Accounts**. * **Multi-tenant Cloud**: Click **Service Accounts**. 2. Click **Create service account**. 3. Provide a name and select a default team. 4. Click **Create**. 5. Find the service account you just created. 6. Click the **action ()** menu, then click **Create API key**. 7. Provide a name for the API key, then click **Create**. 8. Copy the API key and store it securely. 9. Click **Done**. The full API key is only shown once at creation time. After you close the dialog, you cannot view the full API key again. Only the key ID (first part of the key) is visible in your settings. If you lose the full API key, you must create a new API key. An organization-scoped service account requires a default team, even though it has access to non-restricted projects owned by all teams within the organization. This helps to prevent a workload from failing if the `WANDB_ENTITY` variable isn't set in the environment for your model training or generative AI app. To use an organization-scoped service account for a project in a different team, you must set the `WANDB_ENTITY` environment variable to that team. ## Team-scoped service accounts Use a team-scoped service account when you want to limit automation to a single team's projects, following the principle of least privilege. A team-scoped service account can read and write in all projects within its team, except to [restricted projects](/platform/hosting/iam/access-management/restricted-projects/#visibility-scopes) in that team. Before a team-scoped service account can access a restricted project, an admin of that project must explicitly add the service account to the project. ### Create a team-scoped service account To create a new team-scoped service account and API key: 1. In your team's settings, click **Service Accounts**. 2. Click **New Team Service Account**. 3. Provide a name for the service account. 4. Set Authentication Method to **Generate API key** (default). If you select **Federated Identity**, the service account cannot own API keys. 5. Click **Create**. 6. Find the service account you just created. 7. Click the **action ()** menu, then click **Create API key**. 8. Provide a name for the API key, then click **Create**. 9. Copy the API key and store it securely. 10. Click **Done**. The full API key is only shown once at creation time. After you close the dialog, you cannot view the full API key again. Only the key ID (first part of the key) is visible in your settings. If you lose the full API key, you must create a new API key. ### Create additional API keys for a service account To create an API key owned by a service account: 1. Navigate to the **Service Accounts** tab in your team or organization settings. 2. Find the service account in the list. 3. Click the **action ()** menu, then click **Create API key**. 4. Provide a name for the API key, then click **Create**. 5. Copy the displayed API key immediately and store it securely. 6. Click **Done**. You can create multiple API keys for a single service account to support different environments or workflows. The full API key is only shown once at creation time. After you close the dialog, you cannot view the full API key again. Only the key ID (first part of the key) is visible in your settings. If you lose the full API key, you must create a new API key. ### Delete a service account API key To delete an API key owned by an organization or team service account: 1. Go to [Organization settings](https://wandb.ai/account-settings/), then click **API Keys**. 2. Find the API key. The list includes all API keys owned by organization and team service accounts. You can search or filter by key name or ID, and you can sort by any column. 3. Click the delete button. If you don't configure a team in your model training or generative AI app environment that uses a team-scoped service account, the model runs or weave traces log to the named project within the service account's parent team. In such a scenario, user attribution using the `WANDB_USERNAME` or `WANDB_USER_EMAIL` variables *doesn't work* unless the referenced user is part of the service account's parent team. A team-scoped service account can't log runs to a [team or restricted-scoped project](/platform/hosting/iam/access-management/restricted-projects/#visibility-scopes) in a team different from its parent team, but it can log runs to an open visibility project within another team. ### External service accounts If you'd prefer to issue credentials through your own identity provider instead of managing W\&B-native API keys, use external service accounts. In addition to built-in service accounts, W\&B also supports team-scoped external service accounts with the W\&B SDK and CLI using [Identity federation](./identity_federation#external-service-accounts) with identity providers (IdPs) that issue JSON Web Tokens (JWTs). ## Best practices After you create the service accounts you need, follow these recommendations to ensure secure and efficient use of service accounts in your organization: * **Use a secrets manager**: Store service account API keys in a secure secrets management system (for example, AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) rather than in plain text configuration files. * **Principle of least privilege**: Create team-scoped service accounts when possible, rather than organization-scoped accounts, to limit access to only necessary projects. * **Unique service accounts per use case**: Create separate service accounts for different automation workflows (for example, one for CI/CD, another for scheduled retraining) to improve auditability and enable granular access control. * **Regular audits**: Periodically review active service accounts and remove those no longer in use. Check the audit logs to monitor service account activity. * **Secure API key handling**: * Never commit API keys to version control. * Use environment variables to pass keys to applications. * Rotate keys if they're accidentally exposed. * **Naming conventions**: Use descriptive names that indicate the service account's purpose: * Good: `ci-model-training`, `nightly-eval-pipeline`, `prod-inference-monitor` * Avoid: `service-account-1`, `test-sa`, `temp` * **User attribution**: When multiple team members use the same automation workflow, set `WANDB_USERNAME` or `WANDB_USER_EMAIL` to track who triggered each run: ```bash theme={null} export WANDB_API_KEY="[SERVICE_ACCOUNT_KEY]" export WANDB_USERNAME="john.doe@company.com" ``` * **Environment configuration**: For team-scoped service accounts, always set the `WANDB_ENTITY` to ensure runs log to the correct team: ```bash theme={null} export WANDB_ENTITY="ml-team" export WANDB_PROJECT="production-models" ``` * **Error handling**: Implement proper error handling and alerts for failed authentication to identify issues with service account credentials quickly. * **Documentation**: Maintain documentation of: * Which service accounts exist and their purposes. * Which systems or workflows use each service account. * Contact information for the team responsible for each account. ## Troubleshooting If a service account isn't behaving as expected, check the following common issues and solutions: * **"Unauthorized" errors**: Verify the API key is correctly set and the service account has access to the target project. * **Runs not appearing**: Check that `WANDB_ENTITY` is set to the correct team name. * **User attribution not working**: Ensure the user specified in `WANDB_USERNAME` is a member of the team. * **Access denied to restricted projects**: Explicitly add the service account to the restricted project's access list. # Configure SSO with OIDC Source: https://docs.wandb.ai/platform/hosting/iam/sso Configure SSO using OpenID Connect with identity providers like Okta, Azure AD, and AWS Cognito for W&B instances. This guide is for administrators of W\&B Dedicated Cloud or Self-Managed instances who want to enable single sign-on (SSO) using an OpenID Connect (OIDC) compatible identity provider. By the end, you've configured your identity provider, connected it to W\&B so that users can sign in through your organization's existing identity system, and you can manage user identities and group memberships through providers like Okta, Keycloak, Auth0, Google, and Entra. ## OpenID Connect W\&B supports the following OIDC authentication flows for integrating with external Identity Providers (IdPs): * Implicit flow with form post. * Authorization code flow with Proof Key for Code Exchange (PKCE). These flows authenticate users and provide W\&B with the identity information (in the form of ID tokens) needed to manage access control. The ID token is a JWT that contains the user's identity information, such as their name, username, email, and group memberships. W\&B uses this token to authenticate the user and map them to appropriate roles or groups in the system. In the context of W\&B, access tokens authorize requests to APIs on behalf of the user, but because W\&B's primary concern is user authentication and identity, it only requires the ID token. You can use environment variables to [configure IAM options](/platform/hosting/iam/advanced_env_vars) for your [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [Self-Managed](/platform/hosting/hosting-options/self-managed) instance. To assist with configuring Identity Providers for [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [Self-Managed](/platform/hosting/hosting-options/self-managed) deployments, follow these guidelines. If you're using W\&B Multi-tenant Cloud, reach out to [support@wandb.com](mailto:support@wandb.com) for assistance. ## Configure your IdP The following sections describe how to configure your identity provider (IdP) for OIDC. Complete the configuration steps for your IdP first. You use the resulting **Client ID**, **Issuer URL**, and (optionally) **Client Secret** when you set up SSO in W\&B in the next section. Select the tab for your IdP for details. Follow this procedure to set up AWS Cognito as your IdP. At the end, you have a **Client ID** and **OIDC issuer URL** to use when you configure W\&B. 1. Sign in to your AWS account and navigate to the [AWS Cognito](https://aws.amazon.com/cognito/) App. AWS Cognito setup 2. Provide an allowed callback URL to configure the application in your IdP. Add `http(s)://[YOUR-W-AND-B-HOST]/oidc/callback` as the callback URL. Replace `[YOUR-W-AND-B-HOST]` with your W\&B host path. 3. If your IdP supports universal logout, set the Logout URL to `http(s)://[YOUR-W-AND-B-HOST]`. Replace `[YOUR-W-AND-B-HOST]` with your W\&B host path. For example, if your application runs at `https://wandb.mycompany.com`, replace `[YOUR-W-AND-B-HOST]` with `wandb.mycompany.com`. The following image demonstrates how to provide allowed callback and sign-out URLs in AWS Cognito. Host configuration `wandb/local` uses the [`implicit` grant with the `form_post` response type](https://auth0.com/docs/get-started/authentication-and-authorization-flow/implicit-flow-with-form-post) by default. You can also configure `wandb/local` to perform an `authorization_code` grant that uses the [PKCE Code Exchange](https://www.oauth.com/oauth2-servers/pkce/) flow. 4. Select one or more OAuth grant types to configure how AWS Cognito delivers tokens to your app. 5. W\&B requires specific OpenID Connect (OIDC) scopes. Select the following from AWS Cognito App: * `openid` * `profile` * `email` For example, your AWS Cognito App UI should look similar to the following image: Required fields Select the **Auth Method** in the settings page or set the `OIDC_AUTH_METHOD` environment variable to specify which grant `wandb/local` uses. You must set the **Auth Method** to `pkce`. 6. You need a **Client ID** and the URL of your OIDC issuer. The OpenID discovery document must be available at `$OIDC_ISSUER/.well-known/openid-configuration`. For example, you can generate your issuer URL by appending your User Pool ID to the Cognito IdP URL from the **App Integration** tab within the **User Pools** section: AWS Cognito issuer URL Don't use the Cognito domain for the IdP URL. Cognito provides its discovery document at `https://cognito-idp.$REGION.amazonaws.com/$USER_POOL_ID`. Next, [Set up SSO in W\&B](#set-up-sso-in-w%26b). Follow this procedure to set up Okta as your IdP. At the end, you have a **Client ID** and **OIDC issuer URL** to use when you configure W\&B. 1. Sign in to the [Okta Portal](https://login.okta.com/). 2. On the left side, select **Applications** and then **Applications** again. Okta Applications menu 3. Click **Create App integration**. Create App integration button 4. On the screen named **Create a new app integration**, select **OIDC - OpenID Connect** and **Single-Page Application**. Then click **Next**. OIDC Single-Page Application selection 5. On the screen named **New Single-Page App Integration**, complete the values as follows and click **Save**: * **App integration name**, for example `W&B`. * **Grant type**: Select both **Authorization Code** and **Implicit (hybrid)**. * **Sign-in redirect URIs**: `https://[YOUR-W-AND-B-URL]/oidc/callback`. * **Sign-out redirect URIs**: `https://[YOUR-W-AND-B-URL]/logout`. * **Assignments**: Select **Skip group assignment for now**. Single-Page App configuration 6. On the overview screen of the Okta application you created, make note of the **Client ID** under **Client Credentials** under the **General** tab: Okta Client ID location 7. To identify the Okta **OIDC Issuer URL**, select **Settings** and then **Account** on the left side. The Okta UI shows the company name under **Organization Contact**. Okta organization settings The OIDC issuer URL has the following format: `https://[COMPANY].okta.com`. Replace `[COMPANY]` with the corresponding value. Make note of it. Next, [Set up SSO in W\&B](#set-up-sso-in-w%26b). Azure AD (Entra ID) supports two OIDC configuration modes for W\&B. Choose the configuration that matches your security requirements: * [Public client](#public-client): Uses PKCE without a client secret. Simpler to configure, suitable for most deployments. * [Confidential client](#confidential-client): Uses PKCE with a client secret. Required if you need to set the `GORILLA_OIDC_SECRET` environment variable. Don't mix configurations. If you select **Single-page application** in Azure AD, don't provide a client secret. If you need a client secret, you must select **Web** as the platform type. Use this configuration if you don't need to specify a client secret. It's suitable for deployments without advanced security requirements. 1. Sign in to the [Azure Portal](https://portal.azure.com/). 2. Navigate to **Microsoft Entra ID** service and select **App registrations** from the left sidebar. 3. Click **New registration** at the top of the page. 4. On the **Register an application** screen, configure the following: * **Name**: Enter a descriptive name. * **Supported account types**: Keep the default **Single tenant** or modify as needed. * **Redirect URI**: Select platform type **Single-page application** and enter `https://[YOUR-W-AND-B-URL]/oidc/callback`. * Click **Register**. 5. After registration, note the following values from the Overview page: * **Application (client) ID**: Your OIDC Client ID. * **Directory (tenant) ID**: Your OIDC Issuer URL. Application and Directory IDs 6. Configure authentication settings: * Select **Authentication** from the left sidebar. * Under **Front-channel logout URL**, enter `https://[YOUR-W-AND-B-URL]/logout`. * Click **Save**. Make a note of the following details: * **OIDC Client ID**: The Application (client) ID from step 5. * **OIDC Issuer URL**: `https://login.microsoftonline.com/[TENANT-ID]/v2.0` (replace `[TENANT-ID]` with your Directory ID from step 5). When configuring W\&B, use: * **Auth Method**: `pkce`. * **OIDC Client Secret**: Leave empty (don't set `GORILLA_OIDC_SECRET`). Next, [Set up SSO in W\&B](#set-up-sso-in-w%26b). Use this configuration if you need to authenticate using a client secret. 1. Sign in to the [Azure Portal](https://portal.azure.com/). 2. Navigate to **Microsoft Entra ID** service and select **App registrations** from the left sidebar. 3. Click **New registration** at the top of the page. 4. On the **Register an application** screen, configure the following: * **Name**: Enter a descriptive name. * **Supported account types**: Keep the default **Single tenant** or modify as needed. * **Redirect URI**: Select platform type **Web** and enter `https://[YOUR-W-AND-B-URL]/oidc/callback`. * Click **Register**. 5. After registration, note the following values from the Overview page: * **Application (client) ID**: Your OIDC Client ID. * **Directory (tenant) ID**: Your OIDC Issuer URL. Application and Directory IDs 6. Configure authentication settings: * Select **Authentication** from the left sidebar. * Under **Front-channel logout URL**, enter `https://[YOUR-W-AND-B-URL]/logout`. * Click **Save**. 7. Create a client secret: * Select **Certificates & secrets** from the left sidebar. * Click **New client secret**. * Add a description for the secret. * Choose an expiration period. * Click **Add**. Copy and save the secret **Value** immediately (not the Secret ID). Client secret value Make a note of the following details: * **OIDC Client ID**: The Application (client) ID from step 5. * **OIDC Client Secret**: The secret value from step 7. * **OIDC Issuer URL**: `https://login.microsoftonline.com/[TENANT-ID]/v2.0` (replace `[TENANT-ID]` with your Directory ID from step 5). When configuring W\&B, use: * **Auth Method**: `pkce`. * **OIDC Client Secret**: Set the `GORILLA_OIDC_SECRET` environment variable to the secret value from step 7. The v2.0 endpoint supports both personal Microsoft accounts and work/school accounts. If your organization requires the v1.0 endpoint, use `https://login.microsoftonline.com/[TENANT-ID]` instead. Next, [Set up SSO in W\&B](#set-up-sso-in-w%26b). ## Set up SSO in W\&B After you finish configuring your IdP, complete the following steps in W\&B to connect the IdP and enable SSO for your instance. To set up SSO, you must have administrator privileges and the following information: * **OIDC Client ID**. * **OIDC Auth method** (`implicit` or `pkce`). * **OIDC Issuer URL**. * **OIDC Client Secret** (optional, depends on how you've set up your IdP). If your IdP requires an OIDC Client Secret, specify it by passing the [environment variables](/platform/hosting/env-vars) `GORILLA_OIDC_SECRET`: * In the W\&B App, go to **System Console** > **Settings** > **Advanced** > **User Spec** and add `GORILLA_OIDC_SECRET` to the `extraENV` section as shown in the following example. * In Helm, configure `values.global.extraEnv` as shown in the following example. ```yaml theme={null} values: global: extraEnv: GORILLA_OIDC_SECRET="[YOUR-SECRET]" ``` If you can't sign in to your instance after configuring SSO, you can restart the instance with the `LOCAL_RESTORE=true` environment variable set. This outputs a temporary password to the containers logs and disables SSO. After you resolve any issues with SSO, you must remove that environment variable to enable SSO again. Use this tab if you deploy W\&B with the W\&B Kubernetes Operator. The System Console is the successor to the System Settings page. It's available with the [W\&B Kubernetes Operator](/platform/hosting/self-managed/operator) based deployment. 1. Refer to [Access the W\&B Management Console](/platform/hosting/self-managed/operator#access-the-wb-management-console). 2. Navigate to **Settings**, then **Authentication**. Select **OIDC** in the **Type** dropdown. System Console OIDC configuration 3. Enter the values. 4. Click **Save**. 5. Sign out and then sign back in, this time using the IdP sign-in screen. ## Find your customer namespace Before you can configure team-level BYOB with CoreWeave storage on W\&B Dedicated Cloud or Self-Managed, you must obtain your organization's **Customer Namespace**. You can view and copy it from the bottom of the **Authentication** tab. For detailed instructions on configuring CoreWeave storage with your Customer Namespace, see [CoreWeave requirements for Dedicated Cloud or Self-Managed](/platform/hosting/data-security/secure-storage-connector#coreweave-customer-namespace). 1. Sign in to your W\&B instance. 2. Navigate to the W\&B App. W&B App navigation 3. From the dropdown, select **System Settings**: System Settings dropdown 4. Enter your **Issuer**, **Client ID**, and **Authentication Method**. 5. Select **Update settings**. Update settings button If you can't sign in to your instance after configuring SSO, you can restart the instance with the `LOCAL_RESTORE=true` environment variable set. This outputs a temporary password to the containers logs and disables SSO. After you resolve any issues with SSO, you must remove that environment variable to enable SSO again. ## Security Assertion Markup Language (SAML) W\&B doesn't support SAML. # Manage bucket storage and costs Source: https://docs.wandb.ai/platform/hosting/managing-bucket-storage Understand how W&B uses object storage, how deletion maps to bucket bytes, and how to reduce storage usage and costs. When you use [Bring your own bucket (BYOB)](/platform/hosting/data-security/secure-storage-connector), [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), or [W\&B Self-Managed](/platform/hosting/hosting-options/self-managed), your team often pays cloud storage providers directly. This page explains what occupies your bucket, how W\&B removes objects after deletion in the app or API, and what you should expect. Use it to understand bucket usage, plan cleanup, and set realistic expectations for when storage is reclaimed. This page is intended for W\&B administrators and operators who manage object storage for Self-Managed, Dedicated Cloud, or BYOB deployments. ## What uses bucket space W\&B stores several categories of data in your configured object storage. The [BYOB overview](/platform/hosting/data-security/secure-storage-connector#data-stored-in-the-central-database-vs-buckets) lists examples, including experiment files and metrics, artifact files, media files, run files, and exported history in Parquet form. Together these drive bucket size and cost. ## How W\&B removes data from storage Deletion in the W\&B App or [Public API](/models/ref/python/public-api/api) updates W\&B metadata first. Removing a run, artifact, or file from the product doesn't guarantee an immediate drop in reported bucket usage. Object storage cleanup runs as background work that can lag, especially on busy instances. ### Artifacts Deleted artifacts are soft-deleted, then processed by artifact garbage collection. Self-managed deployments must set `GORILLA_ARTIFACT_GC_ENABLED` and meet provider requirements such as versioning or soft delete. See [Delete an artifact](/models/artifacts/delete-artifacts) and [Configure environment variables](/platform/hosting/env-vars). ### Run data and run files After runs or run-associated files are deleted, permanent removal of the underlying stored objects is controlled separately from artifacts. On Dedicated Cloud and Self-Managed deployments, `GORILLA_DATA_RETENTION_PERIOD` sets how long deleted run data is retained before it can be removed from storage. This setting does not delete artifacts. See [Configure environment variables](/platform/hosting/env-vars), [Data retention policy](/platform/hosting/hosting-options/dedicated-cloud#data-retention-policy) for Dedicated Cloud, and [Delete runs](/models/runs/delete-runs#when-deleted-run-data-is-removed-from-storage) for how run and file deletion relates to storage. ## What to expect from background cleanup Garbage collection and related jobs that free object storage run without timing guarantees. W\&B does not guarantee that a given object disappears from your bucket within a specific time after you delete content in the UI or API. For projects with a large number of files per run, such as when logging many media files per run, expect longer delays before storage usage is released. Monitor your bucket in your cloud provider and contact [W\&B Support](mailto:support@wandb.ai) or your account team if cleanup appears stuck. ## Reduce bucket usage This section describes the recommended order of operations for freeing space in your bucket, starting with safe product flows and ending with direct bucket operations that require more care. Use supported product flows first: * [Delete runs in the W\&B App](/models/runs/delete-runs#ui) or [with Python](/models/runs/delete-runs#python) when you no longer need them. * [Delete artifacts](/models/artifacts/delete-artifacts) you no longer need, and use [Artifact TTL](/models/artifacts/ttl) where it fits your workflow. If you must reclaim space immediately, operators with access to the bucket may delete specific object keys directly in cloud storage. :::caution Deleting object keys directly in your bucket bypasses W\&B and can cause data loss or break access for the app. Review the following before you proceed: * Objects you remove are **no longer available to download** through W\&B. * You should delete **only** keys you intend to remove. Incorrect deletes can break access to data the app still references. * If your bucket uses **object versioning** or **provider soft delete** (for example on Google Cloud Storage), storage charges can persist until non-current versions or soft-deleted objects expire under your cloud lifecycle rules. ::: For high-level usage in W\&B Multi-tenant Cloud, organization admins can review storage-related usage from organization settings. See [Billing settings](/platform/app/settings-page/billing-settings). ## Troubleshooting If deletions don't appear correctly in the W\&B App after you use the Public API, upgrade the W\&B Python SDK to a current release and retry. Large per-run file counts can increase how long background cleanup takes across the instance. For scripted cleanup patterns that match your deployment, contact [W\&B Support](mailto:support@wandb.ai) or your account team. ## Related documentation For more information, see the following resources: * [Delete runs](/models/runs/delete-runs#delete-runs) * [Delete an artifact](/models/artifacts/delete-artifacts) * [Configure environment variables](/platform/hosting/env-vars) * [Bring your own bucket (BYOB)](/platform/hosting/data-security/secure-storage-connector) # Track user activity with audit logs Source: https://docs.wandb.ai/platform/hosting/monitoring-usage/audit-logging Access, fetch, and analyze W&B audit logs across deployment types, including the log schema and tracked actions. Use W\&B audit logs to track user activity within your organization and to conform to your enterprise governance requirements. This page is for organization-level admins who need to access, fetch, and analyze audit log data across W\&B deployment types. Audit logs are available in JSON format. Refer to [Audit log schema](#audit-log-schema). How you access audit logs depends on your W\&B platform deployment type: | W\&B Platform deployment type | Audit logs access mechanism | | -------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) |
  • [Instance-level BYOB](/platform/hosting/data-security/secure-storage-connector): Synced to instance-level bucket (BYOB) every 10 minutes. Also available with [the API](#fetch-audit-logs-using-api).
  • Default instance-level storage: Available only with [the API](#fetch-audit-logs-using-api).
| | [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) | Available for Enterprise plans only. Available only with [the API](#fetch-audit-logs-using-api). | | [Self-Managed](/platform/hosting/hosting-options/self-managed) | Synced to instance-level bucket every 10 minutes. Also available with [the API](#fetch-audit-logs-using-api). | After you fetch audit logs, you can analyze them with tools like [Pandas](https://pandas.pydata.org/docs/index.html), [Amazon Redshift](https://aws.amazon.com/redshift/), [Google BigQuery](https://cloud.google.com/bigquery), or [Microsoft Fabric](https://www.microsoft.com/microsoft-fabric). Some audit log analysis tools don't support JSON. Refer to the documentation for your analysis tool for guidelines and requirements to transform the JSON-formatted audit logs before analysis. For more details about the format of the logs, see [Audit log schema](#audit-log-schema) and [Actions](#actions). ## Audit log retention The following recommendations help you retain audit logs to meet your organization's governance and compliance obligations: * If you must retain audit logs for a specific period of time, W\&B recommends periodically transferring logs to long-term storage, either with storage buckets or the Audit Logging API. * If you are subject to the [Health Insurance Portability and Accountability Act of 1996 (HIPAA)](https://hhs.gov/hipaa/for-professionals/index.html), you must retain audit logs for a minimum of 6 years in an environment where no internal or external actor can delete or modify them before the end of the mandatory retention period. For HIPAA-compliant [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) instances with [BYOB](/platform/hosting/data-security/secure-storage-connector), you must configure guardrails for your managed storage, including any long-term retention storage. ## Audit log schema Use this schema to interpret the fields returned in each audit log entry. The following table shows all keys that can appear in an audit log entry, ordered alphabetically. Depending on the action and the circumstances, a specific log entry may include only a subset of the possible fields. | Key | Definition | | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `action` | The [action](#actions) of the event. | | `actor_email` | The email address of the user that initiated the action, if applicable. | | `actor_ip` | The IP address of the user that initiated the action. | | `actor_user_id` | The ID of the logged-in user who performed the action, if applicable. | | `artifact_asset` | The artifact ID associated with the action, if applicable. | | `artifact_digest` | The artifact digest associated with the action, if applicable. | | `artifact_qualified_name` | The full name of the artifact associated with the action, if applicable. | | `artifact_sequence_asset` | The artifact sequence ID associated with the action, if applicable. | | `cli_version` | The version of the Python SDK that initiated the action, if applicable. | | `entity_asset` | The entity or team ID associated with the action, if applicable. | | `entity_name` | The entity or team name associated with the action, if applicable. | | `project_asset` | The project associated with the action, if applicable. | | `project_name` | The name of the project associated with the action, if applicable. | | `report_asset` | The report ID associated with the action, if applicable. | | `report_name` | The name of the report associated with the action, if applicable. | | `response_code` | The HTTP response code for the action, if applicable. | | `timestamp` | The time of the event in [RFC3339 format](https://www.rfc-editor.org/rfc/rfc3339). For example, `2023-01-23T12:34:56Z` represents January 23, 2023 at 12:34:56 UTC. | | `user_asset` | The user asset the action impacts (rather than the user performing the action), if applicable. | | `user_email` | The email address of the user the action impacts (rather than the email address of the user performing the action), if applicable. | ### Personally identifiable information (PII) Personally identifiable information (PII), such as email addresses and the names of projects, teams, and reports, is available only with the API endpoint option: * For [Self-Managed](/platform/hosting/hosting-options/self-managed) and [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), an organization admin can [exclude PII](#exclude-pii) when fetching audit logs. * For [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud), the API endpoint always returns relevant fields for audit logs, including PII. This isn't configurable. ## Before you begin Before you fetch audit logs, confirm that you meet the following prerequisites for your deployment type: * Organization-level admins can fetch audit logs. If you receive a `403` error, ensure that you or your service account has adequate permission. * **Multi-tenant Cloud**: If you're a member of multiple Multi-tenant Cloud organizations, you must configure the **Default API organization**, which determines where audit logging API calls are routed. Otherwise, you receive the following error: ```text theme={null} user is associated with multiple organizations but no valid org ID found in user info ``` To specify your default API organization: 1. Click your profile image, then click **User Settings**. 2. For **Default API organization**, select an organization. This doesn't apply to a service account, which can be a member of only one Multi-tenant Cloud organization. ## Fetch audit logs To retrieve audit logs from the W\&B Audit Logging API, follow these steps. After you complete this procedure, you have a set of newline-separated JSON objects you can analyze using your preferred tooling. 1. Determine the correct API endpoint for your instance: * [Self-Managed](/platform/hosting/hosting-options/self-managed): `[WANDB-PLATFORM-URL]/admin/audit_logs` * [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud): `[INSTANCE-NAME].wandb.io/admin/audit_logs` * [Multi-tenant Cloud (Enterprise required)](/platform/hosting/hosting-options/multi_tenant_cloud): `https://api.wandb.ai/audit_logs` In the following steps, replace `[API-ENDPOINT]` with your API endpoint. 2. Optional: Construct query parameters to append to the endpoint. In the following steps, replace `[PARAMETERS]` with the resulting string. * `anonymize`: If the URL includes the parameter `anonymize=true`, W\&B removes any PII. Otherwise, PII is included. Refer to [Exclude PII when fetching audit logs](#exclude-pii-when-fetching-audit-logs). Not supported for Multi-tenant Cloud, where all fields are included, including PII. * Configure the date window of logs to fetch with a combination of `numDays` and `startDate`. Each parameter is optional, and they interact. * If neither parameter is included, only today's logs are fetched. * `numDays`: An integer that indicates the number of days backward from `startDate` to fetch logs. If you omit it or set it to `0`, logs are fetched for `startDate` only. On **Multi-tenant Cloud**, you can fetch a maximum of 7 days of audit logs, even if you set `numDays` to a larger value. * `startDate`: Optional. Use `startDate=YYYY-MM-DD` to set the newest calendar day in the range. If you omit it or set it to today, the range ends today and `numDays` counts backward from that day. * Supported in **Multi-tenant Cloud** only with an Enterprise license. * Supported in **Dedicated Cloud** and **Self-Managed** v0.80.0 and above. 3. Construct the fully qualified endpoint URL in the format `[API-ENDPOINT]?[PARAMETERS]`. 4. Execute an HTTP `GET` request on the fully qualified API endpoint with a web browser or a tool like [Postman](https://www.postman.com/downloads/), [HTTPie](https://httpie.io/), or cURL. The API response contains newline-separated JSON objects. Objects include the fields described in the [schema](#audit-log-schema), just like when audit logs are synced to an instance-level bucket. In those cases, the audit logs are located in the `/wandb-audit-logs` directory in your bucket. ### Use basic authentication You must authenticate each request to the audit logs API. To use basic authentication with your API key to access the audit logs API, set the HTTP request's `Authorization` header to the string `Basic` followed by a space, then the base64-encoded string in the format `[USERNAME]:[API-KEY]`. In other words, replace the username and API key with your values separated with a `:` character, then base64-encode the result. For example, to authorize as `demo:p@55w0rd`, set the header to `Authorization: Basic ZGVtbzpwQDU1dzByZA==`. ### Exclude PII when fetching audit logs
For [Self-Managed](/platform/hosting/hosting-options/self-managed) and [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud), a W\&B organization or instance admin can exclude PII when fetching audit logs. For [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud), the API endpoint always returns relevant fields for audit logs, including PII. This isn't configurable. To exclude PII, pass the `anonymize=true` URL parameter. For example, to get audit logs for user activity within the last week and exclude PII, if your W\&B instance URL is `https://mycompany.wandb.io`, use an API endpoint like: ```text theme={null} https://mycompany.wandb.io/admin/audit_logs?anonymize=true&[ADDITIONAL-PARAMETERS]. ``` ## Actions Each audit log entry records one of the following actions. Use this reference to interpret the `action` field in a log entry. The following table describes possible actions that W\&B can record, sorted alphabetically. | Action | Definition | | ----------------------------- | --------------------------------------------------------------------------------- | | `artifact:create` | Artifact is created. | | `artifact:delete` | Artifact is deleted. | | `artifact:read` | Artifact is read. | | `project:delete` | Project is deleted. | | `project:read` | Project is read. | | `report:read` | Report is read. 1 | | `run:delete_many` | Batch of runs is deleted. | | `run:delete` | Run is deleted. | | `run:stop` | Run is stopped. | | `run:undelete_many` | Batch of runs is restored from trash. | | `run:update_many` | Batch of runs is updated. | | `run:update` | Run is updated. | | `sweep:create_agent` | Sweep agent is created. | | `team:create_service_account` | Service account is created for the team. | | `team:create` | Team is created. | | `team:delete` | Team is deleted. | | `team:invite_user` | User is invited to team. | | `team:uninvite` | User or service account is uninvited from team. | | `user:create_api_key` | API key for the user or service account is created. 1 | | `user:create` | User is created. 1 | | `user:deactivate` | User is deactivated. 1 | | `user:delete_api_key` | API key for the user or service account is deleted. 1 | | `user:initiate_login` | User initiates log in. 1 | | `user:login` | User logs in. 1 | | `user:logout` | User logs out. 1 | | `user:permanently_delete` | User is permanently deleted. 1 | | `user:reactivate` | User is reactivated. 1 | | `user:read` | User profile is read. 1 | | `user:update` | User is updated. 1 | 1: On [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud), audit logs are not collected for: * Open or Public projects. * The `report:read` action. * `User` actions that aren't tied to a specific organization. # W&B Mobile App (iOS) Source: https://docs.wandb.ai/platform/hosting/monitoring-usage/mobile-app Track training runs, review console logs, view line plots and system metrics, and explore W&B Models projects from your iPhone or iPad. W&B mobile app project panel with metric charts W&B mobile app Manage notifications screen W&B mobile app Projects list and team picker W&B mobile app runs list for a project The W\&B Mobile App for iOS keeps you connected to your W\&B Models projects wherever you go. Use it on your iPhone or iPad to track training runs and read run logs in real time, view line plots and system metrics, explore project histories, and follow your team's progress so you can monitor experiments without opening your laptop. This page describes the app's features and shows you how to set up notifications. The mobile app is only available for [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) accounts. It isn't available for [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [Self-managed](/platform/hosting/hosting-options/self-managed) deployments. ## Download the app Install the app from the App Store to get started. ## Features The following features are available in the mobile app. * **Explore projects**: Browse and search across your W\&B projects. * **Full project history**: Metric charts on project panels include data from *all runs* in the project, not only the latest run. * **Star projects**: See your most important projects at a glance. Click the star icon next to a project to star it. To filter the list to only starred projects, click the **Starred** tab at the top of the list. * **Track experiments**: View run status, metrics, and line plots for your experiments in real time. * **Live updates**: Charts refresh as new data arrives. The **Runs** tab and project run lists update automatically when new runs start. * **Run overview tab**: On each run, open the **Overview** tab for a dedicated summary of run details alongside metrics and logs. Long run config values wrap to multiple lines instead of truncating. * **View system metrics**: On each run, open the **Metrics** tab to view [system metrics](/models/ref/python/experiments/system-metrics) that W\&B logs automatically during training, such as GPU utilization, CPU usage, memory, disk I/O, and network traffic. System metrics appear in a collapsible **System** section alongside your logged metrics. Each metric displays as a line chart with the same tooltip and search behavior as other metrics on the tab. * **Mobile-optimized panel grouping**: Panels are automatically grouped by name into collapsible sections. You can collapse the ungrouped metrics section to focus on named sections. Sections are a single level (not nested), and grouping follows the same rules as [workspace panels](/models/app/features/panels) in the W\&B web app, so the layout stays consistent when you move between desktop and your phone. * **Star panels**: See your most important panels at a glance. When viewing a run or a project, click the star icon at the top of a panel to star it. To filter the list to only starred panels, click the **Starred** tab at the top of the list. * **Search panels**: When viewing a run or a project, use the **panel search** field at the bottom of the screen to filter which runs appear in each chart. You can search with [JavaScript regular expressions](https://www.w3schools.com/js/js_regexp.asp) to match patterns in run names. * **Chart tooltips**: On line charts, tooltips show metric values with up to four decimal places so you can read small changes accurately. * **Stop runs**: When viewing an in-progress run, click the **action ()** menu, then click **Stop run**. * **View console logs**: View [console logs](/models/runs/view-logged-runs#logs) for active or completed runs, or download logs for completed runs. The app polls active runs for new log lines every 5 seconds. The most recent 10,000 lines display by default, and you can scroll backward to view older logs. Log search stays responsive even for large outputs. * **Switch teams**: Use the **team picker** to change teams. * **Stay informed**: Get updates on your experiments without opening your laptop using notifications. **Remove alerts** you no longer need from the **Notifications** tab by swiping them away. ## Set up notifications Configure notifications to receive updates about your experiments on your mobile device. The following sections describe the two kinds of notifications you can configure. ### Metric threshold alerts Use metric threshold alerts to get notified when a metric crosses a value you specify. 1. Navigate to a run. 2. Tap the graph to enter fullscreen view. 3. Tap the bell icon in the top right. 4. Set notifications that trigger when future runs cross the specified metric threshold. ### Run failure alerts Use run failure alerts to get notified when runs in a project fail. 1. Navigate to a project. 2. Tap the **action ()** menu in the top right. 3. Select **Run failed alert** to receive notifications for future run failures in that project. The **Manage notifications** screen lists both kinds of settings in one place, with short descriptions that explain what each section controls. # View organization activity Source: https://docs.wandb.ai/platform/hosting/monitoring-usage/org_dashboard View user status, activity, and usage trends in the W&B organization dashboard across deployment types. This page describes how to view activity in your W\&B organization, including user status, activity trends over time, and how to export user details as a CSV. Organization admins use this information to audit access, monitor adoption, and report on usage. Select your deployment type in each section to continue. ## View user status and activity The following procedure shows how to open the dashboard that lists every user in your organization and interpret each user's status. 1. Navigate to the **Organization Dashboard**: * **Dedicated Cloud**: `https://[ORG-NAME].io/org/dashboard/`. Replace `[ORG-NAME]` with your organization name. * **Self-Managed**: `https://[YOUR-W&B-SERVER-IP]/org/dashboard`. Replace `[YOUR-W&B-SERVER-IP]` with your deployment's IP address. The **Users** tab opens by default and lists every user in the organization. 2. To sort the list by user status, click the **Last Active** column header. Each user's status is one of the following: * **Invite pending**: An invitation was sent but not yet accepted. * **A timestamp**: The user accepted the invitation and has signed in at least once. The timestamp indicates the most recent activity. * **Deactivated**: An admin revoked the user's access. * **No status (hyphen)**: The user was previously active but hasn't been active in the last six months. 3. Hover over a user's **Last Active** field to see the date you added the user and their total active days. A user is *active* if they do any of the following: * Log in to W\&B. * Open any page in the W\&B App. * Log runs. * Use the SDK to track an experiment. * Interact with the W\&B server in any way. 1. Open the [**Members** page](https://wandb.ai/account-settings/wandb/members/). The table lists every user in your organization. 2. Click the **Last Active** column header to sort by user status. Each user's status is one of the following: * **A timestamp**: The user has signed in at least once. The timestamp indicates the most recent activity. * **No status (hyphen)**: The user hasn't yet been active within the organization. A user is *active* if they perform any auditable action scoped to the organization after May 8, 2025. For a full list, see [Actions](/platform/hosting/monitoring-usage/audit-logging#actions) in **Audit logging**. ## View activity over time After you understand individual user status, use the activity views to track how usage in your organization changes over time. Use the **Activity** tab to see how many users have been active during a given period. 1. Open the **Organization Dashboard**. See [View user status and activity](#view-user-status-and-activity). 2. Click **Activity**. 3. Review the following plots: * **Total active users**: Unique active users during the selected period (defaults to 3 months). * **Users active over time**: Fluctuation of active users over the period (defaults to 6 months). Hover over a point to see the exact count on that date. To change the reporting period, use the drop-down above a plot. Options are **Last 30 days**, **Last 3 months**, **Last 6 months**, **Last 12 months**, and **All time**. Use the **Activity Dashboard** to view aggregate activity. 1. Click your user icon in the upper-right corner of the W\&B App. 2. Under **Account**, click **Users**. 3. Above the table of users, review the Activity Panel: * **Active user count**: Unique active users during the selected period (defaults to 3 months). * **Weekly active users**: Users active per week. * **Most active user**: Top-10 users ranked by active days and last-active date. 4. To change the date range, click the date picker in the upper-right corner and choose a new value: **7 days**, **30 days** (the default), **90 days**, **6 months**, or **12 months**. All plots update automatically. ## Export user details When you need to share user details outside the dashboard or analyze them in another tool, export the user list as a CSV. From the **Users** tab you can download a CSV that lists each user's details (username, email address, last-active time, roles, and more). 1. In the **Users** tab, click the **action ()** menu next to **Invite new user**. 2. Click **Export as CSV**. ### Export CSV schema The CSV export uses the comma (`,`) as the separator, encloses strings in double quotes, and includes a header row that defines these columns: * `"Name"` * `"Username"` * `"Org Role"` * `"Models Seat"` * `"Weave Access"` * `"Email"` * `"Teams"` * `"Last Active"` 1. In the **Users** tab, click the **action ()** menu in the upper-right corner. 2. Select **Export as CSV** to download the file. ### Export CSV schema The CSV export uses the comma (`,`) as the separator, encloses strings in double quotes, and includes a header row that defines these columns: * `"Name"` * `"Username"` * `"Last Active"` * `"Role"` * `"Email"` * `"Teams"` * `"Status"` * `"Number of Reports"` * `"Number of Runs"` * `"Number of active days"` * `"Models Seat"` * `"Weave Access"` # Configure Slack alerts Source: https://docs.wandb.ai/platform/hosting/monitoring-usage/slack-alerts Create and configure a Slack application to receive W&B Server alerts, notifications, and monitoring updates. Integrate W\&B Server with [Slack](https://slack.com/) so that your W\&B instance can dispatch alerts and notifications to a Slack workspace your team already uses. This page walks W\&B Server administrators through creating a Slack application, configuring its OAuth scopes and redirect URL, and registering the application with W\&B. Watch a [video demonstrating setting up Slack alerts on W\&B Dedicated Cloud deployment](https://www.youtube.com/watch?v=JmvKb-7u-oU) (6 min). ## Create the Slack application W\&B Server uses a custom Slack application as the bridge for delivering alerts. Follow this procedure to create that application in the Slack workspace where you want to receive notifications. 1. Visit [https://api.slack.com/apps](https://api.slack.com/apps) and select **Create an App**. Create an App button 2. Provide a name for your app in the **App Name** field. 3. Select a Slack workspace where you want to develop your app. Ensure that the Slack workspace you use is the same workspace you intend to use for alerts. App name and workspace selection ## Configure the Slack application After you create the Slack application, grant it the permissions to post messages and accept the redirect from W\&B during the OAuth handshake. 1. On the left sidebar, select **OAuth & Permissions**. OAuth & Permissions menu 2. In the **Scopes** section, grant the bot the `incoming_webhook` scope. Scopes give your app permission to perform actions in your development workspace. For more information about OAuth scopes for bots, see [Understanding OAuth scopes for bots](https://api.slack.com/legacy/oauth-scopes) in the Slack API documentation. Bot token scopes 3. Configure the **Redirect URL** to point to your W\&B installation. Use the same URL that you set as your host URL in your local system settings. You can specify multiple URLs if you have different DNS mappings to your instance. Redirect URLs configuration 4. Select **Save URLs**. 5. Optional: Under **Restrict API Token Usage**, specify an IP or IP range to allowlist for your W\&B instances. Limiting the allowed IP address helps secure your Slack application. ## Register your Slack application with W\&B Register the Slack application you configured with your W\&B instance so that W\&B can use it to dispatch alerts. 1. Navigate to the **System Settings** or **System Console** page of your W\&B instance, depending on your deployment. 2. Depending on the System page you're on, follow one of the following options: * If you're in the **System Console**: go to **Settings** then to **Notifications**. System Console notifications * If you're in the **System Settings**: toggle the **Enable a custom Slack application to dispatch alerts** to enable a custom Slack application. Enable Slack application toggle 3. Supply your **Slack client ID** and **Slack secret**, then select **Save**. To find your application's client ID and secret, navigate to **Basic Information** in **Settings**. 4. To verify that everything works, set up a Slack integration in the W\&B app. Your W\&B Server is now registered with the Slack application and can dispatch alerts to the configured Slack workspace. # Disable automatic updates for W&B Server Source: https://docs.wandb.ai/platform/hosting/self-managed/disable-automatic-app-version-updates Learn how to disable automatic updates for W&B Server. This page shows administrators of Self-Managed W\&B deployments how to disable automatic version upgrades for W\&B Server and pin its version to a specific release. Pinning a version gives you control over when upgrades happen. This control is useful when you need to coordinate upgrades with internal change management processes or validate a release in a staging environment before you roll it out. These instructions work only for deployments managed by the [W\&B Kubernetes Operator](/platform/hosting/self-managed/operator). W\&B supports a major W\&B Server release for 12 months from its initial release date. Customers with **Self-Managed** instances are responsible for upgrading in time to maintain support. Avoid staying on an unsupported version. W\&B recommends that customers with **Self-Managed** instances update their deployments with the latest release at least once per quarter to maintain support and to receive the latest features, performance improvements, and fixes. ## Requirements * W\&B Kubernetes Operator `v1.13.0` or newer * System Console `v2.12.2` or newer To verify that you meet these requirements, refer to the W\&B Custom Resource or Helm chart for your instance. Check the `version` values for the `operator-wandb` and `system-console` components. ## Disable automatic updates To disable automatic updates, use the System Console to pin W\&B Server to a specific version, then check the Operator reconciliation logs to confirm that version pinning is active. 1. Log in to the W\&B App as a user with the `admin` role. 2. Click the user icon at the top, then click **System Console**. 3. Go to **Settings** > **Advanced**, then select the **Other** tab. 4. In the **Disable Auto Upgrades** section, turn on **Pin specific version**. 5. Click the **Select a version** drop-down list, then select a W\&B Server version. 6. Click **Save**. System Console showing Pin specific version turned on with a selected W&B Server version Automatic upgrades are turned off and W\&B Server is pinned at the version you selected. 7. Verify that automatic upgrades are turned off. Go to the **Operator** tab and search the reconciliation logs for the string `Version pinning is enabled`. ```text theme={null} │info 2025-04-17T17:24:16Z wandb default No changes found │info 2025-04-17T17:24:16Z wandb default Active spec found │info 2025-04-17T17:24:16Z wandb default Desired spec │info 2025-04-17T17:24:16Z wandb default License │info 2025-04-17T17:24:16Z wandb default Version Pinning is enabled │info 2025-04-17T17:24:16Z wandb default Found Weights & Biases instance, processing the spec... │info 2025-04-17T17:24:16Z wandb default === Reconciling Weights & Biases instance... ``` # Deploy on Air-Gapped Kubernetes Source: https://docs.wandb.ai/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped Deploy W&B Platform in air-gapped and disconnected Kubernetes environments ## Introduction This guide provides step-by-step instructions to deploy the W\&B Platform in air-gapped, disconnected, or restricted network customer-managed environments. By following this guide, you set up an internal container registry and Helm repository to host W\&B images and charts, install the W\&B Kubernetes Operator, and deploy the W\&B Platform without requiring outbound internet connectivity. This guide targets platform administrators and DevOps engineers who manage Kubernetes infrastructure in regulated or isolated networks. Air-gapped deployments are common in the following environments: * Secure government facilities. * Financial institutions with strict network isolation. * Healthcare organizations with compliance requirements. * Industrial control systems (ICS) environments. * Research facilities with classified networks. Run these commands in a shell console with proper access to the Kubernetes cluster. You can adapt these commands to work with any CI/CD tooling you use to deploy Kubernetes applications. For standard on-premises Kubernetes deployments with internet connectivity, see [Deploy W\&B with Kubernetes Operator](/platform/hosting/self-managed/operator). ## Prerequisites Before starting, ensure your air-gapped environment meets the following requirements. ### Version requirements | Software | Minimum version | | ---------- | ------------------------------------------------------------------------------------------------------------------------------- | | Kubernetes | v1.34 or newer ([Supported Kubernetes versions](https://kubernetes.io/releases/patch-releases/)) | | Helm | v3.x | | MySQL | v8.0.x is required, v8.0.32 or newer; v8.0.44 or newer is recommended.
Aurora MySQL 3.x releases, must be v3.05.2 or newer | | Redis | v7.x | ### SSL/TLS requirements W\&B requires a valid signed SSL/TLS certificate for secure communication between clients and the server. SSL/TLS termination must occur on the ingress/load balancer. The W\&B Server application does not terminate SSL or TLS connections. **Important**: W\&B does not support self-signed certificates and custom CAs. Using self-signed certificates will cause challenges for users and is not supported. If possible, using a service like [Let's Encrypt](https://letsencrypt.org) is a great way to provide trusted certificates to your load balancer. Services like Caddy and Cloudflare manage SSL for you. If your security policies require SSL communication within your trusted networks, consider using a tool like Istio and [side car containers](https://istio.io/latest/docs/reference/config/networking/sidecar/). ### Hardware requirements **CPU Architecture**: W\&B runs on Intel (x86) CPU architecture only. ARM is not supported. **Sizing**: For CPU, memory, and disk sizing recommendations for Kubernetes nodes and MySQL, see the [Sizing section](/platform/hosting/self-managed/ref-arch/#sizing) in the reference architecture. Requirements vary based on whether you're running Models, Weave, or both. ### MySQL database W\&B requires an external MySQL database. For production, W\&B strongly recommends using managed database services: * [AWS RDS Aurora MySQL](https://aws.amazon.com/rds/aurora/) * [Google Cloud SQL for MySQL](https://cloud.google.com/sql/mysql) * [Azure Database for MySQL](https://azure.microsoft.com/en-us/products/mysql/) Managed database services provide automated backups, monitoring, high availability, patching, and reduce operational overhead. See the [reference architecture](/platform/hosting/self-managed/ref-arch/#mysql) for complete MySQL requirements, including sizing recommendations and configuration parameters. For database creation SQL, see the [bare-metal guide](/platform/hosting/self-managed/operator/#mysql-database). For questions about your deployment's database configuration, contact [support](mailto:support@wandb.com) or your AISE. For MySQL configuration parameters for self-managed instances, see the [reference architecture MySQL configuration section](/platform/hosting/self-managed/ref-arch#mysql-configuration-parameters). ### Redis W\&B depends on a single-node Redis 7.x deployment used by W\&B's components for job queuing and data caching. For convenience during testing and development of proofs of concept, W\&B Self-Managed includes a local Redis deployment that is not appropriate for production deployments. For production deployments, W\&B can connect to a Redis instance in the following environments: * [AWS Elasticache](https://aws.amazon.com/elasticache/) * [Google Cloud Memory Store](https://cloud.google.com/memorystore?hl=en) * [Azure Cache for Redis](https://azure.microsoft.com/en-us/products/cache) * Redis deployment hosted in your cloud or on-premise infrastructure ### Object storage W\&B requires object storage with pre-signed URL and CORS support. **Recommended storage providers:** * [Amazon S3](https://aws.amazon.com/s3/): Object storage service offering industry-leading scalability, data availability, security, and performance. * [Google Cloud Storage](https://cloud.google.com/storage): Managed service for storing unstructured data at scale. * [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs): Cloud-based object storage solution for storing massive amounts of unstructured data. * [CoreWeave AI Object Storage](https://docs.coreweave.com/products/storage/object-storage): High-performance, S3-compatible object storage service optimized for AI workloads. * Enterprise S3-compatible storage: [MinIO Enterprise (AIStor)](https://www.min.io/product/aistor), [NetApp StorageGRID](https://www.netapp.com/data-storage/storagegrid/), or other enterprise-grade solutions MinIO Open Source is in [maintenance mode](https://github.com/minio/minio) with no active development or pre-compiled binaries. For production deployments, W\&B recommends using managed object storage services or enterprise S3-compatible solutions such as MinIO Enterprise (AIStor). For detailed bucket provisioning instructions including IAM policies, CORS configuration, and access setup, see the [Bring Your Own Bucket (BYOB) guide](/platform/hosting/data-security/secure-storage-connector). See the [reference architecture object storage section](/platform/hosting/self-managed/ref-arch/#object-storage) for complete requirements. For detailed object storage provisioning guidance, see the [Bring Your Own Bucket (BYOB)](/platform/hosting/data-security/secure-storage-connector) guide. In air-gapped environments, you'll typically use on-premises S3-compatible storage such as MinIO Enterprise, NetApp StorageGRID, or Dell ECS. ### Air-gapped specific requirements In addition to the preceding standard requirements, air-gapped deployments require the following: * **Internal container registry**: Access to a private container registry such as Harbor, JFrog Artifactory, or Nexus, with all required W\&B images. * **Internal Helm repository**: Access to a private Helm chart repository with W\&B Helm charts. * **Image transfer capability**: A method to transfer container images from an internet-connected system to your air-gapped registry. * **License file**: A valid W\&B Enterprise license. To obtain a license (for example, from an internet-connected machine), see the [License](/platform/hosting/self-managed/requirements#license) section on the Requirements page, or contact your W\&B account team. For complete infrastructure requirements, including networking and load balancer configuration, see the [reference architecture](/platform/hosting/self-managed/ref-arch#infrastructure-requirements). ## Prepare your air-gapped environment The following steps prepare your air-gapped environment to host the W\&B container images and Helm charts. Complete these steps before installing the operator or deploying the platform. ### Step 1: Set up internal container registry Because the Kubernetes cluster cannot pull images from public registries, all required container images must be available in your internal air-gapped container registry before deployment. You are responsible for tracking the W\&B Operator's requirements and maintaining your container registry with updated images regularly. For the most current list of required container images and versions, refer to the Helm chart, or contact [W\&B Support](mailto:support@wandb.com) or your assigned W\&B support engineer. #### Core W\&B component containers The following core images are required: * [`docker.io/wandb/controller`](https://hub.docker.com/r/wandb/controller): W\&B Kubernetes Operator. * [`docker.io/wandb/local`](https://hub.docker.com/r/wandb/local): W\&B application server. * [`docker.io/wandb/console`](https://hub.docker.com/r/wandb/console): W\&B management console. * [`docker.io/wandb/megabinary`](https://hub.docker.com/r/wandb/megabinary): W\&B microservices (API, executor, glue, parquet). #### Dependency containers The following third-party dependency images are required: * [`docker.io/bitnamilegacy/redis`](https://hub.docker.com/r/bitnamilegacy/redis): Required for local Redis deployment during testing and development. For production Redis requirements, see the [Redis section](#redis) in Prerequisites. * [`docker.io/otel/opentelemetry-collector-contrib`](https://hub.docker.com/r/otel/opentelemetry-collector-contrib): OpenTelemetry agent for collecting metrics and logs. * [`quay.io/prometheus/prometheus`](https://quay.io/repository/prometheus/prometheus): Prometheus for metrics collection. * [`quay.io/prometheus-operator/prometheus-config-reloader`](https://quay.io/repository/prometheus-operator/prometheus-config-reloader): Prometheus dependency. #### Get the complete image list To extract the complete list of required images and versions from the Helm chart: 1. On an internet-connected system, download the W\&B Helm charts from the [W\&B Helm charts repository](https://github.com/wandb/helm-charts): ```bash theme={null} # Clone the helm-charts repository git clone https://github.com/wandb/helm-charts.git cd helm-charts ``` 2. Inspect the `values.yaml` files to identify all container images and their versions: ```bash theme={null} # Extract image references from the operator chart helm show values charts/operator | grep -E "repository:|tag:" | grep -v "^#" # Extract image references from the platform chart helm show values charts/operator-wandb | grep -E "repository:|tag:" | grep -v "^#" ``` Alternatively, use this command to extract only the repository names (without version tags): ```bash theme={null} helm show values charts/operator-wandb \ | awk -F': *' '/^[[:space:]]*repository:/{print $2}' \ | grep -v "^#" \ | sort -u ``` The list of repositories looks similar to the following: ```text theme={null} wandb/controller wandb/local wandb/console wandb/megabinary wandb/weave-python wandb/weave-trace otel/opentelemetry-collector-contrib prometheus/prometheus prometheus-operator/prometheus-config-reloader bitnamilegacy/redis ``` To get the specific version tags for each image, use the preceding first command (`grep -E "repository:|tag:"`), which shows both repository names and their corresponding version tags. #### Transfer images to air-gapped registry 1. On an internet-connected system, pull and save all required images. Replace version numbers in the following examples with the actual versions from your Helm chart inspection in the preceding step. The versions shown here are examples and become outdated over time. Use shell variables to manage versions consistently: ```bash theme={null} # Set version variables (update these based on your Helm chart versions) CONTROLLER_VERSION="1.13.3" APP_VERSION="0.59.2" CONSOLE_VERSION="2.12.2" # Pull images docker pull wandb/controller:${CONTROLLER_VERSION} docker pull wandb/local:${APP_VERSION} docker pull wandb/console:${CONSOLE_VERSION} docker pull wandb/megabinary:${APP_VERSION} # ... pull all other required images with their versions # Save images to .tar files docker save wandb/controller:${CONTROLLER_VERSION} -o wandb-controller-${CONTROLLER_VERSION}.tar docker save wandb/local:${APP_VERSION} -o wandb-local-${APP_VERSION}.tar docker save wandb/console:${CONSOLE_VERSION} -o wandb-console-${CONSOLE_VERSION}.tar docker save wandb/megabinary:${APP_VERSION} -o wandb-megabinary-${APP_VERSION}.tar # ... save all other images ``` 2. Transfer the `.tar` files to your air-gapped environment using your approved method, such as a USB drive or secure file transfer. 3. In your air-gapped environment, load and push images to your internal registry: ```bash theme={null} # Set the same version variables used above CONTROLLER_VERSION="1.13.3" APP_VERSION="0.59.2" CONSOLE_VERSION="2.12.2" INTERNAL_REGISTRY="registry.yourdomain.com" # Load images docker load -i wandb-controller-${CONTROLLER_VERSION}.tar docker load -i wandb-local-${APP_VERSION}.tar docker load -i wandb-console-${CONSOLE_VERSION}.tar docker load -i wandb-megabinary-${APP_VERSION}.tar # ... load all other images # Tag for internal registry docker tag wandb/controller:${CONTROLLER_VERSION} ${INTERNAL_REGISTRY}/wandb/controller:${CONTROLLER_VERSION} docker tag wandb/local:${APP_VERSION} ${INTERNAL_REGISTRY}/wandb/local:${APP_VERSION} docker tag wandb/console:${CONSOLE_VERSION} ${INTERNAL_REGISTRY}/wandb/console:${CONSOLE_VERSION} docker tag wandb/megabinary:${APP_VERSION} ${INTERNAL_REGISTRY}/wandb/megabinary:${APP_VERSION} # ... tag all other images # Push to internal registry docker push ${INTERNAL_REGISTRY}/wandb/controller:${CONTROLLER_VERSION} docker push ${INTERNAL_REGISTRY}/wandb/local:${APP_VERSION} docker push ${INTERNAL_REGISTRY}/wandb/console:${CONSOLE_VERSION} docker push ${INTERNAL_REGISTRY}/wandb/megabinary:${APP_VERSION} # ... push all other images ``` ### Step 2: Set up internal Helm chart repository With the container images in place, the Kubernetes Operator also needs access to the W\&B Helm charts. Ensure the following Helm charts are available in your internal Helm repository: * [W\&B Operator chart](https://github.com/wandb/helm-charts/tree/main/charts/operator) * [W\&B Platform chart](https://github.com/wandb/helm-charts/tree/main/charts/operator-wandb) 1. On an internet-connected system, download the charts: ```bash theme={null} # Add W&B Helm repository helm repo add wandb https://wandb.github.io/helm-charts helm repo update # Download the charts helm pull wandb/operator --version 1.13.3 helm pull wandb/operator-wandb --version 0.18.0 ``` 2. Transfer the `.tgz` chart files to your air-gapped environment and upload them to your internal Helm repository according to your repository's procedures. The `operator` chart deploys the W\&B Kubernetes Operator (Controller Manager). The `operator-wandb` chart deploys the W\&B Platform using the values configured in the Custom Resource (CR). ### Step 3: Configure Helm repository access Configure your local Helm client in the air-gapped environment to point to your internal repository so that subsequent install commands can locate the charts. 1. In your air-gapped environment, configure Helm to use your internal repository: ```bash theme={null} helm repo add local-repo https://charts.yourdomain.com helm repo update ``` 2. Verify the charts are available: ```bash theme={null} helm search repo local-repo/operator helm search repo local-repo/operator-wandb ``` ## Deploy W\&B in air-gapped environment With your internal registry and Helm repository in place, you can now install the Kubernetes Operator, configure external services, and deploy the W\&B Platform. ### Step 4: Install the Kubernetes Operator The W\&B Kubernetes Operator (controller manager) manages the W\&B Platform components. To install it in an air-gapped environment, configure it to use your internal container registry. 1. Create a `values.yaml` file with the following content: ```yaml theme={null} image: repository: registry.yourdomain.com/wandb/controller tag: 1.13.3 airgapped: true ``` Replace the repository and tag with the actual versions you transferred to your internal registry in Step 1. The version shown here (`1.13.3`) is an example and becomes outdated over time. 2. Install the operator and Custom Resource Definition (CRD): ```bash theme={null} helm upgrade --install operator local-repo/operator \ --namespace wandb \ --create-namespace \ --values values.yaml ``` 3. Verify the operator is running: ```bash theme={null} kubectl get pods -n wandb ``` You should see the operator pod in a `Running` state. The W\&B Kubernetes Operator is now installed and ready to deploy the W\&B Platform from your internal chart repository. For full details about supported values, refer to the [Kubernetes operator GitHub repository values file](https://github.com/wandb/helm-charts/blob/main/charts/operator/values.yaml). ### Step 5: Set up MySQL database Before configuring the W\&B Custom Resource, set up an external MySQL database. For production deployments, W\&B strongly recommends using managed database services where available. However, if you're running your own MySQL instance, create the database and user: Create a database and a user with the following SQL commands. Replace `SOME_PASSWORD` with a secure password of your choice: ```sql theme={null} CREATE USER 'wandb_local'@'%' IDENTIFIED BY 'SOME_PASSWORD'; CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION; ``` For MySQL configuration parameters, see the [reference architecture MySQL configuration section](/platform/hosting/self-managed/ref-arch#mysql-configuration-parameters). ### Step 6: Configure W\&B Custom Resource After installing the W\&B Kubernetes Operator, configure the Custom Resource (CR) to point to your internal Helm repository and container registry. This configuration ensures the Kubernetes operator uses your internal registry and repository when deploying the required components of the W\&B Platform, instead of attempting to reach public sources. The following example configuration includes image version tags that become outdated over time. Replace all `tag:` values with the actual versions you transferred to your internal registry in Step 1. Create a file named `wandb.yaml` with the following content: ```yaml theme={null} apiVersion: apps.wandb.com/v1 kind: WeightsAndBiases metadata: labels: app.kubernetes.io/instance: wandb app.kubernetes.io/name: weightsandbiases name: wandb namespace: wandb spec: chart: url: https://charts.yourdomain.com name: operator-wandb version: 0.18.0 values: global: host: https://wandb.yourdomain.com license: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx bucket: accessKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx secretKey: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx name: s3.yourdomain.com:9000 path: wandb provider: s3 region: us-east-1 mysql: database: wandb host: mysql.yourdomain.com password: [YOUR-MYSQL-PASSWORD] port: 3306 user: wandb redis: host: redis.yourdomain.com port: 6379 password: [YOUR-REDIS-PASSWORD] api: enabled: true glue: enabled: true executor: enabled: true extraEnv: ENABLE_REGISTRY_UI: 'true' # Configure all component images to use internal registry app: image: repository: registry.yourdomain.com/wandb/local tag: 0.59.2 console: image: repository: registry.yourdomain.com/wandb/console tag: 2.12.2 api: image: repository: registry.yourdomain.com/wandb/megabinary tag: 0.59.2 executor: image: repository: registry.yourdomain.com/wandb/megabinary tag: 0.59.2 glue: image: repository: registry.yourdomain.com/wandb/megabinary tag: 0.59.2 parquet: image: repository: registry.yourdomain.com/wandb/megabinary tag: 0.59.2 weave: image: repository: registry.yourdomain.com/wandb/weave-python tag: 0.59.2 otel: image: repository: registry.yourdomain.com/otel/opentelemetry-collector-contrib tag: 0.97.0 prometheus: server: image: repository: registry.yourdomain.com/prometheus/prometheus tag: v2.47.0 configmapReload: prometheus: image: repository: registry.yourdomain.com/prometheus-operator/prometheus-config-reloader tag: v0.67.0 ingress: annotations: nginx.ingress.kubernetes.io/proxy-body-size: "0" class: nginx ``` Replace all placeholder values, such as hostnames, passwords, and tags, with your actual configuration values. The preceding example shows the most commonly used components. Depending on your deployment needs, you may also need to configure image repositories for additional components such as the following: * `settingsMigrationJob` * `weave-trace` * `filestream` * `flat-runs-table` Refer to the [W\&B Helm repository values file](https://github.com/wandb/helm-charts/blob/main/charts/operator-wandb/values.yaml) for the complete list of configurable components. ### Step 7: Deploy the W\&B Platform Applying the Custom Resource triggers the operator to install the W\&B Platform components defined in the `operator-wandb` chart, using the configuration and image references from `wandb.yaml`. 1. Apply the W\&B Custom Resource to deploy the platform: ```bash theme={null} kubectl apply -f wandb.yaml ``` 2. Monitor the deployment progress: ```bash theme={null} # Watch pods being created kubectl get pods -n wandb --watch # Check deployment status kubectl get weightsandbiases -n wandb # View operator logs kubectl logs -n wandb deployment/wandb-operator-controller-manager ``` The deployment may take several minutes as the operator creates all necessary components. ## OpenShift configuration W\&B supports deployment on air-gapped OpenShift Kubernetes clusters. OpenShift deployments require additional security context configurations because of OpenShift's stricter security policies. If you're deploying on OpenShift, apply the configurations in this section in addition to the preceding steps. ### OpenShift security context constraints OpenShift uses Security Context Constraints (SCCs) to control pod permissions. By default, OpenShift assigns the `restricted` SCC to pods, which prevents running as root and requires specific user IDs. #### Option 1: Use restricted SCC (recommended) Configure W\&B components to run with the restricted SCC by setting appropriate security contexts in your Custom Resource: ```yaml theme={null} spec: values: # Configure security contexts for all pods app: podSecurityContext: fsGroup: 1000 runAsUser: 1000 runAsNonRoot: true securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true seccompProfile: type: RuntimeDefault console: podSecurityContext: fsGroup: 1000 runAsUser: 1000 runAsNonRoot: true securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true seccompProfile: type: RuntimeDefault # Repeat for other components: api, executor, glue, parquet, weave ``` #### Option 2: Create custom SCC (if required) If your deployment requires capabilities not available in the `restricted` SCC, create a custom SCC: ```yaml theme={null} apiVersion: security.openshift.io/v1 kind: SecurityContextConstraints metadata: name: wandb-scc allowHostDirVolumePlugin: false allowHostIPC: false allowHostNetwork: false allowHostPID: false allowHostPorts: false allowPrivilegeEscalation: false allowPrivilegedContainer: false allowedCapabilities: [] defaultAddCapabilities: [] fsGroup: type: MustRunAs ranges: - min: 1000 max: 65535 readOnlyRootFilesystem: false requiredDropCapabilities: - ALL runAsUser: type: MustRunAsRange uidRangeMin: 1000 uidRangeMax: 65535 seLinuxContext: type: MustRunAs supplementalGroups: type: RunAsAny volumes: - configMap - downwardAPI - emptyDir - persistentVolumeClaim - projected - secret ``` 1. Apply the SCC: ```bash theme={null} oc apply -f wandb-scc.yaml ``` 2. Bind the SCC to the W\&B service accounts: ```bash theme={null} oc adm policy add-scc-to-user wandb-scc -z wandb-app -n wandb oc adm policy add-scc-to-user wandb-scc -z wandb-console -n wandb ``` ### OpenShift routes OpenShift uses Routes instead of standard Kubernetes Ingress. Configure W\&B to use OpenShift Routes: ```yaml theme={null} spec: values: ingress: enabled: false route: enabled: true host: wandb.apps.openshift.yourdomain.com tls: enabled: true termination: edge insecureEdgeTerminationPolicy: Redirect ``` ### OpenShift image pull configuration If your OpenShift cluster uses an internal image registry with authentication: 1. Create an image pull secret: ```bash theme={null} kubectl create secret docker-registry wandb-registry-secret \ --docker-server=registry.yourdomain.com \ --docker-username=[USERNAME] \ --docker-password=[PASSWORD] \ --namespace=wandb ``` 2. Reference the secret in your Custom Resource: ```yaml theme={null} spec: values: imagePullSecrets: - name: wandb-registry-secret ``` ### OpenShift complete example The following example shows a complete CR for OpenShift air-gapped deployment: Replace all `tag:` values in this example with the actual versions you transferred to your internal registry in Step 1. The versions shown are examples and become outdated over time. ```yaml theme={null} apiVersion: apps.wandb.com/v1 kind: WeightsAndBiases metadata: name: wandb namespace: wandb spec: chart: url: https://charts.yourdomain.com name: operator-wandb version: 0.18.0 values: global: host: https://wandb.apps.openshift.yourdomain.com license: [YOUR-LICENSE] bucket: accessKey: [YOUR-ACCESS-KEY] secretKey: [YOUR-SECRET-KEY] name: s3.yourdomain.com:9000 path: wandb provider: s3 region: us-east-1 mysql: database: wandb host: mysql.yourdomain.com password: [YOUR-MYSQL-PASSWORD] port: 3306 user: wandb redis: host: redis.yourdomain.com port: 6379 password: [YOUR-REDIS-PASSWORD] # OpenShift-specific: Use Routes instead of Ingress ingress: enabled: false route: enabled: true host: wandb.apps.openshift.yourdomain.com tls: enabled: true termination: edge # Image pull secret for internal registry imagePullSecrets: - name: wandb-registry-secret # Security contexts for OpenShift restricted SCC app: image: repository: registry.yourdomain.com/wandb/local tag: 0.59.2 podSecurityContext: fsGroup: 1000 runAsUser: 1000 runAsNonRoot: true securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true seccompProfile: type: RuntimeDefault console: image: repository: registry.yourdomain.com/wandb/console tag: 2.12.2 podSecurityContext: fsGroup: 1000 runAsUser: 1000 runAsNonRoot: true securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL runAsNonRoot: true seccompProfile: type: RuntimeDefault # Repeat security contexts for: api, executor, glue, parquet, weave # (abbreviated for clarity) ``` Contact [W\&B Support](mailto:support@wandb.com) or your assigned W\&B support engineer for comprehensive OpenShift configuration examples tailored to your security requirements. ## Verify your installation After deploying W\&B, verify the installation is working correctly so that you can confirm the platform is reachable, pods are healthy, and the deployment is using only your internal resources. Follow the general verification steps, then complete the additional air-gapped checks in the following section. To verify the installation, W\&B recommends using the [W\&B CLI](/models/ref/cli/). The verify command executes several tests that verify all components and configurations. This step assumes that the first admin user account is created with the browser. Follow these steps to verify the installation: 1. Install the W\&B CLI: ```bash theme={null} pip install wandb ``` 2. Log in to W\&B: ```bash theme={null} wandb login --host=https://YOUR_DNS_DOMAIN ``` For example: ```bash theme={null} wandb login --host=https://wandb.company-name.com ``` 3. Verify the installation: ```bash theme={null} wandb verify ``` A successful installation and fully working W\&B deployment shows the following output: ```console theme={null} Default host selected: https://wandb.company-name.com Find detailed logs for this test at: /var/folders/pn/b3g3gnc11_sbsykqkm3tx5rh0000gp/T/tmpdtdjbxua/wandb Checking if logged in...................................................✅ Checking signed URL upload..............................................✅ Checking ability to send large payloads through proxy...................✅ Checking requests to base url...........................................✅ Checking requests made over signed URLs.................................✅ Checking CORs configuration of the bucket...............................✅ Checking wandb package version is up to date............................✅ Checking logged metrics, saving and downloading a file..................✅ Checking artifact save and download workflows...........................✅ ``` Contact W\&B Support if you encounter errors. ### Additional air-gapped verification For air-gapped deployments, also verify: 1. **Image pull**: Confirm all pods successfully pulled images from your internal registry: ```bash theme={null} kubectl get pods -n wandb -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.phase}{"\t"}{.status.containerStatuses[*].image}{"\n"}{end}' ``` All images should point to your internal registry and all pods should be in `Running` state. 2. **External connectivity**: Verify W\&B is not attempting external connections (it shouldn't in air-gapped mode): ```bash theme={null} kubectl logs -n wandb deployment/wandb-app --tail=100 | grep -i "connection" ``` 3. **License validation**: Access the W\&B console and verify your license is active. ## Troubleshooting ### Image pull errors If pods fail to pull images, check the following: * Verify images exist in your internal registry. * Check that the image pull secret is correctly configured. * Verify network connectivity from Kubernetes nodes to the registry. * Check registry authentication credentials. To test an image pull manually: ```bash theme={null} kubectl run test-pull --image=registry.yourdomain.com/wandb/local:0.59.2 --namespace=wandb kubectl logs test-pull -n wandb kubectl delete pod test-pull -n wandb ``` ### OpenShift SCC errors If pods fail with permission errors on OpenShift: ```bash theme={null} # Check which SCC is being used oc get pod [POD-NAME] -n wandb -o yaml | grep scc # Check service account permissions oc describe scc wandb-scc oc get rolebinding -n wandb ``` ### Helm chart not found If the operator can't find the platform chart, check the following: * Verify the chart repository URL in the Custom Resource. * Check that the operator pod can reach your internal Helm repository. * Verify the chart exists in your repository: ```bash theme={null} helm search repo local-repo/operator-wandb ``` ## Frequently asked questions ### Can I use a different ingress class? Yes, configure your ingress class by modifying the ingress settings in your Custom Resource: ```yaml theme={null} spec: values: ingress: class: your-ingress-class ``` ### How do I handle certificate bundles with multiple certificates? Split the certificates into multiple entries in the `customCACerts` section: ```yaml theme={null} spec: values: customCACerts: cert1.crt: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- cert2.crt: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` ### How do I prevent automatic updates? To configure the operator to not automatically update W\&B, do the following: * Set `airgapped: true` in the operator installation (this disables automatic update checks). * Control version updates by manually updating the `spec.chart.version` in your Custom Resource. * Optionally, disable automatic updates from the W\&B System Console. See [Disable automatic app version updates](/platform/hosting/self-managed/disable-automatic-app-version-updates) for more details. W\&B strongly recommends customers with Self-Managed instances update their deployments with the latest release at minimum once per quarter to maintain support and receive the latest features, performance improvements, and fixes. W\&B supports a major release for 12 months from its initial release date. Refer to [Release policies and processes](/release-notes/release-policies). ### Does the deployment work with no connection to public repositories? Yes. When `airgapped: true` is set in the operator configuration, the Kubernetes operator uses only your internal resources and doesn't attempt to connect to public repositories. ### How do I update W\&B in an air-gapped environment? To update W\&B: 1. Pull new container images on an internet-connected system. 2. Transfer images to your air-gapped registry. 3. Upload new Helm charts to your internal repository. 4. Update the `spec.chart.version` and image tags in your Custom Resource. 5. Apply the updated Custom Resource. The operator performs a rolling update of the W\&B components. ## Next steps After successful deployment, complete the following tasks: * **Configure user authentication**: Set up [SSO](/platform/hosting/iam/sso) or other authentication methods. * **Set up monitoring**: Configure monitoring for your W\&B instance and infrastructure. * **Plan for updates**: Review the [Server upgrade process](/platform/hosting/server-upgrade-process) and establish an update cadence. * **Configure backups**: Establish backup procedures for your MySQL database. * **Document your process**: Create runbooks for your specific air-gapped update procedures. ## Get help If you encounter issues during deployment: * Review the [Reference Architecture](/platform/hosting/self-managed/ref-arch) for infrastructure guidance. * Check the [Operator guide](/platform/hosting/self-managed/operator) for configuration details. * Contact [W\&B Support](mailto:support@wandb.com) or your assigned W\&B support engineer. * For OpenShift-specific issues, reference Red Hat OpenShift documentation. # Deploy W&B with Kubernetes Operator Source: https://docs.wandb.ai/platform/hosting/self-managed/operator Deploy W&B Platform with Kubernetes Operator on cloud or on-premises ## Overview This page shows platform administrators how to deploy and manage W\&B Server on Kubernetes (cloud or on-premises) using the W\&B Kubernetes Operator. By the end, you have a running W\&B Server installation that the operator manages and upgrades automatically. Use this guide if you self-manage a W\&B deployment and need an installation method that works across cloud, on-premises, and air-gapped environments. The W\&B Kubernetes Operator is the recommended way to deploy W\&B Server on Kubernetes (cloud or on-premises). For an overview of the operator, why W\&B uses it, and how configuration hierarchy works, see [Self-Managed](/platform/hosting/hosting-options/self-managed#about-the-wb-kubernetes-operator). ## Before you begin Before deploying W\&B with the Kubernetes Operator, ensure your infrastructure meets all requirements: 1. **Review infrastructure requirements**: See the [Self-Managed infrastructure requirements](/platform/hosting/self-managed/requirements/) page for details on: * Software version requirements (Kubernetes, MySQL, Redis, Helm) * Hardware requirements (CPU architecture, sizing recommendations) * Kubernetes cluster configuration * Networking, SSL/TLS, and DNS requirements 2. **Obtain a W\&B Server license**: See the [License](/platform/hosting/self-managed/requirements#license) section on the Requirements page. 3. **Provision external services**: Set up MySQL, Redis, and object storage before deployment. For additional context, see the [reference architecture](/platform/hosting/self-managed/ref-arch/) page. ### MySQL database W\&B requires an external MySQL database. For production, W\&B strongly recommends using managed database services: * [AWS RDS Aurora MySQL](https://aws.amazon.com/rds/aurora/) * [Google Cloud SQL for MySQL](https://cloud.google.com/sql/mysql) * [Azure Database for MySQL](https://azure.microsoft.com/en-us/products/mysql/) Managed database services provide automated backups, monitoring, high availability, patching, and reduce operational overhead. See the [reference architecture](/platform/hosting/self-managed/ref-arch/#mysql) for complete MySQL requirements, including sizing recommendations and configuration parameters. For database creation SQL, see the [bare-metal guide](/platform/hosting/self-managed/operator/#mysql-database). For questions about your deployment's database configuration, contact [support](mailto:support@wandb.com) or your AISE. For complete MySQL setup instructions including configuration parameters and database creation, see the [MySQL section in the requirements page](/platform/hosting/self-managed/requirements/#mysql-database). ### Redis W\&B depends on a single-node Redis 7.x deployment used by W\&B's components for job queuing and data caching. For convenience during testing and development of proofs of concept, W\&B Self-Managed includes a local Redis deployment that is not appropriate for production deployments. For production deployments, W\&B can connect to a Redis instance in the following environments: * [AWS Elasticache](https://aws.amazon.com/elasticache/) * [Google Cloud Memory Store](https://cloud.google.com/memorystore?hl=en) * [Azure Cache for Redis](https://azure.microsoft.com/en-us/products/cache) * Redis deployment hosted in your cloud or on-premise infrastructure See the [External Redis configuration section](#external-redis) for details on how to configure an external Redis instance in Helm values. ### Object storage W\&B requires object storage with pre-signed URL and CORS support. **Recommended storage providers:** * [Amazon S3](https://aws.amazon.com/s3/): Object storage service offering industry-leading scalability, data availability, security, and performance. * [Google Cloud Storage](https://cloud.google.com/storage): Managed service for storing unstructured data at scale. * [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs): Cloud-based object storage solution for storing massive amounts of unstructured data. * [CoreWeave AI Object Storage](https://docs.coreweave.com/products/storage/object-storage): High-performance, S3-compatible object storage service optimized for AI workloads. * Enterprise S3-compatible storage: [MinIO Enterprise (AIStor)](https://www.min.io/product/aistor), [NetApp StorageGRID](https://www.netapp.com/data-storage/storagegrid/), or other enterprise-grade solutions MinIO Open Source is in [maintenance mode](https://github.com/minio/minio) with no active development or pre-compiled binaries. For production deployments, W\&B recommends using managed object storage services or enterprise S3-compatible solutions such as MinIO Enterprise (AIStor). For detailed bucket provisioning instructions including IAM policies, CORS configuration, and access setup, see the [Bring Your Own Bucket (BYOB) guide](/platform/hosting/data-security/secure-storage-connector). See the [reference architecture object storage section](/platform/hosting/self-managed/ref-arch/#object-storage) for complete requirements. ### Provision your storage bucket Before configuring W\&B, provision your object storage bucket with proper IAM policies, CORS configuration, and access credentials. **See the [Bring Your Own Bucket (BYOB) guide](/platform/hosting/data-security/secure-storage-connector) for detailed step-by-step provisioning instructions for:** * Amazon S3 (including IAM policies and bucket policies) * Google Cloud Storage (including PubSub notifications) * Azure Blob Storage (including managed identities) * CoreWeave AI Object Storage * S3-compatible storage (MinIO Enterprise, NetApp StorageGRID, and other enterprise solutions) See the [Object storage configuration section](#object-storage-bucket) for details on how to configure object storage in Helm values. ### OpenShift Kubernetes clusters W\&B supports deployment on [OpenShift Kubernetes clusters](https://www.redhat.com/en/technologies/cloud-computing/openshift) in cloud, on-premises, and air-gapped environments. W\&B recommends you install with the official W\&B Helm chart. #### Run the container as an un-privileged user OpenShift and similar orchestrators often reject containers that run as root, so W\&B containers must be configured to run as a non-root user that still belongs to the root group. By default, containers use a `$UID` of 999. Specify `$UID` >= 100000 and a `$GID` of 0 if your orchestrator requires the container run with a non-root user. W\&B must start as the root group (`$GID=0`) for file system permissions to function properly. Configure security contexts for each W\&B component. For example, to configure the API component: ```yaml theme={null} api: install: true image: repository: wandb/megabinary tag: 0.74.1 # Replace with your actual version pod: securityContext: fsGroup: 10001 fsGroupChangePolicy: Always runAsGroup: 0 runAsNonRoot: true runAsUser: 10001 seccompProfile: type: RuntimeDefault container: securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL privileged: false readOnlyRootFilesystem: false ``` If needed, configure a custom security context for other components like `app` or `console`. For details, see [Custom security context](#custom-security-context). ## Deploy W\&B Server application **The W\&B Kubernetes Operator with Helm is the recommended installation method** for all W\&B self-managed deployments, including cloud, on-premises, and air-gapped environments. Choose your deployment method: W\&B provides a Helm chart to deploy the W\&B Kubernetes Operator to a Kubernetes cluster. This approach lets you deploy W\&B Server with Helm CLI or a continuous delivery tool like ArgoCD. For deployment-specific considerations, see [Environment-specific considerations](#environment-specific-considerations) and [Deploy with Terraform on public cloud](#deploy-with-terraform-on-public-cloud). For disconnected environments, see [Deploy on Air-Gapped Kubernetes](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped/). Follow these steps to install the W\&B Kubernetes Operator with Helm CLI: 1. Add the W\&B Helm repository. The W\&B Helm chart is available in the W\&B Helm repository: ```shell theme={null} helm repo add wandb https://charts.wandb.ai helm repo update ``` 2. Install the Operator on a Kubernetes cluster: ```shell theme={null} helm upgrade --install operator wandb/operator -n wandb-cr --create-namespace ``` 3. Configure the W\&B operator custom resource to trigger the W\&B Server installation. Create a file named `operator.yaml` with your W\&B deployment configuration. Refer to [Configuration Reference](#configuration-reference-for-wb-server) for all available options. Here's a minimal example configuration: ```yaml theme={null} apiVersion: apps.wandb.com/v1 kind: WeightsAndBiases metadata: labels: app.kubernetes.io/name: weightsandbiases app.kubernetes.io/instance: wandb name: wandb namespace: default spec: values: global: host: https:// license: eyJhbGnUzaH...j9ZieKQ2x5GGfw bucket:
mysql: ingress: annotations: ``` 4. Start the Operator with your custom configuration so that it can install, configure, and manage the W\&B Server application: ```shell theme={null} kubectl apply -f operator.yaml ``` Wait until the deployment completes. This takes a few minutes. 5. To verify the installation using the web UI, create the first admin user account, then follow the verification steps outlined in [Verify the installation](#verify-the-installation). After these steps complete, you have a W\&B Kubernetes Operator running in the `wandb-cr` namespace and a W\&B Server application that the operator manages from your `operator.yaml` custom resource. Deploy W\&B using Terraform for infrastructure-as-code deployments. Choose between: * **Helm Terraform Module**: Deploys the operator to existing Kubernetes infrastructure. * **Cloud Terraform Modules**: Complete infrastructure plus application deployment for AWS, Google Cloud, and Azure. For deployment-specific considerations, see [Environment-specific considerations](#environment-specific-considerations) and [Deploy with Terraform on public cloud](#deploy-with-terraform-on-public-cloud). For disconnected environments, see [Deploy on Air-Gapped Kubernetes](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped/). #### Helm Terraform Module This method supports customized deployments tailored to specific requirements, using Terraform's infrastructure-as-code approach for consistency and repeatability. The official W\&B [Helm-based Terraform Module](https://registry.terraform.io/modules/wandb/wandb/helm/latest) is available on the Terraform Registry. Use the following code as a starting point. It includes all necessary configuration options for a production grade deployment: ```hcl theme={null} module "wandb" { source = "wandb/wandb/helm" spec = { values = { global = { host = "https://" license = "eyJhbGnUzaH...j9ZieKQ2x5GGfw" bucket = {
} mysql = { } } ingress = { annotations = { "a" = "b" "x" = "y" } } } } } ``` The configuration options are the same as described in [Configuration Reference](#configuration-reference-for-wb-server), but the syntax must follow the HashiCorp Configuration Language (HCL). The Terraform module creates the W\&B custom resource definition (CRD). To see how W\&B themselves use the Helm Terraform module to deploy Dedicated Cloud installations for customers, follow these links: * [AWS](https://github.com/wandb/terraform-aws-wandb/blob/45e1d746f53e78e73e68f911a1f8cad5408e74b6/main.tf#L225) * [Azure](https://github.com/wandb/terraform-azurerm-wandb/blob/170e03136b6b6fc758102d59dacda99768854045/main.tf#L155) * [Google Cloud](https://github.com/wandb/terraform-google-wandb/blob/49ddc3383df4cefc04337a2ae784f57ce2a2c699/main.tf#L189) #### Cloud Terraform Modules W\&B provides a set of Terraform Modules for AWS, Google Cloud, and Azure. These modules deploy the complete infrastructure, including Kubernetes clusters, load balancers, and MySQL databases, along with the W\&B Server application. The W\&B Kubernetes Operator is included in these official W\&B cloud-specific Terraform Modules with the following versions: | Terraform Registry | Source Code | Version | | ------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | ------- | | [AWS](https://registry.terraform.io/modules/wandb/wandb/aws/latest) | [https://github.com/wandb/terraform-aws-wandb](https://github.com/wandb/terraform-aws-wandb) | v4.0.0+ | | [Azure](https://github.com/wandb/terraform-azurerm-wandb) | [https://github.com/wandb/terraform-azurerm-wandb](https://github.com/wandb/terraform-azurerm-wandb) | v2.0.0+ | | [Google Cloud](https://github.com/wandb/terraform-google-wandb) | [https://github.com/wandb/terraform-google-wandb](https://github.com/wandb/terraform-google-wandb) | v2.0.0+ | These modules install the W\&B Kubernetes Operator as part of the deployment, so you can use it to manage W\&B Server in your cloud environment without additional setup. For detailed instructions on using these cloud-specific modules, see [Deploy with Terraform on public cloud](#deploy-with-terraform-on-public-cloud). ### Verify the installation To verify the installation, W\&B recommends using the [W\&B CLI](/models/ref/cli/). The verify command executes several tests that verify all components and configurations. This step assumes that the first admin user account is created with the browser. Follow these steps to verify the installation: 1. Install the W\&B CLI: ```bash theme={null} pip install wandb ``` 2. Log in to W\&B: ```bash theme={null} wandb login --host=https://YOUR_DNS_DOMAIN ``` For example: ```bash theme={null} wandb login --host=https://wandb.company-name.com ``` 3. Verify the installation: ```bash theme={null} wandb verify ``` A successful installation and fully working W\&B deployment shows the following output: ```console theme={null} Default host selected: https://wandb.company-name.com Find detailed logs for this test at: /var/folders/pn/b3g3gnc11_sbsykqkm3tx5rh0000gp/T/tmpdtdjbxua/wandb Checking if logged in...................................................✅ Checking signed URL upload..............................................✅ Checking ability to send large payloads through proxy...................✅ Checking requests to base url...........................................✅ Checking requests made over signed URLs.................................✅ Checking CORs configuration of the bucket...............................✅ Checking wandb package version is up to date............................✅ Checking logged metrics, saving and downloading a file..................✅ Checking artifact save and download workflows...........................✅ ``` Contact W\&B Support if you encounter errors. ## Enable the MCP server The [W\&B MCP Server](/platform/mcp-server) ships as an optional subchart in `operator-wandb`. When enabled, the operator deploys an in-cluster MCP server exposed through your existing ingress at `/mcp`, so any MCP-compatible client can connect using a W\&B API key. This is the same server W\&B runs as the hosted offering at `https://mcp.withwandb.com/mcp`, but pointed at your deployment's data. For end-user client configuration and the tool catalog, see [Use the W\&B MCP server](/platform/mcp-server). This section only covers the operator-side enablement. ### Prerequisites Make sure your deployment meets the following requirements before you enable the MCP server: * **Chart version**: `operator-wandb` `0.42.3` or later. The `mcp-server` subchart was introduced in `0.42.1`, but the Datadog and privacy fields used in the following example were added later. * **Weave Traces enabled**: the MCP server depends on Weave Traces for trace tools and for the `WF_TRACE_SERVER_URL` default. Set `weave-trace.install: true`. If Weave Traces isn't enabled, the Helm render fails with `mcp-server requires weave-trace.install=true`. * **Reachable ingress**: `global.host` must already resolve and route to the W\&B ingress. The MCP pod reads `WANDB_BASE_URL` from `global.host` and is available at `/mcp`. * **Node capacity**: the MCP pod requests `500m` CPU and `1Gi` memory by default (limits `2` CPU and `4Gi` memory). Confirm your node pool has enough headroom before you enable the subchart. ### Enable the subchart Enable the `mcp-server` subchart so that the operator deploys an in-cluster MCP server and extends your existing W\&B ingress with a `/mcp` route. Add the following to the `spec.values` block of your existing `WeightsAndBiases` custom resource (CR), alongside your existing `global`, `ingress`, and other overrides. The Datadog block is optional, but recommended when a Datadog Agent DaemonSet already collects pod logs and traces in your cluster. ```yaml theme={null} spec: values: weave-trace: install: true mcp-server: install: true image: repository: us-docker.pkg.dev/wandb-production/public/wandb/mcp-server tag: "0.3.3" datadog: enabled: true mode: "agent" service: "wandb-mcp-server-" env: "" deploymentType: "self-managed" customer: "" extraTags: - "region:" - "tier:" privacy: logLevel: "standard" ``` Configure each block: * **`weave-trace.install: true`**: required unless you set `mcp-server.env.WF_TRACE_SERVER_URL` yourself. * **`datadog.mode: "agent"`**: use for Kubernetes deployments where the Datadog Agent DaemonSet owns log and trace collection. In agent mode, the MCP pod doesn't need a Datadog API key. * **`datadog.service`, `env`, `deploymentType`, `customer`, `extraTags`**: set these to match your deployment's observability naming conventions. Set `customer` to an empty string if you don't want a customer tag. * **`privacy.logLevel`**: use `"standard"` for most self-managed Kubernetes installations. This redacts free-text parameter values in logs while preserving deployment identifiers that operators commonly use for debugging. Use `"strict"` when entity, project, run, or user identifiers should not remain in plaintext logs. Use `"off"` only when you explicitly want plaintext logging for those values. Apply the change to trigger reconciliation: ```bash theme={null} kubectl apply -f operator.yaml ``` The operator creates a `wandb-mcp-server` deployment and service in the release namespace, and extends the W\&B ingress with a `/mcp` path. ### Verify the MCP server Wait for the pod to become `Running`, then check the health endpoint in-cluster and through the ingress: ```bash theme={null} kubectl get pod -l app.kubernetes.io/component=mcp-server kubectl port-forward svc/wandb-mcp-server 8080:8080 curl -s http://localhost:8080/mcp/health curl -s "https:///mcp/health" ``` Both requests should return `200 OK`. The in-cluster check confirms the pod is healthy. The ingress check confirms routing. If the in-cluster check returns `200 OK` but the ingress check returns `404 Not Found`, see [Troubleshooting](#troubleshooting). If you enabled Datadog, MCP server logs should also appear in Datadog with the configured `mcp-server.datadog.service` and `mcp-server.datadog.env` values. ### Connect a client After the MCP server is healthy, configure your MCP client to use `https:///mcp` with a W\&B API key as the bearer token. For IDE and agent configurations, see [Use the W\&B MCP server](/platform/mcp-server). ### Troubleshooting | Symptom | Cause and fix | | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `helm render` fails with `mcp-server requires weave-trace.install=true` | Add `weave-trace.install: true` to `spec.values`. The MCP server depends on Weave Traces for trace tools. | | `wandb-mcp-server` pod stuck in `Pending` with `Insufficient cpu` or `Insufficient memory` | Add node capacity, or lower `mcp-server.resources.requests` in your CR. Defaults are `500m` CPU and `1Gi` memory. | | `curl https:///mcp/health` returns 404 | The chart renders the `/mcp` ingress path only when `mcp-server.install: true`. Reapply the CR and wait for the ingress controller to propagate the new path. | | MCP logs don't appear in Datadog | Confirm `mcp-server.datadog.enabled: true`, `mcp-server.datadog.mode: "agent"`, and that the Datadog Agent DaemonSet collects pod stdout. Search Datadog with the configured `service` and `env` values. | | MCP logs include more user-supplied text than expected | Set `mcp-server.privacy.logLevel` to `"standard"` or `"strict"`. Use `"strict"` when identifiers such as entity, project, run, or user names should not remain in plaintext logs. | | `wandb-mcp-server` pod in `ImagePullBackOff` in an air-gapped or mirrored cluster | Mirror the image to your registry and override `mcp-server.image.repository` in your CR, the same pattern used for other W\&B component images in air-gapped installs. See [Deploy on Air-Gapped Kubernetes](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped/). | ## Environment-specific considerations Kubernetes is the same whether it runs on-premises or in the cloud. The main differences are in naming and managed services (for example, MySQL compared to RDS, or S3 compared to on-premises object storage). This section covers considerations that vary by environment. ### On-premises and bare metal When deploying on on-premises or bare-metal Kubernetes, pay attention to the following. #### Load balancer configuration On-premises Kubernetes clusters typically require manual load balancer configuration. Options include: * **External load balancer**: Configure an existing hardware or software load balancer, such as F5 or HAProxy. * **Nginx Ingress Controller**: Deploy nginx-ingress-controller with NodePort or host networking. * **MetalLB**: For bare-metal Kubernetes clusters, MetalLB provides load balancer services. For detailed load balancer configuration examples, see the [Reference Architecture networking section](/platform/hosting/self-managed/ref-arch#networking). #### Persistent storage Ensure your Kubernetes cluster has a StorageClass configured for persistent volumes. W\&B components might require persistent storage for caching and temporary data. Common on-premises storage options include: * NFS-based storage classes * Ceph/Rook storage * Local persistent volumes * Enterprise storage solutions such as NetApp or Pure Storage #### DNS and certificate management For on-premises deployments, complete the following tasks: * Configure internal DNS records to point to your W\&B hostname. * Provision SSL/TLS certificates from your internal Certificate Authority (CA). * If using self-signed certificates, configure the operator to trust your CA certificate. See the [SSL/TLS requirements](/platform/hosting/self-managed/requirements#ssl-tls) for certificate configuration details. #### OpenShift deployments W\&B fully supports deployment on OpenShift Kubernetes clusters. OpenShift deployments require additional security context configurations due to OpenShift's stricter security policies. For OpenShift-specific configuration details, see [OpenShift Kubernetes clusters](#openshift-kubernetes-clusters). For OpenShift examples in air-gapped environments, see [Deploy on Air-Gapped Kubernetes](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped#openshift-configuration). #### Object storage for on-premises and S3-compatible After provisioning your object storage bucket (see [Object storage provisioning](/platform/hosting/data-security/secure-storage-connector)), configure it in your W\&B Custom Resource. **AWS S3 (on-premises)** For on-premises AWS S3 (through Outposts or compatible storage): ```yaml theme={null} bucket: kmsKey: # Optional KMS key for encryption name: # Example: wandb path: "" # Keep as empty string provider: s3 region: # Example: us-east-1 ``` **S3-compatible storage such as MinIO, Ceph, or NetApp** For S3-compatible storage systems: ```yaml theme={null} bucket: kmsKey: null name: # Example: s3.example.com:9000 path: # Example: wandb provider: s3 region: # Example: us-east-1 ``` To enable TLS for S3-compatible storage, append `?tls=true` to the bucket path: ```yaml theme={null} bucket: path: "wandb?tls=true" ``` The certificate must be trusted. Self-signed certificates require additional configuration. See the [SSL/TLS requirements](/platform/hosting/self-managed/requirements#ssl-tls) for details. **Important considerations for on-premises object storage** When running your own object storage, consider: 1. **Storage capacity and performance**: Monitor disk capacity carefully. Average W\&B usage results in tens to hundreds of gigabytes. Heavy usage can result in petabytes of storage consumption. 2. **Fault tolerance**: At minimum, use RAID arrays for physical disks. For S3-compatible storage, use distributed or highly available configurations. 3. **Availability**: Configure monitoring to ensure the storage remains available. **MinIO considerations** MinIO Open Source is in [maintenance mode](https://github.com/minio/minio) with no active development. Pre-compiled binaries are no longer provided, and only critical security fixes are considered case-by-case. For production deployments, W\&B recommends using managed object storage services or [MinIO Enterprise (AIStor)](https://min.io/product/aistor). Enterprise alternatives for on-premises object storage include: * [Amazon S3 on Outposts](https://aws.amazon.com/s3/outposts/) * [NetApp StorageGRID](https://www.netapp.com/data-storage/storagegrid/) * MinIO Enterprise (AIStor) * [Dell ObjectScale](https://www.dell.com/en-us/shop/cty/sf/objectscale) If you are using an existing MinIO deployment or MinIO Enterprise, you can create a bucket using the MinIO client: ```bash theme={null} mc config host add local http://$MINIO_HOST:$MINIO_PORT "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY" --api s3v4 mc mb --region=us-east-1 local/wandb-files ``` ### Public cloud with Terraform For full infrastructure-plus-application deployment on AWS, Google Cloud, or Azure, see [Deploy with Terraform on public cloud](#deploy-with-terraform-on-public-cloud). ## Deploy with Terraform on public cloud W\&B recommends fully managed deployment options such as [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) or [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) deployment types. Fully managed services require little or no configuration. W\&B provides Terraform modules for deploying the platform on public cloud providers. These modules automate the provisioning of infrastructure and installation of W\&B Server, so you can stand up a complete environment without manually creating each cloud resource. Before you start, W\&B recommends that you choose one of the [remote backends](https://developer.hashicorp.com/terraform/language/backend) available for Terraform to store the [State File](https://developer.hashicorp.com/terraform/language/state). The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components. Select your cloud provider: W\&B recommends using the [W\&B Server AWS Terraform Module](https://registry.terraform.io/modules/wandb/wandb/aws/latest) to deploy the platform on AWS. The Terraform Module deploys the following mandatory components: * Load Balancer * AWS Identity & Access Management (IAM) * AWS Key Management System (KMS) * Amazon Aurora MySQL * Amazon VPC * Amazon S3 * Amazon Route53 * Amazon Certificate Manager (ACM) * Amazon Elastic Load Balancing (ALB) * Amazon Secrets Manager Optional components include: * Elastic Cache for Redis * SQS ### Prerequisite permissions The account that runs Terraform must be able to create all components listed in the preceding section and have permission to create **IAM Policies** and **IAM Roles** and assign roles to resources. ### General steps The steps in this section are common for any deployment option. 1. Prepare the development environment. * Install [Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli) * W\&B recommends creating a Git repository for version control. 2. Create the `terraform.tfvars` file. Customize the `tvfars` file content according to the installation type. The minimum recommended content looks like the following example. ```bash theme={null} namespace = "wandb" license = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz" subdomain = "wandb-aws" domain_name = "wandb.ml" zone_id = "xxxxxxxxxxxxxxxx" allowed_inbound_cidr = ["0.0.0.0/0"] allowed_inbound_ipv6_cidr = ["::/0"] eks_cluster_version = "1.29" ``` Define variables in your `tvfars` file before you deploy because the `namespace` variable is a string that prefixes all resources created by Terraform. The combination of `subdomain` and `domain` forms the FQDN for your W\&B instance. In the preceding example, the W\&B FQDN is `wandb-aws.wandb.ml` and the DNS `zone_id` is where Terraform creates the FQDN record. Both `allowed_inbound_cidr` and `allowed_inbound_ipv6_cidr` also require setting. In the module, this is a mandatory input. The following example permits access from any source to the W\&B installation. 3. Create the file `versions.tf`. This file contains the Terraform and Terraform provider versions required to deploy W\&B in AWS: ```bash theme={null} provider "aws" { region = "eu-central-1" default_tags { tags = { GithubRepo = "terraform-aws-wandb" GithubOrg = "wandb" Enviroment = "Example" Example = "PublicDnsExternal" } } } ``` Refer to the [Terraform Official Documentation](https://registry.terraform.io/providers/hashicorp/aws/latest/docs#provider-configuration) to configure the AWS provider. W\&B recommends that you also add the [remote backend configuration](https://developer.hashicorp.com/terraform/language/backend) mentioned at the beginning of this documentation. 4. Create the file `variables.tf` For every option configured in the `terraform.tfvars` Terraform requires a correspondent variable declaration. ```hcl theme={null} variable "namespace" { type = string description = "Name prefix used for resources" } variable "domain_name" { type = string description = "Domain name used to access instance." } variable "subdomain" { type = string default = null description = "Subdomain for accessing the Weights & Biases UI." } variable "license" { type = string } variable "zone_id" { type = string description = "Domain for creating the Weights & Biases subdomain on." } variable "allowed_inbound_cidr" { description = "CIDRs allowed to access wandb-server." nullable = false type = list(string) } variable "allowed_inbound_ipv6_cidr" { description = "CIDRs allowed to access wandb-server." nullable = false type = list(string) } variable "eks_cluster_version" { description = "EKS cluster kubernetes version" nullable = false type = string } ``` ### Recommended deployment This is the most straightforward deployment option configuration that creates all mandatory components and installs in the Kubernetes Cluster the latest version of W\&B. 1. Create the `main.tf` In the same directory where you created the files in the General Steps, create a file `main.tf` with the following content: ```hcl theme={null} module "wandb_infra" { source = "wandb/wandb/aws" version = "~>7.0" namespace = var.namespace domain_name = var.domain_name license = var.license subdomain = var.subdomain zone_id = var.zone_id allowed_inbound_cidr = var.allowed_inbound_cidr allowed_inbound_ipv6_cidr = var.allowed_inbound_ipv6_cidr public_access = true external_dns = true kubernetes_public_access = true kubernetes_public_access_cidrs = ["0.0.0.0/0"] eks_cluster_version = var.eks_cluster_version } data "aws_eks_cluster" "eks_cluster_id" { name = module.wandb_infra.cluster_name } data "aws_eks_cluster_auth" "eks_cluster_auth" { name = module.wandb_infra.cluster_name } provider "kubernetes" { host = data.aws_eks_cluster.eks_cluster_id.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster_id.certificate_authority.0.data) token = data.aws_eks_cluster_auth.eks_cluster_auth.token } provider "helm" { kubernetes { host = data.aws_eks_cluster.eks_cluster_id.endpoint cluster_ca_certificate = base64decode(data.aws_eks_cluster.eks_cluster_id.certificate_authority.0.data) token = data.aws_eks_cluster_auth.eks_cluster_auth.token } } output "url" { value = module.wandb_infra.url } output "bucket" { value = module.wandb_infra.bucket_name } ``` 2. Deploy W\&B To deploy W\&B, execute the following commands: ```bash theme={null} terraform init terraform apply -var-file=terraform.tfvars ``` ### Enable Redis To use Redis to cache SQL queries and speed up the application response when loading metrics, add the option `create_elasticache_subnet = true` to the `main.tf` file: ```hcl theme={null} module "wandb_infra" { source = "wandb/wandb/aws" version = "~>7.0" namespace = var.namespace domain_name = var.domain_name subdomain = var.subdomain zone_id = var.zone_id create_elasticache_subnet = true } [...] ``` ### Enable message broker (queue) To enable an external message broker using SQS, add the option `use_internal_queue = false` to the `main.tf` file: This is optional because W\&B includes an embedded broker. This option doesn't bring a performance improvement. ```hcl theme={null} module "wandb_infra" { source = "wandb/wandb/aws" version = "~>7.0" namespace = var.namespace domain_name = var.domain_name subdomain = var.subdomain zone_id = var.zone_id use_internal_queue = false [...] } ``` ### Additional resources * [AWS Terraform Module documentation](https://registry.terraform.io/modules/wandb/wandb/aws/latest) * [AWS Terraform Module source code](https://github.com/wandb/terraform-aws-wandb) * [Migrate to operator-based AWS Terraform modules](#migrate-to-operator-based-aws-terraform-modules) W\&B recommends using the [W\&B Server Google Cloud Terraform Module](https://registry.terraform.io/modules/wandb/wandb/google/latest) to deploy the platform on Google Cloud. The module documentation lists all available options. Before you start, W\&B recommends that you choose one of the [remote backends](https://developer.hashicorp.com/terraform/language/backend/remote) available for Terraform to store the [State File](https://developer.hashicorp.com/terraform/language/state). The State File is the necessary resource to roll out upgrades or make changes in your deployment without recreating all components. The Terraform Module deploys the following mandatory components: * VPC * Cloud SQL for MySQL * Cloud Storage Bucket * Google Kubernetes Engine * Memorystore for Redis * KMS Crypto Key * Load Balancer Optional components include: * Pub/Sub messages system ### Prerequisite permissions The account that runs Terraform must have the role `roles/owner` in the Google Cloud project used. ### General steps The steps in this section are common for any deployment option. 1. Prepare the development environment. * Install [Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli). * W\&B recommends creating a Git repository for your code, but you can keep your files locally. * Create a project in [Google Cloud Console](https://console.cloud.google.com/). * Authenticate with Google Cloud (make sure to [install gcloud](https://cloud.google.com/sdk/docs/install) first) using `gcloud auth application-default login`. 2. Create the `terraform.tfvars` file. Customize the `tvfars` file content according to the installation type. The minimum recommended content looks like the following example. ```bash theme={null} project_id = "wandb-project" region = "europe-west2" zone = "europe-west2-a" namespace = "wandb" license = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz" subdomain = "wandb-gcp" domain_name = "wandb.ml" ``` You must decide the values for these variables before deployment. The `namespace` variable is a string that prefixes all resources created by Terraform. The combination of `subdomain` and `domain` forms the FQDN that configures W\&B. In the preceding example, the W\&B FQDN is `wandb-gcp.wandb.ml`. 3. Create the file `variables.tf`. For every option configured in the `terraform.tfvars` Terraform requires a correspondent variable declaration. ```hcl theme={null} variable "project_id" { type = string description = "Project ID" } variable "region" { type = string description = "Google region" } variable "zone" { type = string description = "Google zone" } variable "namespace" { type = string description = "Namespace prefix used for resources" } variable "domain_name" { type = string description = "Domain name for accessing the Weights & Biases UI." } variable "subdomain" { type = string description = "Subdomain for access the Weights & Biases UI." } variable "license" { type = string description = "W&B License" } ``` ### Recommended deployment This is the most straightforward deployment option configuration that creates all mandatory components and installs in the Kubernetes Cluster the latest version of W\&B. 1. Create the `main.tf` In the same directory where you created the files in the General Steps, create a file `main.tf` with the following content: ```hcl theme={null} provider "google" { project = var.project_id region = var.region zone = var.zone } provider "google-beta" { project = var.project_id region = var.region zone = var.zone } data "google_client_config" "current" {} provider "kubernetes" { host = "https://${module.wandb.cluster_endpoint}" cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate) token = data.google_client_config.current.access_token } provider "helm" { kubernetes { host = "https://${module.wandb.cluster_endpoint}" cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate) token = data.google_client_config.current.access_token } } # Spin up all required services module "wandb" { source = "wandb/wandb/google" version = "~> 10.0" namespace = var.namespace license = var.license domain_name = var.domain_name subdomain = var.subdomain } # You'll want to update your DNS with the provisioned IP address output "url" { value = module.wandb.url } output "address" { value = module.wandb.address } output "bucket_name" { value = module.wandb.bucket_name } ``` 2. Deploy W\&B. To deploy W\&B, execute the following commands: ```bash theme={null} terraform init terraform apply -var-file=terraform.tfvars ``` ### Enable Redis To use Redis to cache SQL queries and speed up the application response when loading metrics, add the option `create_redis = true` to the `main.tf` file: ```hcl theme={null} [...] module "wandb" { source = "wandb/wandb/google" version = "~> 10.0" namespace = var.namespace license = var.license domain_name = var.domain_name subdomain = var.subdomain create_redis = true } [...] ``` ### Enable message broker (queue) To enable an external message broker using Pub/Sub, add the option `use_internal_queue = false` to the `main.tf` file: This is optional because W\&B includes an embedded broker. This option doesn't bring a performance improvement. ```hcl theme={null} [...] module "wandb" { source = "wandb/wandb/google" version = "~> 10.0" namespace = var.namespace license = var.license domain_name = var.domain_name subdomain = var.subdomain use_internal_queue = false } [...] ``` ### Additional resources * [Google Cloud Terraform Module documentation](https://registry.terraform.io/modules/wandb/wandb/google/latest) * [Google Cloud Terraform Module source code](https://github.com/wandb/terraform-google-wandb) W\&B recommends using the [W\&B Server Azure Terraform Module](https://registry.terraform.io/modules/wandb/wandb/azurerm/latest) to deploy the platform on Azure. The module documentation lists all available options. The Terraform Module deploys the following mandatory components: * Azure Resource Group * Azure Virtual Network (VPC) * Azure MySQL Flexible Server * Azure Storage Account and Blob Storage * Azure Kubernetes Service * Azure Application Gateway Optional components include: * Azure Cache for Redis * Azure Event Grid ### Prerequisite permissions The simplest way to configure the AzureRM provider is through the [Azure CLI](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/azure_cli). For automation, you can also use an [Azure Service Principal](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/service_principal_client_secret). Regardless of the authentication method used, the account that runs Terraform must be able to create all components listed in the preceding section. ### General steps The steps in this section are common for any deployment option. 1. Prepare the development environment. * Install [Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli) * W\&B recommends creating a Git repository for your code, but you can keep your files locally. 2. Create the `terraform.tfvars` file. Customize the `tvfars` file content according to the installation type. The minimum recommended content looks like the following example. ```bash theme={null} namespace = "wandb" wandb_license = "xxxxxxxxxxyyyyyyyyyyyzzzzzzz" subdomain = "wandb-azure" domain_name = "wandb.ml" location = "westeurope" ``` You must decide the values for these variables before deployment. The `namespace` variable is a string that prefixes all resources created by Terraform. The combination of `subdomain` and `domain` forms the FQDN that configures W\&B. In the preceding example, the W\&B FQDN is `wandb-azure.wandb.ml`. 3. Create the file `versions.tf`. This file contains the Terraform and Terraform provider versions required to deploy W\&B in Azure: ```bash theme={null} terraform { required_version = "~> 1.3" required_providers { azurerm = { source = "hashicorp/azurerm" version = "~> 3.17" } } } ``` Refer to the [Terraform Official Documentation](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs) to configure the Azure provider. W\&B recommends that you also add the [remote backend configuration](https://developer.hashicorp.com/terraform/language/backend) mentioned at the beginning of this documentation. 4. Create the file `variables.tf` For every option configured in the `terraform.tfvars` Terraform requires a correspondent variable declaration. ```bash theme={null} variable "namespace" { type = string description = "String used for prefix resources." } variable "location" { type = string description = "Azure Resource Group location" } variable "domain_name" { type = string description = "Domain for accessing the Weights & Biases UI." } variable "subdomain" { type = string default = null description = "Subdomain for accessing the Weights & Biases UI. Default creates record at Route53 Route." } variable "license" { type = string description = "Your wandb/local license" } ``` ### Recommended deployment This is the most straightforward deployment option configuration that creates all mandatory components and installs in the Kubernetes Cluster the latest version of W\&B. 1. Create the `main.tf` In the same directory where you created the files in the General Steps, create a file `main.tf` with the following content: ```bash theme={null} provider "azurerm" { features {} } provider "kubernetes" { host = module.wandb.cluster_host cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate) client_key = base64decode(module.wandb.cluster_client_key) client_certificate = base64decode(module.wandb.cluster_client_certificate) } provider "helm" { kubernetes { host = module.wandb.cluster_host cluster_ca_certificate = base64decode(module.wandb.cluster_ca_certificate) client_key = base64decode(module.wandb.cluster_client_key) client_certificate = base64decode(module.wandb.cluster_client_certificate) } } # Spin up all required services module "wandb" { source = "wandb/wandb/azurerm" version = "~> 1.2" namespace = var.namespace location = var.location license = var.license domain_name = var.domain_name subdomain = var.subdomain deletion_protection = false tags = { "Example" : "PublicDns" } } output "address" { value = module.wandb.address } output "url" { value = module.wandb.url } ``` 2. Deploy W\&B To deploy W\&B, execute the following commands: ```bash theme={null} terraform init terraform apply -var-file=terraform.tfvars ``` ### Enable Redis To use Redis to cache SQL queries and speed up the application response when loading metrics, add the option `create_redis = true` to the `main.tf` file: ```bash theme={null} # Spin up all required services module "wandb" { source = "wandb/wandb/azurerm" version = "~> 1.2" namespace = var.namespace location = var.location license = var.license domain_name = var.domain_name subdomain = var.subdomain create_redis = true [...] } ``` ### Enable message broker (queue) To enable an external message broker using Azure Event Grid, add the option `use_internal_queue = false` to the `main.tf` file: This is optional because W\&B includes an embedded broker. This option doesn't bring a performance improvement. ```bash theme={null} # Spin up all required services module "wandb" { source = "wandb/wandb/azurerm" version = "~> 1.2" namespace = var.namespace location = var.location license = var.license domain_name = var.domain_name subdomain = var.subdomain use_internal_queue = false [...] } ``` ### Additional resources * [Azure Terraform Module documentation](https://registry.terraform.io/modules/wandb/wandb/azurerm/latest) * [Azure Terraform Module source code](https://github.com/wandb/terraform-azurerm-wandb) ### Other deployment options You can combine multiple deployment options by adding all configurations to the same file. Each Terraform module provides several options that can be combined with the standard options and the minimal configuration found in the recommended deployment section. Refer to the module documentation for your cloud provider for the full list of available options: * [AWS Module documentation](https://registry.terraform.io/modules/wandb/wandb/aws/latest) * [Google Cloud Module documentation](https://registry.terraform.io/modules/wandb/wandb/google/latest) * [Azure Module documentation](https://registry.terraform.io/modules/wandb/wandb/azurerm/latest) ## Access the W\&B management console The W\&B Kubernetes Operator comes with a management console where you can review deployment status, view component metrics, and adjust operator-level settings. It's available at `${HOST_URI}/console`, for example `https://wandb.company-name.com/console`. You can log in to the management console in two ways: 1. Open the W\&B application in the browser and log in. Log in to the W\&B application with `${HOST_URI}/`, for example `https://wandb.company-name.com/` 2. Access the console. Click the icon in the top right corner and then click **System console**. Only users with admin privileges can see the **System console** entry. System console access W\&B recommends you access the console using the following steps only if Option 1 doesn't work. 1. Open the console application in your browser. Open the URL described in the preceding section, which redirects you to the login screen: Direct system console access 2. Retrieve the password from the Kubernetes secret that the installation generates: ```shell theme={null} kubectl get secret wandb-password -o jsonpath='{.data.password}' | base64 -d ``` Copy the password. 3. Log in to the console. Paste the copied password, then click **Login**. ## Update the W\&B Kubernetes Operator This section describes how to update the W\&B Kubernetes Operator itself. Update the operator periodically so that you get bug fixes and new reconciliation features. * Updating the W\&B Kubernetes Operator doesn't update the W\&B Server application. * If you use a Helm chart that doesn't use the W\&B Kubernetes Operator, see the [migration instructions](#migrate-self-managed-instances-to-wb-operator) before following the steps in this section to update the W\&B Operator. Copy and paste the following code snippets into your terminal. 1. Update the repo with [`helm repo update`](https://helm.sh/docs/helm/helm_repo_update/): ```shell theme={null} helm repo update ``` 2. Update the Helm chart with [`helm upgrade`](https://helm.sh/docs/helm/helm_upgrade/): ```shell theme={null} helm upgrade operator wandb/operator -n wandb-cr --reuse-values ``` ## Update the W\&B Server application You no longer need to update W\&B Server application if you use the W\&B Kubernetes Operator. The operator automatically updates your W\&B Server application when a new version of the software of W\&B is released. ## Migrate self-managed instances to W\&B Operator The following section describes how to migrate from self-managing your own W\&B Server installation to using the W\&B Operator to do this for you. Migrating lets the operator handle reconciliation and W\&B Server upgrades automatically, so you no longer have to coordinate manifest changes or Helm upgrades for the application. The migration process depends on how you installed W\&B Server: The W\&B Operator is the default and recommended installation method for W\&B Server. Reach out to [Customer Support](mailto:support@wandb.com) or your W\&B team if you have any questions. * If you used the official W\&B Cloud Terraform Modules, navigate to the appropriate documentation and follow the steps there: * [AWS](#migrate-to-operator-based-aws-terraform-modules) * [Google Cloud](#migrate-to-operator-based-google-cloud-terraform-modules) * [Azure](#migrate-to-operator-based-azure-terraform-modules) * If you used the [W\&B Non-Operator Helm chart](https://github.com/wandb/helm-charts/tree/main/charts/wandb), see [Migrate to operator-based Helm chart](#migrate-to-operator-based-helm-chart). * If you used the [W\&B Non-Operator Helm chart with Terraform](https://registry.terraform.io/modules/wandb/wandb/kubernetes/latest), see [Migrate to operator-based Terraform Helm chart](#migrate-to-operator-based-terraform-helm-chart). * If you created the Kubernetes resources with manifests, see [Migrate to operator-based Helm chart](#migrate-to-operator-based-helm-chart). ### Migrate to operator-based AWS Terraform modules For a detailed description of the migration process, see the [operator-wandb chart documentation](https://github.com/wandb/helm-charts/tree/main/charts/operator-wandb). ### Migrate to operator-based Google Cloud Terraform modules Reach out to [Customer Support](mailto:support@wandb.com) or your W\&B team if you have any questions or need assistance. ### Migrate to operator-based Azure Terraform modules Reach out to [Customer Support](mailto:support@wandb.com) or your W\&B team if you have any questions or need assistance. ### Migrate to operator-based Helm chart Follow these steps to migrate to the operator-based Helm chart: 1. Get the current W\&B configuration. If you deployed W\&B with a non-operator-based version of the Helm chart, export the values like this: ```shell theme={null} helm get values wandb ``` If you deployed W\&B with Kubernetes manifests, export the values like this: ```shell theme={null} kubectl get deployment wandb -o yaml ``` You now have all the configuration values you need for the next step. 2. Create a file called `operator.yaml`. Follow the format described in the [Configuration Reference](#configuration-reference-for-wb-operator). Use the values from step 1. 3. Scale the current deployment to 0 pods. This step stops the current deployment. ```shell theme={null} kubectl scale --replicas=0 deployment wandb ``` 4. Update the Helm chart repo: ```shell theme={null} helm repo update ``` 5. Install the new Helm chart: ```shell theme={null} helm upgrade --install operator wandb/operator -n wandb-cr --create-namespace ``` 6. Configure the new Helm chart and trigger W\&B application deployment. Apply the new configuration. ```shell theme={null} kubectl apply -f operator.yaml ``` The deployment takes a few minutes to complete. 7. Verify the installation. Make sure that everything works by following the steps in [Verify the installation](#verify-the-installation). 8. Remove the old installation. Uninstall the old Helm chart or delete the resources that you created with manifests. ### Migrate to operator-based Terraform Helm chart Follow these steps to migrate to the operator-based Helm chart: 1. Prepare Terraform config. Replace the Terraform code from the old deployment in your Terraform config with the code described in [Deploy W\&B with Helm Terraform module](#deploy-wb-with-helm-terraform-module). Set the same variables as before. Do not change the `.tfvars` file if you have one. 2. Execute Terraform run. Execute `terraform init`, `terraform plan`, and `terraform apply`. 3. Verify the installation. Make sure that everything works by following the steps in [Verify the installation](#verify-the-installation). 4. Remove the old installation. Uninstall the old Helm chart or delete the resources that you created with manifests. ## Configuration reference for W\&B Server This section is a reference for the configuration options that you set in your `WeightsAndBiases` custom resource. Use it to look up the YAML schema for a specific subsystem (for example, MySQL, Redis, ingress, or OIDC) as you build or update your `operator.yaml` file. This section describes the configuration options for W\&B Server application. The application receives its configuration as custom resource definition named [WeightsAndBiases](#how-it-works). Some configuration options are exposed with the following configuration. You must set others as environment variables. The documentation has two lists of environment variables: [basic](/platform/hosting/env-vars/) and [advanced](/platform/hosting/iam/advanced_env_vars/). Only use environment variables if the configuration option that you need is not exposed using the Helm chart. ### Basic example This example defines the minimum set of values required for W\&B. For a more realistic production example, see [Complete example](#complete-example). This YAML file defines the desired state of your W\&B deployment, including the version, environment variables, external resources like databases, and other necessary settings. ```yaml theme={null} apiVersion: apps.wandb.com/v1 kind: WeightsAndBiases metadata: labels: app.kubernetes.io/name: weightsandbiases app.kubernetes.io/instance: wandb name: wandb namespace: default spec: values: global: host: https:// license: eyJhbGnUzaH...j9ZieKQ2x5GGfw bucket:
mysql: ingress: annotations: ``` Find the full set of values in the [W\&B Helm repository](https://github.com/wandb/helm-charts/blob/main/charts/operator-wandb/values.yaml). **Change only those values you need to override**. ### Complete example This example configuration deploys W\&B to Google Cloud Anthos using Google Cloud Storage: ```yaml theme={null} apiVersion: apps.wandb.com/v1 kind: WeightsAndBiases metadata: labels: app.kubernetes.io/name: weightsandbiases app.kubernetes.io/instance: wandb name: wandb namespace: default spec: values: global: host: https://abc-wandb.sandbox-gcp.wandb.ml bucket: name: abc-wandb-moving-pipefish provider: gcs mysql: database: wandb_local host: 10.218.0.2 name: wandb_local password: 8wtX6cJHizAZvYScjDzZcUarK4zZGjpV port: 3306 user: wandb redis: host: redis.example.com port: 6379 password: password api: enabled: true glue: enabled: true executor: enabled: true license: eyJhbGnUzaHgyQjQyQWhEU3...ZieKQ2x5GGfw ingress: annotations: ingress.gcp.kubernetes.io/pre-shared-cert: abc-wandb-cert-creative-puma kubernetes.io/ingress.class: gce kubernetes.io/ingress.global-static-ip-name: abc-wandb-operator-address ``` ### Host ```yaml theme={null} # Provide the FQDN with protocol global: # example host name, replace with your own host: https://wandb.example.com ``` ### Object storage (bucket) **AWS** ```yaml theme={null} global: bucket: provider: "s3" name: "" kmsKey: "" region: "" ``` **Google Cloud** ```yaml theme={null} global: bucket: provider: "gcs" name: "" ``` **Azure** ```yaml theme={null} global: bucket: provider: "az" name: "" secretKey: "" ``` **Other providers (Minio, Ceph, and other S3-compatible storage)** For other S3 compatible providers, set the bucket configuration as follows: ```yaml theme={null} global: bucket: # Example values, replace with your own provider: s3 name: storage.example.com kmsKey: null path: wandb region: default accessKey: 5WOA500...P5DK7I secretKey: HDKYe4Q...JAp1YyjysnX ``` For S3-compatible storage hosted outside of AWS, `kmsKey` must be `null`. To reference `accessKey` and `secretKey` from a secret: ```yaml theme={null} global: bucket: # Example values, replace with your own provider: s3 name: storage.example.com kmsKey: null path: wandb region: default secret: secretName: bucket-secret accessKeyName: ACCESS_KEY secretKeyName: SECRET_KEY ``` ### MySQL ```yaml theme={null} global: mysql: # Example values, replace with your own host: db.example.com port: 3306 database: wandb_local user: wandb password: 8wtX6cJH...ZcUarK4zZGjpV ``` To reference the `password` from a secret: ```yaml theme={null} global: mysql: # Example values, replace with your own host: db.example.com port: 3306 database: wandb_local user: wandb passwordSecret: name: database-secret passwordKey: MYSQL_WANDB_PASSWORD ``` ### License ```yaml theme={null} global: # Example license, replace with your own license: eyJhbGnUzaHgyQjQy...VFnPS_KETXg1hi ``` To reference the `license` from a secret: ```yaml theme={null} global: licenseSecret: name: license-secret key: CUSTOMER_WANDB_LICENSE ``` ### Ingress See [How to identify the Kubernetes ingress class](#how-to-identify-the-kubernetes-ingress-class). **Without TLS** ```yaml theme={null} global: # IMPORTANT: Ingress is on the same level in the YAML as 'global' (not a child) ingress: class: "" ``` **With TLS** Create a secret that contains the certificate ```console theme={null} kubectl create secret tls wandb-ingress-tls --key wandb-ingress-tls.key --cert wandb-ingress-tls.crt ``` Reference the secret in the ingress configuration ```yaml theme={null} global: # IMPORTANT: Ingress is on the same level in the YAML as 'global' (not a child) ingress: class: "" annotations: {} # kubernetes.io/ingress.class: nginx # kubernetes.io/tls-acme: "true" tls: - secretName: wandb-ingress-tls hosts: - ``` For Nginx, you might have to add the following annotation: ```yaml theme={null} ingress: annotations: nginx.ingress.kubernetes.io/proxy-body-size: 0 ``` ### Custom Kubernetes service accounts Specify custom Kubernetes service accounts to run the W\&B pods. The following snippet creates a service account as part of the deployment with the specified name: ```yaml theme={null} app: serviceAccount: name: custom-service-account create: true parquet: serviceAccount: name: custom-service-account create: true global: ... ``` The subsystems "app" and "parquet" run under the specified service account. The other subsystems run under the default service account. If the service account already exists on the cluster, set `create: false`: ```yaml theme={null} app: serviceAccount: name: custom-service-account create: false parquet: serviceAccount: name: custom-service-account create: false global: ... ``` You can specify service accounts on different subsystems such as app, parquet, console, and others: ```yaml theme={null} app: serviceAccount: name: custom-service-account create: true console: serviceAccount: name: custom-service-account create: true global: ... ``` The service accounts can be different between the subsystems: ```yaml theme={null} app: serviceAccount: name: custom-service-account create: false console: serviceAccount: name: another-custom-service-account create: true global: ... ``` ### External Redis ```yaml theme={null} redis: install: false global: redis: host: "" port: 6379 password: "" parameters: {} caCert: "" ``` To reference the `password` from a secret: ```console theme={null} kubectl create secret generic redis-secret --from-literal=redis-password=supersecret ``` Reference it in the following configuration: ```yaml theme={null} redis: install: false global: redis: host: redis.example port: 9001 auth: enabled: true secret: redis-secret key: redis-password ``` ### LDAP LDAP configuration support in the current Helm chart is limited. Contact W\&B Support or your AISE for assistance configuring LDAP. Configure LDAP by setting environment variables in `global.extraEnv`: ```yaml theme={null} global: extraEnv: LDAP_ADDRESS: ldaps://ldap.company.example.com LDAP_BASE_DN: cn=accounts,dc=company,dc=example,dc=com LDAP_USER_BASE_DN: cn=users,cn=accounts,dc=company,dc=example,dc=com LDAP_GROUP_BASE_DN: cn=groups,cn=accounts,dc=company,dc=example,dc=com LDAP_BIND_DN: uid=ldapbind,cn=sysaccounts,cn=etc,dc=company,dc=example,dc=com LDAP_BIND_PW: ******************** LDAP_ATTRIBUTES: email=mail,name=cn LDAP_TLS_ENABLE: "true" LDAP_LOGIN: "true" LDAP_USER_OBJECT_CLASS: user LDAP_GROUP_OBJECT_CLASS: group ``` This legacy approach is no longer recommended. This section is provided for reference. **Without TLS** ```yaml theme={null} global: ldap: enabled: true # LDAP server address including "ldap://" or "ldaps://" host: # LDAP search base to use for finding users baseDN: # LDAP user to bind with (if not using anonymous bind) bindDN: # Secret name and key with LDAP password to bind with (if not using anonymous bind) bindPW: # LDAP attribute for email and group ID attribute names as comma separated string values. attributes: # LDAP group allow list groupAllowList: # Enable LDAP TLS tls: false ``` **With TLS** The LDAP TLS cert configuration requires a config map pre-created with the certificate content. To create the config map you can use the following command: ```console theme={null} kubectl create configmap ldap-tls-cert --from-file=certificate.crt ``` And use the config map in the YAML like the following example. ```yaml theme={null} global: ldap: enabled: true # LDAP server address including "ldap://" or "ldaps://" host: # LDAP search base to use for finding users baseDN: # LDAP user to bind with (if not using anonymous bind) bindDN: # Secret name and key with LDAP password to bind with (if not using anonymous bind) bindPW: # LDAP attribute for email and group ID attribute names as comma separated string values. attributes: # LDAP group allow list groupAllowList: # Enable LDAP TLS tls: true # ConfigMap name and key with CA certificate for LDAP server tlsCert: configMap: name: "ldap-tls-cert" key: "certificate.crt" ``` ### OIDC SSO ```yaml theme={null} global: auth: sessionLengthHours: 720 oidc: clientId: "" secret: "" # Only include if your IdP requires it. authMethod: "" issuer: "" ``` `authMethod` is optional. ### SMTP ```yaml theme={null} global: email: smtp: host: "" port: 587 user: "" password: "" ``` ### Environment variables ```yaml theme={null} global: extraEnv: GLOBAL_ENV: "example" ``` ### Custom certificate authority `customCACerts` is a list and can take many certificates. Certificate authorities specified in `customCACerts` only apply to the W\&B Server application. ```yaml theme={null} global: customCACerts: - | -----BEGIN CERTIFICATE----- MIIBnDCCAUKgAwIBAg.....................fucMwCgYIKoZIzj0EAwIwLDEQ MA4GA1UEChMHSG9tZU.....................tZUxhYiBSb290IENBMB4XDTI0 MDQwMTA4MjgzMFoXDT.....................oNWYggsMo8O+0mWLYMAoGCCqG SM49BAMCA0gAMEUCIQ.....................hwuJgyQRaqMI149div72V2QIg P5GD+5I+02yEp58Cwxd5Bj2CvyQwTjTO4hiVl1Xd0M0= -----END CERTIFICATE----- - | -----BEGIN CERTIFICATE----- MIIBxTCCAWugAwIB.......................qaJcwCgYIKoZIzj0EAwIwLDEQ MA4GA1UEChMHSG9t.......................tZUxhYiBSb290IENBMB4XDTI0 MDQwMTA4MjgzMVoX.......................UK+moK4nZYvpNpqfvz/7m5wKU SAAwRQIhAIzXZMW4.......................E8UFqsCcILdXjAiA7iTluM0IU aIgJYVqKxXt25blH/VyBRzvNhViesfkNUQ== -----END CERTIFICATE----- ``` CA certificates can also be stored in a ConfigMap: ```yaml theme={null} global: caCertsConfigMap: custom-ca-certs ``` The ConfigMap must look like this: ```yaml theme={null} apiVersion: v1 kind: ConfigMap metadata: name: custom-ca-certs data: ca-cert1.crt: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ca-cert2.crt: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` If using a ConfigMap, each key in the ConfigMap must end with `.crt` (for example, `my-cert.crt` or `ca-cert1.crt`). This naming convention is required for `update-ca-certificates` to parse and add each certificate to the system CA store. ### Custom security context Each W\&B component supports custom security context configurations of the following form: ```yaml theme={null} pod: securityContext: runAsNonRoot: true runAsUser: 1001 runAsGroup: 0 fsGroup: 1001 fsGroupChangePolicy: Always seccompProfile: type: RuntimeDefault container: securityContext: capabilities: drop: - ALL readOnlyRootFilesystem: false allowPrivilegeEscalation: false ``` The only valid value for `runAsGroup:` is `0`. Any other value is an error. For example, to configure the application pod, add a section `app` to your configuration: ```yaml theme={null} global: ... app: pod: securityContext: runAsNonRoot: true runAsUser: 1001 runAsGroup: 0 fsGroup: 1001 fsGroupChangePolicy: Always seccompProfile: type: RuntimeDefault container: securityContext: capabilities: drop: - ALL readOnlyRootFilesystem: false allowPrivilegeEscalation: false ``` The same concept applies to `console`, `weave`, `weave-trace`, and `parquet`. ## Configuration reference for W\&B Operator This section describes configuration options for W\&B Kubernetes Operator (`wandb-controller-manager`). The operator receives its configuration in the form of a YAML file. By default, the W\&B Kubernetes Operator doesn't need a configuration file. Create a configuration file if required. For example, you might need a configuration file to specify custom certificate authorities, deploy in an air gap environment, and so forth. Find the full list of spec customization [in the Helm repository](https://github.com/wandb/helm-charts/blob/main/charts/operator/values.yaml). ### Custom CA A custom certificate authority (`customCACerts`) is a list and can take many certificates. Those certificate authorities, when added, only apply to the W\&B Kubernetes Operator (`wandb-controller-manager`). ```yaml theme={null} customCACerts: - | -----BEGIN CERTIFICATE----- MIIBnDCCAUKgAwIBAg.....................fucMwCgYIKoZIzj0EAwIwLDEQ MA4GA1UEChMHSG9tZU.....................tZUxhYiBSb290IENBMB4XDTI0 MDQwMTA4MjgzMFoXDT.....................oNWYggsMo8O+0mWLYMAoGCCqG SM49BAMCA0gAMEUCIQ.....................hwuJgyQRaqMI149div72V2QIg P5GD+5I+02yEp58Cwxd5Bj2CvyQwTjTO4hiVl1Xd0M0= -----END CERTIFICATE----- - | -----BEGIN CERTIFICATE----- MIIBxTCCAWugAwIB.......................qaJcwCgYIKoZIzj0EAwIwLDEQ MA4GA1UEChMHSG9t.......................tZUxhYiBSb290IENBMB4XDTI0 MDQwMTA4MjgzMVoX.......................UK+moK4nZYvpNpqfvz/7m5wKU SAAwRQIhAIzXZMW4.......................E8UFqsCcILdXjAiA7iTluM0IU aIgJYVqKxXt25blH/VyBRzvNhViesfkNUQ== -----END CERTIFICATE----- ``` CA certificates can also be stored in a ConfigMap: ```yaml theme={null} caCertsConfigMap: custom-ca-certs ``` The ConfigMap must look like this: ```yaml theme={null} apiVersion: v1 kind: ConfigMap metadata: name: custom-ca-certs data: ca-cert1.crt: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ca-cert2.crt: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE----- ``` Each key in the ConfigMap must end with `.crt` (for example, `my-cert.crt` or `ca-cert1.crt`). This naming convention is required for `update-ca-certificates` to parse and add each certificate to the system CA store. ## FAQ ### Purpose and role of each pod A W\&B Server deployment includes the following pods: * **`wandb-app`**: the core of W\&B, including the GraphQL API and frontend application. It powers most of the W\&B platform's functionality. * **`wandb-console`**: the administration console, accessed through `/console`. * **`wandb-otel`**: the OpenTelemetry agent, which collects metrics and logs from resources at the Kubernetes layer for display in the administration console. * **`wandb-prometheus`**: the Prometheus server, which captures metrics from various components for display in the administration console. * **`wandb-parquet`**: a backend microservice separate from the `wandb-app` pod that exports database data to object storage in Parquet format. * **`wandb-weave`**: another backend microservice that loads query tables in the UI and supports various core app features. * **`wandb-weave-trace`**: a framework for tracking, experimenting with, evaluating, deploying, and improving LLM-based applications. The framework is accessed through the `wandb-app` pod. ### How to get the W\&B Operator Console password See [Access the W\&B management console](#access-the-wb-management-console). ### How to access the W\&B Operator Console if Ingress doesn't work Execute the following command on a host that can reach the Kubernetes cluster: ```console theme={null} kubectl port-forward svc/wandb-console 8082 ``` Access the console in the browser with `https://localhost:8082/` console. For how to get the password (Option 2), see [Access the W\&B management console](#access-the-wb-management-console). ### How to view W\&B Server logs The application pod is named **wandb-app-xxx**. ```console theme={null} kubectl get pods kubectl logs wandb-XXXXX-XXXXX ``` ### How to identify the Kubernetes ingress class You can get the ingress class installed in your cluster by running ```console theme={null} kubectl get ingressclass ``` # Rate limits Source: https://docs.wandb.ai/platform/hosting/self-managed/rate-limits Optional rate limits on Self-Managed instances for stability You can configure rate limits on your W\&B Self-Managed instance to maintain stability and prevent one user or workload from affecting others. These limits are optional. If you don't set them, the instance doesn't enforce any limits. ## Default limits and notification The following default limits help maintain platform stability: | Limit | Default | Scope | | -------------------------------- | ------- | ------- | | Filestream requests per second | 500 | Project | | Filestream ingestion per second | 120 MB | Project | | Filestream requests per second | 2 | Run | | Run creation requests per second | 80 | Project | When a limit is exceeded, the W\&B SDK returns HTTP response `429`, and the message `HTTP 429: rate limited exceeded` appears in the SDK logs. * Filesystem rate limits never cause logging to crash or fail. When the SDK receives a `429` response on a filestream request, it will back off and retry the rate-limited request as-is, while subsequent updates accumulate. * Run creation rate limits block further training. ``` HTTP 429: rate limit exceeded ``` W\&B recommends these defaults for typical production workloads. If your workloads consistently exceed these limits, you can adjust them (or leave them unset) according to your instance size and shared usage. For configuration details, see [Deploy W\&B with Kubernetes Operator](/platform/hosting/self-managed/operator) or contact [W\&B support](mailto:support@wandb.com). # Reference architecture Source: https://docs.wandb.ai/platform/hosting/self-managed/ref-arch Review the reference architecture for self-managed W&B deployments covering Kubernetes, MySQL, object storage, and networking. This page describes a reference architecture for a W\&B deployment and outlines the recommended infrastructure and resources to support a production deployment of the platform. Use it as a planning guide to size, provision, and integrate the components required for a reliable Self-Managed installation. This page is intended for platform engineers, site reliability engineers, and infrastructure administrators who deploy and operate W\&B on their own infrastructure. Depending on your chosen deployment environment for W\&B, different services can help to enhance the resiliency of your deployment. For instance, major cloud providers offer managed database services that help to reduce the complexity of database configuration, maintenance, high availability, and resilience. This reference architecture addresses common deployment scenarios and shows how you can integrate your W\&B deployment with cloud vendor services for performance and reliability. ## Before you start Running any application in production comes with its own set of challenges, and W\&B is no exception. Although W\&B aims to streamline the process, complexities may arise depending on your architecture and design decisions. Typically, managing a production deployment involves overseeing components including hardware, operating systems, networking, storage, security, the W\&B platform itself, and other dependencies. This responsibility extends to both the initial setup of the environment and its ongoing maintenance. Consider carefully whether a Self-Managed approach with W\&B is suitable for your team and your requirements. A strong understanding of how to run and maintain production-grade application is an important prerequisite before you deploy Self-Managed W\&B. If your team needs assistance, the W\&B Professional Services team and partners offer support for implementation and optimization. To learn more about managed solutions for running W\&B instead of managing it yourself, refer to [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) and [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud). ## Infrastructure A W\&B deployment consists of an application layer and a storage layer. The following diagram shows how these layers fit together, and the subsections that follow describe each one. W&B infrastructure diagram ### Application layer The application layer consists of a multi-node Kubernetes cluster, with resilience against node failures. The Kubernetes cluster runs and maintains the W\&B pods. ### Storage layer The storage layer consists of a MySQL database and object storage. The MySQL database stores metadata and the object storage stores artifacts such as models and datasets. ## Infrastructure requirements The following sections detail requirements for a W\&B deployment, including Kubernetes cluster details, MySQL, Redis, object storage, software versions, networking, DNS, load balancer and ingress, SSL/TLS, and supported CPU architectures. Confirm that your environment meets each of these requirements before you begin a deployment. ### Kubernetes W\&B deploys the W\&B Server application as a [Kubernetes Operator](/platform/hosting/self-managed/operator) that deploys multiple pods. For this reason, W\&B requires a Kubernetes cluster with: * A fully configured and functioning ingress controller. * The capability to provision Persistent Volumes. W\&B supports deployment on [OpenShift Kubernetes clusters](https://www.redhat.com/en/technologies/cloud-computing/openshift) in cloud, on-premises, and air-gapped environments. For specific configuration instructions, see the [OpenShift section](/platform/hosting/self-managed/operator#openshift-kubernetes-clusters) in the Operator guide. ### MySQL W\&B stores metadata in a MySQL database. The database's performance and storage requirements depend on the shapes of the model parameters and related metadata. For example, the database grows in size as you track more training runs, and load on the database increases based on queries in run tables, user workspaces, and reports. **W\&B strongly recommends using managed database services** (such as AWS RDS Aurora MySQL, Google Cloud SQL for MySQL, or Azure Database for MySQL) for production deployments. Managed services provide automated backups, monitoring, high availability, and patching, and reduce operational complexity. See the [Cloud provider instance recommendations](#cloud-provider-instance-recommendations) section for specific service recommendations. If you choose to deploy a self-managed MySQL database, consider the following: * **Backups**: Periodically back up the database to a separate facility. W\&B recommends daily backups with at least 1 week of retention. * **Performance**: The database requires fast storage hardware, such as SSD or accelerated NAS. * **Monitoring**: The database requires adequate CPU resources. Monitor the database server's CPU load. If CPU usage is sustained at > 90% of the system for more than 5 minutes, consider adding CPU capacity. * **Availability**: To meet your availability and durability requirements, W\&B recommends configuring a hot standby deployment on a separate machine. The standby streams all updates in real time from the primary deployment and is ready to fail over if the primary server crashes, becomes corrupted, or experiences sustained downtime. #### MySQL topology For production, a managed MySQL service is the simplest path to high availability because the cloud provider handles failover, backups, and patching. Use the provider's high availability option, for example, Aurora Multi-AZ on AWS. If you run self-managed MySQL, use a primary database with a hot standby that receives a real-time replication stream and can take over on failure. W\&B doesn't support a multi-primary topology or read-only replicas for the application database. #### MySQL database creation For instructions to manually create the MySQL database and user, see the [bare-metal guide MySQL database section](/platform/hosting/self-managed/operator#mysql-database). #### MySQL configuration parameters These parameters tune MySQL for the write patterns and schema changes that W\&B performs at scale. If you're running your own MySQL instance, configure MySQL with these settings: ```ini theme={null} binlog_format = 'ROW' binlog_row_image = 'MINIMAL' innodb_flush_log_at_trx_commit = 1 innodb_online_alter_log_max_size = 268435456 max_prepared_stmt_count = 1048576 sort_buffer_size = '67108864' sync_binlog = 1 ``` W\&B has validated these settings for performance and reliability. ### Redis W\&B depends on a single-node Redis 7.x deployment that W\&B components use for job queuing and data caching. For convenience during testing and development of proofs of concept, W\&B Self-Managed includes a local Redis deployment that isn't appropriate for production deployments. W\&B can connect to a Redis instance in the following environments: * [AWS Elasticache](https://aws.amazon.com/elasticache/). * [Google Cloud Memory Store](https://cloud.google.com/memorystore?hl=en). * [Azure Cache for Redis](https://azure.microsoft.com/en-us/products/cache). * Redis deployment hosted in your cloud or on-premises infrastructure. ### Object storage W\&B requires object storage with pre-signed URL and CORS support, deployed in one of: * [CoreWeave AI Object Storage](https://docs.coreweave.com/products/storage/object-storage) is an S3-compatible object storage service optimized for AI workloads. * [Amazon S3](https://aws.amazon.com/s3/) is an object storage service that provides scalability, data availability, security, and performance. * [Google Cloud Storage](https://cloud.google.com/storage) is a managed service for storing unstructured data at scale. * [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) is a cloud-based object storage solution for storing unstructured data like text, binary data, images, videos, and logs. * S3-compatible storage such as [MinIO Enterprise (AIStor)](https://www.min.io/product/aistor), [NetApp StorageGRID](https://www.netapp.com/data-storage/storagegrid/), or other enterprise-grade solutions hosted in your cloud or on-premises infrastructure. ### Versions | Software | Minimum version | | ---------- | ------------------------------------------------------------------------------------------------------------------------------- | | Kubernetes | v1.34 or newer ([Supported Kubernetes versions](https://kubernetes.io/releases/patch-releases/)) | | Helm | v3.x | | MySQL | v8.0.x is required, v8.0.32 or newer; v8.0.44 or newer is recommended.
Aurora MySQL 3.x releases, must be v3.05.2 or newer | | Redis | v7.x | ### Networking For a networked deployment, egress to these endpoints is required during *both* installation and runtime: * [https://deploy.wandb.ai](https://deploy.wandb.ai) * [https://charts.wandb.ai](https://charts.wandb.ai) * [https://quay.io](https://quay.io) (used for Prometheus images) Additional container registries may be required depending on your deployment configuration: * `https://gcr.io` is needed when deploying Bufstream and etcd for Weave online evaluations. To learn about air-gapped deployments, refer to [Kubernetes operator for air-gapped instances](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped). Access to W\&B and to the object storage is required for the training infrastructure and for each system that tracks the needs of experiments. ### DNS The fully qualified domain name (FQDN) of the W\&B deployment must resolve to the IP address of the ingress or load balancer using an `A` record. ### Load balancer and ingress The W\&B Kubernetes Operator can expose services using a Kubernetes ingress controller, which routes to service endpoints based on URL paths with different ports. The ingress controller must be accessible by all machines that execute machine learning payloads or access the service through web browsers. #### Ingress controller requirements Your Kubernetes cluster must have an `IngressClass` available. Common ingress controller options include: * [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/). * [Istio](https://istio.io). * [Traefik](https://traefik.io/). * Cloud provider ingress controllers (AWS ALB, GCP Ingress, and Azure Application Gateway). #### W\&B service routing The W\&B Operator routes requests automatically to multiple backend services based on path: | Path | Service | Default port | Purpose | | ----------- | ------------------- | ------------ | ---------------------------------- | | `/` | `wandb-app` | 8080 | Main web application UI | | `/api` | `wandb-api` | 8081 | API service | | `/graphql` | `wandb-api` | 8081 | GraphQL API endpoint | | `/graphql2` | `wandb-api` | 8081 | GraphQL API v2 endpoint | | `/console` | `wandb-console` | 8082 | System Console | | `/traces` | `wandb-weave-trace` | 8722 | Weave tracing service (if enabled) | #### Example ingress configuration The following shows an example ingress resource created by the W\&B Operator: ```yaml theme={null} apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: wandb namespace: wandb annotations: nginx.ingress.kubernetes.io/proxy-body-size: "0" spec: ingressClassName: nginx rules: - host: wandb.example.com http: paths: - path: / pathType: Prefix backend: service: name: wandb-app port: number: 8080 - path: /api pathType: Prefix backend: service: name: wandb-api port: number: 8081 - path: /graphql pathType: Prefix backend: service: name: wandb-api port: number: 8081 - path: /graphql2 pathType: Prefix backend: service: name: wandb-api port: number: 8081 - path: /console pathType: Prefix backend: service: name: wandb-console port: number: 8082 tls: - hosts: - wandb.example.com secretName: wandb-tls ``` The W\&B Operator creates and manages the ingress configuration automatically. You typically don't need to create ingress resources manually. Make sure your cluster has a functioning ingress controller and the appropriate `IngressClass` configured. ### SSL/TLS W\&B requires a valid signed SSL/TLS certificate for secure communication between clients and the server. SSL/TLS termination must occur on the ingress/load balancer. The W\&B Server application does not terminate SSL or TLS connections. **Important**: W\&B does not support self-signed certificates and custom CAs. Using self-signed certificates will cause challenges for users and is not supported. If possible, using a service like [Let's Encrypt](https://letsencrypt.org) is a great way to provide trusted certificates to your load balancer. Services like Caddy and Cloudflare manage SSL for you. If your security policies require SSL communication within your trusted networks, consider using a tool like Istio and [side car containers](https://istio.io/latest/docs/reference/config/networking/sidecar/). ### Supported CPU architectures W\&B runs on Intel and AMD 64-bit architecture. ARM isn't supported. ## Deployment method After your infrastructure meets the preceding requirements, choose how to install W\&B and provision the underlying resources. The following sections describe the recommended deployment method and the recommended approach for infrastructure provisioning. ### W\&B Kubernetes Operator with Helm The recommended installation method for W\&B Self-Managed uses the **W\&B Kubernetes Operator**, deployed through Helm. This approach provides: * Automated updates and management of W\&B components. * Simplified configuration and deployment. * Support for all deployment scenarios (cloud, on-premises, and air-gapped). For detailed installation instructions, see: * [Deploy W\&B Platform On-premises](/platform/hosting/self-managed/operator) - Primary installation guide. * [Kubernetes operator for air-gapped instances](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped) - For disconnected environments. ### Infrastructure provisioning Terraform is the recommended way to provision infrastructure for W\&B production deployments. With Terraform, you define the required resources, their references to other resources, and their dependencies. W\&B provides Terraform modules for the major cloud providers. For details, refer to [Deploy W\&B Server within Self-Managed cloud accounts](/platform/hosting/hosting-options/self-managed#deploy-wb-server-within-Self-Managed-cloud-accounts). ## Sizing Use the following guidelines as a starting point when planning a deployment. W\&B recommends that you monitor all components of a deployment closely and that you make adjustments based on observed usage patterns. Continue to monitor production deployments over time and make adjustments as needed to maintain performance. When you plan capacity, you size two core components: a Kubernetes cluster for the W\&B Operator workload and a MySQL database for metadata. Recommendations vary by **environment** (Test/Dev or Production) and, for Kubernetes only, by **product mix** (Models only, Weave only, or Models and Weave). W\&B recommends starting with a minimum of 3 worker nodes for both Test/Dev and Production, and enabling cluster autoscaling in Production. The following sections give per-node sizing recommendations for the Kubernetes cluster and the MySQL database. ### Kubernetes sizing | Environment | CPU | Memory | Disk | | ----------- | ------- | ------ | ------ | | Test/Dev | 2 cores | 16 GB | 100 GB | | Production | 8 cores | 64 GB | 100 GB | Numbers are per Kubernetes worker node. | Environment | CPU | Memory | Disk | | ----------- | -------- | ------ | ------ | | Test/Dev | 4 cores | 32 GB | 100 GB | | Production | 12 cores | 96 GB | 100 GB | Numbers are per Kubernetes worker node. | Environment | CPU | Memory | Disk | | ----------- | -------- | ------ | ------ | | Test/Dev | 4 cores | 32 GB | 100 GB | | Production | 16 cores | 128 GB | 100 GB | Numbers are per Kubernetes worker node. ### MySQL sizing These recommendations don't vary by product mix. For topology and availability guidance, see [MySQL topology](#mysql-topology) under [MySQL](#mysql). | Environment | CPU | Memory | Disk | | ----------- | ------- | ------ | ------ | | Test/Dev | 2 cores | 16 GB | 100 GB | | Production | 8 cores | 64 GB | 500 GB | Numbers are per MySQL node. ## Cloud provider instance recommendations After you determine the per-node CPU, memory, and disk requirements from the preceding sizing tables, use the following recommendations to pick specific cloud provider instance types and managed services that meet those requirements. These recommendations apply to each node of a Self-Managed deployment of W\&B in cloud infrastructure. **Recommended managed services** * **Kubernetes**: Amazon EKS * **MySQL**: Amazon RDS Aurora * **Object storage**: Amazon S3 | Environment | K8s (Models only) | K8s (Weave only) | K8s (Models\&Weave) | MySQL | | ----------- | ----------------- | ---------------- | ------------------- | -------------- | | Test/Dev | r6i.large | r6i.xlarge | r6i.xlarge | db.r6g.large | | Production | r6i.2xlarge | r6i.4xlarge | r6i.4xlarge | db.r6g.2xlarge | **Recommended managed services** * **Kubernetes**: Google Kubernetes Engine (GKE) * **MySQL**: Google Cloud SQL for MySQL * **Object storage**: Google Cloud Storage (GCS) | Environment | K8s (Models only) | K8s (Weave only) | K8s (Models\&Weave) | MySQL | | ----------- | ----------------- | ---------------- | ------------------- | --------------- | | Test/Dev | n2-highmem-2 | n2-highmem-4 | n2-highmem-4 | db-n1-highmem-2 | | Production | n2-highmem-8 | n2-highmem-16 | n2-highmem-16 | db-n1-highmem-8 | **Recommended managed services** * **Kubernetes**: Azure Kubernetes Service (AKS) * **MySQL**: Azure Database for MySQL * **Object storage**: Azure Blob Storage | Environment | K8s (Models only) | K8s (Weave only) | K8s (Models\&Weave) | MySQL | | ----------- | ----------------- | ----------------- | ------------------- | ---------------------- | | Test/Dev | Standard\_E2\_v5 | Standard\_E4\_v5 | Standard\_E4\_v5 | MO\_Standard\_E2ds\_v4 | | Production | Standard\_E8\_v5 | Standard\_E16\_v5 | Standard\_E16\_v5 | MO\_Standard\_E8ds\_v4 | # Self-Managed infrastructure requirements Source: https://docs.wandb.ai/platform/hosting/self-managed/requirements Infrastructure and software requirements for W&B Self-Managed deployments This page describes the infrastructure and software requirements for deploying W\&B Self-Managed. It's intended for platform and infrastructure engineers planning a Self-Managed installation. Review these requirements before you begin your deployment to confirm that your environment can support W\&B Server. W\&B recommends fully managed deployment options such as [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud) or [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) deployment types. W\&B fully managed services are straightforward and secure to use, and require minimal to no configuration. For complete architectural guidance, see the [reference architecture](/platform/hosting/self-managed/ref-arch/). ## Software version requirements | Software | Minimum version | | ---------- | ------------------------------------------------------------------------------------------------------------------------------- | | Kubernetes | v1.34 or newer ([Supported Kubernetes versions](https://kubernetes.io/releases/patch-releases/)) | | Helm | v3.x | | MySQL | v8.0.x is required, v8.0.32 or newer; v8.0.44 or newer is recommended.
Aurora MySQL 3.x releases, must be v3.05.2 or newer | | Redis | v7.x | ## Hardware requirements **CPU Architecture**: W\&B runs on Intel (x86) CPU architecture only. ARM is not supported. **Sizing**: For CPU, memory, and disk sizing recommendations for Kubernetes nodes and MySQL, see the [Sizing section](/platform/hosting/self-managed/ref-arch/#sizing) in the reference architecture. Requirements vary based on whether you're running Models, Weave, or both. For detailed sizing recommendations based on your use case (Models only, Weave only, or both), see the [reference architecture sizing section](/platform/hosting/self-managed/ref-arch/#sizing). ## Kubernetes W\&B Server is deployed as a [Kubernetes Operator](/platform/hosting/self-managed/operator/) that manages multiple pods. Your Kubernetes cluster must meet the following requirements: * **Version**: See the preceding [Software version requirements](#software-version-requirements) section. * **Ingress controller**: A fully configured and functioning ingress controller (Nginx, Istio, Traefik, or cloud provider ingress). * **Persistent volumes**: Capability to provision persistent volumes. * **CPU architecture**: Intel or AMD 64-bit (ARM is not supported). W\&B supports deployment on [OpenShift Kubernetes clusters](https://www.redhat.com/en/technologies/cloud-computing/openshift) in cloud, on-premises, and air-gapped environments. For specific configuration instructions, see the [OpenShift section](/platform/hosting/self-managed/operator/#openshift-kubernetes-clusters) in the Operator guide. For complete Kubernetes requirements, including load balancer and ingress configuration, see the [reference architecture Kubernetes section](/platform/hosting/self-managed/ref-arch/#kubernetes). ## MySQL database W\&B requires an external MySQL database. For production, W\&B strongly recommends using managed database services: * [AWS RDS Aurora MySQL](https://aws.amazon.com/rds/aurora/) * [Google Cloud SQL for MySQL](https://cloud.google.com/sql/mysql) * [Azure Database for MySQL](https://azure.microsoft.com/en-us/products/mysql/) Managed database services provide automated backups, monitoring, high availability, patching, and reduce operational overhead. See the [reference architecture](/platform/hosting/self-managed/ref-arch/#mysql) for complete MySQL requirements, including sizing recommendations and configuration parameters. For database creation SQL, see the [bare-metal guide](/platform/hosting/self-managed/operator/#mysql-database). For questions about your deployment's database configuration, contact [support](mailto:support@wandb.com) or your AISE. W\&B recommends using managed database services such as AWS RDS Aurora MySQL, Google Cloud SQL for MySQL, or Azure Database for MySQL for production deployments. Managed services provide automated backups, monitoring, high availability, and patching, and they reduce operational complexity. ### MySQL configuration parameters If you're running your own MySQL instance, configure MySQL with the following settings for compatibility with W\&B Server: ```ini theme={null} binlog_format = 'ROW' binlog_row_image = 'MINIMAL' innodb_flush_log_at_trx_commit = 1 innodb_online_alter_log_max_size = 268435456 max_prepared_stmt_count = 1048576 sort_buffer_size = '67108864' sync_binlog = 1 ``` W\&B has validated these settings for performance and reliability. ### Database creation If you aren't using a managed MySQL service that provisions the database automatically, follow these instructions to manually create the MySQL database and user that W\&B Server uses: Create a database and a user with the following SQL commands. Replace `SOME_PASSWORD` with a secure password of your choice: ```sql theme={null} CREATE USER 'wandb_local'@'%' IDENTIFIED BY 'SOME_PASSWORD'; CREATE DATABASE wandb_local CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci; GRANT ALL ON wandb_local.* TO 'wandb_local'@'%' WITH GRANT OPTION; ``` For additional considerations, including backups, performance, monitoring, and availability, see the [reference architecture MySQL section](/platform/hosting/self-managed/ref-arch/#mysql). ## Redis W\&B Server uses Redis for caching and background job coordination. W\&B depends on a single-node Redis 7.x deployment used by W\&B's components for job queuing and data caching. For convenience during testing and development of proofs of concept, W\&B Self-Managed includes a local Redis deployment that is not appropriate for production deployments. For production deployments, W\&B can connect to a Redis instance in the following environments: * [AWS Elasticache](https://aws.amazon.com/elasticache/) * [Google Cloud Memory Store](https://cloud.google.com/memorystore?hl=en) * [Azure Cache for Redis](https://azure.microsoft.com/en-us/products/cache) * Redis deployment hosted in your cloud or on-premise infrastructure W\&B can connect to a Redis instance in any of the following environments: * [AWS Elasticache](https://aws.amazon.com/pm/elasticache/). * [Google Cloud Memory Store](https://cloud.google.com/memorystore?hl=en). * [Azure Cache for Redis](https://azure.microsoft.com/en-us/products/cache). * Redis deployment hosted in your cloud or on-premises infrastructure. ## Object storage W\&B Server requires an object storage bucket to store artifacts, media, and run data. W\&B requires object storage with pre-signed URL and CORS support. **Recommended storage providers:** * [Amazon S3](https://aws.amazon.com/s3/): Object storage service offering industry-leading scalability, data availability, security, and performance. * [Google Cloud Storage](https://cloud.google.com/storage): Managed service for storing unstructured data at scale. * [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs): Cloud-based object storage solution for storing massive amounts of unstructured data. * [CoreWeave AI Object Storage](https://docs.coreweave.com/products/storage/object-storage): High-performance, S3-compatible object storage service optimized for AI workloads. * Enterprise S3-compatible storage: [MinIO Enterprise (AIStor)](https://www.min.io/product/aistor), [NetApp StorageGRID](https://www.netapp.com/data-storage/storagegrid/), or other enterprise-grade solutions MinIO Open Source is in [maintenance mode](https://github.com/minio/minio) with no active development or pre-compiled binaries. For production deployments, W\&B recommends using managed object storage services or enterprise S3-compatible solutions such as MinIO Enterprise (AIStor). For detailed bucket provisioning instructions including IAM policies, CORS configuration, and access setup, see the [Bring Your Own Bucket (BYOB) guide](/platform/hosting/data-security/secure-storage-connector). See the [reference architecture object storage section](/platform/hosting/self-managed/ref-arch/#object-storage) for complete requirements. ### Provision your storage bucket Before configuring W\&B, provision your object storage bucket with proper IAM policies, CORS configuration, and access credentials. **See the [Bring Your Own Bucket (BYOB) guide](/platform/hosting/data-security/secure-storage-connector) for detailed step-by-step provisioning instructions for:** * Amazon S3 (including IAM policies and bucket policies) * Google Cloud Storage (including PubSub notifications) * Azure Blob Storage (including managed identities) * CoreWeave AI Object Storage * S3-compatible storage (MinIO Enterprise, NetApp StorageGRID, and other enterprise solutions) ### Configure W\&B to use your bucket After you provision your bucket, you must configure W\&B to use it through the Operator's Helm values so that W\&B Server can read from and write to the bucket. See the [Operator object storage configuration section](/platform/hosting/self-managed/operator/#object-storage-bucket) for details. ## Networking Networking configuration exposes W\&B Server to users and machine-learning workloads. The following sections describe DNS, load balancer, and ingress requirements. For a networked deployment, egress to these endpoints is required during *both* installation and runtime: * [https://deploy.wandb.ai](https://deploy.wandb.ai) * [https://charts.wandb.ai](https://charts.wandb.ai) * [https://quay.io](https://quay.io) (used for Prometheus images) Additional container registries may be required depending on your deployment configuration: * `https://gcr.io` is needed when deploying Bufstream and etcd for Weave online evaluations. To learn about air-gapped deployments, refer to [Kubernetes operator for air-gapped instances](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped). Access to W\&B and to the object storage is required for the training infrastructure and for each system that tracks the needs of experiments. ### DNS The fully qualified domain name (FQDN) of the W\&B deployment must resolve to the IP address of the ingress or load balancer using an A record. ### Load balancer and ingress The W\&B Kubernetes Operator exposes services using a Kubernetes ingress controller, which routes to service endpoints based on URL paths. The ingress controller must be accessible by all machines that execute machine-learning payloads or access the service through web browsers. For detailed load balancer options, ingress controller requirements, and configuration examples, see the [reference architecture load balancer section](/platform/hosting/self-managed/ref-arch/#load-balancer-and-ingress). ## SSL/TLS W\&B requires a valid signed SSL/TLS certificate for secure communication between clients and the server. SSL/TLS termination must occur on the ingress/load balancer. The W\&B Server application does not terminate SSL or TLS connections. **Important**: W\&B does not support self-signed certificates and custom CAs. Using self-signed certificates will cause challenges for users and is not supported. If possible, using a service like [Let's Encrypt](https://letsencrypt.org) is a great way to provide trusted certificates to your load balancer. Services like Caddy and Cloudflare manage SSL for you. If your security policies require SSL communication within your trusted networks, consider using a tool like Istio and [side car containers](https://istio.io/latest/docs/reference/config/networking/sidecar/). ## License All Self-Managed deployments require a valid W\&B Server license. Without a license, W\&B Server can't start. You need a W\&B license to deploy W\&B Self-Managed. 1. If you do not already have a W\&B account, create one. 2. If you need an enterprise trial license with support for important security and other enterprise-friendly capabilities, [submit a request](https://wandb.ai/site/for-enterprise/self-hosted-trial) or reach out to your W\&B team. 3. Otherwise, open the [Deploy Manager](https://deploy.wandb.ai/deploy) to generate a free trial license. The URL redirects you to a **Get a License for W\&B Local** form. Provide the following information: * The owner of the license * The deployment type * A name and optional description for the instance 4. Click **Generate License Key**. A page displays with an overview of your deployment along with the license associated with the instance. ## Next steps After you confirm that your infrastructure meets these requirements, proceed to the deployment guide that matches your environment: * **Cloud and on-premises deployments**: See [Deploy W\&B with Kubernetes Operator](/platform/hosting/self-managed/operator) for Helm and Terraform deployment options. * **Air-gapped deployments**: See [Deploy on Air-Gapped Kubernetes](/platform/hosting/self-managed/on-premises-deployments/kubernetes-airgapped) for disconnected environments. * **All deployment methods**: See [Deploy with Kubernetes Operator](/platform/hosting/self-managed/operator) for the core operator deployment guide. # Update W&B license and version Source: https://docs.wandb.ai/platform/hosting/server-upgrade-process Guide for updating W&B version and license across different installation methods. This guide shows W\&B Server administrators how to update the W\&B Server version and license key for an existing self-managed deployment. Keeping your server and license current ensures access to the latest features, fixes, and continued entitlement to use W\&B Server. Update your W\&B Server version and license with the same method you used to install W\&B Server. The following table lists how to update your license and version based on different deployment methods. | Release type | Description | | ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | [Terraform](#update-with-terraform) | W\&B supports three public Terraform modules for cloud deployment: [AWS](https://registry.terraform.io/modules/wandb/wandb/aws/latest), [Google Cloud](https://registry.terraform.io/modules/wandb/wandb/google/latest), and [Azure](https://registry.terraform.io/modules/wandb/wandb/azurerm/latest). | | [Helm](#update-with-helm) | Use the [Helm Chart](https://github.com/wandb/helm-charts) to install W\&B into an existing Kubernetes cluster. | ## Update with Terraform If you deployed W\&B Server with one of the W\&B-maintained Terraform modules, use Terraform to update both your license key and the W\&B version in place. The following table lists W\&B-managed Terraform modules by cloud platform. | Cloud provider | Terraform module | | -------------- | ------------------------------------------------------------------------------------------------ | | AWS | [AWS Terraform module](https://registry.terraform.io/modules/wandb/wandb/aws/latest) | | Google Cloud | [Google Cloud Terraform module](https://registry.terraform.io/modules/wandb/wandb/google/latest) | | Azure | [Azure Terraform module](https://registry.terraform.io/modules/wandb/wandb/azurerm/latest) | 1. Navigate to the W\&B-maintained Terraform module for your cloud provider. See the preceding table to find the Terraform module that matches your cloud provider. 2. In your Terraform configuration, update `wandb_version` and `license` in your Terraform `wandb_app` module configuration: ```hcl theme={null} module "wandb_app" { source = "wandb/wandb/[CLOUD-SPECIFIC-MODULE]" version = "new_version" license = "new_license_key" # Your new license key wandb_version = "new_wandb_version" # Desired W&B version ... } ``` 3. Apply the Terraform configuration with `terraform plan` and `terraform apply`. ```bash theme={null} terraform init terraform apply ``` 4. Optional: If you use a `terraform.tfvars` or other `.tfvars` file, update or create one with the new W\&B version and license key. ```bash theme={null} terraform plan -var-file="terraform.tfvars" ``` From your Terraform workspace directory, apply the configuration: ```bash theme={null} terraform apply -var-file="terraform.tfvars" ``` After Terraform applies the change, your deployment runs the specified W\&B version and uses the updated license key. ## Update with Helm The `wandb` Helm chart is deprecated and no longer supported. It deployed a single pod and has been replaced by the [W\&B Kubernetes Operator](/platform/hosting/self-managed/operator). If you're still using this chart, follow the [migration guide](/platform/hosting/self-managed/operator#migrate-self-managed-instances-to-wb-operator) to move to the operator. Two Helm-based update paths are available: update from your existing Helm values file, or set the new license and image tag directly on the upgrade command. The following sections describe each approach. ### Update W\&B with spec Use this approach when you manage your Helm configuration in a tracked `*.yaml` values file. 1. Specify a new version by modifying the `image.tag` or `license` values, or both, in your Helm chart `*.yaml` configuration file: ```yaml theme={null} license: 'new_license' image: repository: wandb/local tag: 'new_version' ``` 2. Update the Helm repository and upgrade the W\&B release using your values file: ```bash theme={null} helm repo update helm upgrade --namespace=wandb --create-namespace \ --install wandb wandb/wandb --version ${chart_version} \ -f ${wandb_install_spec.yaml} ``` ### Update license and version directly Use this approach to update the license and image tag without editing a values file, and reuse your existing Helm release configuration. 1. Set the new license key and image tag as environment variables: ```bash theme={null} export LICENSE='new_license' export TAG='new_version' ``` 2. Upgrade your Helm release, merging the new values with the existing configuration: ```bash theme={null} helm repo update helm upgrade --namespace=wandb --create-namespace \ --install wandb wandb/wandb --version ${chart_version} \ --reuse-values --set license=$LICENSE --set image.tag=$TAG ``` For more information, see the [upgrade guide](https://github.com/wandb/helm-charts/blob/main/upgrade) in the public repository. ## Update with admin UI Use the admin UI to rotate your license key from within the W\&B App, without changing your deployment configuration. This method only works for updating licenses that aren't set with an environment variable in the W\&B server container, typically in self-managed Docker installations. This method updates the license only. It doesn't change the running W\&B Server version. 1. Obtain a new license from the [W\&B Deployment Page](https://deploy.wandb.ai/), ensuring it matches the correct organization and deployment ID for the deployment you want to upgrade. 2. Access the **License** page in the W\&B App. Click **Settings** > **License** or browse to `HOST_URL/console/settings/license`, where `HOST_URL` is your W\&B Server host URL. 3. Navigate to the license management section. 4. Enter the new license key and save your changes. # Manage secrets Source: https://docs.wandb.ai/platform/secrets Overview of W&B secrets, how they work, and how to get started using them. W\&B Secret Manager lets you securely and centrally store, manage, and inject *secrets*, which are sensitive strings such as access tokens, bearer tokens, API keys, or passwords. W\&B features can read team secret values, which removes the need to paste them or store them in code, training scripts, or plain-text automation configuration. This page is for W\&B Admins who need to create, rotate, delete, or manage access to team secrets used by webhook automations, Weave Playground, sandboxes, and LLM evaluation jobs. Secrets live in each team's Secret Manager, in the **Team secrets** section of the [team settings](/platform/app/settings-page/teams/). * Only W\&B Admins can create, edit, or delete a secret. * Secrets are included as a core part of W\&B, including in [W\&B Server deployments](/platform/hosting/) that you host in Azure, Google Cloud, or AWS. Connect with your W\&B account team to discuss how you can use secrets in W\&B if you use a different deployment type. * In W\&B Server, you are responsible for configuring security measures that satisfy your security needs. * W\&B strongly recommends that you store secrets in a W\&B instance of a cloud provider's secrets manager from AWS, Google Cloud, or Azure, which include advanced security capabilities. * W\&B recommends against using a Kubernetes cluster as the backend of your secrets store. Use a cluster only if you can't use a W\&B instance of a cloud secrets manager (AWS, Google Cloud, or Azure) and you understand how to prevent the security vulnerabilities that can occur. ## Where team secrets are used You can use team secrets in W\&B in multiple contexts. After you [add a secret](#add-a-secret), a feature like W\&B Automations can access the secret by name. * **Webhook automations**: When an automation sends an HTTP request to a [webhook](/models/automations/create-automations/webhook/), you can attach team secrets for authentication headers and for values referenced in the payload. You can scope automations to a [project](/models/automations/automation-events/#project) or a [Registry](/models/automations/automation-events/#registry). Registry-scoped automations that call a webhook use the same team webhooks and team secrets as project-scoped webhook automations. * **Weave Playground**: Supply provider credentials as named team secrets. See [Add provider credentials and information](/weave/guides/tools/playground#add-provider-credentials-and-information). * **Sandboxes**: Securely provide team secrets to your sandboxes to make them available as environment variables. See [Secrets in sandboxes](/sandboxes/secrets). * **LLM evaluation jobs**: Some benchmarks need API keys or tokens stored as team secrets. See the [Evaluation benchmark catalog](/models/launch/evaluations). ## Add a secret Add a secret when you want to make a sensitive value available to W\&B features without exposing it in code or configuration. After you complete these steps, the secret is available by name to the team features described in [Where team secrets are used](#where-team-secrets-are-used). To add a secret: 1. If an external service gives you a token or API key, obtain that value through that service's normal flow. If necessary, save the sensitive string securely, such as in a password manager, before you paste it into W\&B Secret Manager. Saving a backup matters because, after creation, W\&B no longer reveals the secret's value. 2. Sign in to W\&B and go to the team's **Settings** page. 3. In the **Team Secrets** section, click **New secret**. 4. Provide a name for the secret, using letters, numbers, and underscores (`_`). 5. Paste the sensitive string into the **Secret** field. 6. Click **Add secret**. When you configure a webhook for an automation, select which team secrets the webhook may use. For field names, access tokens, and payload variables, see [Create a webhook automation](/models/automations/create-automations/webhook). After you create a secret, you can access that secret in a [webhook automation's payload](/models/automations/create-automations/webhook/) using the format `${SECRET_NAME}`. ## Rotate a secret Rotate a secret when its value changes (for example, when an upstream credential is regenerated or when you suspect the existing value has been compromised). Because W\&B doesn't reveal a secret's current value after creation, rotation is also the way to replace a value you no longer have a copy of. To rotate a secret and update its value: 1. Click the pencil icon in the secret's row to open the secret's details. 2. Set **Secret** to the new value. Optionally, click **Reveal secret** to verify the new value. 3. Click **Add secret**. The secret's value updates and no longer resolves to the previous value. After a secret is created or updated, you can no longer reveal its current value. Instead, rotate the secret to a new value. Rotating or replacing a secret can affect every feature that still expects the old value. Before you rely on the new value everywhere, update [webhook automations](/models/automations/create-automations/webhook), [sandboxes](/sandboxes/secrets) that inject the secret, [evaluation jobs](/models/launch/evaluations), [Weave Playground](/weave/guides/tools/playground#add-provider-credentials-and-information), or other consumers. ## Delete a secret Delete a secret when no team feature uses it. Because deletion is immediate and permanent, confirm that no automations, sandboxes, or other consumers still reference the secret before you proceed (see [Manage access to secrets](#manage-access-to-secrets)). To delete a secret: 1. Click the trash icon in the secret's row. 2. Read the confirmation dialog, then click **Delete**. The secret is deleted immediately and permanently. ## Manage access to secrets You can reference a team's secrets by name in [webhook automations](/models/automations/create-automations/webhook), [Weave Playground](/weave/guides/tools/playground#add-provider-credentials-and-information), [sandboxes](/sandboxes/secrets), [LLM evaluation jobs](/models/launch/evaluations), and other team-scoped features that select secrets by name. Before you remove a secret, update or remove every automation, job, sandbox configuration, or Playground flow that uses it so they don't stop working. # W&B Pricing Source: https://docs.wandb.ai/pricing # Serverless Inference Source: https://docs.wandb.ai/product-inference # W&B Models Source: https://docs.wandb.ai/product-models # Serverless Sandboxes Source: https://docs.wandb.ai/product-sandboxes # Serverless Training Source: https://docs.wandb.ai/product-serverless-training # W&B Weave Source: https://docs.wandb.ai/product-weave # W&B Platform Security Source: https://docs.wandb.ai/security # Console logs Source: https://docs.wandb.ai/models/app/console-logs View and debug console log messages including info, warnings, and errors from your W&B experiment runs. When you run an experiment, you might notice messages printed to your console. W\&B captures console logs and displays them in the W\&B App. Use these messages to debug and monitor the behavior of your experiment. The following sections describe how to view, configure, search, filter, download, and copy console logs for your runs. ## View console logs Access console logs for a run in the W\&B App to inspect messages produced during the run. 1. Navigate to your project in the W\&B App. 2. Select a run within the **Runs** table. 3. Click the **Logs** tab in the project sidebar. W\&B stores a maximum of 100,000 lines of your logs for a run. In the W\&B App, a maximum of 10,000 lines of your logs display at once. To view all stored lines, scroll through the logs to display older lines. ## Types of console logs W\&B captures three types of console logs and adds a prefix to indicate each log's severity. The prefix helps you scan logs and identify the messages most relevant to debugging. The following table summarizes each type, ordered from most to least severe. | Severity | Prefix | Description | Example | | -------- | --------- | ----------------------------------------------------------------------- | --------------------------------------------------------------- | | Error | `ERROR` | Serious issues that might prevent the run from completing successfully. | `ERROR Failed to save notebook.` | | Warning | `WARNING` | Potential issues that don't stop execution. | `WARNING Found .wandb file, not streaming tensorboard metrics.` | | Info | `wandb:` | Updates about the run's progress and status. | `wandb: Starting Run: abc123` | ## Console log settings To control which types of console output W\&B captures and displays, pass a `wandb.Settings` object to `wandb.init()` when you initialize a run. The relevant parameters are `show_errors`, `show_warnings`, `show_info`, and `silent`. For details on each parameter and its default value, see the [`wandb.Settings` reference](/models/ref/python/experiments/settings). The following example shows how to configure these settings: ```python theme={null} import wandb settings = wandb.Settings( show_errors=True, # Show error messages in the W&B App silent=False, # Disable all W&B console output show_warnings=True # Show warning messages in the W&B App ) with wandb.init(settings=settings) as run: # Your training code here run.log({"accuracy": 0.95}) ``` ## Custom logging If you already have your own logging setup, you can continue to use it alongside W\&B. W\&B captures console logs from your application, but it doesn't interfere with your own logging setup. You can use Python's built-in `print()` function or the `logging` module to log messages. ```python theme={null} import wandb with wandb.init(project="my-project") as run: for i in range(100, 1000, 100): # Logs to W&B and prints to console run.log({"epoch": i, "loss": 0.1 * i}) print(f"epoch: {i} loss: {0.1 * i}") ``` The console logs look similar to the following: ```text theme={null} 1 epoch: 100 loss: 1.3191105127334595 2 epoch: 200 loss: 0.8664389848709106 3 epoch: 300 loss: 0.6157898902893066 4 epoch: 400 loss: 0.4961796700954437 5 epoch: 500 loss: 0.42592573165893555 6 epoch: 600 loss: 0.3771176040172577 7 epoch: 700 loss: 0.3393910825252533 8 epoch: 800 loss: 0.3082585036754608 9 epoch: 900 loss: 0.28154927492141724 ``` ## Timestamps W\&B automatically adds timestamps to each console log entry. This lets you track when each log message was generated. To show or hide timestamps in the console logs, select the **Timestamp visible** drop-down list on the console logs page. ## Search console logs To quickly locate relevant entries, use the search bar on the console logs page to filter logs by keywords. You can search for specific terms, labels, or error messages. ## Filter with custom labels Parameters prefixed by `x_` (such as `x_label`) are in public preview. Create a [GitHub issue in the W\&B repository](https://github.com/wandb/wandb) to provide feedback. You can filter console logs based on the labels you pass as arguments for `x_label` in `wandb.Settings`. Enter the label in the search bar on the console logs page. ```python theme={null} import wandb # Initialize a run in the primary node with wandb.init( entity="[ENTITY-NAME]", project="[PROJECT-NAME]", settings=wandb.Settings( x_label="[CUSTOM-LABEL]" # (Optional) Custom label for filtering logs ) ) as run: # Your code here ``` ## Download console logs To save logs locally for offline analysis or sharing, download console logs for a run in the W\&B App: 1. Navigate to your project in the W\&B App. 2. Select a run within the **Runs** table. 3. Click the **Logs** tab in the project sidebar. 4. Click the download button on the console logs page. ## Copy console logs To paste logs into another tool or message, copy console logs for a run in the W\&B App: 1. Navigate to your project in the W\&B App. 2. Select a run within the **Runs** table. 3. Click the **Logs** tab in the project sidebar. 4. Click the copy button on the console logs page. # Manage workspace, section, and panel settings Source: https://docs.wandb.ai/models/app/features/cascade-settings Manage workspace, section, and panel settings in the W&B App, including layout and line plot configuration options. A workspace page has three different setting levels: workspaces, sections, and panels. [Workspace settings](#workspace-settings) apply to the entire workspace. [Section settings](#section-settings) apply to all panels within a section. [Panel settings](#panel-settings) apply to individual panels. This page describes how to configure each level of settings so that you can control your workspace's layout, organize panels, and customize how data is displayed. The following sections describe each setting level in turn. ## Workspace settings Workspace settings apply to all sections and all panels within those sections. You can edit two types of workspace settings: [Workspace layout](#workspace-layout-options) and [Line plots](#line-plots-options). **Workspace layouts** determine the structure of the workspace, while **Line plots** settings control the default settings for line plots in the workspace. To edit settings that apply to the overall structure of this workspace: 1. Navigate to your project workspace. 2. Click the gear icon next to the **New report** button to view the workspace settings. 3. Choose **Workspace layout** to change the workspace's layout, or choose **Line plots** to configure default settings for line plots in the workspace. Workspace settings gear icon The workspace settings panel opens, where you can apply the changes described in the following sections. After customizing your workspace, you can use *workspace templates* to create new workspaces with the same settings. Refer to [Workspace templates](/models/track/workspaces/#workspace-templates). ### Workspace layout options Configure a workspace's layout to define the overall structure of the workspace. This includes sectioning logic and panel organization. Workspace layout options The workspace layout options page shows whether the workspace generates panels automatically or manually. To adjust a workspace's panel generation mode, refer to [Panels](/models/app/features/panels/). This table describes each workspace layout option. | Workspace setting | Description | | ------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Hide empty sections during search** | Hide sections that don't contain any panels when searching for a panel. | | **Sort panels alphabetically** | Sort panels in your workspaces alphabetically. | | **Section organization** | Remove all existing sections and panels and repopulate them with new section names. Groups the newly populated sections either by first or last prefix. | W\&B suggests that you organize sections by grouping the first prefix rather than grouping by the last prefix. Grouping by the first prefix can result in fewer sections and better performance. ### Line plots options Set global defaults and custom rules for line plots in a workspace by modifying the **Line plots** workspace settings. Line plot settings You can edit two main settings within **Line plots** settings: **Data** and **Display preferences**. The **Data** tab contains the following settings: | Line plot setting | Description | | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | **X axis** | The scale of the x-axis in line plots. The x-axis is set to **Step** by default. See the following table for the list of x-axis options. | | **Range** | Minimum and maximum settings to display for the x-axis. | | **Smoothing** | Change the smoothing on the line plot. For more information about smoothing, see [Smooth line plots](/models/app/features/panels/line-plot/smoothing/). | | **Outliers** | Rescale to exclude outliers from the default plot min and max scale. | | **Point aggregation method** | Improve data visualization accuracy and performance. For more information, see [Point aggregation](/models/app/features/panels/line-plot/sampling/). | | **Max number of runs or groups** | Limit the number of runs or groups displayed on the line plot. | Besides **Step**, the x-axis supports the following options: | X axis option | Description | | --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Relative Time (Wall)** | Timestamp since the process starts. For example, suppose you start a run and resume that run the next day. If you then log something, the recorded point is 24 hours. | | **Relative Time (Process)** | Timestamp inside the running process. For example, suppose you start a run and let it continue for 10 seconds. The next day you resume that run. The point is recorded as 10 seconds. | | **Wall Time** | Minutes elapsed since the start of the first run on the graph. | | **Step** | Increments each time you call `wandb.Run.log()`. | For information about how to edit an individual line plot, see [Edit line panel settings](/models/app/features/panels/line-plot/#edit-line-panel-settings) in Line plots. Within the **Display preferences** tab, you can toggle the following settings: | Display preference | Description | | -------------------------------------------------------- | ------------------------------------------------------ | | **Remove legends from all panels** | Remove the panel's legend. | | **Display colored run names in tooltips** | Show the runs as colored text within the tooltip. | | **Only show highlighted run in companion chart tooltip** | Display only highlighted runs in chart tooltip. | | **Number of runs shown in tooltips** | Display the number of runs in the tooltip. | | **Display full run names on the primary chart tooltip** | Display the full name of the run in the chart tooltip. | ## Section settings Section settings apply to all panels within that section. Within a workspace section you can sort panels, rearrange panels, and rename the section name. To modify section settings, select the **action ()** menu in the upper right corner of a section. Section settings menu From the dropdown, you can edit the following settings that apply to the entire section: | Section setting | Description | | -------------------- | ----------------------------------------------------------------------- | | **Rename a section** | Rename the name of the section. | | **Sort panels A-Z** | Sort panels within a section alphabetically. | | **Rearrange panels** | Select and drag a panel within a section to manually order your panels. | The following animation demonstrates how to rearrange panels within a section: Rearranging panels Besides the settings described in the preceding table, you can also edit how sections appear in your workspaces such as **Add section below**, **Add section above**, **Delete section**, and **Add section to report**. ## Panel settings Customize an individual panel's settings to compare multiple lines on the same plot, calculate custom axes, rename labels, and more. To edit a panel's settings: 1. Hover over the panel you want to edit. 2. Select the pencil icon that appears. Panel edit icon 3. Within the modal that appears, you can edit settings related to the panel's data, display preferences, and more. Panel settings modal Your changes apply to that panel only and take effect when you save them. For a complete list of settings you can apply to a panel, see [Edit line panel settings](/models/app/features/panels/line-plot/#edit-line-panel-settings). # Custom charts overview Source: https://docs.wandb.ai/models/app/features/custom-charts Create custom charts in W&B projects with Vega visualizations Custom charts let you visualize logged data exactly how you want, beyond the defaults W\&B provides. Use them when you need to plot relationships, distributions, or model evaluation metrics that the built-in panels don't cover, such as precision-recall curves, custom histograms, or overlays of multiple experiments. Create custom charts in your W\&B project. Log arbitrary tables of data and visualize them exactly how you want. Control details of fonts, colors, and tooltips with [Vega](https://vega.github.io/vega/). * Code: Try an example [Colab notebook](https://tiny.cc/custom-charts). * Video: Watch a [walkthrough video](https://www.youtube.com/watch?v=3-N9OV6bkSM). * Example: Quick Keras and Sklearn [demo notebook](https://colab.research.google.com/drive/1g-gNGokPWM2Qbc8p1Gofud0_5AoZdoSD?usp=sharing). Supported charts from vega.github.io/vega ## How it works 1. **Log data**: From your script, log [config](/models/track/config/) and summary data. 2. **Customize the chart**: Pull in logged data with a [GraphQL](https://graphql.org) query. Visualize the results of your query with [Vega](https://vega.github.io/vega/), a visualization grammar. 3. **Log the chart**: Call your own preset from your script with `wandb.plot_table()`. PR and ROC curves If you don't see the expected data, the column you're looking for might not be logged in the selected runs. Save your chart, go back out to the runs table, and verify selected runs using the **eye** icon. ## Log charts from a script The following sections describe two ways to log charts directly from your training script: built-in chart presets that cover common visualizations, and custom presets that let you reuse your own Vega specifications. ### Built-in presets W\&B has several built-in chart presets that you can log directly from your script. These include line plots, scatter plots, bar charts, histograms, PR curves, and ROC curves. `wandb.plot.line()` Log a custom line plot, a list of connected and ordered points (x,y) on arbitrary axes x and y. ```python theme={null} with wandb.init() as run: data = [[x, y] for (x, y) in zip(x_values, y_values)] table = wandb.Table(data=data, columns=["x", "y"]) run.log( { "my_custom_plot_id": wandb.plot.line( table, "x", "y", title="Custom Y vs X Line Plot" ) } ) ``` A line plot logs curves on any two dimensions. If you plot two lists of values against each other, the number of values in the lists must match exactly (for example, each point must have an x and a y). Custom line plot [See an example report](https://wandb.ai/wandb/plots/reports/Custom-Line-Plots--VmlldzoyNjk5NTA) or [try an example Google Colab notebook](https://tiny.cc/custom-charts). `wandb.plot.scatter()` Log a custom scatter plot, a list of points (x, y) on a pair of arbitrary axes x and y. ```python theme={null} with wandb.init() as run: data = [[x, y] for (x, y) in zip(class_x_prediction_scores, class_y_prediction_scores)] table = wandb.Table(data=data, columns=["class_x", "class_y"]) run.log({"my_custom_id": wandb.plot.scatter(table, "class_x", "class_y")}) ``` You can use this to log scatter points on any two dimensions. Note that if you're plotting two lists of values against each other, the number of values in the lists must match exactly (for example, each point must have an x and a y). Scatter plot [See an example report](https://wandb.ai/wandb/plots/reports/Custom-Scatter-Plots--VmlldzoyNjk5NDQ) or [try an example Google Colab notebook](https://tiny.cc/custom-charts). `wandb.plot.bar()` Log a custom bar chart (a list of labeled values as bars): ```python theme={null} with wandb.init() as run: data = [[label, val] for (label, val) in zip(labels, values)] table = wandb.Table(data=data, columns=["label", "value"]) run.log( { "my_bar_chart_id": wandb.plot.bar( table, "label", "value", title="Custom Bar Chart" ) } ) ``` You can use this to log arbitrary bar charts. Note that the number of labels and values in the lists must match exactly (for example, each data point must have both). Demo bar plot [See an example report](https://wandb.ai/wandb/plots/reports/Custom-Bar-Charts--VmlldzoyNzExNzk) or [try an example Google Colab notebook](https://tiny.cc/custom-charts). `wandb.plot.histogram()` Log a custom histogram (sort a list of values into bins by count or frequency of occurrence). For example, suppose you have a list of prediction confidence scores (`scores`) and want to visualize their distribution: ```python theme={null} with wandb.init() as run: data = [[s] for s in scores] table = wandb.Table(data=data, columns=["scores"]) run.log({"my_histogram": wandb.plot.histogram(table, "scores", title=None)}) ``` You can use this to log arbitrary histograms. Note that `data` is a list of lists, intended to support a 2D array of rows and columns. Custom histogram [See an example report](https://wandb.ai/wandb/plots/reports/Custom-Histograms--VmlldzoyNzE0NzM) or [try an example Google Colab notebook](https://tiny.cc/custom-charts). `wandb.plot.pr_curve()` Create a [Precision-Recall curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve) in one line: ```python theme={null} with wandb.init() as run: plot = wandb.plot.pr_curve(ground_truth, predictions, labels=None, classes_to_plot=None) run.log({"pr": plot}) ``` You can log this whenever your code has access to: * A model's predicted scores (`predictions`) on a set of examples. * The corresponding ground truth labels (`ground_truth`) for those examples. * `labels` (Optional): A list of the labels or class names (`labels=["cat", "dog", "bird"...]` if label index 0 means cat, 1 = dog, 2 = bird, and so on). * `classes_to_plot` (Optional): A subset (still in list format) of the labels to visualize in the plot. Precision-recall curves [See an example report](https://wandb.ai/wandb/plots/reports/Plot-Precision-Recall-Curves--VmlldzoyNjk1ODY) or [try an example Google Colab notebook](https://colab.research.google.com/drive/1mS8ogA3LcZWOXchfJoMrboW3opY1A8BY?usp=sharing). `wandb.plot.roc_curve()` Create an [ROC curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) in one line: ```python theme={null} with wandb.init() as run: # ground_truth is a list of true labels, predictions is a list of predicted scores ground_truth = [0, 1, 0, 1, 0, 1] predictions = [0.1, 0.4, 0.35, 0.8, 0.7, 0.9] # Create the ROC curve plot # labels is an optional list of class names, classes_to_plot is an optional subset of those labels to visualize plot = wandb.plot.roc_curve( ground_truth, predictions, labels=None, classes_to_plot=None ) run.log({"roc": plot}) ``` You can log this whenever your code has access to: * A model's predicted scores (`predictions`) on a set of examples. * The corresponding ground truth labels (`ground_truth`) for those examples. * `labels` (Optional): A list of the labels or class names (`labels=["cat", "dog", "bird"...]` if label index 0 means cat, 1 = dog, 2 = bird, and so on). * `classes_to_plot` (Optional): A subset (still in list format) of these labels to visualize on the plot. ROC curve [See an example report](https://wandb.ai/wandb/plots/reports/Plot-ROC-Curves--VmlldzoyNjk3MDE) or [try an example Google Colab notebook](https://colab.research.google.com/drive/1_RMppCqsA8XInV_jhJz32NCZG6Z5t1RO?usp=sharing). ### Custom presets Tweak a built-in preset, or create a new preset, then save the chart. Use the chart ID to log data to that custom preset directly from your script. [Try an example Google Colab notebook](https://tiny.cc/custom-charts). ```python theme={null} # Create a table with the columns to plot table = wandb.Table(data=data, columns=["step", "height"]) # Map from the table's columns to the chart's fields fields = {"x": "step", "value": "height"} # Use the table to populate the new custom chart preset # To use your own saved chart preset, change the vega_spec_name my_custom_chart = wandb.plot_table( vega_spec_name="carey/new_chart", data_table=table, fields=fields, ) ``` Custom chart presets ## Log data Before you can visualize anything in a custom chart, your script must log the underlying data in a format the chart editor can query. You can log the following data types from your script and use them in a custom chart: * **Config**: Initial settings of your experiment (your independent variables). This includes any named fields you've logged as keys to `wandb.Run.config` at the start of your training. For example, `wandb.Run.config.learning_rate = 0.0001`. * **Summary**: Single values logged during training (your results or dependent variables). For example, `wandb.Run.log({"val_acc" : 0.8})`. If you write to this key multiple times during training through `wandb.Run.log()`, the summary is set to the final value of that key. * **History**: The full time series of the logged scalar is available to the query through the `history` field. * **summaryTable**: If you need to log a list of multiple values, use a `wandb.Table()` to save that data, then query it in your custom panel. * **historyTable**: If you need to see the history data, then query `historyTable` in your custom chart panel. Each time you call `wandb.Table()` or log a custom chart, you're creating a new table in history for that step. ### Log a custom table Use `wandb.Table()` to log your data as a 2D array. Typically, each row of this table represents one data point, and each column denotes the relevant fields or dimensions for each data point that you want to plot. As you configure a custom panel, the whole table is accessible through the named key passed to `wandb.Run.log()` (`custom_data_table` in the following example). The individual fields are accessible through the column names (`x`, `y`, and `z`). You can log tables at multiple time steps throughout your experiment. The maximum size of each table is 10,000 rows. [Try an example Google Colab notebook](https://tiny.cc/custom-charts). ```python theme={null} with wandb.init() as run: # Logging a custom table of data my_custom_data = [[x1, y1, z1], [x2, y2, z2]] run.log( {"custom_data_table": wandb.Table(data=my_custom_data, columns=["x", "y", "z"])} ) ``` ## Customize the chart After you log data, build a chart in the W\&B app by choosing which logged values to pull in and how to render them. Add a new custom chart to get started, then edit the query to select data from your visible runs. The query uses [GraphQL](https://graphql.org) to fetch data from the config, summary, and history fields in your runs. ### Build the GraphQL query The custom chart editor runs a GraphQL query over the runs you selected in the project workspace or report. In the query editor, add the fields you need. You can pick from `config`, `summary`, `history`, `summaryTable`, and `historyTable` so you don't need to write the query from scratch for most cases. Each source in the query maps to different logged data: * **Config** pulls [run configuration](/models/track/config/) values (hyperparameters and other settings). * **Summary** pulls [summary](/models/track/log/) values. By default, the summary for a key logged with `wandb.Run.log()` holds the last value written for that key. To use a different aggregate, call `wandb.Run.define_metric(..., summary=...)` with `"min"`, `"max"`, `"mean"`, `"best"`, or `"none"`. To set a value directly, assign `wandb.Run.summary["key"] = value`. * **History** pulls scalar time series from run history (for example, `loss` or `accuracy` at each step). Use **history** when you need the full curve, not only the final number. * **`summaryTable`** loads a [`wandb.Table`](/models/ref/python/data-types/table) from the run summary. Use it when the table you care about is stored as a single snapshot on the run (for example, one confusion matrix you log once at the end). * **`historyTable`** loads a [`wandb.Table`](/models/ref/python/data-types/table) from run history. Each time you log a table with `wandb.Run.log()`, you add another step to run history that includes that table. Use **`historyTable`** when the table changes over time or when you want to enable the step selector in the custom chart editor (see [How do you show a step slider in a custom chart?](/support/models/articles/how-do-you-show-a-step-slider-in-a-custo)). For **`summaryTable`** and **`historyTable`**, set **`tableKey`** to the dictionary key you used inside `wandb.Run.log()`, not to a column name inside the [`wandb.Table`](/models/ref/python/data-types/table). The following examples cover common cases: * Plot columns from a table you log at each step (for example, a PR curve): add **`historyTable`**, set **`tableKey`** to your logged key (for example, `pr_curve`), then map table columns in **Chart fields**. See the [custom charts tutorial](/models/app/features/custom-charts/walkthrough). * Plot columns from a table that lives in a summary (for example, class scores for a composite histogram): add **`summaryTable`**, set **`tableKey`** to that key (the tutorial uses `class_scores`). See [Bonus: composite histograms](/models/app/features/custom-charts/walkthrough#bonus-composite-histograms). * Plot a scalar metric over training steps: add the metric from **history**. If you only add it from **summary**, the chart shows a single value per run. **Chart field names** After the query runs, **Chart fields** lists columns you can bind into the Vega spec. Names often begin with `runSets_` and reflect the selected query fields. Choose them from the dropdowns next to each `${field:...}` placeholder instead of typing them manually. If a column never appears, confirm the key exists on the selected runs, open the run page to verify how the data was logged, and check whether **`summaryTable`** or **`historyTable`** matches that logging pattern. Custom charts use this GraphQL-based panel query. [Query panels](/models/app/features/panels/query-panels) use a different expression language and are documented separately. Custom chart creation ### Custom visualizations Select a **Chart** in the upper right corner to start with a default preset. Next, select **Chart fields** to map the data you're pulling in from the query to the corresponding fields in your chart. The following image shows how to select a metric and then map it into the bar chart fields. Creating a custom bar chart ### Edit Vega Click **Edit** at the top of the panel to go into [Vega](https://vega.github.io/vega/) edit mode. Here you can define a [Vega specification](https://vega.github.io/vega/docs/specification/) that creates an interactive chart in the UI. You can change any aspect of the chart. For example, you can change the title, pick a different color scheme, or show curves as a series of points instead of as connected lines. You can also make changes to the data itself, such as using a Vega transform to bin an array of values into a histogram. The panel preview updates interactively, so you can see the effect of your changes as you edit the Vega spec or query. For more information, see the [Vega documentation and tutorials](https://vega.github.io/vega/). **Field references** To pull data into your chart from W\&B, add template strings of the form `"${field:}"` anywhere in your Vega spec. This creates a dropdown in the **Chart fields** area on the right side, which you can use to select a query result column to map into Vega. To set a default value for a field, use this syntax: `"${field::}"` ### Save chart presets Save a preset to reuse the same Vega definition across panels and projects instead of recreating it each time. Apply changes to a specific visualization panel with the button at the bottom of the modal. Alternatively, you can save the Vega spec to reuse it elsewhere in your project. To save the reusable chart definition, click **Save as** at the top of the Vega editor and give your preset a name. ## Reports and guides The following reports show end-to-end examples and deeper explorations of custom charts in practice. * [The W\&B Machine Learning Visualization IDE](https://wandb.ai/wandb/posts/reports/The-W-B-Machine-Learning-Visualization-IDE--VmlldzoyNjk3Nzg) * [Visualizing NLP Attention Based Models](https://wandb.ai/kylegoyette/gradientsandtranslation2/reports/Visualizing-NLP-Attention-Based-Models-Using-Custom-Charts--VmlldzoyNjg2MjM) * [Visualizing The Effect of Attention on Gradient Flow](https://wandb.ai/kylegoyette/gradientsandtranslation/reports/Visualizing-The-Effect-of-Attention-on-Gradient-Flow-Using-Custom-Charts--VmlldzoyNjg1NDg) * [Logging arbitrary curves](https://wandb.ai/stacey/presets/reports/Logging-Arbitrary-Curves--VmlldzoyNzQyMzA) ## Common use cases Custom charts are useful when the default panels can't represent what you need to show. For example: * Customizing bar plots with error bars * Showing model validation metrics that require custom x-y coordinates (like precision-recall curves) * Overlaying data distributions from two different models or experiments as histograms * Showing changes in a metric through snapshots at multiple points during training * Creating a unique visualization not yet available in W\&B and sharing it with others # Tutorial: Use custom charts Source: https://docs.wandb.ai/models/app/features/custom-charts/walkthrough Tutorial of using the custom charts feature in the W&B UI Custom charts let you control both the data loaded into a panel and how it's visualized. This tutorial walks you through logging data, building a query, customizing the chart's Vega specification, and saving the result for reuse. It's intended for users who want to go beyond the default chart types and tailor visualizations to specific data. By the end of this tutorial, you'll have a working custom chart in your project that you can save as a preset and a bonus composite histogram you can adapt to your own data. ## Log data to W\&B Before you can visualize anything in a custom chart, your run needs to log the data you want to display. Use [wandb.Run.config](/models/track/config/) for single points set at the beginning of training, like hyperparameters. Use [wandb.Run.log()](/models/track/log/) for multiple points over time, and log custom 2D arrays with `wandb.Table()`. We recommend logging up to 10,000 data points per logged key. ```python theme={null} with wandb.init() as run: # Logging a custom table of data my_custom_data = [[x1, y1, z1], [x2, y2, z2]] run.log( {"custom_data_table": wandb.Table(data=my_custom_data, columns=["x", "y", "z"])} ) ``` [Try a quick example notebook](https://bit.ly/custom-charts-colab) to log the data tables, and in the next step you'll set up custom charts. See what the resulting charts look like in the [live report](https://app.wandb.ai/demo-team/custom-charts/reports/Custom-Charts--VmlldzoyMTk5MDc). With your data logged, you're ready to pull it into a custom chart panel. ## Create a query A query tells the custom chart which logged data to load. After you've logged data to visualize, go to your project page and click the **`+`** button to add a new panel, then select **Custom Chart**. You can follow along in the [custom charts demo workspace](https://app.wandb.ai/demo-team/custom-charts). Blank custom chart ### Add a query 1. Click `summary` and select `historyTable` to set up a new query pulling data from the run history. 2. Type in the key where you logged the `wandb.Table()`. In the previous code snippet, it was `custom_data_table`. In the [example notebook](https://bit.ly/custom-charts-colab), the keys are `pr_curve` and `roc_curve`. For more information about **`summaryTable`**, **`historyTable`**, and **`tableKey`**, see [Build the GraphQL query](/models/app/features/custom-charts#build-the-graphql-query). ### Set Vega fields With the query in place, you can map your logged columns to the chart's visual encodings using the Vega fields dropdown menus: Pulling in columns from the query results to set Vega fields * **x-axis:** runSets\_historyTable\_r (recall) * **y-axis:** runSets\_historyTable\_p (precision) * **color:** runSets\_historyTable\_c (class label) ## Customize the chart The default visualization is a good starting point, but you can edit the Vega spec to change the chart type, add titles, and refine the appearance. To switch from a scatter plot to a line plot, click **Edit** to change the Vega spec for this built-in chart. Follow along in the [custom charts demo workspace](https://app.wandb.ai/demo-team/custom-charts). Custom chart selection Update the Vega spec to customize the visualization: * Add titles for the plot, legend, x-axis, and y-axis (set "title" for each field). * Change the value of "mark" from "point" to "line". * Remove the unused "size" field. PR curve Vega spec Save a preset to reuse the same chart configuration on other panels in the project. To save this as a preset that you can use elsewhere in this project, click **Save as** at the top of the page. Here's what the result looks like, along with an ROC curve. PR curve chart ## Bonus: composite histograms This section shows how to build a more advanced custom chart, a composite histogram that overlays two distributions in the same view, to demonstrate what's possible after you're comfortable editing the Vega spec. Histograms can visualize numerical distributions to help you understand larger datasets. Composite histograms show multiple distributions across the same bins, letting you compare two or more metrics across different models or across different classes within your model. For a semantic segmentation model detecting objects in driving scenes, you might compare the effectiveness of optimizing for accuracy versus Intersection over Union (IoU), or you might want to know how well different models detect cars (large, common regions in the data) versus traffic signs (much smaller, less common regions). In the [demo Colab](https://bit.ly/custom-charts-colab), you can compare the confidence scores for two of the ten classes of living things. Composite histogram To create your own version of the custom composite histogram panel: 1. Create a new custom chart panel in your workspace or report (by adding a **Custom Chart** visualization). Click **Edit** in the top right to modify the Vega spec starting from any built-in panel type. 2. Replace that built-in Vega spec with the [starter code for a composite histogram in Vega](https://gist.github.com/staceysv/9bed36a2c0c2a427365991403611ce21). You can modify the main title, axis titles, input domain, and any other details directly in this Vega spec [using Vega syntax](https://vega.github.io/). For example, you can change the colors or add a third histogram. 3. Modify the query in the right panel to load the correct data from your wandb logs. Add the field `summaryTable` and set the corresponding `tableKey` to `class_scores` to fetch the `wandb.Table` logged by your run. This lets you populate the two histogram bin sets (`red_bins` and `blue_bins`) through the dropdown menus with the columns of the `wandb.Table` logged as `class_scores`. For example, you can choose the `animal` class prediction scores for the red bins and `plant` for the blue bins. 4. You can keep making changes to the Vega spec and query until the plot you see in the preview rendering matches what you want. After you're done, click **Save as** in the top and give your custom plot a name so you can reuse it. Then click **Apply from panel library** to finish your plot. Here's an example result from a brief experiment: training on only 1,000 examples for one epoch yields a model that's confident that most images are not plants and uncertain about which images might be animals. Chart configuration Chart result # Panels Source: https://docs.wandb.ai/models/app/features/panels Use and customize workspace panels to visualize your logged data Use workspace panel visualizations to explore your [logged data](/models/ref/python/experiments/run.md/#method-runlog) by key, visualize the relationships between hyperparameters and output metrics, and more. This page describes how to choose a workspace mode, add and configure panels, organize them into sections, and share them with collaborators. ## Workspace modes W\&B projects support two different workspace modes. The icon next to the workspace name shows its mode. | Icon | Workspace mode | | -------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | automated workspace icon | **Automated workspaces** automatically generate panels for all keys logged in the project. Choose an automatic workspace:
  • To get started quickly by visualizing all available data for the project.
  • For smaller projects that log fewer keys.
  • For more broad analysis.
If you delete a panel from an automatic workspace, you can use [Quick add](#quick-add) to recreate it. | | manual workspace icon | **Manual workspaces** start as blank slates and display only the panels you intentionally add. Choose a manual workspace:
  • When you care mainly about a fraction of the keys logged in the project.
  • For more focused analysis.
  • To improve the performance of a workspace by avoiding loading panels that are less useful to you.
Use [Quick add](#quick-add) to quickly populate a manual workspace and its sections with useful visualizations. | To change how a workspace generates panels, [reset the workspace](#reset-a-workspace). **Undo changes to your workspace** To undo changes to your workspace, click the Undo button (arrow that points left) or press **Cmd+Z** (macOS) or **Ctrl+Z** (Windows or Linux). ## Reset a workspace Reset a workspace to change its mode or clear its panels to start fresh. To reset a workspace: 1. At the top of the workspace, click the **action ()** menu. 2. Click **Reset workspace**. ## Configure the workspace layout Workspace layout settings control how panels and sections are organized and displayed across the whole workspace. To configure the workspace layout, click **Settings** near the top of the workspace, then click **Workspace layout**. The following settings are available: * **Hide empty sections during search** (turned on by default) * **Sort panels alphabetically** (turned off by default) * **Section organization** (grouped by first prefix by default). To modify this setting: 1. Click the padlock icon. 2. Choose how to group panels within a section. To configure defaults for the workspace's line plots, refer to [Line plots](/models/app/features/panels/line-plot/). ### Configure a section's layout To configure the layout of a section, click its gear icon, then click **Display preferences**. The following settings are available: * **Turn on or off colored run names in tooltips** (turned on by default) * **Only show highlighted run in companion chart tooltips** (turned off by default) * **Number of runs shown in tooltips** (a single run, all runs, or **Default**) * **Display full run names on the primary chart tooltip** (turned off by default) ## View a panel in full-screen mode Full-screen mode gives a panel more space, which is useful for closer inspection of your data. In full-screen mode, the run selector displays. To view a panel in full-screen mode: 1. Hover over the panel. 2. Click the panel's **action ()** menu, then click the full-screen button, which looks like a viewfinder or an outline showing the four corners of a square. Full-screen panel When you [share the panel](#share-a-panel) while viewing it in full-screen mode, the resulting link opens in full-screen mode automatically. When you work in full-screen mode, use the following tips: * To return to a panel's workspace from full-screen mode, click the left-pointing arrow at the top of the page. * To navigate through a section's panels without exiting full-screen mode, use either the **Previous** and **Next** buttons below the panel or the left and right arrow keys. * To reclaim more space for the panel, minimize the run selector with **Cmd+.** (macOS) or **Ctrl+.** (Windows or Linux). * When you view an image from a [media panel](/models/app/features/panels/media) in full-screen mode, keyboard shortcuts can zoom in or out, reset zoom, or zoom to fit. See [Keyboard shortcuts](/models/app/keyboard-shortcuts#media-panels). See [Keyboard shortcuts](/models/app/keyboard-shortcuts) for other full-screen and panel shortcuts. ## Add panels The following sections describe ways to add panels to your workspace. ### Add a panel manually Add panels to your workspace one at a time, either globally or at the section level. 1. To add a panel globally, click **Add panels** in the control bar near the panel search field. 2. To add a panel directly to a section instead, click the section's **action ()** menu, then click **+ Add panels**. 3. Select the type of panel to add, such as a chart. The panel's configuration details appear, with defaults selected. 4. Optional: Customize the panel and its display preferences. Configuration options depend on the type of panel you select. To learn more about the options for each type of panel, refer to the relevant section, such as [Line plots](/models/app/features/panels/line-plot/) or [Bar plots](/models/app/features/panels/bar-plot/). 5. Click **Apply**. Demo of adding a panel ### Quick add panels Use **Quick add** to add a panel automatically for each key you select, either globally or at the section level. For an automated workspace with no deleted panels, the **Quick add** option isn't visible because the workspace already includes panels for all logged keys. You can use **Quick add** to re-add a panel that you deleted. To open **Quick add**, do one of the following: * To use **Quick add** to add a panel globally, click **Add panels** in the control bar near the panel search field, then click **Quick add**. * To use **Quick add** to add a panel directly to a section, click the section's **action ()** menu, click **Add panels**, then click **Quick add**. A list of panels appears. Each panel with a checkmark is already included in the workspace. To add panels from the list: * To add all available panels, click the **Add `[N]` panels** button at the top of the list. The **Quick Add** list closes and the new panels display in the workspace. * To add an individual panel from the list, hover over the panel's row, then click **Add**. Repeat this step for each panel you want to add, then click the **X** at the top right to close the **Quick Add** list. The new panels display in the workspace. Optional: Customize the panel's settings. ## Share a panel Share a panel to send collaborators directly to a focused view of a specific visualization without navigating to the full workspace. To share a panel using a link, do one of the following: * While viewing the panel in full-screen mode, copy the URL from the browser. * Click the **action ()** menu, then click **Copy panel URL**. Share the link with the user or team. When they access the link, the panel opens in [full-screen mode](#view-a-panel-in-full-screen-mode). To return to a panel's workspace from full-screen mode, click the left-pointing arrow at the top of the page. ### Compose a panel's full-screen link programmatically When you [create an automation](/models/automations/) or similar workflow, it can be useful to include the panel's full-screen URL. The following format shows a panel's full-screen URL. In the following example, replace the entity, project, panel, and section names in brackets. ```text theme={null} https://wandb.ai/[ENTITY_NAME]/[PROJECT_NAME]?panelDisplayName=[PANEL_NAME]&panelSectionName=[SECTION_NAME] ``` If multiple panels in the same section have the same name, this URL opens the first panel with the name. ### Embed or share a panel on social media To embed a panel in a website or share it on social media, the panel must be viewable by anyone with the link. If a project is private, only members of the project can view the panel. If the project is public, anyone with the link can view the panel. To get the code to embed or share a panel on social media: 1. From the workspace, hover over the panel, then click its **action ()** menu. 2. Click the **Share** tab. 3. Change **Only those who are invited have access** to **Anyone with the link can view**. Otherwise, the choices in the next step aren't available. 4. Choose **Share on Twitter**, **Share on Reddit**, **Share on LinkedIn**, or **Copy embed link**. ### Email a panel report To email a single panel as a stand-alone report: 1. Hover over the panel, then click the panel's **action ()** menu. 2. Click **Share panel in report**. 3. Select the **Invite** tab. 4. Enter an email address or username. 5. Optional: Change **can view** to **can edit**. 6. Click **Invite**. W\&B sends an email to the user with a clickable link to the report that contains only the panel you're sharing. Unlike when you [share a panel](#share-a-panel), the recipient can't get to the workspace from this report. ## Manage panels After you add panels to a workspace, you can edit, move, duplicate, or remove them to keep your visualizations organized and up to date. ### Edit a panel To edit a panel: 1. Click its pencil icon. 2. Modify the panel's settings. 3. Optional: To change the panel to a different type, select the type and then configure the settings. 4. Click **Apply**. ### Move a panel To move a panel to a different section, you can use the drag handle on the panel. To select the new section from a list instead: 1. Optional: Create a new section by clicking **Add section** after the last section. 2. Click the **action ()** menu for the panel. 3. Click **Move**, then select a new section. You can also use the drag handle to rearrange panels within a section. ### Duplicate a panel To duplicate a panel: 1. At the top of the panel, click the **action ()** menu. 2. Click **Duplicate**. If desired, you can [customize](#edit-a-panel) or [move](#move-a-panel) the duplicated panel. ### Remove panels To remove a panel: 1. Hover over the panel. 2. Click the **action ()** menu. 3. Click **Delete**. To remove all panels from a manual workspace, click its **action ()** menu, then click **Clear all panels**. To remove all panels from an automatic or manual workspace, you can [reset the workspace](#reset-a-workspace). Select **Automatic** to start with the default set of panels, or select **Manual** to start with an empty workspace with no panels. ## Manage sections Sections group related panels together, making it easier to scan large workspaces. By default, sections in a workspace reflect the logging hierarchy of your keys. However, in a manual workspace, sections appear only after you start adding panels. ### Add a section To add a section, click **Add section** after the last section. To add a new section before or after an existing section, you can instead click the section's **action ()** menu, then click **New section below** or **New section above**. Don't name a section "Section". Because of a known limitation, panels in this section don't render until you rename the section. ### Manage a section's panels Within a section, you can resize panels, control pagination, and delete individual panels. Sections with many panels are paginated by default. The default number of panels on a page depends on the panel's configuration and on the sizes of the panels in the section. To resize a panel, hover over it, click the drag handle, and drag it to adjust the panel's size. Resizing one panel resizes all panels in the section. If a section is paginated, you can customize the number of panels to show on a page: 1. At the top of the section, click **1 to `[X]` of `[Y]`**, where `[X]` is the number of visible panels and `[Y]` is the total number of panels. 2. Choose how many panels to show per page, up to 100. To delete a panel from a section: 1. Hover over the panel, then click its **action ()** menu. 2. Click **Delete**. If you reset a workspace to an automated workspace, all deleted panels appear again. ### Rename a section To rename a section, click its **action ()** menu, then click **Rename section**. Don't name a section "Section". Because of a known limitation, panels in this section don't render until you rename the section. ### Delete a section To delete a section, click the **action ()** menu, then click **Delete section**. This removes the section and its panels. # Bar plots Source: https://docs.wandb.ai/models/app/features/panels/bar-plot Visualize metrics, customize axes, and compare categorical data as bars. A bar plot presents categorical data with rectangular bars that you can plot vertically or horizontally. Use bar plots to visualize metrics, compare categorical data, and customize axes for your runs. Bar plots appear by default with `wandb.Run.log()` when all logged values are of length one. Box and horizontal bar plots in W&B Use chart settings to limit the maximum number of runs shown, group runs by any config, and rename labels. Customized bar plot ## Customize bar plots You can also create **Box** or **Violin** plots to combine many summary statistics into one chart type. To create a box or violin plot: 1. Group runs through the runs table. 2. In the workspace, click **Add panel**. 3. Add a standard **Bar Chart** and select the metric to plot. 4. Under the **Grouping** tab, pick **Box** or **Violin** to plot either of these styles. Customizing a bar plot with grouping options # Save and diff code Source: https://docs.wandb.ai/models/app/features/panels/code Enable code saving, compare code across W&B runs with the code comparer, and capture Jupyter session history. This page explains how to enable code saving so you can compare the code used across W\&B runs and review the cells executed in Jupyter sessions. Saving code makes it easier to reproduce experiments and understand how changes to your training code affect results. By default, W\&B only saves the latest Git commit hash. You can turn on more code features to compare the code between your experiments in the UI. Starting with `wandb` version 0.8.28, W\&B can save the code from your main training file where you call `wandb.init()`. ## Save library code When you enable code saving, W\&B saves the code from the file that called `wandb.init()`. To save additional library code, you have three options. ### Call `log_code` after `wandb.init` Call `wandb.Run.log_code(".")` after `wandb.init()`: ```python theme={null} import wandb with wandb.init() as run: run.log_code(".") ``` ### Pass a settings object with `code_dir` Pass a settings object to `wandb.init()` with `code_dir` set: ```python theme={null} import wandb wandb.init(settings=wandb.Settings(code_dir=".")) ``` This captures all Python source code files in the current directory and all subdirectories as an [artifact](/models/ref/python/experiments/artifact). For more control over the types and locations of source code files that W\&B saves, see the [reference docs](/models/ref/python/experiments/run#log_code). ### Set code saving in the UI In addition to setting code saving programmatically, you can configure defaults in the UI at the team or organization level. The following sections describe the team-level and organization-level settings. #### Team By default, W\&B disables code saving for all teams. Before you can turn it on for a team, an organization admin must turn it on for the organization. See the [Organization](#organization) section. A team admin can open the team **Settings** page, go to the **Privacy** section, and configure **Enable code saving by default** for runs in that team. This option is available only when an organization admin hasn't enforced code saving restrictions for the whole organization. For navigation steps, see [Configure privacy settings for a team](/platform/hosting/privacy-settings#configure-privacy-settings-for-a-team). #### Organization An organization admin can open organization **Settings**, go to the **Privacy** section, and activate **Enforce default code saving restrictions** so code saving stays off by default for every team. While this enforcement is active, team admins can't turn on **Enable code saving by default** for a team. For the full list of organization controls, see [Enforce privacy settings for all teams](/platform/hosting/privacy-settings#enforce-privacy-settings-for-all-teams). ## Code comparer The code comparer panel displays the code from different W\&B runs side by side in the workspace. To compare code used in different W\&B runs: 1. Select the **Add panels** button in the top right corner of the page. 2. Expand the **TEXT AND CODE** dropdown and select **Code**. Code comparer panel ## Jupyter session history W\&B saves the history of code executed in your Jupyter notebook session. When you call `wandb.init()` inside of Jupyter, W\&B adds a hook to automatically save a Jupyter notebook that contains the history of code executed in your current session. To view the saved notebook history for a run: 1. Navigate to the project workspace that contains your code. 2. Select the **Artifacts** tab in the project sidebar. 3. Expand the **code** artifact. 4. Select the **Files** tab. Jupyter session history This displays the cells that ran in your session along with any outputs created by calling IPython's `display` method. This lets you see exactly what code ran within Jupyter in a given run. When possible, W\&B also saves the most recent version of the notebook, which you find in the code directory as well. Jupyter session output # Line plots overview Source: https://docs.wandb.ai/models/app/features/panels/line-plot Visualize metrics, customize axes, and compare multiple lines on a plot Line plots display by default for metrics logged with `wandb.Run.log()` over time. Line plots support plotting multiple metrics, calculating custom axes, and more. This page shows how to create, configure, and manage line plots in a [workspace](/models/track/workspaces). Line plot example For [runs](/models/runs) that execute on [CoreWeave Kubernetes Service (CKS)](https://docs.coreweave.com/products/cks) clusters, [CoreWeave Mission Control](https://www.coreweave.com/mission-control) can monitor your compute infrastructure when the integration is enabled. If an error occurs, W\&B populates infrastructure information onto your run's plots in your project's workspace. For prerequisites and details, see [Visualize CoreWeave infrastructure alerts](/models/runs/infrastructure-alerts). ## Add a line plot The following sections describe how to create a line plot for a single metric or multiple metrics. In an [automatic workspace](/models/app/features/panels#workspace-modes), W\&B creates a single-metric line plot automatically for each logged metric. Follow these steps to re-add a line plot that was deleted from an automatic workspace, or to add a line plot to a manual workspace. 1. Navigate to your workspace. 2. To add a line plot globally, click **Add panels** in the control bar near the panel search field. To add a line plot directly to a section instead, click the section's **action ()** menu, then click **+ Add panels**. 3. To add a single-metric plot with default settings, click **Quick panel builder**. 1. In the **Single-key panels** tab, hover over a metric, then click **Add**. Repeat this step for each panel you want to add. 2. Click **Create \[NUMBER] panels**. 4. To add a custom line plot instead, click **Line plot**. 1. Configure the line plot's data, grouping, and display preferences using the corresponding tabs. For details, see [Edit line plot settings](#edit-line-plot-settings). 2. To add calculated expressions to the x-axis or y-axis, click **Expressions**. [JavaScript regular expressions](https://www.w3schools.com/js/js_regexp.asp) are supported. 5. Select the type of panel to add, such as a chart. The panel's configuration details appear with selected defaults. 6. Optionally, customize the panel and its display preferences. Configuration options depend on the type of panel you select. For more information about the options for each type of panel, see [Line plots](/models/app/features/panels/line-plot/) or [Bar plots](/models/app/features/panels/bar-plot/). 7. Click **Apply**. This feature is in preview, available by invitation only. To request enrollment, contact [support](mailto:support@wandb.com) or your AISE. In an [automatic workspace](/models/app/features/panels#workspace-modes), W\&B creates a single-metric line plot automatically for each logged metric. This section shows how to create a single line plot that shows multiple metrics together, defined by a JavaScript regular expression. You can optionally consolidate many single-metric plots into a single multi-metric plot. This can improve the performance of a workspace with many logged metrics, and can help you analyze the results of your runs. 1. Navigate to your workspace. 2. To add a line plot globally, click **Add panels** in the control bar near the panel search field. To add a line plot directly to a section instead, click the section's **action ()** menu, then click **+ Add panels**. 3. Click **Quick panel builder**, then click the **Multi-metric panels** tab. 4. In **Regex** enter an expression in [JavaScript regular expression](https://www.w3schools.com/js/js_regexp.asp) format. As you type, the UI updates to show which metrics match the expression. By default, the plot name shows the regular expression used by the plot. The plot includes lines for all metrics that match the expression, including metrics logged in the future. 5. To optionally remove duplicate single-metric panels when the multi-metric plot is created, toggle **Clean up auto-generated panels**. A preview shows which panels are cleaned up. When this option is turned on, W\&B doesn't create a single-metric plot for a newly logged metric that matches the expression. Instead, it appears only in this multi-metric plot. 6. Click **Create \[NUMBER] panels**. ### Multi-metric regular expressions Multi-metric line plots use [JavaScript regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions) to match metric names. The following sections describe common use cases and provide more details about how the regular expressions work, such as how capture groups affect the panels that W\&B creates. #### Common use cases The following examples show some ways you can use multi-metric panels to help you analyze your experiment results. **Compare metrics across layers or model components** Instead of creating separate panels for each layer's metrics, you can view them together in a single panel. For example, if you log metrics with consistent naming, like `layer_0_loss`, `layer_1_loss`, and `layer_2_loss` in this Python example code, you can use the regex `layer_\d+_loss` to display all layer losses on one plot. ```python theme={null} with wandb.init(project="multi-layer-model") as run: for step in range(100): run.log({ "layer_0_loss": loss_0, "layer_1_loss": loss_1, "layer_2_loss": loss_2, "step": step }) ``` **Group related metrics by prefix or suffix** Match all metrics that share a common naming pattern. For example: * `train_.*` matches all training metrics like `train_loss`, `train_accuracy`, `train_f1_score`. * `.*_accuracy` matches accuracy metrics across different datasets like `train_accuracy`, `val_accuracy`, `test_accuracy`. **Match specific metric variations** Use alternation to match only the metrics you want. For example, the non-capture group `(?:layer_0|layer_10)_loss` matches only the first and tenth layer losses, excluding intermediate layers. #### Capture groups Capture groups in your regular expression control how multi-metric panels are created. This behavior can be confusing if you're not expecting it. * **Capture groups create multiple panels** When your regular expression includes parentheses that form a capture group, the UI creates a separate panel for each unique value captured by that group. For example, the expression `(layer_0|layer_10)_loss` includes a capture group and will create two separate panels: 1. One panel for metrics matching `layer_0`. 2. One panel for metrics matching `layer_10`. * **Non-capturing groups keep metrics together** To match multiple alternatives without creating separate panels, use a non-capturing group with the `?:` syntax. The expression `(?:layer_0|layer_10)_loss` matches the same metrics as the previous example, but displays them together in a single panel. Here's the difference: * `(layer_0|layer_10)_loss` - Creates two panels, one for each layer. * `(?:layer_0|layer_10)_loss` - Creates one panel showing both layers together. This gives you flexibility to choose the approach that best fits your analysis needs. Use capture groups when you want to separate metrics into distinct panels. Use non-capturing groups when you want to compare metrics together on a single plot. ## Edit line plot settings The following sections describe how to edit the settings for an individual line plot panel, all line plot panels in a section, or all line plot panels in a workspace. For details about line plot settings, see [Line plot reference](/models/app/features/panels/line-plot/reference). ### Individual line plot A line plot's individual settings override the line plot settings for the section or the workspace. To customize a line plot: 1. Navigate to your workspace. 2. Hover your mouse over the panel, then click the gear icon. 3. Within the drawer that appears, select a tab to edit its settings. 4. Click **Apply**. Line plot settings are organized into tabs: * **Data**: Configure x-axis, y-axis, sampling method, smoothing, outliers, and chart type. * **Grouping**: Configure whether and how to group and aggregate runs in the plot. * **Chart**: Specify titles for the panel and axes, and configure legend visibility and position. * **Legend**: Customize the appearance and content of the panel's legend. * **Expressions**: Add custom calculated expressions for the axes. For detailed information about each setting, see the [Line plot reference](/models/app/features/panels/line-plot/reference). ### All line plots in a section To customize the default settings for all line plots in a section, overriding workspace settings for line plots: 1. Navigate to your workspace. 2. Click the section's gear icon to open its settings. 3. Within the drawer that appears, select the **Data** or **Display preferences** tabs to configure the default settings for the section. For details about each **Data** setting, see the [Line plot reference](/models/app/features/panels/line-plot/reference). For details about each display preference, see [Configure section layout](../#configure-section-layout). ### All line plots in a workspace To customize the default settings for all line plots in a workspace: 1. Navigate to your workspace. 2. Click the workspace settings icon, which has a gear with the label **Settings**. 3. Click **Line plots**. 4. Within the drawer that appears, select the **Data** or **Display preferences** tabs to configure the default settings for the workspace. * For details about each **Data** setting, see the [Line plot reference](/models/app/features/panels/line-plot/reference). * For details about each **Display preferences** section, see [Workspace display preferences](../#configure-workspace-layout). At the workspace level, you can configure the default **Zooming** behavior for line plots. This setting controls whether to sync zooming across line plots with a matching x-axis key. Deactivated by default. ## Visualize average values on a plot If you have several different experiments and you want to see the average of their values on a plot, you can use the Grouping feature in the table. Click **Group** above the run table and select **All** to show averaged values in your graphs. The following image shows the graph before averaging, with one line per run: Individual precision lines The following image shows a graph that represents average values across runs using grouped lines. Averaged precision lines ## Visualize NaN value on a plot To track metrics that might sometimes be undefined, such as a loss that returns `NaN`, you can log them and W\&B renders them on the line plot. You can also plot `NaN` values including PyTorch tensors on a line plot with `wandb.Run.log()`. For example: ```python theme={null} with wandb.init() as run: # Log a NaN value run.log({"test": float("nan")}) ``` NaN value handling ## Compare multiple metrics on one chart To compare multiple metrics from one or more runs side by side, add a **Run comparer** panel to your workspace. Adding visualization panels 1. Navigate to your workspace. 2. Select the **Add panels** button in the top right corner of the page. 3. From the drawer that appears, expand the **Evaluation** dropdown. 4. Select **Run comparer**. ## Change the colors of the lines If the default color of runs isn't helpful for comparison, W\&B provides two ways to change the colors: from the run table or from a chart's legend settings. Each run is given a random color by default upon initialization. Random colors given to runs Upon clicking any of the colors, a color palette appears from which you can choose the color you want. The color palette 1. Navigate to your workspace. 2. Hover your mouse over the panel whose settings you want to edit. 3. Select the pencil icon that appears. 4. Choose the **Legend** tab. Line plot legend settings ## Visualize on different x axes By default, line plots use training steps as the x-axis, but you can switch to a different x-axis to view your data from another perspective. If you want to see the absolute time that an experiment has taken, or see what day an experiment ran, you can switch the x-axis. The following example shows switching from steps to relative time and then to wall time. X-axis time options To use a custom x-axis, log the metric in the same call to `wandb.Run.log()` where you log the y-axis. For example: ```python theme={null} with wandb.init() as run: for i in range(100): run.log({"accuracy": acc, "custom_x": i * 10}) ``` For more details, see [Customize log axes](/models/track/log/customize-logging-axes#customize-log-axes). ## Zoom To inspect a specific region of a line plot more closely, you can zoom in on both axes at once. Click and drag a rectangle to zoom vertically and horizontally at the same time. This changes the x-axis and y-axis zoom. Plot zoom functionality ## Hide chart legend If the chart legend is taking up space you want to use for the plot, you can turn it off. Turn off the legend in the line plot with this toggle: Hide legend toggle ## Create a run metrics notification Use [Automations](/models/automations/) to notify your team when a run metric meets a condition you specify. An automation can post to a Slack channel or run a webhook. From a line plot, you can create a [run metrics notification](/models/automations/automation-events/#run-events) for the metric it shows: 1. Navigate to your workspace. 2. Hover over the panel, then click the bell icon. 3. Configure the automation using the basic or advanced configuration controls. For example, apply a run filter to limit the scope of the automation, or configure an absolute threshold. Learn more about [Automations](/models/automations/). # Line plot reference Source: https://docs.wandb.ai/models/app/features/panels/line-plot/reference Reference for line plot panel settings including x-axis, y-axis, smoothing, aggregation, and grouping options. This page provides comprehensive details for line plot settings. For more details about working with line plots, see the [Line plots overview](/models/app/features/panels/line-plot). ## Data settings Data settings control which metrics appear on the plot and how data points are sampled, smoothed, and aggregated. ### X-axis Selecting X-Axis Set the range of the x-axis to any integer or float value you logged with `wandb.Run.log()`. Available time-based x-axis options: * **Step**: Increments each time `wandb.Run.log()` is called. Reflects the number of training steps logged from your model. (Default) * **Relative Time (Wall)**: Clock time since the process started. If you start a run, pause it for a day, then resume and log, that point appears at 24 hours. * **Relative Time (Process)**: Time inside the running process. If you start a run, run for 10 seconds, pause for a day, then resume, that point appears at 10 seconds. * **Wall Time**: Minutes elapsed since the start of the first run on the graph. * **X range**: From the smallest to largest value of your x-axis by default. You can customize the minimum and maximum values. ### Y-axis Set the y-axis variables to any integer or float value you logged with `wandb.Run.log()`. Specify a single value, array of values, or histogram of values. If you logged more than 1,500 points for a variable, W\&B samples down to 1,500 points. Customize the color of your y-axis lines by changing the run color in the runs table. The **Y range** option defaults to the range from the smallest positive value of your metrics (inclusive of 0) to the largest value of your metrics. You can customize the minimum and maximum values. ### Point aggregation method Choose the sampling mode for displaying data points: * **Random sampling** (default): See [Random sampling](/models/app/features/panels/line-plot/sampling/#random-sampling). * **Full fidelity**: See [Full fidelity](/models/app/features/panels/line-plot/sampling/#full-fidelity). ### Smoothing Set the [smoothing coefficient](/support/models/articles/what-formula-do-you-use-for-your-smoothi) between 0 and 1, where 0 is no smoothing and 1 is maximum smoothing. Available smoothing methods: * **Time-weighted EMA** (default): an exponential moving average (EMA) technique for smoothing time series data by exponentially decaying the weight of previous points. * **Running average**: replaces a point with the average of points in a window before and after the given x value. * **Gaussian**: computes a weighted average of the points, where the weights correspond to a gaussian distribution with the standard deviation specified as the smoothing parameter. * **No smoothing** For comprehensive details, see [Smooth line plots](/models/app/features/panels/line-plot/smoothing). ### Ignore outliers Rescale the plot to exclude outliers from the default plot min and max scale. The setting's impact depends on the plot's sampling mode: * **Random sampling mode**: Ignoring outliers omits points below 5% and above 95% from the plot. * **Full fidelity mode**: Ignoring outliers shows all points, condensed down to the last value in each bucket, and shades the area below 5% and above 95%. ### Max runs or groups By default, plots include only the first 10 runs or groups of runs in the run list or run set. Change the sort order to control which runs or groups to display. A workspace is limited to displaying a maximum of 1,000 runs, regardless of its configuration. ### Chart type Choose the plot style: * **Line plot** Line plot style * **Area plot** Area plot style * **Percentage area plot** Percentage plot style Configure the chart type in the **Data** tab. Refer to [Individual line plot](/models/app/features/panels/line-plot#individual-line-plot). ## Grouping settings Aggregate all runs by turning on grouping, or group over an individual variable. When you turn on grouping in the runs table, the groups automatically populate into the graph. * **Group runs**: Turn on run grouping in the plot. Required to configure the shaded range in the plot. * **Group by**: Optionally select a column. All runs with the same value in that column are grouped together. * **Aggregation**: The value of the line on the graph. Options are mean, median, min, and max of the group. * **Range**: Configure the shaded area of a full-fidelity line plot. Options are Min/Max, Std Dev, Std Err, or None. ## Chart settings Configure titles and legend visibility: * **Panel title**: Title displayed at the top of the panel. * **X-axis title**: Label for the x-axis. * **Y-axis title**: Label for the y-axis. * **Legend**: Show or hide the legend, and configure its position. ## Legend settings Customize the legend to show any config value logged and metadata from the runs, such as creation time or the user who created the run. ### Legend template Define a template for the legend name. 1. Click the gear icon to open the plot settings. 2. Go to the **Display preferences** tab. 3. Expand **Advanced legend**, then specify the legend template. 4. Click **Apply**. ### Point-specific values Set values inside `[[ ]]` to display point-specific values in the crosshair when hovering over a chart. 1. Click the gear icon to open the plot settings. 2. Go to the **Display preferences** tab. 3. At the bottom of the tab, configure point-specific values for one or more of the plot's metrics. 4. Click **Apply**. Supported values inside `[[ ]]`: | Value | Meaning | | ------------- | ------------------------------------------ | | `${x}` | X value | | `${y}` | Y value (including smoothing adjustment) | | `${original}` | Y value not including smoothing adjustment | | `${mean}` | Mean of grouped runs | | `${stddev}` | Standard deviation of grouped runs | | `${min}` | Min of grouped runs | | `${max}` | Max of grouped runs | | `${percent}` | Percent of total (for stacked area charts) | ## Expressions Formulate math expressions to create or transform line plots from your logged metrics, config values, and summary statistics. For example, you can compute the difference between two metrics, rescale a metric by a config value, or plot the logarithm of a metric. Expressions are evaluated at each step that you log. The following sections show how to reference values in expressions and describe the operators and functions you can use in custom expressions. In a line plot, you can use an expression for the x-axis, y-axis, or both. See [Example expressions](#example-expressions) for common use cases and example expressions. ### Create new line plots with expressions 1. Navigate to your project's workspace. 2. Click the **+ Add panel** button and select **Line plot**. 3. Click the **Data** tab. Select the data you want to plot on the line plot for both the x and y axes. 4. Click the **Expressions** tab. 5. In the **Y-axis** field or **X-axis** field, enter your expression. 6. Click **Apply** to save your settings and view the line plot. ### Transform existing line plots with expressions 1. Open the line plot you want to transform. 2. Click the gear icon in the upper right corner of the plot to edit the panel. 3. Click the **Expressions** tab. 4. In the **Y-axis** field or **X-axis** field, enter your expression. 5. Click **Apply** to save your settings and view the updated line plot. ### Referencing values The following table describes how to reference logged metrics, config parameters, and summary statistics in line plot expressions. | Type | Syntax | Description | Examples | | ----------------- | ---------------------- | -------------------------------------------------------- | --------------------------------------------------- | | Metric | `${metric_name}` | Reference a logged metric by name. | `${val/accuracy}`, `${"accuracy"}` | | Config parameter | `${config:param_name}` | Reference config values using the `${config:}` prefix. | `${config:lr}`, `${config:batch_size}` | | Summary statistic | `${summary:stat_name}` | Reference summary fields using the `${summary:}` prefix. | `${summary:final_accuracy}`, `${summary:best_loss}` | Use `${...}` to escape any metric name, config parameter, or summary field that includes special characters or spaces such as `/`, `-`, or spaces. For example, if you log a metric named `val/accuracy`, reference it as `${val/accuracy}` to avoid confusion with the division operator. If you log a config parameter named `dropout-rate`, reference it as `${config:dropout-rate}`. If you log a summary field named `best loss`, reference it as `${summary:best loss}`. #### Nested configs Access nested config values using dot notation and the following syntax: ```javascript theme={null} ${config:parent.child.grandchild} ``` Where `parent`, `child`, and `grandchild` are the keys in the nested config dictionary. For example, suppose you log the following config with nested dictionaries: ```python theme={null} config = { "model": { "type": "resnet", "layers": 50 }, "training": { "batch_size": 32, "learning_rate": 0.001 } } with wandb.init(project="my-project", config=config) as run: ... ``` You can reference the model type with: `${config:model.type}`. Reference the batch size with: `${config:training.batch_size}`. As another example, consider the following config with nested dictionaries: ```json Config parameters theme={null} config: optimizer: value: lr: 0.001 weight_decay: 0.01 model: value: hidden_size: 768 ``` Reference the learning rate with `${config:optimizer.value.lr}`, the model's hidden size with `${config:model.value.hidden_size}`, or the weight decay with `${config:optimizer.value.weight_decay}`. ### Available operators W\&B supports the following operators for line plot expressions: | Category | Operators | | ---------- | ------------------------------------------------- | | Arithmetic | `+`, `-`, `*`, `/`, `%` (modulo), `**` (exponent) | | Comparison | `==`, `!=`, `===`, `!==`, `<`, `>`, `<=`, `>=` | | Bitwise | `\|`, `^`, `&`, `<<`, `>>`, `>>>` | | Logical | `\|\|` , `&&` | ### Math constants and functions All JavaScript math functions and constants are supported. For more details, see the [MDN Math documentation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Math). The following tables summarize the most commonly used functions and constants for line plot expressions. #### Math constants | Constant | Description | | --------- | ----------------------- | | `e` | Euler's number | | `pi` | Pi | | `ln2` | Natural logarithm of 2 | | `ln10` | Natural logarithm of 10 | | `log2e` | Base-2 logarithm of e | | `log10e` | Base-10 logarithm of e | | `sqrt2` | Square root of 2 | | `sqrt1_2` | Square root of 1/2 | #### Arithmetic and statistical functions The following table describes available arithmetic and statistical functions: | Function | Description | | -------------- | ---------------------------------------------- | | abs(x) | Absolute value | | ceil(x) | Ceiling function (round up to nearest integer) | | floor(x) | Floor function (round down to nearest integer) | | round(x) | Round to nearest integer | | min(x, y, ...) | Minimum value | | max(x, y, ...) | Maximum value | | sqrt(x) | Square root | #### Logarithmic and exponential functions The following table describes available logarithmic and exponential functions: | Function | Description | | --------- | -------------------------- | | log(x) | Natural logarithm (base e) | | log10(x) | Base-10 logarithm | | log2(x) | Base-2 logarithm | | exp(x) | Exponential function (e^x) | | pow(x, y) | Power function (x^y) | #### Trigonometric functions The following table describes available trigonometric functions: | Function | Description | | ----------- | ----------------------------- | | sin(x) | Sine | | cos(x) | Cosine | | tan(x) | Tangent | | asin(x) | Arc sine (inverse sine) | | acos(x) | Arc cosine (inverse cosine) | | atan(x) | Arc tangent (inverse tangent) | | atan2(y, x) | Two-argument arc tangent | #### Hyperbolic functions The following table describes available hyperbolic functions: | Function | Description | | -------- | ------------------ | | sinh(x) | Hyperbolic sine | | cosh(x) | Hyperbolic cosine | | tanh(x) | Hyperbolic tangent | ### Example expressions The following are example expressions for line plot axes. These examples are for illustrative purposes. You can use any combination of operators and functions described in the previous sections to create complex expressions that suit your needs. For the following examples, suppose your [summary metrics](/models/track/log/log-summary#log-summary-metrics) include `accuracy` and `loss` with the following values: ```json Summary metrics theme={null} { "accuracy": 0.7829240801794489, "loss": 0.2194763318905079 } ``` And suppose your config looks like this: ```json Config parameters theme={null} config = { "epochs": 100, "optimizer": { "value": { "lr": 0.001, "weight_decay": 0.01 } } } ``` Vertically shift the accuracy metric by adding a constant value. In the following example, shift the accuracy up by 1: ```javascript theme={null} 1 - accuracy ``` Vertically shift a metric by a constant value, in this case the learning rate (`lr`): ```javascript theme={null} accuracy+${config:optimizer.value.lr} ``` Compute the sine of a metric. In the following example, compute the sine of the loss metric: ```javascript theme={null} sin(loss) ``` Phase shift a metric by applying a sine function. For example, apply a sine function and phase shift the loss metric by 2: ```javascript theme={null} sin(loss - 2) ``` Rescale the accuracy metric by a config parameter named `batch_size`: ```javascript theme={null} ${accuracy} / ${config:batch_size} ``` Compute the minimum of two metrics. In the following example, compute the minimum of loss and accuracy: ```javascript theme={null} min(loss, accuracy) ``` Compute the square root of the sum of squares of two metrics. In the following example, compute the square root of the sum of squares of loss and accuracy: ```javascript theme={null} sqrt(loss ** 2 + accuracy ** 2) ``` Compute the exponential of the loss. In the following example, compute the exponential of the loss metric: ```javascript theme={null} sqrt(loss*100)+sqrt(loss*100000) ``` You can also refer to summary metric values with `${summary:metric_name}` syntax in expressions. For example: ```javascript theme={null} sqrt(${summary:loss}*100)+sqrt(${summary:loss}*100000) ``` ### Multi-metric panel expressions Use a regular expression to create a single line plot that shows multiple metrics together (including matching metrics logged in the future). For detailed instructions, see [Add a line plot](/models/app/features/panels/line-plot#multi-metric-line-plot). For example: * Instead of creating separate panels for each layer's metrics, you can view them together in a single panel. For example, if you log metrics with consistent naming, like `layer_0_loss`, `layer_1_loss`, and `layer_2_loss`, you can use a regex like `layer_\d+_loss` to display all layer losses on one plot. * Match all metrics that share a common naming pattern. For example: * `train_.*` matches all training metrics like `train_loss`, `train_accuracy`, `train_f1_score` * `.*_accuracy` matches accuracy metrics across different datasets like `train_accuracy`, `val_accuracy`, `test_accuracy` * Use alternation to match only the metrics you want. For example, the non-capture group `(?:layer_0|layer_10)_loss` matches only the first and tenth layer losses, excluding intermediate layers. #### Capture groups Capture groups in your regular expression control how multi-metric panels are created. This behavior can be confusing if you're not expecting it. * **Capture groups create multiple panels** When your regular expression includes parentheses that form a capture group, the UI creates a separate panel for each unique value captured by that group. For example, the expression `(layer_0|layer_10)_loss` includes a capture group and will create two separate panels: 1. One panel for metrics matching `layer_0`. 2. One panel for metrics matching `layer_10`. * **Non-capturing groups keep metrics together** To match multiple alternatives without creating separate panels, use a non-capturing group with the `?:` syntax. The expression `(?:layer_0|layer_10)_loss` matches the same metrics as the previous example, but displays them together in a single panel. Here's the difference: * `(layer_0|layer_10)_loss` - Creates two panels, one for each layer. * `(?:layer_0|layer_10)_loss` - Creates one panel showing both layers together. This gives you flexibility to choose the approach that best fits your analysis needs. Use capture groups when you want to separate metrics into distinct panels. Use non-capturing groups when you want to compare metrics together on a single plot. # Point aggregation Source: https://docs.wandb.ai/models/app/features/panels/line-plot/sampling Understand the two point aggregation modes for W&B line plots: full fidelity bucketed sampling and random sampling. Use point aggregation methods within your line plots for improved data visualization accuracy and performance. This page explains the two available aggregation modes and how to configure each one, so you can choose the right tradeoff between detail and rendering speed for your workspace. Two types of point aggregation modes are available: [full fidelity](#full-fidelity) and [random sampling](#random-sampling). W\&B uses full fidelity mode by default. ## Full fidelity Full fidelity is the default aggregation method and preserves the extreme values in your data. This section explains how full fidelity works, its main advantages, and how to turn it on for a single chart or an entire workspace. When you use full fidelity mode, W\&B breaks the x-axis into dynamic buckets. The number of points per line adapts to chart size and the number of runs. It calculates the minimum and maximum values within each bucket (used for optional shading) and uses the last value in each bucket (not the average) to draw the primary line. Full fidelity mode for point aggregation has three main advantages: * Preserve extreme values and spikes: retain extreme values and spikes in your data. * Configure how minimum and maximum points render: use the W\&B App to interactively decide whether you want to show extreme (min/max) values as a shaded area. * Explore your data without losing data fidelity: W\&B recalculates x-axis bucket sizes when you zoom into specific data points. This helps ensure that you can explore your data without losing accuracy. W\&B caches previously computed aggregations to help reduce loading times, which is useful when you navigate through large datasets. ### Turn on full fidelity W\&B uses full fidelity mode by default. To configure it manually, follow these steps: 1. Navigate to your workspace. 2. Select the gear icon on the top right corner of the screen next to the left of the **Add panels** button. 3. From the UI slider that appears, select **Line plots**. 4. Choose **Full fidelity** from the **Point aggregation** section. 5. Configure the **Smoothing** algorithm and settings. 6. Set **Aggregation** to **Mean**, **Min**, or **Max**. 7. Click **Apply**. 1. Navigate to your workspace. 2. Select the **Workspace** icon on the left tab. 3. Hover over the line plot panel you want to configure, then click the gear icon. 4. Within the modal that appears, set **Point aggregation method** to **Full fidelity**. 5. Configure the **Smoothing** algorithm and settings. 6. Click **Apply**. ### Configure shading Shading visualizes the variability within each bucket so you can see how much points spread around the line. The shaded areas of a full-fidelity line plot can show: * **Min/Max**: For each x-axis point, shade the area between the minimum and maximum values. The shaded area shows all points from the lowest to the highest value in each bucket: ```math theme={null} \text{Min/Max Range} = [\min(x_1, x_2, \ldots, x_n),\ \max(x_1, x_2, \ldots, x_n)] ``` where $x_1, x_2, \ldots, x_n$ are the values in a given bucket. * **Standard deviation**: For each x-axis point, calculate the variability of the values using standard deviation, and shade the resulting area. ```math theme={null} SD = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(x_i - \overline{x})^2} ``` * **Standard error**: For each x-axis point, calculate the likelihood of a sampling error by dividing the value by the square root of the sample size: ```math theme={null} SE = \frac{SD}{\sqrt{n}} ``` * **None**: No shading (the default). The following image shows a blue line plot. The light blue shaded area represents the minimum and maximum values for each bucket. Shaded confidence areas To configure shading: 1. Navigate to your workspace. 2. Hover over a line plot, then click the gear icon. 3. In the **Data** tab, set **Point aggregation** to **Full fidelity** if necessary, then configure the smoothing algorithm. 4. In the **Grouping** tab, turn on **Group runs**. Optionally, set **Group by** to a run attribute. 5. Set **Agg** to **Mean** (default), **Min**, or **Max**. 6. Set **Range** to **Min/Max**, **Std Dev**, **Std Err**, or **None**. 7. Click **Apply**. ### Explore your data without losing data fidelity Analyze specific regions of the dataset without missing critical points like extreme values or spikes. When you zoom in on a line plot, W\&B adjusts the bucket sizes used to calculate the minimum, maximum, and last value within each bucket. Plot zoom functionality W\&B uses dynamic binning to divide the x-axis into buckets. The number of points shown per line adapts to chart size and the number of runs: smaller chart dimensions or more runs can reduce the number of points per line so that the chart remains responsive and can display more lines. For each bucket, W\&B calculates the following values: * **Minimum**: The lowest value in that bucket (used for shading). * **Maximum**: The highest value in that bucket (used for shading). * **Line value**: The last value in that bucket, used to draw the line. W\&B plots values in buckets in a way that preserves full data representation and includes extreme values in every plot. The line is drawn from the last value in each bucket. When you zoom in far enough, full fidelity mode can render every data point without additional aggregation. The exact threshold depends on the current chart dimensions and number of runs. To zoom in on a line plot, follow these steps: 1. Navigate to your W\&B project. 2. Select the **Workspace** icon on the left tab. 3. Optional: Add a line plot panel to your workspace or navigate to an existing line plot panel. 4. Click and drag to select a specific region to zoom in on. **Line plot grouping and expressions** When you use Line Plot Grouping, W\&B applies the following based on the mode selected: * **Non-windowed sampling (grouping)**: Aligns points across runs on the x-axis. W\&B takes the average if multiple points share the same x-value. Otherwise, they appear as discrete points. * **Windowed sampling (grouping and expressions)**: Divides the x-axis either into 250 buckets or the number of points in the longest line (whichever is smaller). W\&B takes an average of points within each bucket. * **Full fidelity (grouping and expressions)**: Similar to non-windowed sampling, but fetches up to 500 points per run to balance performance and detail. ## Random sampling Random sampling is an alternative aggregation method that prioritizes rendering speed over data fidelity. Use it when you need faster chart performance and you don't need to preserve every extreme value. Random sampling uses 1,500 randomly sampled points to render line plots. Because sampling drops points, spotting outliers or spikes becomes more difficult. Random sampling samples non-deterministically. As a result, random sampling sometimes excludes outliers or spikes in the data and therefore reduces data accuracy. ### Enable random sampling By default, W\&B uses full fidelity mode. To enable random sampling, follow these steps: 1. Navigate to your W\&B project. 2. Select the **Workspace** icon on the left tab. 3. Select the gear icon on the top right corner of the screen next to the left of the **Add panels** button. 4. From the UI slider that appears, select **Line plots**. 5. Choose **Random sampling** from the **Point aggregation** section. 1. Navigate to your W\&B project. 2. Select the **Workspace** icon on the left tab. 3. Select the line plot panel you want to enable random sampling for. 4. Within the modal that appears, select **Random sampling** from the **Point aggregation method** section. ### Access non-sampled data If random sampling drops points you need, you can still access the complete, unsampled metric history programmatically. You can access the complete history of metrics logged during a run using the [W\&B Run API](/models/ref/python/public-api/runs). The following example demonstrates how to retrieve and process the loss values from a specific run: ```python theme={null} # Initialize the W&B API run = api.run("l2k2/examples-numpy-boston/i0wt6xua") # Retrieve the history of the 'Loss' metric history = run.scan_history(keys=["Loss"]) # Extract the loss values from the history losses = [row["Loss"] for row in history] ``` # Smooth line plots Source: https://docs.wandb.ai/models/app/features/panels/line-plot/smoothing In line plots, use smoothing to see trends in noisy data. Smoothing helps you spot trends in noisy line plots by reducing point-to-point variation, making the underlying signal easier to read. This page describes the smoothing algorithms W\&B supports, when each one is most useful, and how to control whether the original data remains visible. W\&B supports several types of smoothing: * [Time weighted exponential moving average (TWEMA) smoothing](#time-weighted-exponential-moving-average-twema-smoothing-default) * [Gaussian smoothing](#gaussian-smoothing) * [Running average](#running-average-smoothing) * [Exponential moving average (EMA) smoothing](#exponential-moving-average-ema-smoothing) To see these algorithms applied to real data, see this [interactive W\&B report](https://wandb.ai/carey/smoothing-example/reports/W-B-Smoothing-Features--Vmlldzo1MzY3OTc). Comparison of various smoothing algorithms applied to a noisy line plot ## Time weighted exponential moving average (TWEMA) smoothing (default) The time-weighted exponential moving average (TWEMA) smoothing algorithm is a technique for smoothing time series data by exponentially decaying the weight of previous points. For details about the technique, see [Exponential Smoothing](https://www.wikiwand.com/en/Exponential_smoothing). The range is 0 to 1. A debias term is added so that early values in the time series aren't biased towards zero. The TWEMA algorithm takes the density of points on the line (the number of `y` values per unit of range on x-axis) into account. This allows consistent smoothing when displaying multiple lines with different characteristics simultaneously. The following sample code shows how this works under the hood: ```javascript theme={null} const smoothingWeight = Math.min(Math.sqrt(smoothingParam || 0), 0.999); let lastY = yValues.length > 0 ? 0 : NaN; let debiasWeight = 0; return yValues.map((yPoint, index) => { const prevX = index > 0 ? index - 1 : 0; // VIEWPORT_SCALE scales the result to the chart's x-axis range const changeInX = ((xValues[index] - xValues[prevX]) / rangeOfX) * VIEWPORT_SCALE; const smoothingWeightAdj = Math.pow(smoothingWeight, changeInX); lastY = lastY * smoothingWeightAdj + yPoint; debiasWeight = debiasWeight * smoothingWeightAdj + 1; return lastY / debiasWeight; }); ``` To see this algorithm applied to live data, see the [TWEMA section of the interactive W\&B report](https://wandb.ai/carey/smoothing-example/reports/W-B-Smoothing-Features--Vmlldzo1MzY3OTc). Line plot with TWEMA smoothing applied ## Gaussian smoothing Gaussian smoothing (or Gaussian kernel smoothing) computes a weighted average of the points, where the weights correspond to a Gaussian distribution with the standard deviation specified as the smoothing parameter. W\&B calculates the smoothed value for every input `x` value, based on the points that occur both before and after it. To see this algorithm applied to live data, see the [Gaussian smoothing section of the interactive W\&B report](https://wandb.ai/carey/smoothing-example/reports/W-B-Smoothing-Features--Vmlldzo1MzY3OTc#3.-gaussian-smoothing). Line plot with Gaussian smoothing applied ## Running average smoothing Running average is a smoothing algorithm that replaces a point with the average of points in a window before and after the given `x` value. See ["Boxcar Filter" on Wikipedia](https://en.wikipedia.org/wiki/Moving_average). The selected parameter for running average specifies the number of points to consider in the moving average. If your points are spaced unevenly on the x-axis, use Gaussian smoothing instead, because a fixed-width window can produce misleading averages when point density varies. To see this algorithm applied to live data, see the [running average section of the interactive W\&B report](https://wandb.ai/carey/smoothing-example/reports/W-B-Smoothing-Features--Vmlldzo1MzY3OTc#4.-running-average). Line plot with running average smoothing applied ## Exponential moving average (EMA) smoothing The exponential moving average (EMA) smoothing algorithm is a heuristic technique for smoothing time series data using the exponential window function. For details about the technique, see [Exponential Smoothing](https://www.wikiwand.com/en/Exponential_smoothing). The range is 0 to 1. A debias term is added so that early values in the time series aren't biased towards zero. In most cases, EMA smoothing applies to a full scan of history, rather than bucketing first before smoothing. This typically produces more accurate smoothing. In the following situations, EMA smoothing is applied after bucketing instead: * Sampling * Grouping * Expressions * Non-monotonic x-axes * Time-based x-axes The following sample code shows how this works under the hood: ```javascript theme={null} data.forEach(d => { const nextVal = d; last = last * smoothingWeight + (1 - smoothingWeight) * nextVal; numAccum++; debiasWeight = 1.0 - Math.pow(smoothingWeight, numAccum); smoothedData.push(last / debiasWeight); ``` To see this algorithm applied to live data, see the [EMA section of the interactive W\&B report](https://wandb.ai/carey/smoothing-example/reports/W-B-Smoothing-Features--Vmlldzo1MzY3OTc). Line plot with EMA smoothing applied ## Hide original data Compare the smoothed line to the raw data to judge how aggressively smoothing alters the signal. By default, the original unsmoothed data displays in the plot as a faint line in the background. Click **Show Original** to turn this off. Toggling the display of the original unsmoothed data in a line plot # Media panels Source: https://docs.wandb.ai/models/app/features/panels/media Add and configure media panels for images, video, audio, 3D objects, and point clouds in a W&B workspace. A media panel visualizes [logged keys for media objects](/models/track/log/media/), including 3D objects, audio, images, video, or point clouds. This page shows how to add and manage media panels in a workspace. Demo of a media panel ## Add a media panel You can add a media panel using Quick Add (which applies the default configuration) or by adding it manually so you can configure it as you create it. In either case, you can add the panel globally to the workspace or to a specific section. To add a media panel for a logged key using the default configuration, use Quick Add: 1. **Global**: Click **Add panels** in the control bar near the panel search field. 2. **Section**: Click the section's **action ()** menu, then click **Add panels**. 3. In the list of available panels, find the key for the panel, then click **Add**. Repeat this step for each media panel you want to add, then click the **X** at the top right to close the **Quick Add** list. 4. Optionally, [configure the panel](#configure-a-media-panel). To add a media panel manually so that you can configure it during creation: 1. **Global**: Click **Add panels** in the control bar near the panel search field. 2. **Section**: Click the section's **action ()** menu, then click **Add panels**. 3. Click the **Media** section to expand it. 4. Select the type of media the panel visualizes, 3D objects, images, video, or audio. The panel configuration screen displays. Configure the panel, then click **Apply**. Refer to [Configure a media panel](#configure-a-media-panel). ## Configure a media panel Panels for all media types have the same options, so the following steps apply regardless of which media type you're working with. When you add a media panel manually, its configuration page opens after you select the type of media. To update the configuration for an existing panel, hover over the panel, then click the gear icon that appears at the top right. The following sections describe the settings available in each tab. ### Overlays This tab appears for images and point clouds logged with segmentation masks or bounding boxes. Use it to: * Search and filter overlays by name. * Customize overlay colors. ### Sync This tab appears in the workspace and section settings, and provides the following settings: * **Sync slider by key**: Configure whether the step sliders for videos in the section move in sync. * **Autoplay videos**: Configure whether videos start playing when the page loads. * **Loop videos**: Configure whether videos in the section restart automatically and play continuously until stopped. Not customizable at the section level. Appears only if the workspace has video media panels. ### Display Customize the panel's overall appearance and behavior: * Configure the panel's title. * Select the media keys to visualize. * Customize the panel's slider and playback behavior. * Configure the slider key, which defaults to **Step**. * Set **Stride length** to the number of steps to advance for each click of the slider. * Turn on or off **Snap to existing step**. If it's turned on, the stepper advances to the next existing step after **Stride length**. Otherwise, it advances by **Stride length** even if that doesn't align with an existing step. * **Images**: Turn on or off smoothing. * **Videos**: Configure the video seek step interval, which controls how far forward or backward to skip when using the left and right arrow keys. * **3d objects**: Configure the background color and point color. ### Layout Customize the display of the panel's individual items: * Optionally limit the **Max runs to include** in the panel. * Optionally specify a **Media display limit** to limit the number of media items to include per run. * Set **Panel mode**: * **Gallery** (default): Specify the number of columns and whether the grid shows a single item per run (default), step, or index. * **Grid**: Specify the number of columns, the x axis (defaults to **Log Step**), and the y axis (defaults to **Run**). * **Compare**: Compare up to four media items side by side, optionally fanning out by step or index. See [Compare mode](#compare-mode) for details. * **Use original size**: If off (default), W\&B scales images and videos in the panel to appear the same size. No effect in **Compare** mode. * Configure media fit and scaling behavior: * **Images**: Choose one of **Original image**, **Original size**, or **Fit to available space**. With the latter, W\&B scales images in the panel to take up as much room as possible. In **Gallery** mode, each image fills the panel width. In **Grid** mode, each image fills the grid width. No effect in **Compare** mode. * **Videos**: Toggle **Use original size** to control whether to scale media to appear the same size. If inactive (default), W\&B scales videos in the panel to take up as much room as possible. In **Gallery** mode, each video fills the panel width. In **Grid** mode, each video fills the grid width. No effect in **Compare** mode. * **Right-handed system**: If on, point clouds use the right-handed system for plotting points, rather than the default left-handed system. ### All media panels in a section Configure section-level defaults when you want every media panel in a section to share the same settings, overriding any workspace-level defaults. To customize the default settings for all media panels in a section, overriding workspace settings for media panels: 1. Click the section's gear icon to open its settings. 2. Click **Media settings**. 3. Within the drawer that appears, click the **Display**, **Layout**, or **Sync** tab to configure the default media settings for the section. You can configure settings for images, videos, audio, and 3D objects. The settings that appear depend on the section's current media panels. Refer to [Configure a media panel](#configure-a-media-panel) for details about a specific setting for **Display** or **Layout** media setting. The **Sync** tab is available only at the section or workspace level, not for individual media panels. When **Step slider syncing** is turned on, the section's media panels with the same step slider are kept in sync. To turn on step slider syncing: 1. Click the **Sync** tab. 2. Turn on **Sync slider by key (Step)**. ### All media panels in a workspace Configure workspace-level defaults when you want a consistent baseline for media panels across every section in the workspace. To customize the default settings for all media panels in a workspace: 1. Click the workspace's settings, which has a gear with the label **Settings**. 2. Click **Media settings**. 3. Within the drawer that appears, click the **Display** or **Layout** tab to configure the default media settings for the workspace. You can configure settings for images, videos, audio, and 3D objects. The settings that appear depend on the workspace's current media panels. Except for the **Sync** tab, refer to [Configure a media panel](#configure-a-media-panel) for details about a setting. The **Sync** tab is available only at the section or workspace level, not for individual media panels. When **Step slider syncing** is turned on, the section's media panels with the same step slider are kept in sync. To turn on step slider syncing: 1. Click the **Sync** tab. 2. Turn on **Sync slider by key (Step)**. For details about each setting, refer to [Configure a media panel](#configure-a-media-panel). ## Compare mode Compare mode lets you select two to four images or videos from any combination of runs, steps, or indices and view them in a single grid. You can compare media without downloading one or more files or switching tabs. Use compare mode to: * Confirm that a new checkpoint or run produces better outputs than a baseline or previous checkpoint. * Catch regressions by comparing the current run to a known good checkpoint at the same step or index. * Validate model outputs against ground truth by viewing reference and generated media side by side. * Review how generated images or videos change across training steps or across different checkpoints or runs. Compare mode is useful for image or video generation workflows like diffusion or generative models, multimodal evaluation, vision experiments, and other workflows where you log media and need to compare it across runs, steps, or indices. To use compare mode: 1. Open a media panel in your workspace (for images or video). 2. Set **Panel mode** to **Compare**. 3. Configure the grid's dimensions and density by specifying the number of columns and the fan-out mode. The best grid geometry depends on how your runs log media. * **Number of columns**: How many items to show per row, from `1` to `4`. * **Fan out**: Control the granularity of media to display per grid row. The best setting depends on how many media items you log per step per run, and whether you need a quick spot check or deep analysis. * **None** (default): No fan-out. Each grid column represents a run, and each grid row represents a logged image or video. Media panel in compare mode with two runs and no fan out * **Step**: Fan out along the step axis. Each grid column represents a run, and each grid row shows the first image or video logged at a run:step combination. Media panel in compare mode with two runs and fan out by step * **Index**: Fan out along the index axis, useful when you log several items per step (for example, `run.log({"images": [img0, img1, img2, img3]})`). Each row cell shows the image or video logged at a given run:step:index combination. Media panel in compare mode with two runs and fan out by index 4. Configure whether a given variable affects a single tile (**Individual**) or all tiles (**Linked**). * **Media key**: Defaults to **Linked**. * **Run**: Defaults to **Individual**. * **Step**: Defaults to **Linked**. * **Index**: Defaults to **Linked**. W\&B saves your selection so you can return to the same comparison later without reconfiguring the panel. ## Interact with a media panel After you add and configure a media panel, use the following interactions to step through, compare, and inspect logged media: * To configure a media panel, hover over it and click the gear icon at the top. * To move a media panel's step slider, use **Cmd + left or right arrow key** (macOS) or **Ctrl + left or right arrow key** (Windows or Linux). If **Sync slider by key** is turned on for the section or workspace, moving the step slider in one media panel also moves the step slider in other media panels with the same step slider key. * Use the stepper at the top of a media panel to step through media runs. To move the step slider, use the UI controls. * Click a media panel to view it in full-screen mode. Click the arrow button at the top of the panel to exit full-screen mode. * To navigate through a section's panels without exiting full-screen mode, use either the **Previous** and **Next** buttons below the panel or the left and right arrow keys. * When viewing an image in full-screen mode, click the zoom control at the top right to zoom in or out. By default, images zoom to fit. Click and drag to pan a zoomed-in image. You can also zoom in, zoom out, reset to 100% zoom, or zoom to fit using keyboard shortcuts. See [Keyboard shortcuts](/models/app/keyboard-shortcuts#media-panels). * For an image that was logged with segmentation masks, you can customize their appearance or turn each one on or off. Hover over the panel, then click the lower gear icon. * For an image or point cloud that was logged with bounding boxes, you can customize their appearance or turn each one on or off. Hover over the panel, then click the lower gear icon. * Use the media controls to play, pause, or stop video playback. Use the left and right arrow keys to skip forward or backward by the configured **video seek step**. If **Sync video playback** is turned on, all videos in the section play in sync. If **Loop videos** is turned on, videos in the section restart automatically and play continuously until stopped. # Parallel coordinates Source: https://docs.wandb.ai/models/app/features/panels/parallel-coordinates Compare results across machine learning experiments Parallel coordinates charts summarize the relationship between large numbers of hyperparameters and model metrics at a glance. Parallel coordinates plot A parallel coordinates panel has the following components: * **Axes**: Hyperparameters from [`wandb.Run.config`](/models/tables/evaluate-models) and metrics from [`wandb.Run.log()`](/models/tables/evaluate-models). * **Lines**: Each line represents a single run. Hover over a line to see a tooltip with details about the run. All lines matching the current filters appear. If you turn off the eye icon for a run, its line is dimmed. ## Create a parallel coordinates panel To add a parallel coordinates panel to your workspace: 1. Navigate to the landing page for your workspace. 2. Click **Add Panels**. 3. Select **Parallel coordinates**. ## Panel settings To configure the panel, click the edit button in the upper-right corner of the panel. You can adjust the following settings: * **Tooltip**: On hover, a legend appears with information about each run. * **Titles**: Edit the axis titles to make them more readable. * **Gradient**: Customize the gradient to any color range. * **Log scale**: Set each axis to use a log scale independently. * **Flip axis**: Switch the axis direction. This is useful when you have both accuracy and loss as columns. For an interactive example, see [a live parallel coordinates panel](https://app.wandb.ai/example-team/sweep-demo/reports/Zoom-in-on-Parallel-Coordinates-Charts--Vmlldzo5MTQ4Nw). # Parameter importance Source: https://docs.wandb.ai/models/app/features/panels/parameter-importance Visualize the relationships between your model's hyperparameters and output metrics The parameter importance panel helps you discover which of your hyperparameters are the best predictors of, and highly correlated to, desirable values of your metrics. Use this panel to focus future hyperparameter searches on the parameters that matter most for model performance. Parameter importance panel **Correlation** is the linear correlation between the hyperparameter and the chosen metric (in this case, `val_loss`). A high correlation means that when the hyperparameter has a higher value, the metric also has higher values, and vice versa. Correlation is a useful metric, but it can't capture second-order interactions between inputs, and it can get messy when you compare inputs with different ranges. W\&B also calculates an **importance** metric. W\&B trains a random forest with the hyperparameters as inputs and the metric as the target output, then reports the feature importance values for the random forest. A conversation with [Jeremy Howard](https://twitter.com/jeremyphoward) inspired this technique. Jeremy pioneered the use of random forest feature importances to explore hyperparameter spaces at [Fast.ai](https://fast.ai). For more information about the motivation behind this analysis, see the [Fast.ai lesson 4 lecture](https://course18.fast.ai/lessonsml1/lesson4.html) and the [Fast.ai lesson 4 forum notes](https://forums.fast.ai/t/wiki-lesson-thread-lesson-4/7540). The hyperparameter importance panel untangles the complicated interactions between highly correlated hyperparameters. In doing so, it helps you fine-tune your hyperparameter searches by showing you which of your hyperparameters matter the most for predicting model performance. ## Create a hyperparameter importance panel To add a hyperparameter importance panel to your workspace: 1. Navigate to your W\&B project. 2. Select the **Add panels** button. 3. Expand the **CHARTS** dropdown, then choose **Parallel coordinates** from the dropdown. If an empty panel appears, make sure that your runs are ungrouped. Automatic parameter visualization With the parameter manager, you can manually set the visible and hidden parameters. Manually setting the visible and hidden fields ## Interpret a hyperparameter importance panel The following sections describe how to read the importance and correlation values shown in the panel so you can act on them in your next sweep. Feature importance analysis This panel shows you all the parameters passed to the [`wandb.Run.config`](/models/track/config/) object in your training script. It also shows the feature importances and correlations of these config parameters with respect to the model metric you select (`val_loss` in this case). ### Importance The importance column shows you the degree to which each hyperparameter was useful in predicting the chosen metric. Imagine a scenario where you start tuning many hyperparameters and use this plot to focus on which ones merit further exploration. You can then limit subsequent sweeps to the most important hyperparameters, which helps you find a better model faster and at lower cost. W\&B calculates importances using a tree-based model rather than a linear model, because tree-based models are more tolerant of both categorical data and data that isn't normalized. In the preceding image, you can see that `epochs`, `learning_rate`, `batch_size`, and `weight_decay` were important. ### Correlations Correlations capture linear relationships between individual hyperparameters and metric values. They answer the question of whether a relationship exists between using a hyperparameter, such as the SGD optimizer, and the `val_loss` (in this case, the answer is yes). Correlation values range from -1 to 1, where positive values represent positive linear correlation, negative values represent negative linear correlation, and a value of 0 represents no correlation. Generally, a value greater than 0.7 in either direction represents strong correlation. You might use this graph to further explore the values that have a higher correlation to your metric (in this case you might pick stochastic gradient descent or `adam` over `rmsprop` or `nadam`) or train for more epochs. * Correlations show evidence of association, not necessarily causation. * Correlations are sensitive to outliers, which might turn a strong relationship into a moderate one, especially when the sample size of hyperparameters tried is small. * Correlations only capture linear relationships between hyperparameters and metrics. Correlations don't capture strong polynomial relationships. The disparities between importance and correlations result because importance accounts for interactions between hyperparameters, whereas correlation only measures the effects of individual hyperparameters on metric values. Correlations also capture only linear relationships, whereas importances can capture more complex ones. Together, importance and correlations help you understand how your hyperparameters influence model performance. # Query panels overview Source: https://docs.wandb.ai/models/app/features/panels/query-panels Some features on this page are in beta, hidden behind a feature flag. Add `weave-plot` to your bio on your profile page to unlock all related features. Looking for W\&B Weave, W\&B's suite of tools for generative AI application building? See the [Weave documentation](https://wandb.github.io/weave/?utm_source=wandb_docs\&utm_medium=docs\&utm_campaign=weave-nudge). Use query panels to query and interactively visualize your data. With a query panel, you can pull specific runs, artifacts, tables, and other W\&B objects into a single view and explore them as tables or plots without leaving your workspace or report. This page is for users who want to compose ad hoc queries against logged W\&B data and surface the results inside a workspace or report. A query panel combines three pieces: * **[Expression](#expressions)**: The data you select. * **[Configuration](#configurations)**: Optional settings for the panel, such as the panel type and options from the gear menu. * **[Result panel](#result-panels)**: How to show the results, such as in a table or plot. For a set of interactive examples you can try, see a public [Query panel examples report](https://wandb.ai/luis_team_test/weave_example_queries/reports/Query-Panel-Examples---Vmlldzo1NzIxOTY2). For a guided walkthrough of query syntax, see the [Query panel tutorial report](https://wandb.ai/luis_team_test/weave_example_queries/reports/Weave-queries---Vmlldzo1NzIxOTY2). Generated types and ops are listed in the [query expression language overview](/models/ref/query-panel). Query panel ## Create a query panel Add a query panel so you have a surface to write expressions against and visualize the results. You can add one to a project workspace or within a report. 1. Navigate to your project's workspace. 2. In the upper-right corner, click **Add panel**. 3. From the dropdown menu, select **Query panel**. * Type and select **/Query panel**. Query panel option Alternatively, you can associate a query with a set of runs: 1. Within your report, type and select **/Panel grid**. 2. Click the **Add panel** button. 3. From the dropdown menu, select **Query panel**. ## Query components The following sections describe the three pieces that make up a query panel: the expression that selects data, the configuration that controls how the panel behaves, and the result panel that renders the output. ### Expressions Use query expressions to query your data stored in W\&B such as runs, artifacts, models, tables, and more. #### Example: Query a table Suppose you want to query a W\&B Table. In your training code, you log a table called `"cifar10_sample_table"`: ```python theme={null} import wandb with wandb.init() as run: run.log({"cifar10_sample_table":[MY-TABLE]}) ``` Within the query panel, you can query your table with: ```python theme={null} runs.summary["cifar10_sample_table"] ``` Table query expression Breaking this down: * `runs` is a variable automatically injected in query panel expressions when the query panel is in a workspace. Its value is the list of runs visible for that workspace. For details about the different attributes available within a run, see [Understanding the different attributes](/models/track/public-api-guide/#understanding-the-different-attributes). * `summary` is an op that returns the Summary object for a run. Ops are *mapped*, meaning this op is applied to each run in the list, resulting in a list of Summary objects. * `["cifar10_sample_table"]` is a Pick op (denoted with brackets) with a key of `cifar10_sample_table`. Because Summary objects act like dictionaries or maps, this operation picks that field from each Summary object. ### Configurations In the upper-left corner of the panel, click the gear icon to expand the query configuration. The configuration lets you set the type of panel and the parameters for the result panel. Panel configuration menu #### Panel options The configuration menu can include options that change how the panel combines or loads table-style results. Exact labels and availability depend on your expression and panel type. For concrete setups, see the [Query panel examples report](https://wandb.ai/luis_team_test/weave_example_queries/reports/Query-Panel-Examples---Vmlldzo1NzIxOTY2). **Concat** Use **Concat** in the configuration when you want the panel to merge compatible table-style results and treat them as a single table for viewing and downstream operations. Expression-level row merging (for example, `concat` and `join` in the query) is separate from this setting. For more information, see [Combine tables in expressions](#combine-tables-in-expressions). **Paginate** Use **Paginate** when a table result might be too large to render at once. Pagination loads rows in chunks so the panel stays responsive. Pair this option with expressions that return large row lists. For patterns that work well with pagination, see the [Query panel examples report](https://wandb.ai/luis_team_test/weave_example_queries/reports/Query-Panel-Examples---Vmlldzo1NzIxOTY2). ### Result panels The query result panel renders the result of the query expression, using the selected query panel, configured to display the data in an interactive form. The following images show a Table and a Plot of the same data. Table result panel Plot result panel ### Step through run history In tables and plots built from `runs` or `runs.history`, the app can show a **step** control (for example, a slider) so you can move through logged steps and inspect metrics, text, or media over the course of your runs. After you change the expression, edit the query panel's configuration and change **Render As** to **Stepper**. The control can follow a different metric instead of `_step` if it better matches how you logged data. For sample expressions, see the [Query panel examples report](https://wandb.ai/luis_team_test/weave_example_queries/reports/Query-Panel-Examples---Vmlldzo1NzIxOTY2). ## Basic operations After a query panel renders results, you can refine what you see by sorting, filtering, mapping, or grouping the rows. The following are common operations you can perform within your query panels. ### Sort Sort from the column options: Column sort options ### Filter You can either filter directly in the query or use the filter button in the upper-left corner of the panel. Query filter syntax Filter button ### Map Map operations iterate over lists and apply a function to each element in the data. You can do this directly with a panel query or by inserting a new column from the column options. Map operation query Map column insertion ### Group by You can group by using a query or from the column options. Group by query Group by column options ### Combine tables in expressions Use `concat`, `join`, and related ops in your expression when you need to stack or merge row lists from tables. See [Join](#join) for a full example. The **Concat** and **Paginate** items in [Panel options](#panel-options) are separate controls for how the UI merges and loads table results. ### Join You can also join tables directly in the query. Consider the following query expression: ```python theme={null} project("luis_team_test", "weave_example_queries").runs.summary["short_table_0"].table.rows.concat.join(\ project("luis_team_test", "weave_example_queries").runs.summary["short_table_1"].table.rows.concat,\ (row) => row["Label"],(row) => row["Label"], "Table1", "Table2",\ "false", "false") ``` Table join operation The table on the left is generated from: ```python theme={null} project("luis_team_test", "weave_example_queries").\ runs.summary["short_table_0"].table.rows.concat.join ``` The table on the right is generated from: ```python theme={null} project("luis_team_test", "weave_example_queries").\ runs.summary["short_table_1"].table.rows.concat ``` Where: * `(row) => row["Label"]` are selectors for each table, determining which column to join on. * `"Table1"` and `"Table2"` are the names of each table when joined. * `true` and `false` are for left and right inner/outer join settings. ## Runs object Use query panels to access the `runs` object. Run objects store records of your experiments. For more details, see [Accessing runs object](https://wandb.ai/luis_team_test/weave_example_queries/reports/Weave-queries---Vmlldzo1NzIxOTY2#3.-accessing-runs-object). As a quick overview, the `runs` object has the following available: * `summary`: A dictionary of information that summarizes the run's results. The summary can contain scalars like accuracy and loss, or large files. By default, `wandb.Run.log()` sets the summary to the final value of a logged time series. You can set the contents of the summary directly. Think of the summary as the run's outputs. * `history`: A list of dictionaries meant to store values that change while the model is training, such as loss. The command `wandb.Run.log()` appends to this object. * `config`: A dictionary of the run's configuration information, such as the hyperparameters for a training run or the preprocessing methods for a run that creates a dataset artifact. Think of these as the run's inputs. Runs object structure ## Access artifacts Artifacts are a core concept in W\&B. They are a versioned, named collection of files and directories. Use artifacts to track model weights, datasets, and any other file or directory. W\&B stores artifacts, and you can download them or use them in other runs. For more details and examples, see [Accessing artifacts](https://wandb.ai/luis_team_test/weave_example_queries/reports/Weave-queries---Vmlldzo1NzIxOTY2#4.-accessing-artifacts). You normally access artifacts from the `project` object: * `project.artifactVersion()`: Returns the specific artifact version for a given name and version within a project. * `project.artifact("")`: Returns the artifact for a given name within a project. You can then use `.versions` to get a list of all versions of this artifact. * `project.artifactType()`: Returns the `artifactType` for a given name within a project. You can then use `.artifacts` to get a list of all artifacts with this type. * `project.artifactTypes`: Returns a list of all artifact types under the project. Artifact access methods # Embed objects Source: https://docs.wandb.ai/models/app/features/panels/query-panels/embedding-projector W&B's Embedding Projector lets you plot multi-dimensional embeddings on a 2D plane using common dimension reduction algorithms like PCA, UMAP, and t-SNE. Embedding projector [Embeddings](https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture) represent objects such as people, images, posts, or words with a list of numbers, sometimes referred to as a *vector*. In machine learning and data science use cases, you can generate embeddings using a variety of approaches across a range of applications. This page assumes you're familiar with embeddings and want to visually analyze them inside W\&B. This guide shows you how to log embeddings to W\&B and use the Embedding Projector to plot them on a 2D plane with dimension reduction algorithms such as PCA, UMAP, and t-SNE. Visualizing embeddings this way helps you explore clusters, inspect relationships between data points, and validate that your embeddings capture the structure you expect. ## Embedding examples The following resources demonstrate the Embedding Projector in action before you try it yourself: * [Live interactive demo report](https://wandb.ai/timssweeney/toy_datasets/reports/Feature-Report-W-B-Embeddings-Projector--VmlldzoxMjg2MjY4?accessToken=bo36zrgl0gref1th5nj59nrft9rc4r71s53zr2qvqlz68jwn8d8yyjdz73cqfyhq) * [Example Colab](https://colab.research.google.com/drive/1DaKL4lZVh3ETyYEM1oJ46ffjpGs8glXA#scrollTo=D--9i6-gXBm_) ### Hello world This minimal example shows the smallest amount of code needed to log embeddings and view them in the projector. W\&B lets you log embeddings using the `wandb.Table` class. Consider the following example of three embeddings, each consisting of five dimensions: ```python theme={null} import wandb with wandb.init(project="embedding_tutorial") as run: embeddings = [ # D1 D2 D3 D4 D5 [0.2, 0.4, 0.1, 0.7, 0.5], # embedding 1 [0.3, 0.1, 0.9, 0.2, 0.7], # embedding 2 [0.4, 0.5, 0.2, 0.2, 0.1], # embedding 3 ] run.log( {"embeddings": wandb.Table(columns=["D1", "D2", "D3", "D4", "D5"], data=embeddings)} ) run.finish() ``` After you run the preceding code, the W\&B dashboard contains a new Table with your data. Select **2D Projection** from the upper-right panel selector to plot the embeddings in two dimensions. W\&B automatically selects smart defaults, which you can override in the configuration menu by clicking the gear icon. In this example, W\&B uses all five available numeric dimensions. 2D projection example ### Digits MNIST The next example demonstrates a more realistic workflow with higher-dimensional data and richer overlays. While the preceding example shows the basic mechanics of logging embeddings, you typically work with many more dimensions and samples. Consider the MNIST Digits dataset ([UCI ML hand-written digits dataset](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits)) made available through [SciKit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html). This dataset has 1,797 records, each with 64 dimensions. The problem is a 10-class classification use case. You can also convert the input data to an image for visualization. ```python theme={null} import wandb from sklearn.datasets import load_digits with wandb.init(project="embedding_tutorial") as run: # Load the dataset ds = load_digits(as_frame=True) df = ds.data # Create a "target" column df["target"] = ds.target.astype(str) cols = df.columns.tolist() df = df[cols[-1:] + cols[:-1]] # Create an "image" column df["image"] = df.apply( lambda row: wandb.Image(row[1:].values.reshape(8, 8) / 16.0), axis=1 ) cols = df.columns.tolist() df = df[cols[-1:] + cols[:-1]] run.log({"digits": df}) ``` After you run the preceding code, the UI again presents a Table. Select **2D Projection** to configure the embedding definition, coloring, algorithm (PCA, UMAP, t-SNE), algorithm parameters, and overlay. In this case, W\&B shows the image when you hover over a point. These are all smart defaults, and you should see something similar with a single click of **2D Projection**. [Interact with this embedding tutorial example](https://wandb.ai/timssweeney/embedding_tutorial/runs/k6guxhum?workspace=user-timssweeney). MNIST digits projection ## Logging options The following sections describe the supported ways to structure embedding data when you log it to W\&B. You can log embeddings in several formats: * **Single embedding column:** Often your data is already in a matrix-like format. In this case, you can create a single embedding column, where the data type of the cell values can be `list[int]`, `list[float]`, or `np.ndarray`. * **Multiple numeric columns:** The preceding two examples use this approach and create a column for each dimension. W\&B accepts Python `int` or `float` for the cells. Single embedding column Multiple numeric columns Just like all tables, you have several options for how to construct the table: * Directly from a **dataframe** using `wandb.Table(dataframe=df)`. * Directly from a **list of data** using `wandb.Table(data=[...], columns=[...])`. * Build the table **incrementally row by row** (great if you have a loop in your code). Add rows to your table using `table.add_data(...)`. * Add an **embedding column** to your table (great if you have a list of predictions in the form of embeddings): `table.add_col("col_name", ...)`. * Add a **computed column** (great if you have a function or model you want to map over your table): `table.add_computed_columns(lambda row, ndx: {"embedding": model.predict(row)})`. ## Plotting options After you log your embeddings, you can adjust how they are projected and rendered. After you select **2D Projection**, click the gear icon to edit the rendering settings. Besides selecting the intended columns (see preceding sections), you can select an algorithm of interest along with the desired parameters. The following images show the parameters for UMAP and t-SNE. UMAP parameters t-SNE parameters W\&B downsamples to a random subset of 1,000 rows and 50 dimensions for all three algorithms. # Compare run metrics Source: https://docs.wandb.ai/models/app/features/panels/run-comparer Use the Run Comparer panel to view and compare configuration and metric differences across W&B experiment runs. Use the Run Comparer to see differences and similarities across runs in your project. This helps you quickly identify how configuration changes affect metrics so you can decide which experiments to pursue further. ## Add a Run Comparer panel Add the Run Comparer panel to your workspace before you can use it to compare runs. 1. Select the **Add panels** button in the top right corner of the page. 2. From the **Evaluation** section, select **Run comparer**. ## Use Run Comparer Run Comparer shows the configuration and logged metrics for the first 10 visible runs in the project, one column per run. After you add the panel, use the following options to explore and refine your comparison: * To change the runs to compare, search, filter, group, or sort the list of runs on the left. The Run Comparer updates automatically. * To filter or search for a configuration key or a metadata key such as the Python version or the run's creation time, use the search field at the top of the Run Comparer. * To see differences and hide identical values, toggle **Diff only** at the top of the panel. * To adjust the column width or row height, use the formatting buttons at the top of the panel. * To copy any configuration or metric's value, hover over the value, then click the copy button. The entire value is copied, even if it's too long to display on the screen. By default, Run Comparer doesn't differentiate runs with different values for [`job_type`](/models/ref/python/functions/init). This means you can compare runs that aren't comparable within a project. For example, you could compare a training run to a model evaluation run. A training run could contain run logs, hyperparameters, training loss metrics, and the model itself. An evaluation run could use the model to check the model's performance on new training data. When you search, filter, group, or sort the list of runs in the Runs Table, the Run Comparer automatically updates to compare the first 10 runs. Filter or search within the Runs Table to compare similar runs, such as by filtering or sorting the list by `job_type`. Learn more about [filtering runs](/models/runs/filter-runs/). # Scatter plots Source: https://docs.wandb.ai/models/app/features/panels/scatter-plot Create and customize scatter plots in W&B to compare runs and visualize relationships between experiment metrics. Use scatter plots in W\&B to compare runs and visualize relationships between experiment metrics. Scatter plots help you spot trends, outliers, and trade-offs across many runs, which is useful when tuning hyperparameters or comparing model variants. ## Use case Use scatter plots to compare multiple runs and visualize the performance of an experiment. With a scatter plot, you can: * Plot lines for minimum, maximum, and average values. * Customize metadata tooltips. * Control point colors. * Adjust axis ranges. * Use a log scale for the axes. * Configure how the **frontier line** and related points are labeled and emphasized. See [Frontier line display options](#frontier-line-display-options). ## Example The following example shows a scatter plot displaying validation accuracy for different models over several weeks of experimentation. The tooltip includes batch size, dropout, and axis values. A line also shows the running average of validation accuracy. [See a live example →](https://app.wandb.ai/l2k2/l2k/reports?view=carey%2FScatter%20Plot) Validation accuracy scatter plot ## Create a scatter plot To create a scatter plot in the W\&B UI: 1. Navigate to the **Workspaces** tab. 2. In the **Charts** panel, click the **action ()** menu. 3. From the pop-up menu, select **Add panels**. 4. In the **Add panels** menu, select **Scatter plot**. 5. Set the `x` and `y` axes to plot the data you want to view. Optionally, set maximum and minimum ranges for your axes or add a `z` axis. 6. Click **Apply** to create the scatter plot. The new scatter plot appears in the **Charts** panel, displaying the data you configured. ## Frontier line display options The *frontier line* on a scatter plot connects the highest and lowest y-axis values observed so far for the data in the plot. You can tune how the line and the surrounding points appear from the plot settings to emphasize the frontier. To configure frontier line display options: 1. Hover over the scatter plot panel, then click the gear icon to open the panel settings drawer. 2. Select the **Annotations** tab. 3. Set the options: * **Show run labels**: Toggle whether to show a label with the run name or group name. * **Dim non-frontier points**: Toggle whether to reduce the visual weight of points that aren't on the frontier line so the frontier stands out. 4. Click **Apply** to save your changes. The scatter plot updates to reflect your annotation choices. # Keyboard shortcuts Source: https://docs.wandb.ai/models/app/keyboard-shortcuts Learn about the keyboard shortcuts available in W&B. W\&B supports keyboard shortcuts to help you navigate and interact with experiments, workspaces, and data more efficiently. This guide covers keyboard shortcuts for the W\&B App and the W\&B LEET (Lightweight Experiment Exploration Tool) terminal UI. The following tables list keyboard shortcuts available in the W\&B App, grouped by where they apply. ## Workspace management | Shortcut | Description | | ---------------------------------------------------- | ---------------------------------------------------------------------------------------- | | **Cmd+Z** (macOS) / **Ctrl+Z** (Windows/Linux) | Undo a change you've made in the UI, such as a modification to the workspace or a panel. | | **Cmd+Shift+Z** (macOS) / **Ctrl+Y** (Windows/Linux) | Redo a change you undid in the workspace. | ## Navigation | Shortcut | Description | | ---------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Tab** | Navigate between interactive elements. | | **Cmd+J** (macOS) / **Ctrl+J** (Windows/Linux) | Switch between the Workspaces and Runs tabs in the project sidebar. | | **Cmd+.** (macOS) / **Ctrl+.** (Windows/Linux) | Minimize or restore the Runs selector sidebar to reclaim screen space. | | **Cmd+K** (macOS) / **Ctrl+K** (Windows/Linux) | Open the quick search dialog to search across projects, runs, and other resources. | | **Esc** | Throughout the W\&B App, exit full-screen panel views, close settings drawers, dismiss the quick search dialog, close an editor, or dismiss other overlays. | ## Panel navigation | Shortcut | Description | | -------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------- | | **Left Arrow / Right Arrow** | When in full-screen mode, step through panels in a section. | | **Esc** | Exit full-screen panel view and return to the workspace. | | **Cmd+Left Arrow / Cmd+Right Arrow** (macOS)
**Ctrl+Left Arrow / Ctrl+Right Arrow** (Windows/Linux) | When viewing a media panel in full-screen mode, move the step slider. | ## Media panels | Shortcut | Description | | --------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | | **Cmd + +/-** (macOS)
**Ctrl + +/-** (Windows/Linux) | When viewing an image in full-screen, zoom in or out.
When zoomed in, click and drag to pan the image. | | **Cmd + 0** (macOS)
**Ctrl + 0** (Windows/Linux) | When viewing an image in full-screen, reset to 100% zoom. | | **Shift + L** | When viewing an image in full-screen, zoom to fit. | | **Click** | In a video panel, select a video, before skipping forward or backward using the keyboard. | | **Left Arrow / Right Arrow** | In a video panel, skip forward or backward by the configured **video seek step**. | ## Reports | Shortcut | Description | | ---------------------- | --------------------------------------------------------- | | **Delete / Backspace** | Remove the selected panel grid from the report. | | **Enter** | Insert a Markdown block after typing `/mark` in a report. | | **Esc** | Exit the report editor. | | **Tab** | Navigate between interactive elements in a report. | ## Notes Keep the following points in mind when using keyboard shortcuts in the W\&B App: * Most keyboard shortcuts use **Cmd** on macOS and **Ctrl** on Windows/Linux. * The W\&B App implements custom handling for some browser default shortcuts. * Some shortcuts are context-sensitive and only work in specific areas of the app.
The following keyboard shortcuts work in the W\&B LEET (Lightweight Experiment Exploration Tool) terminal UI, which lets you explore experiments without leaving the command line. To launch LEET, run `wandb beta leet` in your terminal. For more information, see [`wandb beta leet`](/models/ref/cli/wandb-beta/wandb-beta-leet/).
# Artifacts overview Source: https://docs.wandb.ai/models/artifacts Overview of W&B Artifacts, how they work, and how to get started using them. Use W\&B Artifacts to track and version data as the inputs and outputs of your [W\&B Runs](/models/runs). For example, a model training run might take in a dataset as input and produce a trained model as output. You can log hyperparameters, metadata, and metrics to a run, and you can use an artifact to log, track, and version the dataset used to train the model as input and another artifact for the resulting model checkpoints as output. ## Use cases You can use artifacts throughout your entire ML workflow as inputs and outputs of [runs](/models/runs). You can use datasets, models, or even other artifacts as inputs for processing. Artifacts workflow diagram with inputs and outputs for model training, data processing, and model evaluation | Use Case | Input | Output | | ---------------------- | -------------------------------------- | ---------------------------- | | Model Training | Dataset (training and validation data) | Trained Model | | Dataset Pre-Processing | Dataset (raw data) | Dataset (pre-processed data) | | Model Evaluation | Model + Dataset (test data) | [W\&B Table](/models/tables) | | Model Optimization | Model | Optimized Model | The following code snippets are meant to be run in order. ## Create an artifact Create an artifact with four lines of code: 1. Create a [W\&B run](/models/runs). 2. Create an artifact object with [`wandb.Artifact`](/models/ref/python/experiments/artifact). 3. Add one or more files, such as a model file or dataset, to the artifact object with `wandb.Artifact.add_file()`. 4. Log your artifact to W\&B with `wandb.Run.log_artifact()`. For example, the following code snippet shows how to log a file called `dataset.h5` to an artifact called `example_artifact`: ```python theme={null} import wandb with wandb.init(project="artifacts-example", job_type="add-dataset") as run: artifact = wandb.Artifact(name="example_artifact", type="dataset") artifact.add_file(local_path="./dataset.h5", name="training_dataset") run.log_artifact(artifact) ``` * The `type` of the artifact affects how it appears in the W\&B platform. If you do not specify a `type`, it defaults to `unspecified`. * Each label of the dropdown represents a different `type` parameter value. In the above code snippet, the artifact's `type` is `dataset`. See the [track external files](/models/artifacts/track-external-files) page for information on how to add references to files or directories stored in external object storage, like an Amazon S3 bucket. ## Download an artifact Indicate the artifact you want to mark as input to your run with the [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#use_artifact) method. Continuing from the previous code snippet, the following code example shows how to use the artifact called `example_artifact` that was created earlier: ```python theme={null} with wandb.init(project="artifacts-example", job_type="add-dataset") as run: artifact = run.use_artifact("training_dataset:latest") # returns a run object using the "my_data" artifact ``` This returns an artifact object. Next, use the returned object to download all contents of the artifact: ```python theme={null} datadir = artifact.download() # downloads the full `my_data` artifact to the default directory. ``` You can pass a custom path into the `root` [parameter](/models/ref/python/experiments/artifact) to download an artifact to a specific directory. For alternate ways to download artifacts and to see additional parameters, see the guide on [downloading and using artifacts](/models/artifacts/download-and-use-an-artifact). ## Next steps * Learn how to [version](/models/artifacts/create-a-new-artifact-version) and [update](/models/artifacts/update-an-artifact) artifacts. * Learn how to trigger downstream workflows or notify a Slack channel in response to changes to your artifacts with [automations](/models/automations). * Learn about the [registry](/models/registry), a space that houses trained models. * Explore the [Python SDK](/models/ref/python/experiments/artifact) and [CLI](/models/ref/cli/wandb-artifact) reference guides. # Tutorial: Create, track, and use a dataset artifact Source: https://docs.wandb.ai/models/artifacts/artifacts-walkthrough Create, track, and use a dataset artifact with W&B. This walkthrough demonstrates how to create, track, and use a dataset artifact. ## 1. Log into W\&B Import the W\&B library and log in to W\&B. You will need to sign up for a free W\&B account if you have not done so already. ```python theme={null} import wandb wandb.login() ``` ## 2. Initialize a run Use [`wandb.init()`](/models/ref/python/functions/init) to initialize a run. This generates a background process to sync and log data. Provide a project name and a job type: ```python theme={null} # Create a W&B Run. Here we specify 'dataset' as the job type since this example # shows how to create a dataset artifact. with wandb.init(project="artifacts-example", job_type="upload-dataset") as run: # Your code here ``` ## 3. Create an artifact object Create an artifact object with the [`wandb.Artifact()`](/models/ref/python/experiments/artifact). Provide a name for the artifact and a description of the file type for the `name` and `type` parameters, respectively. For example, the following code snippet demonstrates how to create an artifact called `‘bicycle-dataset’` with a `‘dataset’` label: ```python theme={null} artifact = wandb.Artifact(name="bicycle-dataset", type="dataset") ``` For more information about how to construct an artifact, see [Construct artifacts](./construct-an-artifact). ## 4. Add the dataset to the artifact Add a file to the artifact. Common file types include models and datasets. The following example adds a dataset named `dataset.h5` that is saved locally on our machine to the artifact: ```python theme={null} # Add a file to the artifact's contents artifact.add_file(local_path="dataset.h5") ``` Replace the filename `dataset.h5` in the previous code snippet with the path to the file you want to add to the artifact. ## 5. Log the dataset Use the W\&B run objects `wandb.Run.log_artifact()` method to both save your artifact version and declare the artifact as an [output of the run](/models/artifacts/explore-and-traverse-an-artifact-graph). ```python theme={null} # Save the artifact version to W&B and mark it # as the output of this run run.log_artifact(artifact) ``` A `'latest'` [alias](/models/artifacts/create-a-custom-alias) is created by default when you log an artifact. For more information about artifact aliases and versions, see [Create a custom alias](./create-a-custom-alias) and [Create new artifact versions](./create-a-new-artifact-version), respectively. Putting this together, you script so far should look like this: ```python theme={null} import wandb wandb.login() with wandb.init(project="artifacts-example", job_type="upload-dataset") as run: artifact = wandb.Artifact(name="bicycle-dataset", type="dataset") artifact.add_file(local_path="dataset.h5") run.log_artifact(artifact) ``` ## 6. Download and use the artifact The following code example demonstrates the steps you can take to use an artifact you have logged and saved to the W\&B servers. 1. First, initialize a new run object with **`wandb.init()`.** 2. Second, use the run objects [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#use_artifact) method to tell W\&B what artifact to use. This returns an artifact object. 3. Third, use the artifacts [`wandb.Artifact.download()`](/models/ref/python/experiments/artifact#download) method to download the contents of the artifact. ```python theme={null} # Create a W&B Run. Here we specify 'training' for 'type' # because we will use this run to track training. with wandb.init(project="artifacts-example", job_type="training") as run: # Query W&B for an artifact and mark it as input to this run artifact = run.use_artifact("bicycle-dataset:latest") # Download the artifact's contents artifact_dir = artifact.download() ``` Alternatively, you can use the Public API (`wandb.Api`) to export (or update data) data already saved in a W\&B outside of a Run. See [Track external files](./track-external-files) for more information. # Create an artifact Source: https://docs.wandb.ai/models/artifacts/construct-an-artifact Create and log a W&B Artifact. Learn how to add one or more files or a URI reference to an Artifact. Use the W\&B Python SDK to construct artifacts from [W\&B Runs](/models/ref/python/experiments/run). You can add [files, directories, URIs, and files from parallel runs to artifacts](#add-files-to-an-artifact). After you add a file to an artifact, save the artifact to the W\&B Server or [your own private server](/platform/hosting/hosting-options/self-managed). Each artifact is associated with a run. For information on how to track external files, such as files stored in Amazon S3, see the [Track external files](./track-external-files) page. ## Construct an artifact Construct a [W\&B Artifact](/models/ref/python/experiments/artifact) in three steps: 1. [Create an artifact Python object with `wandb.Artifact()`](/models/artifacts/construct-an-artifact#create-an-artifact-python-object-with-wandb-artifact) 2. [Add one or more files to the artifact](/models/artifacts/construct-an-artifact#add-one-or-more-files-to-the-artifact) 3. [Save your artifact to the W\&B server](/models/artifacts/construct-an-artifact#save-your-artifact-to-the-w\&b-server) ### Create an artifact Python object with `wandb.Artifact()` Initialize the [`wandb.Artifact()`](/models/ref/python/experiments/artifact) class to create an artifact object. Specify the following parameters: * **Name**: The name of your artifact. The name should be unique, descriptive, and memorable. * **Type**: The type of artifact. The type should be simple, descriptive, and correspond to a single step of your machine learning pipeline. Common artifact types include `'dataset'` or `'model'`. W\&B uses the "name" and "type" you provide to create a directed acyclic graph in the W\&B App. See the [Explore and traverse artifact graphs](./explore-and-traverse-an-artifact-graph) for more information. Artifacts can not have the same name, regardless of type. In other words, you can not create an artifact named `cats` of type `dataset` and another artifact with the same name of type `model`. You can optionally provide a description and metadata when you initialize an artifact object. For more information on available attributes and parameters, see the [`wandb.Artifact`](/models/ref/python/experiments/artifact) Class definition in the Python SDK Reference Guide. Copy and paste the following code snippet to create an artifact object. Replace the `` and `` placeholders with your own values: ```python theme={null} import wandb # Create an artifact object artifact = wandb.Artifact(name="", type="") ``` ### Add one or more files to the artifact [Add files, directories, external URI references (such as Amazon S3) and more](/models/artifacts/construct-an-artifact#add-files-to-an-artifact) to your artifact object. To add a single file, use the artifact object's [`Artifact.add_file()`](/models/ref/python/experiments/artifact#add_file) method: ```python theme={null} artifact.add_file(local_path="path/to/file.txt", name="") ``` To add a directory, use the [`Artifact.add_dir()`](/models/ref/python/experiments/artifact#add_dir) method: ```python theme={null} artifact.add_dir(local_path="path/to/directory", name="") ``` See the next section, [Add files to an artifact](/models/artifacts/construct-an-artifact#add-files-to-an-artifact), for more information on how to add different file types to an artifact. ### Save your artifact to the W\&B server Save your artifact to the W\&B server. Use the run object's [`wandb.Run.log_artifact()`](/models/ref/python/experiments/run#log_artifact) method to save the artifact. ```python theme={null} with wandb.init(project="", job_type="") as run: run.log_artifact(artifact) ``` **When to use to use `wandb.Run.log_artifact()` or `Artifact.save()`** * Use `wandb.Run.log_artifact()` to create a new artifact and associate it with a specific run. * Use `Artifact.save()` to update an existing artifact without creating a new run. Putting this all together, the following code snippet demonstrates how to create a dataset artifact, add a file to the artifact, and save the artifact to W\&B: ```python theme={null} import wandb artifact = wandb.Artifact(name="", type="") artifact.add_file(local_path="path/to/file.txt", name="") artifact.add_dir(local_path="path/to/directory", name="") with wandb.init(project="", job_type="") as run: run.log_artifact(artifact) ``` Each time you log an artifact with the same name and type, W\&B creates a new version of that artifact. For more information, see [Create a new artifact version](/models/artifacts/create-a-new-artifact-version). W\&B performs calls `wandb.Run.log_artifact()` asynchronously for performant uploads. This can cause surprising behavior when logging artifacts in a loop. For example: ```python theme={null} with wandb.init() as run: for i in range(10): a = wandb.Artifact(name = "race", type="dataset", metadata={ "index": i, }, ) # ... add files to artifact a ... run.log_artifact(a) ``` The artifact version **v0** might not have an index of 0 in its metadata because artifacts may be logged in an arbitrary order. ## Add files to an artifact The following sections demonstrate how to add different types of objects to an artifact. Assume you have a directory with the following structure as you read through the examples: ```text theme={null} root-directory | - hello.txt | - images/ | -- | cat.png | -- | dog.png | - checkpoints/ | -- | model.h5 | - models/ | -- | model.h5 ``` ### Add a single file Use [`wandb.Artifact.add_file()`](/models/ref/python/experiments/artifact#method-artifact-add-file) to add a single local file to an artifact. Provide the local path to the file as the `local_path` parameter: ```python theme={null} import wandb # Initialize an artifact object artifact = wandb.Artifact(name="", type="") # Add a single file artifact.add_file(local_path="path/file.format") ``` For example, suppose you had a file called `'hello.txt'` in your working local directory. ```python theme={null} artifact.add_file("hello.txt") ``` The artifact now has the following content: ```text theme={null} hello.txt ``` Optionally, pass a different name to the `name` parameter to rename the file within the artifact object itself. Continuing the previous example: ```python theme={null} artifact.add_file( local_path="hello.txt", name="new/path/hello_world.txt" ) ``` The artifact is stored as: ```text theme={null} new/path/hello_world.txt ``` The following table shows how different API calls produce different artifact contents: | API Call | Resulting artifact | | --------------------------------------------------------- | ------------------- | | `artifact.new_file('hello.txt')` | `hello.txt` | | `artifact.add_file('model.h5')` | `model.h5` | | `artifact.add_file('checkpoints/model.h5')` | `model.h5` | | `artifact.add_file('model.h5', name='models/mymodel.h5')` | `models/mymodel.h5` | ### Add multiple files Use the [`wandb.Artifact.add_dir()`](/models/ref/python/experiments/artifact#method-artifact-add-dir) method to add multiple files from a local directory to an artifact. Provide the local path to the directory as the `local_path` parameter. ```python theme={null} import wandb # Initialize an artifact object artifact = wandb.Artifact(name="", type="") # Add a local directory to the artifact artifact.add_dir(local_path="path/file.format", name="optional-prefix") ``` The following table show how different API calls produce different artifact contents: | API Call | Resulting artifact | | ------------------------------------------- | -------------------------------------------------------------------- | | `artifact.add_dir('images')` |

cat.png

dog.png

| | `artifact.add_dir('images', name='images')` |

images/cat.png

images/dog.png

| ### Add a URI reference Artifacts track checksums and other information for reproducibility if the URI has a scheme that the W\&B library supports. Add an external URI reference to an artifact with the [`wandb.Artifact.add_reference()`](/models/ref/python/experiments/artifact#method-artifact-add-reference) method. Replace the `'uri'` string with your own URI. Optionally pass the desired path within the artifact for the name parameter. ```python theme={null} # Add a URI reference artifact.add_reference(uri="uri", name="optional-name") ``` Artifacts support the following URI schemes: * `http(s)://`: A path to a file accessible over HTTP. The artifact will track checksums in the form of etags and size metadata if the HTTP server supports the `ETag` and `Content-Length` response headers. * `s3://`: A path to an object or object prefix in S3. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects. * `gs://`: A path to an object or object prefix in GCS. The artifact will track checksums and versioning information (if the bucket has object versioning enabled) for the referenced objects. Object prefixes are expanded to include the objects under the prefix, up to a maximum of 10,000 objects. The following table shows how different API calls produce different artifact contents: | API call | Resulting artifact contents | | ----------------------------------------------------------------------------- | -------------------------------------------------------------------- | | `artifact.add_reference('s3://my-bucket/model.h5')` | `model.h5` | | `artifact.add_reference('s3://my-bucket/checkpoints/model.h5')` | `model.h5` | | `artifact.add_reference('s3://my-bucket/model.h5', name='models/mymodel.h5')` | `models/mymodel.h5` | | `artifact.add_reference('s3://my-bucket/images')` |

cat.png

dog.png

| | `artifact.add_reference('s3://my-bucket/images', name='images')` |

images/cat.png

images/dog.png

| ### Add files to artifacts from parallel runs For large datasets or distributed training, multiple parallel runs might need to contribute to a single artifact. ```python theme={null} import wandb import time # This example uses Ray to runs in parallel # for demonstration purposes. import ray ray.init() artifact_type = "dataset" artifact_name = "parallel-artifact" table_name = "distributed_table" parts_path = "parts" num_parallel = 5 # Each batch of parallel writers should have its own # unique group name. group_name = "writer-group-{}".format(round(time.time())) @ray.remote def train(i): """ Our writer job. Each writer will add one image to the artifact. """ with wandb.init(group=group_name) as run: artifact = wandb.Artifact(name=artifact_name, type=artifact_type) # Add data to a wandb table. table = wandb.Table(columns=["a", "b", "c"], data=[[i, i * 2, 2**i]]) # Add the table to folder in the artifact artifact.add(table, "{}/table_{}".format(parts_path, i)) # Upserting the artifact creates or appends data to the artifact run.upsert_artifact(artifact) # Launch your runs in parallel result_ids = [train.remote(i) for i in range(num_parallel)] # Join on all the writers to make sure their files have # been added before finishing the artifact. ray.get(result_ids) # Once all the writers are finished, finish the artifact # to mark it ready. with wandb.init(group=group_name) as run: artifact = wandb.Artifact(artifact_name, type=artifact_type) # Create a "PartitionTable" pointing to the folder of tables # and add it to the artifact. artifact.add(wandb.data_types.PartitionedTable(parts_path), table_name) # Finish artifact finalizes the artifact, disallowing future "upserts" # to this version. run.finish_artifact(artifact) ``` ## Find path for logged artifacts and other metadata The following code snippet shows how to use the [W\&B Public API](/models/ref/python/public-api/) to list the files in a run, including their names and URLs. Replace the `` placeholder with your own values: ```python theme={null} from wandb.apis.public.files import Files from wandb.apis.public.api import Api # Example run object run = Api().run("") # Create a Files object to iterate over files in the run files = Files(api.client, run) # Iterate over files for file in files: print(f"File Name: {file.name}") print(f"File URL: {file.url}") print(f"Path to file in the bucket: {file.direct_url}") ``` See the [File](/models/ref/python/public-api/file) Class for more information on available attributes and methods. # Create an artifact alias Source: https://docs.wandb.ai/models/artifacts/create-a-custom-alias Create and manage custom aliases to reference specific W&B artifact versions by meaningful names like best or production. Use aliases as pointers to specific versions. By default, `wandb.Run.log_artifact()` adds the `latest` alias to the logged version. W\&B creates an artifact version `v0` and attaches it to your artifact when you log that artifact for the first time. W\&B checksums the contents when you log again to the same artifact. If the artifact changed, W\&B saves a new version `v1`. For example, if you want your training script to pull the most recent version of a dataset, specify `latest` when you use that artifact. The following code example demonstrates how to download a recent dataset artifact named `bike-dataset` that has an alias, `latest`: ```python theme={null} import wandb with wandb.init(project="") as run: artifact = run.use_artifact("bike-dataset:latest") artifact.download() ``` You can also apply a custom alias to an artifact version. For example, if you want to mark that model checkpoint is the best on the metric AP-50, you could add the string `'best-ap50'` as an alias when you log the model artifact. ```python theme={null} with wandb.init(project="") as run: artifact = wandb.Artifact("run-3nq3ctyy-bike-model", type="model") artifact.add_file("model.h5") run.log_artifact(artifact, aliases=["latest", "best-ap50"]) ``` # Create an artifact version Source: https://docs.wandb.ai/models/artifacts/create-a-new-artifact-version Create a new artifact version from a single run or from a distributed process. Create a new artifact version with a single [run](/models/runs/) or collaboratively with distributed runs. You can optionally create a new artifact version from a previous version known as an [incremental artifact](#create-a-new-artifact-version-from-an-existing-version). We recommend that you create an incremental artifact when you need to apply changes to a subset of files in an artifact, where the size of the original artifact is significantly larger. ## Create new artifact versions from scratch There are two ways to create a new artifact version: from a single run and from distributed runs. They are defined as follows: * **Single run**: A single run provides all the data for a new version. This is the most common case and is best suited when the run fully recreates the needed data. For example: outputting saved models or model predictions in a table for analysis. * **Distributed runs**: A set of runs collectively provides all the data for a new version. This is best suited for distributed jobs which have multiple runs generating data, often in parallel. For example: evaluating a model in a distributed manner, and outputting the predictions. W\&B will create a new artifact and assign it a `v0` alias if you pass a name to the `wandb.Artifact` API that does not exist in your project. W\&B checksums the contents when you log again to the same artifact. If the artifact changed, W\&B saves a new version `v1`. W\&B will retrieve an existing artifact if you pass a name and artifact type to the `wandb.Artifact` API that matches an existing artifact in your project. The retrieved artifact will have a version greater than 1. Artifact workflow comparison ### Single run Log a new version of an Artifact with a single run that produces all the files in the artifact. This case occurs when a single run produces all the files in the artifact. Based on your use case, select one of the tabs below to create a new artifact version inside or outside of a run: Create an artifact version within a W\&B run: 1. Create a run with `wandb.init()`. 2. Create a new artifact or retrieve an existing one with `wandb.Artifact`. 3. Add files to the artifact with `.add_file`. 4. Log the artifact to the run with `.log_artifact`. ```python theme={null} with wandb.init() as run: artifact = wandb.Artifact("artifact_name", "artifact_type") # Add Files and Assets to the artifact using # `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png") run.log_artifact(artifact) ``` Create an artifact version outside of a W\&B run: 1. Create a new artifact or retrieve an existing one with `wanb.Artifact`. 2. Add files to the artifact with `.add_file`. 3. Save the artifact with `.save`. ```python theme={null} artifact = wandb.Artifact("artifact_name", "artifact_type") # Add Files and Assets to the artifact using # `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png") artifact.save() ``` ### Distributed runs Allow a collection of runs to collaborate on a version before committing it. This is in contrast to single run mode described above where one run provides all the data for a new version. 1. Each run in the collection needs to be aware of the same unique ID (called `distributed_id`) in order to collaborate on the same version. By default, if present, W\&B uses the run's `group` as set by `wandb.init(group=GROUP)` as the `distributed_id`. 2. There must be a final run that "commits" the version, permanently locking its state. 3. Use `upsert_artifact` to add to the collaborative artifact and `finish_artifact` to finalize the commit. Consider the following example. Different runs (labelled below as **Run 1**, **Run 2**, and **Run 3**) add a different image file to the same artifact with `upsert_artifact`. #### Run 1 ```python theme={null} with wandb.init() as run: artifact = wandb.Artifact("artifact_name", "artifact_type") # Add Files and Assets to the artifact using # `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image1.png") run.upsert_artifact(artifact, distributed_id="my_dist_artifact") ``` #### Run 2 ```python theme={null} with wandb.init() as run: artifact = wandb.Artifact("artifact_name", "artifact_type") # Add Files and Assets to the artifact using # `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image2.png") run.upsert_artifact(artifact, distributed_id="my_dist_artifact") ``` #### Run 3 Must run after Run 1 and Run 2 complete. The Run that calls `wandb.Run.finish_artifact()` can include files in the artifact, but does not need to. ```python theme={null} with wandb.init() as run: artifact = wandb.Artifact("artifact_name", "artifact_type") # Add Files and Assets to the artifact # `.add`, `.add_file`, `.add_dir`, and `.add_reference` artifact.add_file("image3.png") run.finish_artifact(artifact, distributed_id="my_dist_artifact") ``` ## Create a new artifact version from an existing version Add, modify, or remove a subset of files from a previous artifact version without the need to re-index the files that didn't change. Adding, modifying, or removing a subset of files from a previous artifact version creates a new artifact version known as an *incremental artifact*. Incremental artifact versioning Here are some scenarios for each type of incremental change you might encounter: * add: you periodically add a new subset of files to a dataset after collecting a new batch. * remove: you discovered several duplicate files and want to remove them from your artifact. * update: you corrected annotations for a subset of files and want to replace the old files with the correct ones. You could create an artifact from scratch to perform the same function as an incremental artifact. However, when you create an artifact from scratch, you will need to have all the contents of your artifact on your local disk. When making an incremental change, you can add, remove, or modify a single file without changing the files from a previous artifact version. You can create an incremental artifact within a single run or with a set of runs (distributed mode). Follow the procedure below to incrementally change an artifact: 1. Obtain the artifact version you want to perform an incremental change on: ```python theme={null} saved_artifact = run.use_artifact("my_artifact:latest") ``` ```python theme={null} client = wandb.Api() saved_artifact = client.artifact("my_artifact:latest") ``` 2. Create a draft with: ```python theme={null} draft_artifact = saved_artifact.new_draft() ``` 3. Perform any incremental changes you want to see in the next version. You can either add, remove, or modify an existing entry. Select one of the tabs for an example on how to perform each of these changes: Add a file to an existing artifact version with the `add_file` method: ```python theme={null} draft_artifact.add_file("file_to_add.txt") ``` You can also add multiple files by adding a directory with the `add_dir` method. Remove a file from an existing artifact version with the `remove` method: ```python theme={null} draft_artifact.remove("file_to_remove.txt") ``` You can also remove multiple files with the `remove` method by passing in a directory path. Modify or replace contents by removing the old contents from the draft and adding the new contents back in: ```python theme={null} draft_artifact.remove("modified_file.txt") draft_artifact.add_file("modified_file.txt") ``` 4. Lastly, log or save your changes. The following tabs show you how to save your changes inside and outside of a W\&B run. Select the tab that is appropriate for your use case: ```python theme={null} run.log_artifact(draft_artifact) ``` ```python theme={null} draft_artifact.save() ``` Putting it all together, the code examples above look like: ```python theme={null} with wandb.init(job_type="modify dataset") as run: saved_artifact = run.use_artifact( "my_artifact:latest" ) # fetch artifact and input it into your run draft_artifact = saved_artifact.new_draft() # create a draft version # modify a subset of files in the draft version draft_artifact.add_file("file_to_add.txt") draft_artifact.remove("dir_to_remove/") run.log_artifact( draft_artifact ) # log your changes to create a new version and mark it as output to your run ``` ```python theme={null} client = wandb.Api() saved_artifact = client.artifact("my_artifact:latest") # load your artifact draft_artifact = saved_artifact.new_draft() # create a draft version # modify a subset of files in the draft version draft_artifact.remove("deleted_file.txt") draft_artifact.add_file("modified_file.txt") draft_artifact.save() # commit changes to the draft ``` # Artifact data privacy and compliance Source: https://docs.wandb.ai/models/artifacts/data-privacy-and-compliance Learn where W&B files are stored by default. Explore how to save, store sensitive information. Files are uploaded to a Google Cloud bucket managed by W\&B when you log artifacts. The contents of the bucket are encrypted both at rest and in transit. Artifact files are only visible to users who have access to the corresponding project. GCS W&B Client Server diagram When you delete a version of an artifact, it is marked for soft deletion in our database and removed from your storage cost. When you delete an entire artifact, it is queued for permanent deletion and all of its contents are removed from the W\&B bucket. If you have specific needs around file deletion, reach out to [Customer Support](mailto:support@wandb.com). By default, deleted artifacts are retained for 7 days and can be restored during this period, which is configurable for Dedicated Cloud. Learn more about data retention in [Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud#data-retention-policy) or [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud#data-retention-policy). For sensitive datasets that cannot reside in a multi-tenant environment, you can use [W\&B Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [reference artifacts](/models/artifacts/track-external-files). Reference artifacts track references to private buckets without sending file contents to W\&B. Reference artifacts maintain links to files on your buckets or servers. W\&B only keeps track of the metadata associated with the files, not the files themselves. W&B Client Server Cloud diagram For more information about deployment options, see [Dedicated Cloud](/platform/hosting/hosting-options/dedicated-cloud) or [Self-Managed](/platform/hosting/hosting-options/self-managed). To discuss your specific requirements, contact [contact@wandb.com](mailto:contact@wandb.com). # Delete an artifact Source: https://docs.wandb.ai/models/artifacts/delete-artifacts Delete artifacts interactively with the App UI or programmatically with the W&B Python SDK. Delete artifacts interactively with the W\&B App or programmatically with the W\&B Python SDK. When you delete an artifact, W\&B marks that artifact as a *soft-delete*. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. The contents of the artifact remain as a soft-delete, or pending deletion state, until a regularly run garbage collection process reviews all artifacts marked for deletion. The garbage collection process deletes associated files from storage if the artifact and its associated files are not used by a previous or subsequent artifact versions. Garbage collection is **best-effort**. W\&B does not guarantee how quickly freed space appears in your object storage after you delete an artifact. Large deployments or backlogs can take longer than expected. For how this fits with run data, retention settings, and optional operator actions, see [Manage bucket storage and costs](/platform/hosting/managing-bucket-storage). ## Artifact garbage collection workflow The following diagram illustrates the complete artifact garbage collection process: ```mermaid theme={null} graph TB Start([Artifact Deletion Initiated]) --> DeleteMethod{Deletion Method} DeleteMethod -->|UI| UIDelete[Delete via W&B App] DeleteMethod -->|SDK| SDKDelete[Delete via W&B Python SDK] DeleteMethod -->|TTL| TTLDelete[TTL Policy Expires] UIDelete --> SoftDelete[Artifact Marked as
'Soft Delete'] SDKDelete --> SoftDelete TTLDelete --> SoftDelete SoftDelete --> GCWait[(Wait for
best-effort
Garbage Collection)] GCWait --> GCRun[Garbage Collection
Process Runs

- Reviews all soft-deleted artifacts
- Checks file dependencies] GCRun --> CheckUsage{Are files used by
other artifact versions?} CheckUsage -->|Yes| KeepFiles[Files Kept in Storage

- Artifact marked deleted
- Files remain for other versions] CheckUsage -->|No| DeleteFiles[Files Deleted from Storage

- Artifact fully removed
- Storage space reclaimed] KeepFiles --> End([End]) DeleteFiles --> End style Start fill:#e1f5fe,stroke:#333,stroke-width:2px,color:#000 style SoftDelete fill:#fff3e0,stroke:#333,stroke-width:2px,color:#000 style GCRun fill:#f3e5f5,stroke:#333,stroke-width:2px,color:#000 style KeepFiles fill:#e8f5e9,stroke:#333,stroke-width:2px,color:#000 style DeleteFiles fill:#ffebee,stroke:#333,stroke-width:2px,color:#000 style End fill:#e0e0e0,stroke:#333,stroke-width:2px,color:#000 ``` You can schedule when artifacts are deleted from W\&B with TTL policies. For more information, see [Manage data retention with Artifact TTL policy](./ttl). Artifacts deleted by a TTL policy, the W\&B Python SDK, or the W\&B App are first soft-deleted. Soft-deleted artifacts are then garbage-collected before they are permanently deleted. Deleting an entity, project, or artifact collection triggers the artifact deletion process described on this page. When you delete a run and choose to delete its associated artifacts, those artifacts follow the same soft-delete and garbage collection workflow. ## Delete an artifact version Delete an artifact version interactively with the W\&B App or programmatically with the W\&B Python SDK. To delete an artifact version: 1. Navigate to the project that contains the artifact version you want to delete. 2. Select the **Artifacts** tab. 3. From the list of artifact types, select the type of artifact that contains the version you want to delete. 4. Click the **action ()** menu next to the artifact version you want to delete. 5. From the dropdown, choose **Delete Version**. Delete an artifact version programmatically with the [wandb.Artifact.delete()](/models/ref/python/experiments/artifact#delete) method. Provide the full name of the artifact. The full name consists of `//:`. Set the `delete_aliases` parameter to `True` to delete the artifact even if it has one or more aliases associated with it. ```python theme={null} import wandb api = wandb.Api() # Get the artifact by its path artifact = api.artifact("//:") # Delete the artifact version along with any aliases artifact.delete(delete_aliases=True) ``` ## Delete multiple artifact versions The following code example shows how to delete multiple artifact versions. Provide the entity, project name, and run ID that created the artifact as arguments to `wandb.Api.run()`. This returns a run object that you can use to access all artifact versions created by that run. Next, iterate through the artifact versions and delete the ones that match your criteria. Set the `delete_aliases` parameter to `True` (`wandb.Artifact.delete(delete_aliases=True)`) to delete an artifact version and any aliases associated with it. Replace the ``, ``, ``, and `` placeholders with your own values: ```python theme={null} import wandb # Initialize W&B API api = wandb.Api() # Get the run by its path. Consists of // run = api.run("//") # Specify the artifact name to delete versions for artifact_name = "" # Search and delete artifact versions with the specified name for artifact in run.logged_artifacts(): print(f"Found artifact: {artifact.name}") # Example name run_4dfbufgq_model:v0 # Grab only the artifact name without the version with split() if artifact.name.split(":")[0] == artifact_name: print(f"Deleting artifact version: {artifact.name}") artifact.delete(delete_aliases=True) ``` ## Delete multiple artifact versions with a specific alias The following code demonstrates how to delete multiple artifact versions that have a specific alias. Replace the ``, ``, ``, ``, and `` placeholders with your own values: ```python theme={null} import wandb # Initialize W&B API api = wandb.Api() # Get the run by its path. Consists of // run = api.run("//") # Specify the artifact name to delete versions for artifact_name = "" # Specify the alias to filter artifact versions for deletion desired_alias = "" # Delete artifacts logged to run with alias 'v3' and 'v4 for artifact in run.logged_artifacts(): print(f"Found artifact: {artifact.name}") if (artifact.name.split(":")[0] == artifact_name) and (desired_alias in artifact.aliases): artifact.delete(delete_aliases=True) ``` ## Delete an artifact collection To delete an artifact collection: 1. Navigate to the artifact collection you want to delete. 2. Select the **action ()** menu next to the artifact collection name. 3. From the dropdown menu, select **Delete**. Delete artifact collection programmatically with the [wandb.Artifact.delete()](/models/ref/python/experiments/artifact#delete) method. Provide the full path of the artifact collection to `wandb.Api.artifact_collection(name="")`. The full path consists of `//`. ```python theme={null} import wandb # Initialize W&B API api = wandb.Api() # Get the artifact collection by its path. Consists of # // collection = api.artifact_collection( type_name = "", name = "//" ) collection.delete() ``` ## Protected aliases and deletion permissions Artifacts with protected aliases have special deletion restrictions. [Protected aliases](/models/registry/aliases#protected-aliases) are aliases in the W\&B Registry that registry admins can set to prevent unauthorized deletion. **Important considerations for protected aliases:** * Artifacts with protected aliases cannot be deleted by non-registry admins. * Within a registry, registry admins can unlink protected artifact versions and delete collections/registries that contain protected aliases. * For source artifacts: if a source artifact is linked to a registry with a protected alias, it cannot be deleted by any user * Registry admins can remove the protected aliases from source artifacts and then delete them. ## Enable garbage collection based on how W\&B is hosted Garbage collection timing is not guaranteed. See [Manage bucket storage and costs](/platform/hosting/managing-bucket-storage) for details. Garbage collection is active by default if you use W\&B Multi-tenant Cloud. In W\&B Dedicated and Self-Managed, you might need to take these additional steps to activate garbage collection. 1. **W\&B Self-Managed**: Set `GORILLA_ARTIFACT_GC_ENABLED=true`. 2. **Dedicated Cloud**: Contact support to verify that garbage collection is active. 3. Enable bucket versioning if you use [AWS](https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html), [Google Cloud](https://cloud.google.com/storage/docs/object-versioning) or any other storage provider such as [Minio](https://min.io/docs/minio/linux/administration/object-management/object-versioning.html#enable-bucket-versioning). If you use Azure, [enable soft deletion](https://learn.microsoft.com/azure/storage/blobs/soft-delete-blob-overview), which is equivalent to bucket versioning. The following table describes how to satisfy requirements to enable garbage collection based on your deployment type. The `X` indicates you must satisfy the requirement: | | Environment variable | Enable versioning | | ------------------------------------------------------------------------------------------------ | -------------------- | ----------------- | | Multi-tenant Cloud | | | | Multi-tenant Cloud with [BYOB storage](/platform/hosting/data-security/secure-storage-connector) | | X | | Dedicated Cloud | | | | Dedicated Cloud with [BYOB storage](/platform/hosting/data-security/secure-storage-connector) | | X | | Self-Managed | X | X | note Secure storage connector is currently only available for Google Cloud Platform and Amazon Web Services. # Download and use artifacts Source: https://docs.wandb.ai/models/artifacts/download-and-use-an-artifact Download and use Artifacts from multiple projects. Download and use an artifact that is already stored on the W\&B server or construct an artifact object and pass it in to for de-duplication as necessary. Team members with view-only seats cannot download artifacts. ### Download and use an artifact stored on W\&B Download and use an artifact stored in W\&B either inside or outside of a W\&B Run. Use the Public API ([`wandb.Api`](/models/ref/python/public-api/api)) to export (or update data) already saved in W\&B. First, import the W\&B Python SDK. Next, create a W\&B [Run](/models/ref/python/experiments/run): ```python theme={null} import wandb with wandb.init(project="", job_type="") as run: # See next step ``` Indicate the artifact you want to use with the [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#use_artifact) method. This returns a run object. In the following code snippet specifies an artifact called `'bike-dataset'` with the alias `'latest'`: ```python theme={null} # Indicate the artifact to use. Format is "name:alias" artifact = run.use_artifact("bike-dataset:latest") ``` Use the object returned to download all the contents of the artifact: ```python theme={null} # Download the entire artifact datadir = artifact.download() ``` You can optionally pass a path to the root parameter to download the contents of the artifact to a specific directory. Use the [`wandb.Artifact.get_entry()`](/models/ref/python/experiments/artifact#get_entry) method to download only a subset of files: ```python theme={null} # Download a specific file entry = artifact.get_entry(name) ``` Putting this together, the complete code example looks like this: ```python theme={null} import wandb with wandb.init(project="", job_type="") as run: # Indicate the artifact to use. Format is "name:alias" artifact = run.use_artifact("bike-dataset:latest") # Download the entire artifact datadir = artifact.download() # Download a specific file entry = artifact.get_entry("bike.png") ``` This fetches only the file at the path `name`. It returns an `Entry` object with the following methods: * `Entry.download`: Downloads file from the artifact at path `name` * `Entry.ref`: If `add_reference` stored the entry as a reference, returns the URI First, import the W\&B SDK. Next, create an artifact object from the Public API Class. Provide the entity, project, artifact, and alias associated with that artifact: ```python theme={null} import wandb api = wandb.Api() artifact = api.artifact("entity/project/artifact:alias") ``` Use the object returned to download the contents of the artifact: ```python theme={null} artifact.download() ``` You can optionally pass a path the `root` parameter to download the contents of the artifact to a specific directory. For more information, see the [Python SDK Reference Guide](/models/ref/python/experiments/artifact#download). Use the `wandb artifact get` command to download an artifact from the W\&B server. ``` $ wandb artifact get project/artifact:alias --root mnist/ ``` ### Partially download an artifact You can optionally download part of an artifact based on a prefix. Use the `path_prefix=` (`wandb.Artifact.download(path_prefix=)`) parameter to download a single file or the content of a sub-folder. ```python theme={null} with wandb.init(project="", job_type="") as run: # Indicate the artifact to use. Format is "name:alias" artifact = run.use_artifact("bike-dataset:latest") # Download a specific file or sub-folder artifact.download(path_prefix="bike.png") # downloads only bike.png ``` Alternatively, you can download files from a certain directory. To do so, specify the directory within the `path_prefix=` parameter. Continuing from the previous code snippet: ```python theme={null} # downloads files in the images/bikes directory artifact.download(path_prefix="images/bikes/") ``` ### Use an artifact from a different project Specify the name of artifact along with its project name to reference an artifact. You can also reference artifacts across entities by specifying the name of the artifact with its entity name. The following code example demonstrates how to query an artifact from another project as input to the current W\&B run. ```python theme={null} with wandb.init(project="", job_type="") as run: # Query W&B for an artifact from another project and mark it # as an input to this run. artifact = run.use_artifact("my-project/artifact:alias") # Use an artifact from another entity and mark it as an input # to this run. artifact = run.use_artifact("my-entity/my-project/artifact:alias") ``` ### Construct and use an artifact simultaneously Simultaneously construct and use an artifact. Create an artifact object and pass it to use\_artifact. This creates an artifact in W\&B if it does not exist yet. The [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#use_artifact) API is idempotent, so you can call it as many times as you like. ```python theme={null} import wandb with wandb.init(project="", job_type="") as run: artifact = wandb.Artifact("reference model") artifact.add_file("model.h5") run.use_artifact(artifact) ``` For more information about constructing an artifact, see [Construct an artifact](/models/artifacts/construct-an-artifact/). # Explore artifact lineage graphs Source: https://docs.wandb.ai/models/artifacts/explore-and-traverse-an-artifact-graph View and traverse artifact lineage graphs to track the inputs and outputs of W&B runs as a directed acyclic graph. W\&B tracks the inputs and outputs of runs using directed acyclic graphs (DAGs) called *lineage graphs*. Lineage graphs are visual representations of the relationships between artifacts and runs in an ML experiment. They show how data and models flow through different stages of the ML lifecycle, from raw data ingestion to model training and evaluation. Tracking artifact lineage provides several key advantages: * **Reproducibility**: Enables teams to reproduce experiments, models, and results for debugging, experimentation, and validation. * **Version control**: Tracks changes to artifacts over time, allowing teams to revert to previous data or model versions when needed. * **Auditing**: Maintains a detailed record of artifacts and transformations to support compliance and governance. * **Collaboration**: Helps to improve teamwork by making experiment history transparent, reducing duplicated effort, and accelerating development. ## View an artifact's lineage graph To view an artifact's lineage graph: 1. Navigate to your project's workspace in the W\&B App. 2. Click on the **Artifacts** tab in the project sidebar. 3. Select an artifact, then click the **Lineage** tab. ## Navigate lineage graphs The lineage graph is a visual representation of the relationships between artifacts and runs in an ML experiment. Use the W\&B App UI or the Python SDK to explore and traverse an artifact's lineage graph. Nodes with green icons represent runs. Nodes with blue icons represent artifacts. Arrows between nodes indicate the input and output of a run or artifact. Artifact nodes display the artifact's name along with the version of the artifact in the form `:`. An artifact's type is displayed above the name of the artifact. You can view the type and the name of artifact in both the left sidebar and in the lineage graph node. Run nodes display the run's name. Run and artifact nodes Click any individual run to get more information about that runs such as the run's: start time, time duration, author, job type, and more. Click any individual artifact to get more information about the artifact's: aliases, creation time, type, version, description, the run that logged the artifact, file size, and more. Previewing a run Runs that create multiple versions of the same artifact are grouped together in a cluster. Click on a specific artifact version listed within the cluster to view specific information about that artifact version. Cluster of artifact versions in a lineage graph Click and drag a node to rearrange the graph to customize the layout. You can also zoom in and out of the graph to get a better view of the nodes and their relationships. Rearranging nodes in a lineage graph Hover your mouse over a node and click on the eye icon to hide or show a node in the graph. This is useful for decluttering the graph to focus on specific nodes and their relationships. Programmatically navigate a graph using the W\&B Python SDK. Use an artifact object's [`logged_by()`](/models/ref/python/experiments/artifact#method-artifact-logged-by) and [`used_by()`](/models/ref/python/experiments/artifact#method-artifact-used-by) methods to walk the graph: ```python theme={null} with wandb.init() as run: artifact = run.use_artifact("artifact_name:latest") # Walk up and down the graph from an artifact: producer_run = artifact.logged_by() consumer_runs = artifact.used_by() ``` ## Enable lineage graph tracking To enable lineage graph tracking, you need to mark artifacts as [inputs](/models/artifacts/explore-and-traverse-an-artifact-graph) or [outputs](/models/artifacts/explore-and-traverse-an-artifact-graph#track-the-output-of-a-run) of a run using the W\&B Python SDK. ### Track the input of a run Mark an artifact as the input (or dependency) of a run with the [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#method-runuse_artifact) method. Specify the name of the artifact and an optional alias to reference a specific version of that artifact. The name of the artifact is in the format `:` or `:`. Replace values enclosed in angle brackets (`< >`) with your values: ```python theme={null} import wandb # Initialize a run with wandb.init(entity="", project="") as run: # Get artifact, mark it as a dependency artifact = run.use_artifact(artifact_or_name="", aliases="") ``` ### Track the output of a run Use [`wandb.Run.log_artifact()`](/models/ref/python/experiments/run#log_artifact) to declare an artifact as an output of a run. First, create an artifact with the [`wandb.Artifact()`](/models/ref/python/experiments/artifact#wandb.Artifact) constructor. Then, log the artifact as an output of the run with `wandb.Run.log_artifact()`. Replace values enclosed in angle brackets (`< >`) with your values: ```python theme={null} import wandb # Initialize a run with wandb.init(entity="", project="") as run: # Create an artifact artifact = wandb.Artifact(name = "", type = "") artifact.add_file(local_path = "", name="") # Log the artifact as an output of the run run.log_artifact(artifact_or_path = artifact) ``` ## Artifact clusters When a level of the graph has five or more runs or artifacts, it creates a cluster. A cluster has a search bar to find specific versions of runs or artifacts and pulls an individual node from a cluster to continue investigating the lineage of a node inside a cluster. Clicking on a node opens a preview with an overview of the node. Clicking on the arrow extracts the individual run or artifact so you can examine the lineage of the extracted node. Searching a run cluster # Manage artifact storage and memory allocation Source: https://docs.wandb.ai/models/artifacts/storage Manage storage, memory allocation of W&B Artifacts. W\&B stores artifact files in a private Google Cloud Storage bucket located in the United States by default. All files are encrypted at rest and in transit. For sensitive files, we recommend you set up [Private Hosting](/platform/hosting/) or use [reference artifacts](/models/artifacts/track-external-files/). During training, W\&B locally saves logs, artifacts, and configuration files in the following local directories: | File | Default location | To change default location set: | | ---------------------------- | ---------------------- | ------------------------------------------------------------------- | | logs | `./wandb` | `dir` in `wandb.init()` or set the `WANDB_DIR` environment variable | | artifacts | `~/.cache/wandb` | the `WANDB_CACHE_DIR` environment variable | | configs | `~/.config/wandb` | the `WANDB_CONFIG_DIR` environment variable | | staging artifacts for upload | `~/.cache/wandb-data/` | the `WANDB_DATA_DIR` environment variable | | downloaded artifacts | `./artifacts` | the `WANDB_ARTIFACT_DIR` environment variable | For a complete guide to using environment variables to configure W\&B, see the [environment variables reference](/models/track/environment-variables/). Depending on the machine on `wandb` is initialized on, these default folders may not be located in a writeable part of the file system. This might trigger an error. ### Clean up local artifact cache W\&B caches artifact files to speed up downloads across versions that share files in common. Over time this cache directory can become large. Run the [`wandb artifact cache cleanup`](/models/ref/cli/wandb-artifact/wandb-artifact-cache/) command to prune the cache and to remove any files that have not been used recently. The following code snippet demonstrates how to limit the size of the cache to 1GB. Copy and paste the code snippet into your terminal: ```bash theme={null} $ wandb artifact cache cleanup 1GB ``` # Track external files Source: https://docs.wandb.ai/models/artifacts/track-external-files Track files saved in an external bucket, HTTP file server, or an NFS share. Use *reference artifacts* to track and use files saved outside of W\&B servers. Common external storage solutions include: CoreWeave AI Object Storage, an Amazon Simple Storage Service (Amazon S3) bucket, GCS bucket, Azure blob, HTTP file server, or NFS share. Reference artifacts behave similar to non-reference artifacts. The key difference is that the reference artifacts only consists of metadata about the files, such as their sizes and MD5 checksums. The files themselves never leave your system. You can interact with reference artifact similarly to non-reference artifacts. In the W\&B App, you can browse the contents of the reference artifact using the file browser, explore the full dependency graph, and scan through the versioned history of your artifact. However, the UI cannot render rich media such as images, audio, because the data itself is not contained within the artifact. If you log an artifact that does not track external files, W\&B saves the artifact's files to W\&B servers. This is the default behavior when you log artifacts with the W\&B Python SDK. If you log an artifact that tracks external files, W\&B logs metadata about the object, such as the object's ETag and size. If object versioning is enabled on the bucket, the version ID is also logged. The following sections describe how to track external reference artifacts. ## Track an artifact in an external bucket Use the W\&B Python SDK to track references to files stored outside of W\&B. 1. Initialize a run with `wandb.init()`. 2. Create an artifact object with `wandb.Artifact()`. 3. Specify the reference to the bucket path with the artifact object's `wandb.Artifact.add_reference()` method. 4. Log the artifact's metadata with `run.log_artifact()`. ```python theme={null} import wandb # Initialize a W&B run with wandb.init(project="my-project") as run: # Create an artifact object artifact = wandb.Artifact(name="name", type="type") # Add a reference to the bucket path artifact.add_reference(uri = "uri/to/your/bucket/path") # Log the artifact's metadata run.log_artifact(artifact) ``` As an example, suppose your bucket has the following directory structure: ```text theme={null} s3://my-bucket |datasets/ |-- mnist/ |models/ |-- cnn/ ``` The `datasets/mnist/` directory contains a collection of images. To track the image `datasets/mnist/` directory as a dataset artifact, specify: 1. Provide a name for the artifact, such as `"mnist"`. 2. Set the `type` parameter to `"dataset"` when you construct the artifact object (`wandb.Artifact(type="dataset")`). 3. Provide the path to the `datasets/mnist/` directory as an Amazon S3 URI (`s3://my-bucket/datasets/mnist/`) when you call `wandb.Artifact.add_reference()`. 4. Log the artifact with `run.log_artifact()`. The following code sample creates a reference artifact `mnist:latest`: ```python theme={null} import wandb with wandb.init(project="my-project") as run: artifact = wandb.Artifact(name="mnist", type="dataset") artifact.add_reference(uri="s3://my-bucket/datasets/mnist") run.log_artifact(artifact) ``` Within the W\&B App, you can look through the contents of the reference artifact using the file browser, [explore the full dependency graph](/models/artifacts/explore-and-traverse-an-artifact-graph/), and scan through the versioned history of your artifact. The W\&B App does not render rich media such as images, audio, and so forth because the data itself is not contained within the artifact. W\&B Artifacts support any Amazon S3 compatible interface, including CoreWeave Storage and MinIO. The scripts described below work as-is with both providers, when you set the `AWS_S3_ENDPOINT_URL` environment variable to point at your CoreWeave Storage or MinIO server. By default, W\&B imposes a 10,000 object limit when adding an object prefix. You can adjust this limit by specifying `max_objects=` when you call `wandb.Artifact.add_reference()`. ## Download an artifact from an external bucket W\&B retrieves the files from the underlying bucket when it downloads a reference artifact using the metadata recorded when the artifact is logged. If your bucket has object versioning enabled, W\&B retrieves the object version that corresponds to the state of the file at the time an artifact was logged. As you evolve the contents of your bucket, you can always point to the exact version of your data a given model was trained on, because the artifact serves as a snapshot of your bucket during the training run. The following code sample shows how to download a reference artifact. The APIs for downloading artifacts are the same for both reference and non-reference artifacts: ```python theme={null} import wandb with wandb.init(project="my-project") as run: artifact = run.use_artifact("mnist:latest", type="dataset") artifact_dir = artifact.download() ``` W\&B recommends that you enable 'Object Versioning' on your storage buckets if you overwrite files as part of your workflow. If versioning is enabled, W\&B can always retrieve the correct version of the file when you download an artifact, even if the file has been overwritten since the artifact was logged. Based on your use case, read the instructions to enable object versioning: [AWS](https://docs.aws.amazon.com/AmazonS3/latest/userguide/manage-versioning-examples.html), [Google Cloud](https://cloud.google.com/storage/docs/using-object-versioning#set), [Azure](https://learn.microsoft.com/azure/storage/blobs/versioning-enable). ## Add and download an external from a bucket The following code sample uploads a dataset to an Amazon S3 bucket, tracks it with a reference artifact, then downloads it: ```python theme={null} import boto3 import wandb with wandb.init() as run: # Training here... s3_client = boto3.client("s3") s3_client.upload_file(file_name="my_model.h5", bucket="my-bucket", object_name="models/cnn/my_model.h5") # Log the model artifact model_artifact = wandb.Artifact("cnn", type="model") model_artifact.add_reference("s3://my-bucket/models/cnn/") run.log_artifact(model_artifact) ``` At a later point, you can download the model artifact. Specify the name of the artifact and its type: ```python theme={null} import wandb with wandb.init() as run: artifact = run.use_artifact(artifact_or_name = "cnn", type="model") datadir = artifact.download() ``` See the following reports for an end-to-end walkthrough on how to track artifacts by reference for Google Cloud or Azure: * [Guide to Tracking Artifacts by Reference with Google Cloud](https://wandb.ai/stacey/artifacts/reports/Tracking-Artifacts-by-Reference--Vmlldzo1NDMwOTE) * [Working with Reference Artifacts in Microsoft Azure](https://wandb.ai/andrea0/azure-2023/reports/Efficiently-Harnessing-Microsoft-Azure-Blob-Storage-with-Weights-Biases--Vmlldzo0NDA2NDgw) ## Cloud storage credentials W\&B uses the default mechanism to look for credentials based on the cloud provider you use. Read the documentation from your cloud provider to learn more about the credentials used: | Cloud provider | Credentials Documentation | | --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | CoreWeave AI Object Storage | [CoreWeave AI Object Storage documentation](https://docs.coreweave.com/docs/products/storage/object-storage/how-to/manage-access-keys/cloud-console-tokens) | | AWS | [Boto3 documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials) | | Google Cloud | [Google Cloud documentation](https://cloud.google.com/docs/authentication/provide-credentials-adc) | | Azure | [Azure documentation](https://learn.microsoft.com/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) | For AWS, if the bucket is not located in the configured user's default region, you must set the `AWS_REGION` environment variable to match the bucket region. Rich media such as images, audio, video, and point clouds may fail to render in the App UI depending on the CORS configuration of your bucket. Allow listing **app.wandb.ai** in your bucket's CORS settings will allow the W\&B App to properly render such rich media. If rich media such as images, audio, video, and point clouds does not render in the App UI, ensure that `app.wandb.ai` is allowlisted in your bucket's CORS policy. ## Track an artifact in a filesystem A common pattern for accessing datasets is to expose an NFS mount point to a remote filesystem on all machines running training jobs. This can be an alternative solution to a cloud storage bucket because from the perspective of the training script, the files appear local to your filesystem. To track an artifact in a filesystem: 1. Initialize a run with `wandb.init()`. 2. Create an artifact object with `wandb.Artifact()`. 3. Specify the reference to the filesystem path with the artifact object's `wandb.Artifact.add_reference()` method. 4. Log the artifact's metadata with `run.log_artifact()`. Copy and paste the following code snippet to track files in a mounted filesystem. Replace the values enclosed in angle brackets (`< >`) with your own values. ```python theme={null} import wandb # Initialize a run with wandb.init(entity="", project="") as run: # Create an artifact object artifact = wandb.Artifact(name="", type="") # Add a reference to the filesystem path artifact.add_reference("file:///path/to/dataset/") # Log the artifact (metadata only) run.log_artifact(artifact) ``` Note the triple slash in the URL. The first component is the `file://` prefix that denotes the use of filesystem references. The second component is the root `/` of the filesystem. The remaining components are the path to the directory or file you want to track. As an example, suppose you have a filesystem mounted at `/mount` with the following structure: ```text theme={null} mount |datasets/ |-- mnist/ |models/ |-- cnn/ ``` You want to track the `datasets/mnist/` directory as a dataset artifact. To track it, you could use the following code snippet. ```python theme={null} import wandb with wandb.init() as run: artifact = wandb.Artifact("mnist", type="dataset") artifact.add_reference("file:///mount/datasets/mnist/") run.log_artifact(artifact) ``` This creates a reference artifact `mnist:latest` that points to the files stored under `/mount/datasets/mnist/`. By default, W\&B imposes a 10,000 file limit when adding a reference to a directory. You can adjust this limit by specifying `max_objects=` when you call `wandb.Artifact.add_reference()`. Similarly, to track a model stored at `models/cnn/my_model.h5`, you could use the following code snippet: ```python theme={null} import wandb with wandb.init() as run: # Training here... # Write model to disk # Create an artifact object model_artifact = wandb.Artifact("cnn", type="model") # Add a reference to the model file path model_artifact.add_reference("file:///mount/cnn/my_model.h5") # Log the artifact to W&B run.log_artifact(model_artifact) ``` ## Download an artifact from an external filesystem Download files from a referenced filesystem using the same APIs as non-reference artifacts: 1. Initialize a run with `wandb.init()`. 2. Use the `wandb.Run.use_artifact()` method to indicate the artifact you want to download. 3. Call the artifact's `wandb.Artifact.download()` method to download the files from the referenced filesystem ```python theme={null} with wandb.init() as run: artifact = run.use_artifact("entity/project/mnist:latest", type="dataset") artifact_dir = artifact.download() ``` W\&B copies the contents of `/mount/datasets/mnist` to the `artifacts/mnist:v0/` directory. `Artifact.download()` throws an error if it cannot reconstruct the artifact. For example, if an artifact contains a reference to a file that was overwritten, `Artifact.download()` will throw an error because the artifact can no longer be reconstructed. # Manage artifact data retention Source: https://docs.wandb.ai/models/artifacts/ttl Set time-to-live (TTL) policies on W&B artifacts to schedule automatic deletion and manage storage consumption. Schedule when artifacts are deleted from W\&B with a W\&B Artifact time-to-live (TTL) policy. When you delete an artifact, W\&B marks that artifact as a *soft-delete*. In other words, the artifact is marked for deletion but files are not immediately deleted from storage. For more information on how W\&B deletes artifacts, see the [Delete artifacts](./delete-artifacts) page. Watch a [Managing data retention with Artifacts TTL](https://www.youtube.com/watch?v=hQ9J6BoVmnc) video tutorial to learn how to manage data retention with Artifacts TTL in the W\&B App. W\&B deactivates the option to set a TTL policy for artifacts linked to the Registry. This is to help ensure that linked artifacts do not accidentally expire if used in production workflows. * Only team admins can view a [team's settings](/platform/app/settings-page/teams) and access team level TTL settings such as (1) permitting who can set or edit a TTL policy or (2) setting a team default TTL. * If you do not see the option to set or edit a TTL policy in an artifact's details in the W\&B App UI or if setting a TTL programmatically does not successfully change an artifact's TTL property, your team admin has not given you permissions to do so. ## Auto-generated Artifacts Only user-generated artifacts can use TTL policies. Artifacts auto-generated by W\&B cannot have TTL policies set for them. The following Artifact types indicate an auto-generated Artifact: * `run_table` * `code` * `job` * Any Artifact type starting with: `wandb-*` You can check an Artifact's type on the [W\&B platform](/models/artifacts/explore-and-traverse-an-artifact-graph/) or programmatically: ```python theme={null} import wandb with wandb.init(project="") as run: artifact = run.use_artifact(artifact_or_name="") print(artifact.type) ``` Replace the values enclosed with `<>` with your own. ## Define who can edit and set TTL policies Define who can set and edit TTL policies within a team. You can either grant TTL permissions only to team admins, or you can grant both team admins and team members TTL permissions. Only team admins can define who can set or edit a TTL policy. 1. Navigate to your team’s profile page. 2. Select the **Settings** tab. 3. Navigate to the **Artifacts time-to-live (TTL) section**. 4. From the **TTL permissions dropdown**, select who can set and edit TTL policies. 5. Click on **Review and save settings**. 6. Confirm the changes and select **Save settings**. Setting TTL permissions ## Create a TTL policy Set a TTL policy for an artifact either when you create the artifact or retroactively after the artifact is created. For all the code snippets below, replace the content wrapped in `<>` with your information to use the code snippet. ### Set a TTL policy when you create an artifact Use the W\&B Python SDK to define a TTL policy when you create an artifact. TTL policies are typically defined in days. Defining a TTL policy when you create an artifact is similar to how you normally [create an artifact](/models/artifacts/construct-an-artifact/). With the exception that you pass in a time delta to the artifact's `ttl` attribute. The steps are as follows: 1. [Create an artifact](/models/artifacts/construct-an-artifact/). 2. [Add content to the artifact](/models/artifacts/construct-an-artifact/#add-files-to-an-artifact) such as files, a directory, or a reference. 3. Define a TTL time limit with the [`datetime.timedelta`](https://docs.python.org/3/library/datetime.html) data type that is part of Python's standard library. 4. [Log the artifact](/models/artifacts/construct-an-artifact/#3-save-your-artifact-to-the-wb-server). The following code snippet demonstrates how to create an artifact and set a TTL policy. ```python theme={null} import wandb from datetime import timedelta with wandb.init(project="", entity="") as run: artifact = wandb.Artifact(name="", type="") artifact.add_file("") artifact.ttl = timedelta(days=30) # Set TTL policy run.log_artifact(artifact) ``` The preceding code snippet sets the TTL policy for the artifact to 30 days. In other words, W\&B deletes the artifact after 30 days. ### Set or edit a TTL policy after you create an artifact Use the W\&B App UI or the W\&B Python SDK to define a TTL policy for an artifact that already exists. When you modify an artifact's TTL, the time the artifact takes to expire is still calculated using the artifact's `createdAt` timestamp. 1. [Fetch your artifact](/models/artifacts/download-and-use-an-artifact/). 2. Pass in a time delta to the artifact's `ttl` attribute. 3. Update the artifact with the [`save`](/models/ref/python/experiments/run#save) method. The following code snippet shows how to set a TTL policy for an artifact: ```python theme={null} import wandb from datetime import timedelta artifact = run.use_artifact("") artifact.ttl = timedelta(days=365 * 2) # Delete in two years artifact.save() ``` The preceding code example sets the TTL policy to two years. 1. Navigate to your W\&B project in the W\&B App UI. 2. Select the artifact icon in the project sidebar. 3. From the list of artifacts, expand the artifact type you 4. Select on the artifact version you want to edit the TTL policy for. 5. Click on the **Version** tab. 6. From the dropdown, select **Edit TTL policy**. 7. Within the modal that appears, select **Custom** from the TTL policy dropdown. 8. Within the **TTL duration** field, set the TTL policy in units of days. 9. Select the **Update TTL** button to save your changes. Editing TTL policy ### Set default TTL policies for a team Only team admins can set a default TTL policy for a team. Set a default TTL policy for your team. Default TTL policies apply to all existing and future artifacts based on their respective creation dates. Artifacts with existing version-level TTL policies are not affected by the team's default TTL. 1. Navigate to your team’s profile page. 2. Select the **Settings** tab. 3. Navigate to the **Artifacts time-to-live (TTL) section**. 4. Click on the **Set team's default TTL policy**. 5. Within the **Duration** field, set the TTL policy in units of days. 6. Click on **Review and save settings**. 7/ Confirm the changes and then select **Save settings**. Setting default TTL policy ### Set a TTL policy outside of a run Use the public API to retrieve an artifact without fetching a run, and set the TTL policy. TTL policies are typically defined in days. The following code sample shows how to fetch an artifact using the public API and set the TTL policy. ```python theme={null} api = wandb.Api() artifact = api.artifact("entity/project/artifact:alias") artifact.ttl = timedelta(days=365) # Delete in one year artifact.save() ``` ## Deactivate a TTL policy Use the W\&B Python SDK or W\&B App UI to deactivate a TTL policy for a specific artifact version. 1. [Fetch your artifact](/models/artifacts/download-and-use-an-artifact/). 2. Set the artifact's `ttl` attribute to `None`. 3. Update the artifact with the [`save`](/models/ref/python/experiments/run#save) method. The following code snippet shows how to turn off a TTL policy for an artifact: ```python theme={null} artifact = run.use_artifact("") artifact.ttl = None artifact.save() ``` 1. Navigate to your W\&B project in the W\&B App UI. 2. Select the artifact icon in the project sidebar. 3. From the list of artifacts, expand the artifact type you 4. Select on the artifact version you want to edit the TTL policy for. 5. Click on the Version tab. 6. Click the **action ()** menu next to the **Link to registry** button. 7. From the dropdown, select **Edit TTL policy**. 8. Within the modal that appears, select **Deactivate** from the TTL policy dropdown. 9. Select the **Update TTL** button to save your changes. Removing TTL policy ## View TTL policies View TTL policies for artifacts with the Python SDK or with the W\&B App UI. Use a print statement to view an artifact's TTL policy. The following example shows how to retrieve an artifact and view its TTL policy: ```python theme={null} artifact = run.use_artifact("") print(artifact.ttl) ``` View a TTL policy for an artifact with the W\&B App UI. 1. Navigate to the [W\&B App](https://wandb.ai). 2. Navigate to your W\&B Project. 3. Within your project, select the Artifacts tab in the project sidebar. 4. Click on a collection. Within the collection view you can see all of the artifacts in the selected collection. Within the `Time to Live` column you will see the TTL policy assigned to that artifact. TTL collection view # Update an artifact Source: https://docs.wandb.ai/models/artifacts/update-an-artifact Update an existing artifact while a run is active or using only the Public API. Pass desired values to update the `description`, `metadata`, and `alias` of an artifact. Update a run previously logged to W\&B with the W\&B Public API with ([`wandb.Api`](/models/ref/python/public-api/api)). Use `wandb.Run.save()` to update an artifact when is first initialized and still active. **When to use wandb.Artifact.save() or wandb.Run.log\_artifact()** * Use `Artifact.save()` to update an existing artifact without starting a new run. * Use `wandb.Run.log_artifact()` to create a new artifact and associate it with a specific run. Use the W\&B Public API ([`wandb.Api`](/models/ref/python/public-api/api)) to update an artifact. Use the wandb.Artifact ([`wandb.Artifact`](/models/ref/python/experiments/artifact)) Class while a run is active. You can not update the alias of artifact linked to a model in Model Registry. The following code example demonstrates how to update the description of an artifact using the [`wandb.Artifact`](/models/ref/python/experiments/artifact) API: ```python theme={null} import wandb with wandb.init(project="") as run: artifact = run.use_artifact(":") artifact.description = "" artifact.save() ``` The following example updates an artifact with [`wandb.Api`](/models/ref/python/public-api/api): ```python theme={null} import wandb api = wandb.Api() artifact = api.artifact("entity/project/artifact:alias") # Update the description artifact.description = "My new description" # Selectively update metadata keys artifact.metadata["oldKey"] = "new value" # Replace the metadata entirely artifact.metadata = {"newKey": "new value"} # Add an alias artifact.aliases.append("best") # Remove an alias artifact.aliases.remove("latest") # Completely replace the aliases artifact.aliases = ["replaced"] # Persist all artifact modifications artifact.save() ``` For more information, see the Weights and Biases [Artifact API](/models/ref/python/experiments/artifact). You can also update an Artifact collection in the same way as a singular artifact: ```python theme={null} import wandb with wandb.init(project="") as run: api = wandb.Api() artifact = api.artifact_collection(type="", collection="") artifact.name = "" artifact.description = "" artifact.save() ``` For more information, see the [Artifacts Collection](/models/ref/python/public-api/api) reference. # Automations overview Source: https://docs.wandb.ai/models/automations Use W&B Automations for triggering workflows based on events in W&B Automations exist for both **projects** and **registries**. Where you create an automation, which events you can use, and how scope works all differ. For event types by scope, see [Automation events and scopes](/models/automations/automation-events). W\&B Automations follow this pattern: when an **event** occurs and optional **conditions** are met, an **action** runs automatically. For example: * When a run fails (event), notify a Slack channel (action). * When the `production` alias is added to an artifact (event), call a webhook to trigger deployment (action). Events and available conditions differ for automations scoped to a [project](/models/automations/automation-events#project) or a [registry](/models/automations/automation-events#registry). See [Automation events and scopes](/models/automations/automation-events). ```mermaid theme={null} flowchart LR Event[Event] --> Condition[Condition] Condition --> Action[Action] ``` **Example:** Run fails (event) and optional run name filter (condition) then Slack notification (action). Or: alias `production` added (event) then webhook (action). ## Where to create automations * **In a project**: Open the project, then select the **Automations** tab in the project sidebar. * **In a registry**: Open the registry, then select the **Automations** tab. ## Use cases * **Run monitoring and alerting**: Notify the team when a run fails or when a metric crosses a threshold (for example, loss goes to NaN or accuracy drops). * **Registry CI/CD**: When a new model version is linked or an alias (such as `staging` or `production`) is added, trigger a webhook to run tests or deploy. * **Project artifact workflows**: When a new artifact version is created or an alias is added in a project, run a downstream job or post to Slack. For full event and scope details, see [Automation events and scopes](/models/automations/automation-events). ## Automation actions When an event triggers an automation, it can perform one of these actions: * **Slack notification**: Send a message to a Slack channel with details about the triggering event. The message summarizes the event, with a link to view more details in W\&B. * **Webhook**: Call a webhook URL with a JSON payload containing information about the triggering event. This enables integration with external systems such as CI/CD pipelines, model deployment services, or custom workflows. For implementation details, see [Create a Slack automation](/models/automations/create-automations/slack) and [Create a webhook automation](/models/automations/create-automations/webhook). ## How automations work To [create an automation](/models/automations/create-automations), you: 1. If required, configure [secrets](/platform/secrets) for sensitive strings the automation requires, such as access tokens, passwords, or sensitive configuration details. Secrets are defined in your **Team Settings**. Secrets are most commonly used in webhook automations to securely pass credentials or tokens to the webhook's external service without exposing it in plain text or hard-coding it in the webhook's payload. 2. Configure team-level webhook or Slack integrations to authorize W\&B to post to Slack or run the webhook on your behalf. A single automation action (webhook or Slack notification) can be used by multiple automations. These actions are defined in your **Team Settings**. 3. In the project or registry, create the automation: 1. Define the [event](/models/automations/automation-events) to watch for, such as when a new artifact version is added. 2. Define the action to take when the event occurs (posting to a Slack channel or running a webhook). For a webhook, specify a secret to use for the access token and/or a secret to send with the payload, if required. ## Recommendations * **Start small**: Begin with one or two automations for high-value events (for example, run failures or production alias changes). Validate that they work as expected before adding more. * **Test before production**: Create automations in a test project or with a test webhook or Slack channel first. Manually trigger the event and confirm the action runs and the payload or message looks correct. * **Avoid alert fatigue**: Use run filters, metric thresholds, or alias patterns to limit how often an automation fires. If you have multiple severities, route them to different channels. ## Limitations [Run metric automations](/models/automations/automation-events/#run-metrics-events) and [run metrics z-score change automations](/models/automations/automation-events/#run-metrics-z-score-change-automations) are currently supported only in [W\&B Multi-tenant Cloud](/platform/hosting/#wb-multi-tenant-cloud). ## Next steps * [Automations tutorial](/models/automations/tutorial): Guides you to create a project automation to alert on run failures and a Registry automation to run a webhoook when an alias is added. The tutorial uses the W\&B App. * [Create an automation](/models/automations/create-automations). * [Automation events and scopes](/models/automations/automation-events). * [Create a secret](/platform/secrets). Looking for companion tutorials for automations? * [Learn to automatically trigger a Github Action for model evaluation and deployment](https://wandb.ai/wandb/wandb-model-cicd/reports/Model-CI-CD-with-W-B--Vmlldzo0OTcwNDQw). * [Watch a video demonstrating automatically deploying a model to a Sagemaker endpoint](https://www.youtube.com/watch?v=s5CMj_w3DaQ). * [Watch a video series introducing automations](https://youtube.com/playlist?list=PLD80i8An1OEGECFPgY-HPCNjXgGu-qGO6\&feature=shared). # Manage automations with the API Source: https://docs.wandb.ai/models/automations/api Programmatic automation management with the Python API. Create and update may be affected on some client versions. Prefer the W&B App until the SDK fix ships. Manage W\&B automations programmatically with the `wandb` Python API when you script automation workflows instead of using the W\&B App. Programmatic **create** and **update** for automations through `wandb.Api().create_automation()` and related helpers can fail on some `wandb` Python client versions because of a client-side feature check (for example, run-state automations). Until a fixed SDK ships, use the [W\&B App](/models/automations/create-automations) to create and edit automations. The `list`, `get`, and `delete` methods aren't affected. Internal tracking: WB-34263. ## Next steps Refer to the following resources to manage automations and learn more about how they work: * [Automations overview](/models/automations) * [Create an automation](/models/automations/create-automations) (W\&B App) * [Automation events and scopes](/models/automations/automation-events) * [Automations API reference](/models/ref/python/public-api/automations) (Python types and `Api` methods) # Automation events and scopes Source: https://docs.wandb.ai/models/automations/automation-events Learn about events and scopes that trigger W&B Automations, including artifact changes, run status, and metric conditions. An automation can start when a specific event occurs within a project or registry. This page describes the events that can trigger an automation within each scope, so you can choose the right trigger when you configure an automation. Learn more about automations in the [Automations overview](/models/automations) or [Create an automation](/models/automations/create-automations). ## Registry The following sections describe the scopes and events for an automation in a [Registry](/models/registry). ### Scopes A [Registry](/models/registry) automation watches for the event taking place on any collection within a specific registry, including collections added in the future. ### Events A Registry automation can watch for these events: * **A new version is linked to a collection**: Test and validate new models or datasets when they're added to a registry. * **An artifact alias is added**: Trigger a specific step of your workflow when a new artifact version has a specific alias applied. For example, deploy a model when it has the `production` alias applied. When the automation calls a webhook, it can access the same team-level webhook configurations and [team secrets](/platform/secrets) as project-scoped automations. ## Project The following sections describe the scopes and events for an automation in a [project](/models/track/project-page). ### Scopes A project-level automation watches for the event taking place on any collection in the project. Depending on the event you specify, you can further limit the scope of the automation. ### Artifact events This section describes the events related to an artifact that can trigger an automation. * **A new version is added to an artifact**: Apply recurring actions to each version of an artifact. For example, start a training job when a new dataset artifact version is created. * **An artifact alias is added**: Trigger a specific step of your workflow when a new artifact version in a project has an alias applied that matches the **Alias regex** you specify. For example, run a series of downstream processing steps when an artifact has the `test-set-quality-check` alias applied, or run a workflow each time a new artifact version has the `latest` alias. Only one artifact version can have a given alias at a time. * **An artifact tag is added**: Trigger a specific step of your workflow when an artifact version in a project has a tag applied that matches the **Tag regex** you specify. For example, specify `^europe.*` to trigger a geo-specific workflow when a tag beginning with the string `europe` is added to an artifact version. Use artifact tags for grouping and filtering. You can assign the same tag to multiple artifact versions. ### Run events The following sections describe how to configure an automation that starts based on a change in a [run's status](/models/runs/run-states) or a change in a [run's metric value](/models/track/log#what-data-is-logged-with-specific-wb-api-calls). #### Run status change * Currently available only in [W\&B Multi-tenant Cloud](/platform/hosting#wb-multi-tenant-cloud) and [Dedicated Cloud](/platform/hosting#wb-dedicated-cloud). * A run with **Killed** status can't trigger an automation. This status indicates that an admin forcibly stopped the run. Trigger a workflow when a run changes its [status](/models/runs/run-states) to **Running**, **Finished**, or **Failed**. Optionally, you can further limit the runs that can trigger an automation by specifying a user or run name filter. Screenshot showing a run status change automation Because run status is a property of the entire run, you can create a run status automation only from the **Automations** page, not from a workspace. #### Run metrics change Currently available only in [W\&B Multi-tenant Cloud](/platform/hosting#wb-multi-tenant-cloud) and [Dedicated Cloud](/platform/hosting#wb-dedicated-cloud). Trigger a workflow based on a logged value for a metric, either a metric in a run's history or a [system metric](/models/ref/python/experiments/system-metrics) such as `cpu`, which tracks the percentage of CPU utilization. W\&B logs system metrics automatically every 15 seconds. You can create a run metrics automation from the project's **Automations** tab or directly from a line plot panel in a workspace. To set up a run metric automation, configure how to compare the metric's value with the threshold you specify. Your choices depend on the event type and on any filters you specify. Optionally, you can further limit the runs that can trigger an automation by specifying a user or run name filter. ##### Threshold Use a threshold event to start an automation when a metric crosses a fixed value. For **Run metrics threshold met** events, configure: 1. The window of most recently logged values to consider (defaults to 5). 2. Whether to evaluate the **Average**, **Min**, or **Max** value within the window. 3. The comparison to make: * Above * Above or equal to * Below * Below or equal to * Not equal to * Equal to For example, trigger an automation when average `accuracy` is above `0.6`. Screenshot showing a run metrics threshold automation ##### Change threshold Use a change threshold event to start an automation when a metric shifts between two recent windows of values. For **Run metrics change threshold met** events, the automation uses two "windows" of values to check whether to start: * The *current window* of recently logged values to consider (defaults to 10). * The *prior window* of recently logged values to consider (defaults to 50). The current and prior windows are consecutive and don't overlap. To create the automation, configure: 1. The current window of logged values (defaults to 10). 2. The prior window of logged values (defaults to 50). 3. Whether to evaluate the values as relative or absolute (defaults to **Relative**). 4. The comparison to make: * Increases by at least * Decreases by at least * Increases or decreases by at least For example, trigger an automation when average `loss` decreases by at least `0.25`. Screenshot showing a run metrics change threshold automation #### Run metrics z-score change Currently available only in [W\&B Multi-tenant Cloud](/platform/hosting#wb-multi-tenant-cloud) and [Dedicated Cloud](/platform/hosting#wb-dedicated-cloud). W\&B can trigger an automation when a metric's z-score (standard score) exceeds a specified threshold. A z-score measures how many standard deviations the value is from the mean for that metric across a configurable window of runs in the project (defaults to 30 runs). To use a z-score as an event trigger, select the **Run metrics z-score threshold met** event. Automations based on z-score keep your team informed about unusual performance without checking absolute thresholds, which may vary as your model or training process evolves. You can create a run metrics z-score automation from the project's **Automations** tab or directly from a line plot panel in a workspace. To create a z-score automation, configure: 1. The target z-score threshold, expressed as a positive float value (for example, 2.0). 2. The window of logged values that determine the mean value (defaults to 30). 3. The comparison to make: * Above (triggers when performance is unusually high). * Below (triggers when performance is unusually low). * Either above or below. For example, trigger an automation when `accuracy` has a z-score above 2, which means the run is performing well above other runs in the project. Z-score values have the following meanings: * A z-score of 0 means the metric is at the average. * A z-score of +2.0 means the metric is 2 standard deviations above average. * A z-score of -2.0 means the metric is 2 standard deviations below average. * Values beyond ±2 are often considered statistically significant outliers. #### Run filters This section describes how the automation selects runs to evaluate. By default, any run in the project triggers the automation when the event occurs. You can limit which runs trigger an automation by configuring one of the following filters: | Filter | Description | | ----------------------------- | ----------------------------------------------------------------- | | **Filter to one user's runs** | Include only runs created by the specified user. | | **Filter on run name** | Include only runs whose names match the given regular expression. | The automation evaluates each run as follows: * Each run is considered individually and can potentially trigger the automation. * Each run's values are put into a separate window and compared to the threshold separately. * In a 24-hour period, a particular automation can fire at most once per run. For details, see [Create automations](/models/automations/create-automations). ## Next steps * [Create a Slack automation](/models/automations/create-automations/slack) * [Create a webhook automation](/models/automations/create-automations/webhook) # Overview Source: https://docs.wandb.ai/models/automations/create-automations Create and manage W&B automations to streamline your ML workflows This page shows how to create and manage W\&B [automations](/models/automations), which let you trigger Slack notifications or webhooks in response to project and registry events. Use this overview to understand the prerequisites and high-level workflow, then follow the detailed instructions in [Create a Slack automation](/models/automations/create-automations/slack) or [Create a webhook automation](/models/automations/create-automations/webhook). Looking for companion tutorials for automations? * [Learn to automatically trigger a GitHub Action for model evaluation and deployment](https://wandb.ai/wandb/wandb-model-cicd/reports/Model-CI-CD-with-W-B--Vmlldzo0OTcwNDQw). * [Watch a video demonstrating automatically deploying a model to a SageMaker endpoint](https://www.youtube.com/watch?v=s5CMj_w3DaQ). * [Watch a video series introducing automations](https://youtube.com/playlist?list=PLD80i8An1OEGECFPgY-HPCNjXgGu-qGO6\&feature=shared). ## Requirements When creating automations: * A team admin can create and manage automations for the team's projects, as well as components of their automations, such as webhooks, secrets, and Slack integrations. Refer to [Team settings](/platform/app/settings-page/teams). * To create a registry automation, you must have access to the registry. Refer to [Configure Registry access](/models/registry/configure_registry/#registry-roles). * To create a Slack automation, you must have permission to post to the Slack instance and channel you select. ## Create an automation You can create an automation from the project or registry's **Automations** tab, or directly from a line plot panel in the workspace. ### Create from the Automations tab At a high level, to create an automation from the **Automations** tab, follow these steps: 1. If necessary, [create a W\&B secret](/platform/secrets) for each sensitive string required by the automation, such as an access token, password, or SSH key. Secrets let you reuse sensitive strings safely and keep them out of automation configurations. Define secrets in your **Team Settings**. Webhook automations most commonly use secrets. 2. Configure team-level webhook or Slack integrations to authorize W\&B to post to Slack or run the webhook on your behalf. Multiple automations can use a single webhook or Slack integration. Define these actions in your **Team Settings**. 3. In the project or registry, create the automation, which specifies the event to watch for and the action to take (such as posting to Slack or running a webhook). When you create a webhook automation, you configure the payload it sends. After you complete these steps, the automation runs whenever the specified event occurs in the project or registry. ### Create a run metric automation from a workspace panel From a line plot in the workspace, you can also create a [run metric automation](/models/automations/automation-events/#run-events) for the metric it shows: 1. Hover over the panel, then click the bell icon at the top of the panel. Automation bell icon location 2. Configure the automation using the basic or advanced configuration controls. For example, apply a run filter to limit the scope of the automation, or configure an absolute threshold. For details, refer to: * [Create a Slack automation](/models/automations/create-automations/slack) * [Create a webhook automation](/models/automations/create-automations/webhook) ## View and manage automations View and manage automations from a project or registry's **Automations** tab: * To view an automation's details, click its name. * To view an automation's execution history (which actions ran and whether they succeeded), click its name and select the **History** tab. For more information, see [View an automation's history](/models/automations/view-automation-history). * To edit an automation, click its **action ()** menu, then click **Edit automation**. * To delete an automation, click its **action ()** menu, then click **Delete automation**. ## Go further For more information, see the following resources: * [Automations tutorials](/models/automations/tutorial): Build a [project run failure alert](/models/automations/project-automation-tutorial) or a [registry alias-to-webhook automation](/models/automations/registry-automation-tutorial) in the W\&B App. * [Automation events and scopes](/models/automations/automation-events). * [Create a Slack automation](/models/automations/create-automations/slack). * [Create a webhook automation](/models/automations/create-automations/webhook). * [Create a secret](/platform/secrets). # Create a Slack automation Source: https://docs.wandb.ai/models/automations/create-automations/slack Set up a Slack integration and create a W&B Automation that sends notifications to a Slack channel on specific events. This page shows how to create a Slack [automation](/models/automations) so that your team receives notifications in a Slack channel when specific events occur in W\&B. Example events include a new model version or a run reaching a metric threshold. To create a webhook automation instead, refer to [Create a webhook automation](/models/automations/create-automations/webhook). At a high level, to create a Slack automation, you take these steps: 1. [Add a Slack integration](#add-a-slack-integration), which authorizes W\&B to post to the Slack instance and channel. 2. [Create the automation](#create-an-automation), which defines the [event](/models/automations/automation-events) to watch for and the channel to notify. ## Add a Slack integration A team admin can add a Slack integration to the team. 1. Log in to W\&B and go to **Team Settings**. 2. In the **Slack channel integrations** section, click **Connect Slack** to add a new Slack instance. To add a channel for an existing Slack instance, click **New integration**. Two Slack integrations in a Team 3. If necessary, sign in to Slack in your browser. When prompted, grant W\&B permission to post to the Slack channel you select. Read the page, then click **Search for a channel** and begin typing the channel name. Select the channel from the list, then click **Allow**. 4. In Slack, go to the channel you selected. If you see a post like `[YOUR-SLACK-HANDLE] added an integration to this channel: Weights & Biases`, where `[YOUR-SLACK-HANDLE]` is your Slack username, the integration is configured correctly. Now you can [create an automation](#create-an-automation) that notifies the Slack channel you configured. ## View and manage Slack integrations A team admin can view and manage the team's Slack instances and channels. 1. Log in to W\&B and go to **Team Settings**. 2. View each Slack destination in the **Slack channel integrations** section. 3. Delete a destination by clicking its trash icon. ## Create an automation After you [add a Slack integration](#add-a-slack-integration), you can create an automation that uses it to send notifications. Select **Registry** or **Project** based on the scope you want the automation to apply to. Then follow these steps to create an automation that notifies the Slack channel. A Registry admin can create automations in that registry. 1. Log in to W\&B. 2. Click the name of a registry to view its details. 3. To create an automation scoped to the registry, click the **Automations** tab, then click **Create automation**. An automation that is scoped to a registry is automatically applied to all of its collections (including those created in the future). 4. Choose the [event](/models/automations/automation-events/#registry-events) to watch for. Fill in any additional fields that appear, which depend upon the event. For example, if you select **An artifact alias is added**, you must specify the **Alias regex**. Click **Next step**. 5. Select the team that owns the [Slack integration](#add-a-slack-integration). 6. Set **Action type** to **Slack notification**. Select the Slack channel, then click **Next step**. 7. Provide a name for the automation. Optionally, provide a description. 8. Click **Create automation**. The automation is now active and notifies the selected Slack channel whenever the chosen event occurs in the registry. A W\&B admin can create automations in a project. 1. Log in to W\&B. 2. Go to the project page and click the **Automations** tab, then click **Create automation**. Or, directly from a line plot in the workspace, you can create a [run metric automation](/models/automations/automation-events/#run-events) for the metric it shows. Hover over the panel, then click the bell icon at the top of the panel. Automation bell icon location 3. Choose the [event](/models/automations/automation-events/#project) to watch for. 1. Fill in any additional fields that appear. For example, if you select **An artifact alias is added**, you must specify the **Alias regex**. 1. For automations triggered by a run, optionally specify one or more run filters: * **Filter to one user's runs**: Include only runs created by the specified user. Click the toggle to turn on the filter, then specify a username. * **Filter on run name**: Include only runs whose names match the given regular expression. Click the toggle to turn on the filter, then specify a regular expression. 2. Click **Next step**. 4. Select the team that owns the [Slack integration](#add-a-slack-integration). 5. Set **Action type** to **Slack notification**. Select the Slack channel, then click **Next step**. 6. Provide a name for the automation. Optionally, provide a description. 7. Click **Create automation**. The automation is now active and notifies the selected Slack channel whenever the chosen event occurs in the project. ## View and manage automations Manage the registry's automations from the registry's **Automations** tab: * To view an automation's details, click its name. * To edit an automation, click its **action ()** menu, then click **Edit automation**. * To delete an automation, click its **action ()** menu, then click **Delete automation**. Confirmation is required. A W\&B admin can view and manage a project's automations from the project's **Automations** tab. * To view an automation's details, click its name. * To edit an automation, click its **action ()** menu, then click **Edit automation**. * To delete an automation, click its **action ()** menu, then click **Delete automation**. Confirmation is required. # Create a webhook automation Source: https://docs.wandb.ai/models/automations/create-automations/webhook Create a webhook automation in W&B to send HTTP requests to external services when specific events occur. This page shows how to create a webhook [automation](/models/automations), which sends an HTTP request to an external service when a specific event occurs in W\&B. Use a webhook automation to integrate W\&B with external systems such as CI/CD pipelines, notification services, or custom tooling. To create a Slack automation, refer to [Create a Slack automation](/models/automations/create-automations/slack) instead. At a high level, to create a webhook automation, you take these steps: 1. If necessary, [create a W\&B secret](/platform/secrets) for each sensitive string required by the automation, such as an access token, password, or SSH key. Secrets are defined in your **Team Settings**. 2. [Create a webhook](#create-a-webhook) to define the endpoint and authorization details and grant the integration access to any secrets it needs. 3. [Create the automation](#create-an-automation) to define the [event](/models/automations/automation-events) to watch for and the payload W\&B sends. Grant the automation access to any secrets it needs for the payload. ## Create a webhook A team admin can add a webhook for the team. The webhook defines the endpoint W\&B sends requests to and any credentials required to authenticate with it. If the webhook requires a Bearer token or its payload requires a sensitive string, [create a secret that contains it](/platform/secrets/#add-a-secret) before creating the webhook. You can configure at most one access token and one other secret for a webhook. The webhook's service determines your webhook's authentication and authorization requirements. 1. Log in to W\&B and go to the **Team Settings** page. 2. In the **Webhooks** section, click **New webhook**. 3. Provide a name for the webhook. 4. Provide the endpoint URL of the webhook. 5. If the webhook requires a Bearer token, set **Access token** to the [secret](/platform/secrets) that contains it. When using the webhook automation, W\&B sets the `Authorization: Bearer` HTTP header to the access token, and you can access the token in the `${ACCESS_TOKEN}` [payload variable](#payload-variables). For more information about the structure of the `POST` request W\&B sends to the webhook service, see [Troubleshoot your webhook](#troubleshoot-your-webhook). 6. If the webhook requires a password or other sensitive string in its payload, set **Secret** to the secret that contains it. When you configure the automation that uses the webhook, you can access the secret as a [payload variable](#payload-variables) by prefixing its name with `$`. If the webhook's access token is stored in a secret, you must *also* complete the next step to specify the secret as the access token. 7. To verify that W\&B can connect and authenticate to the endpoint: 1. Optionally, provide a payload to test. To refer to a secret the webhook has access to in the payload, prefix its name with `$`. W\&B uses this payload only for testing and doesn't save it. You configure an automation's payload when you [create the automation](#create-an-automation). For more information about where the secret and access token appear in the `POST` request, see [Troubleshoot your webhook](#troubleshoot-your-webhook). 2. Click **Test**. W\&B attempts to connect to the webhook's endpoint using the credentials you configured. If you provided a payload, W\&B sends it. If the test doesn't succeed, verify the webhook's configuration and try again. If necessary, refer to [Troubleshoot your webhook](#troubleshoot-your-webhook). Screenshot showing two webhooks in a Team Now you can [create an automation](#create-an-automation) that uses the webhook. ## Create an automation After you [configure a webhook](#create-a-webhook), create an automation that defines which W\&B event triggers the webhook and what payload to send. Select **Registry** or **Project** depending on the scope you want, then follow these steps to create an automation that triggers the webhook. A Registry admin can create automations in that registry. Registry automations apply to all collections in the registry, including those added in the future. 1. Log in to W\&B. 2. Click the name of a registry to view its details. 3. To create an automation scoped to the registry, click the **Automations** tab, then click **Create automation**. 4. Choose the [event](/models/automations/automation-events/#registry-events) to watch for. Fill in any additional fields that appear. For example, if you select **An artifact alias is added**, you must specify the **Alias regex**. Click **Next step**. 5. Select the team that owns the [webhook](#create-a-webhook). 6. Set **Action type** to **Webhooks**. Then select the [webhook](#create-a-webhook) to use. 7. If you configured an access token for the webhook, you can access the token in the `${ACCESS_TOKEN}` [payload variable](#payload-variables). If you configured a secret for the webhook, you can access it in the payload by prefixing its name with `$`. The webhook's service determines your webhook's requirements. 8. Click **Next step**. 9. Provide a name for the automation. Optionally, provide a description. Click **Create automation**. A W\&B admin can create automations in a project. 1. Log in to W\&B and go to the project page. 2. In the project sidebar, click **Automations**, then click **Create automation**. Or, from a line plot in the workspace, you can quickly create a [run metric automation](/models/automations/automation-events/#run-events) for the metric it shows. Hover over the panel, then click the bell icon at the top of the panel. Automation bell icon location 3. Choose the [event](/models/automations/automation-events/#project) to watch for, such as when an artifact alias is added or when a run metric meets a given threshold. 1. Fill in any additional fields that appear, which depend upon the event. For example, if you select **An artifact alias is added**, you must specify the **Alias regex**. 1. For automations triggered by a run, optionally specify one or more run filters. * **Filter to one user's runs**: Include only runs created by the specified user. Click the toggle to turn on the filter, then specify a username. * **Filter on run name**: Include only runs whose names match the given regular expression. Click the toggle to turn on the filter, then specify a regular expression. The automation applies to all collections in the project, including those added in the future. 2. Click **Next step**. 4. Select the team that owns the [webhook](#create-a-webhook). 5. Set **Action type** to **Webhooks**. Then select the [webhook](#create-a-webhook) to use. 6. If your webhook requires a payload, construct it and paste it into the **Payload** field. If you configured an access token for the webhook, you can access the token in the `${ACCESS_TOKEN}` [payload variable](#payload-variables). If you configured a secret for the webhook, you can access it in the payload by prefixing its name with `$`. The webhook's service determines your webhook's requirements. 7. Click **Next step**. 8. Provide a name for the automation. Optionally, provide a description. Click **Create automation**. After you complete these steps, the automation is active and runs the webhook whenever the specified event occurs in the registry or project. ## View and manage automations Review, edit, or delete an automation from the **Automations** tab. Manage a registry's automations from the registry's **Automations** tab. * To view an automation's details, click its name. * To edit an automation, click its **action ()** menu, then click **Edit automation**. * To delete an automation, click its **action ()** menu, then click **Delete automation**. W\&B prompts you to confirm. A W\&B admin can view and manage a project's automations from the project's **Automations** tab. * To view an automation's details, click its name. * To edit an automation, click its **action ()** menu, then click **Edit automation**. * To delete an automation, click its **action ()** menu, then click **Delete automation**. W\&B prompts you to confirm. ## Payload reference Use these sections to construct your webhook's payload. The following sections describe the variables available to your payload and show example payloads for common services. For details about testing your webhook and its payload, refer to [Troubleshoot your webhook](#troubleshoot-your-webhook). ### Payload variables The following table describes the variables you can use to construct your webhook's payload. | Variable | Details | | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `${project_name}` | The name of the project that owns the mutation that triggered the action. | | `${entity_name}` | The name of the entity or team that owns the mutation that triggered the action. | | `${event_type}` | The type of event that triggered the action. | | `${event_author}` | The user that triggered the action. | | `${alias}` | Contains an artifact's alias if the automation is triggered by the **An artifact alias is added** event. For other automations, this variable is blank. | | `${tag}` | Contains an artifact's tags if the automation is triggered by the **An artifact tag is added** event. For other automations, this variable is blank. | | `${artifact_collection_name}` | The name of the artifact collection that the artifact version is linked to. | | `${artifact_metadata.}` | The value of an arbitrary top-level metadata key from the artifact version that triggered the action. Replace `` with the name of a top-level metadata key. Only top-level metadata keys are available in the webhook's payload. | | `${artifact_version}` | The [`Wandb.Artifact`](/models/ref/python/experiments/artifact) representation of the artifact version that triggered the action. | | `${artifact_version_string}` | The `string` representation of the artifact version that triggered the action. | | `${ACCESS_TOKEN}` | The value of the access token configured in the [webhook](#create-a-webhook), if you configured an access token. W\&B automatically passes it in the `Authorization: Bearer` HTTP header. | | `${SECRET_NAME}` | If configured, the value of a secret configured in the [webhook](#create-a-webhook). Replace `SECRET_NAME` with the name of the secret. | ### Payload examples The following examples show webhook payloads for common use cases. The examples demonstrate how to use [payload variables](#payload-variables). Verify that your access tokens have the required set of permissions to trigger your GitHub Actions workflow. For more information, see the [GitHub repository dispatch event documentation](https://docs.github.com/en/rest/repos/repos?#create-a-repository-dispatch-event). Send a repository dispatch from W\&B to trigger a GitHub action. For example, suppose you have a GitHub workflow file that accepts a repository dispatch as a trigger for the `on` key: ```yaml theme={null} on: repository_dispatch: types: BUILD_AND_DEPLOY ``` The payload for the repository might look something like: ```json theme={null} { "event_type": "BUILD_AND_DEPLOY", "client_payload": { "event_author": "${event_author}", "artifact_version": "${artifact_version}", "artifact_version_string": "${artifact_version_string}", "artifact_collection_name": "${artifact_collection_name}", "project_name": "${project_name}", "entity_name": "${entity_name}" } } ``` The `event_type` key in the webhook payload must match the `types` field in the GitHub workflow YAML file. The contents and positioning of rendered template strings depend on the event or model version you configured the automation for. `${event_type}` renders as either `LINK_ARTIFACT` or `ADD_ARTIFACT_ALIAS`. The following shows an example mapping: ```text theme={null} ${event_type} --> "LINK_ARTIFACT" or "ADD_ARTIFACT_ALIAS" ${event_author} --> "" ${artifact_version} --> "wandb-artifact://_id/QXJ0aWZhY3Q6NTE3ODg5ODg3" ${artifact_version_string} --> "/model-registry/:" ${artifact_collection_name} --> "" ${project_name} --> "model-registry" ${entity_name} --> "" ``` Use template strings to dynamically pass context from W\&B to GitHub Actions and other tools. If those tools can call Python scripts, they can consume the registered model artifacts through the [W\&B API](/models/artifacts/download-and-use-an-artifact). For more information, see the following resources: * For more information about repository dispatch, see the [official documentation on the GitHub Marketplace](https://github.com/marketplace/actions/repository-dispatch). * Watch the videos [Webhook Automations for Model Evaluation](https://www.youtube.com/watch?v=7j-Mtbo-E74\&ab_channel=Weights%26Biases) and [Webhook Automations for Model Deployment](https://www.youtube.com/watch?v=g5UiAFjM2nA\&ab_channel=Weights%26Biases), which guide you to create automations for model evaluation and deployment. * Review the W\&B report [Model CI/CD with W\&B](https://wandb.ai/wandb/wandb-model-cicd/reports/Model-CI-CD-with-W-B--Vmlldzo0OTcwNDQw), which illustrates how to use a GitHub Actions webhook automation for model CI. * For an example of model CI with a Modal Labs webhook, see the [wandb-modal-webhook GitHub repository](https://github.com/hamelsmu/wandb-modal-webhook). This example payload shows how to notify your Teams channel using a webhook: ```json theme={null} { "@type": "MessageCard", "@context": "http://schema.org/extensions", "summary": "New Notification", "sections": [ { "activityTitle": "Notification from WANDB", "text": "This is an example message sent via Teams webhook.", "facts": [ { "name": "Author", "value": "${event_author}" }, { "name": "Event Type", "value": "${event_type}" } ], "markdown": true } ] } ``` You can use template strings to inject W\&B data into your payload at the time of execution. See the [Teams example](#microsoft-teams-notification). This section is provided for historical purposes. If you use a webhook to integrate with Slack, W\&B recommends that you update your configuration to use the [Slack integration](/models/automations/create-automations/slack) instead. Set up your Slack app and add an incoming webhook integration by following the instructions in the [Slack API documentation](https://api.slack.com/messaging/webhooks). Ensure that you have the secret specified under `Bot User OAuth Token` as your W\&B webhook's access token. The following is an example payload: ```json theme={null} { "text": "New alert from WANDB!", "blocks": [ { "type": "section", "text": { "type": "mrkdwn", "text": "Registry event: ${event_type}" } }, { "type":"section", "text": { "type": "mrkdwn", "text": "New version: ${artifact_version_string}" } }, { "type": "divider" }, { "type": "section", "text": { "type": "mrkdwn", "text": "Author: ${event_author}" } } ] } ``` ## Troubleshoot your webhook If a webhook isn't working as expected, you can troubleshoot it interactively with the W\&B App UI or programmatically with a shell script. You can troubleshoot a webhook during creation or afterward. For details about the format W\&B uses for the `POST` request, refer to the **Shell script** tab. A team admin can test a webhook interactively with the W\&B App UI. 1. Go to your team page, then click **Settings**. 2. Scroll to the **Webhooks** section. 3. Click the **action ()** menu next to the name of your webhook. 4. Select **Test**. 5. From the UI panel that appears, paste your `POST` request into the field that appears. Demo of testing a webhook payload 6. Click **Test webhook**. Within the W\&B App UI, W\&B posts the response from your endpoint. Demo of testing a webhook Watch the video [Testing Webhooks in W\&B](https://www.youtube.com/watch?v=bl44fDpMGJw\&ab_channel=Weights%26Biases) for a demonstration. This shell script shows one method to generate a `POST` request similar to the request W\&B sends to your webhook automation when it's triggered. Copy and paste the following code into a shell script to troubleshoot your webhook. Specify your own values for: * `ACCESS_TOKEN` * `SECRET` * `PAYLOAD` * `API_ENDPOINT` ```bash webhook_test.sh theme={null} #!/bin/bash # Your access token and secret ACCESS_TOKEN="your_api_key" SECRET="your_api_secret" # URL of your webhook endpoint API_ENDPOINT="https://your.webhook.endpoint/path" # The data you want to send (for example, in JSON format) PAYLOAD='{"key1": "value1", "key2": "value2"}' # Generate the HMAC signature. For security, W&B includes the # X-Wandb-Signature header computed from the payload and the shared secret # associated with the webhook using HMAC with SHA-256. SIGNATURE=$(echo -n "$PAYLOAD" | openssl dgst -sha256 -hmac "$SECRET" | awk '{print $2}') # Make the cURL request curl -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $ACCESS_TOKEN" \ -H "X-Wandb-Signature: $SIGNATURE" \ -d "$PAYLOAD" "$API_ENDPOINT" ``` # Tutorial: Project run-failure alert automation Source: https://docs.wandb.ai/models/automations/project-automation-tutorial Build a run-failure alert that sends a Slack notification when a run in your project fails. This tutorial walks you through building a **project** automation triggered by run status: when a run in your project transitions to **Failed**, W\&B sends a Slack notification. This automation lets your team learn about failed runs in real time, so you can investigate and remediate them instead of discovering failures later.

```mermaid theme={null} %%{init: {'flowchart': {'rankSpacing': 200;}}}%% flowchart LR Event[Run state change to Failed] Action[Slack notification] Event --> Action ```
For guidance creating a Registry automation, see [Tutorial: Registry artifact alias automation](/models/automations/registry-automation-tutorial). ## Prerequisites * A W\&B project. * A [Slack integration](/models/automations/create-automations/slack#add-a-slack-integration) configured in **Team Settings**. ## Create a project automation Set up a project-scoped automation so that when a run in the project transitions to **Failed**, W\&B sends a Slack notification. 1. Open the project and click the **Automations** tab in the sidebar, then click **Create automation**. 2. Choose the event **Run state change**. Set the state to **Failed**. Optionally add a run name or user filter to limit which runs trigger the automation. 3. Click **Next step**. Set **Action type** to **Slack notification** and select the Slack channel. 4. Click **Next step**. Give the automation a name (for example, "Run failure alert") and an optional description, then click **Create automation**. Your project now has an active automation that posts to the Slack channel you selected when a run fails. For more detail, see [Create a Slack automation](/models/automations/create-automations/slack#create-an-automation) (Project tab). ## Test the automation To confirm the automation is configured correctly, trigger it with a deliberately failed run. Create a run and log it to the project, explicitly marking it as failed: ```python theme={null} import wandb with wandb.init(project="my-project") as run: run.log({"loss": 1.23}) run.finish(exit_code=1) ``` Shortly after, you see a Slack message with the run link and status. ## Go further For more information, see the following resources: * [Automation events and scopes](/models/automations/automation-events) for all project and registry event types. * [Create a Slack automation](/models/automations/create-automations/slack) and [Create a webhook automation](/models/automations/create-automations/webhook) for full UI and payload details. # Tutorial: Registry artifact alias automation Source: https://docs.wandb.ai/models/automations/registry-automation-tutorial Build an automation that runs a webhook when a Registry artifact gets a specific alias like "production". This tutorial walks you through building a **registry** automation triggered by artifact metadata: when an artifact in your registry gets a specific alias (for example, **production**), W\&B sends a `POST` request to your webhook. Use this pattern to notify downstream systems like deployment pipelines, paging services, or notification channels, whenever you promote a model to a known stage. This tutorial is intended for ML engineers and MLOps practitioners who manage model lifecycles in W\&B Registry.

```mermaid theme={null} flowchart LR Event[Artifact alias added] Action[Webhook] Event --> Action ```
For guidance creating a project automation, see [Tutorial: Project run-failure alert automation](/models/automations/project-automation-tutorial). ## Prerequisites * A [webhook](/models/automations/create-automations/webhook#create-a-webhook) configured in **Team Settings**. * A W\&B [registry](/models/registry/create_registry) with at least one collection, or reuse an existing registry. ## Create a registry automation Set up a registry-scoped automation so that when an artifact in any collection in the registry gets a specific alias (for example, **production**), W\&B sends a `POST` request to your webhook. 1. Open the registry and click the **Automations** tab, then click **Create automation**. 2. Choose the event **An artifact alias is added**. Enter an **Alias regex** that matches the alias you care about (for example, **production** or **staging**). 3. Click **Next step**. Set **Action type** to **Webhooks** and select your webhook. If the webhook expects a payload, paste a JSON body and use [payload variables](/models/automations/create-automations/webhook#payload-variables) such as `${artifact_collection_name}` and `${artifact_version_string}`. 4. Click **Next step**. Give the automation a name and optional description, then click **Create automation**. For more information, see [Create a webhook automation](/models/automations/create-automations/webhook#create-an-automation) (Registry tab). ## Test the automation To confirm that the automation fires end-to-end, trigger the configured event by adding the alias to an artifact version. Add the alias (for example, **production**) to an artifact version in the registry, using the W\&B App or the public API. For example: ```python theme={null} import wandb with wandb.init(project="my-project") as run: artifact = wandb.Artifact("my-model", type="model") # ... log files or metadata to artifact as needed ... run.log_artifact(artifact) run.wait() # Ensure the artifact is logged before proceeding # Add an alias to the latest version in the collection api = wandb.Api() collection = api.artifact_collection(name="my-model", type_name="model") version = next(collection.versions()) # Get the latest version version.aliases.append("production") version.save() print("Added alias 'production' to", version.name) ``` Within a short time, your webhook endpoint receives a `POST` with the payload you configured. You now have a working registry automation that fires whenever a matching alias is applied to an artifact in this registry. ## Go further For more information, see the following resources: * [Automation events and scopes](/models/automations/automation-events) for all project and registry event types. * [Create a Slack automation](/models/automations/create-automations/slack) and [Create a webhook automation](/models/automations/create-automations/webhook) for full UI and payload details. # Automation tutorial overview Source: https://docs.wandb.ai/models/automations/tutorial Learn to build a project run-failure alert or a registry alias automation. W\&B Automations follow this pattern: when an **event** occurs and optional **conditions** are met, an **action** runs automatically. For example: * When a run fails (event), notify a Slack channel (action).

```mermaid theme={null} %%{init: {'flowchart': {'rankSpacing': 200;}}}%% flowchart LR Event[Run state change to Failed] Action[Slack notification] Event --> Action ```
* When the `production` alias is added to an artifact (event), call a webhook to trigger deployment (action).

```mermaid theme={null} flowchart LR Event[Artifact alias added] Action[Webhook] Event --> Action ```
Events and available conditions differ for automations scoped to a [project](/models/automations/automation-events#project) or a [registry](/models/automations/automation-events#registry). See [Automation events and scopes](/models/automations/automation-events). This page links to two end-to-end tutorials that show you how to create an automation in W\&B. Each tutorial walks through a different scenario, so you can follow the one that matches what you want to build. For detailed guidance, select a tutorial. * **[Project automation tutorial](/models/automations/project-automation-tutorial)**: Alert when a run fails (Slack notification). * **[Registry automation tutorial](/models/automations/registry-automation-tutorial)**: Trigger a webhook when an alias (for example, `production`) is added to an artifact. # View an automation's history Source: https://docs.wandb.ai/models/automations/view-automation-history View the execution history of your W&B Automations to check status, triggering events, and action results. Automation execution history is available on W\&B Multi-tenant Cloud, W\&B Dedicated Cloud, and W\&B Self-Managed v0.75.0 and later. This page describes how to view and understand the execution history of your W\&B [automations](/models/automations). Execution history shows what triggered an automation, what actions ran, and whether they succeeded or failed. Reviewing automation history helps you confirm that automations run as expected, troubleshoot failures, and audit which events triggered downstream actions. Each executed automation generates a record that includes: * **Execution timestamp**: When the automation triggered. * **Triggering event**: The specific event that triggered the automation. * **Status**: The execution's status. See [Execution status](#execution-status). * **Action details**: Information about which action ran, such as notifying a Slack channel or running a webhook. * **Result details**: Additional information, if any, about the final outcome of the automation, including the error for a failed execution. Automation history is available for both registry-scoped and project-scoped automations. Select the **Registry** or **Project** tab for detailed instructions. 1. Click **Registry** in the project sidebar. 2. Select your registry from the list. 3. Navigate to the **Automations** tab, which lists the registry's automations. Click the **Last execution** timestamp to view the execution history details. Use the search bar to filter by automation name, and sort by the last triggered date to find recently executed automations. 4. View the registry's automation executions in reverse chronological order in the **Automations history** tab, including the event, action, and status. Click an execution timestamp to view details about a particular execution. If a collection has associated automation executions, the icon Collection automation execution symbol, a circle with a right-pointing arrow displays, along with the number of associated executions. 1. Navigate to your project. 2. Click the **Automations** tab in the project sidebar, which lists the project's automations. Click the **Last execution** timestamp to view the execution history details. Use the search bar to filter by automation name, and sort by the last triggered date to find recently executed automations. 3. In the **History** tab, view all executions of the project's automations in reverse chronological order. Each execution's metadata displays, including the event, action, and status. Click an execution timestamp to view details about a particular execution. ## Execution details After locating an execution in the history, use the following information to interpret its status and outcome. ### Execution status Each automation execution has one of the following statuses: * **Finished**: The automation completed all actions successfully. * **Failed**: The automation encountered an error and didn't complete. * **Pending**: The automation is queued for execution. ### Execution metadata Click any execution in the history to view the following details: * **Event details**: The specific event that triggered the automation, including: * Event type (for example, "New artifact version", "Run completed"). * Entity information, such as run ID or artifact name. * User who triggered the event (if applicable). * **Action details**: Information about what the automation attempted: * Action type (Slack notification or webhook). * Target (Slack channel or webhook URL). * Payload sent (for webhooks). * **Result information**: * Response status (for webhooks). * Any error messages or stack traces (for failed executions). ## Next steps * [Create an automation](/models/automations/create-automations) * [Automation events and scopes](/models/automations/automation-events) * [Create a secret](/platform/secrets) # Integrations overview Source: https://docs.wandb.ai/models/integrations Explore W&B integrations with ML frameworks, cloud platforms, and workflow orchestration tools W\&B integrates with popular machine learning frameworks, cloud platforms, and workflow orchestration tools to help you track experiments, log metrics, and manage models seamlessly. ## Popular ML integrations: Integrate W\&B with your PyTorch Lightning code to add experiment tracking to your pipeline. Optimize HuggingFace Transformer models with W\&B for experiment tracking and model management. Use W\&B and Keras for machine learning experiment tracking, dataset versioning, and project collaboration. Use the "You Only Look Once" (aka YOLOv5) real-time object detection framework and W\&B to track model metrics, inspect model outputs, and restart interrupted runs. If the library you use is not supported natively, you can still integrate W\&B using the W\&B [Python SDK](/models/ref/python). See [Add W\&B to any library](/models/integrations/add-wandb-to-any-library) for best practices and implementation guidance. # Hugging Face Accelerate Source: https://docs.wandb.ai/models/integrations/accelerate Training and inference at scale made simple, efficient and adaptable Hugging Face Accelerate is a library that enables the same PyTorch code to run across any distributed configuration, to simplify model training and inference at scale. Accelerate includes a W\&B Tracker which we show how to use below. You can also read more about [Accelerate Trackers in Hugging Face](https://huggingface.co/docs/accelerate/main/en/usage_guides/tracking). ## Start logging with Accelerate To get started with Accelerate and W\&B you can follow the pseudocode below: ```python theme={null} from accelerate import Accelerator # Tell the Accelerator object to log with wandb accelerator = Accelerator(log_with="wandb") # Initialise your wandb run, passing wandb parameters and any config information accelerator.init_trackers( project_name="my_project", config={"dropout": 0.1, "learning_rate": 1e-2} init_kwargs={"wandb": {"entity": "my-wandb-team"}} ) ... # Log to wandb by calling accelerator.log(); step is optional accelerator.log({"train_loss": 1.12, "valid_loss": 0.8}, step=global_step) # Make sure that the wandb tracker finishes correctly accelerator.end_training() ``` Explaining more, you need to: 1. Pass `log_with="wandb"` when initialising the Accelerator class 2. Call the [`init_trackers`](https://huggingface.co/docs/accelerate/main/en/package_reference/accelerator#accelerate.Accelerator.init_trackers) method and pass it: * a project name via `project_name` * any parameters you want to pass to [`wandb.init()`](/models/ref/python/functions/init) via a nested dict to `init_kwargs` * any other experiment config information you want to log to your wandb run, via `config` 3. Use the `wandb.Run.log()` method to log to Weigths & Biases; the `step` argument is optional 4. Call `.end_training()` when finished training ## Access the W\&B tracker To access the W\&B tracker, use the `Accelerator.get_tracker()` method. Pass in the string corresponding to a tracker’s `.name` attribute, which returns the tracker on the `main` process. ```python theme={null} wandb_tracker = accelerator.get_tracker("wandb") ``` From there you can interact with wandb’s run object like normal: ```python theme={null} wandb_tracker.log_artifact(some_artifact_to_log) ``` Trackers built in Accelerate will automatically execute on the correct process, so if a tracker is only meant to be ran on the main process it will do so automatically. If you want to truly remove Accelerate’s wrapping entirely, you can achieve the same outcome with: ```python theme={null} wandb_tracker = accelerator.get_tracker("wandb", unwrap=True) with accelerator.on_main_process: wandb_tracker.log_artifact(some_artifact_to_log) ``` ## Accelerate articles Below is an Accelerate article you may enjoy
HuggingFace Accelerate Super Charged With W\&B * In this article, we'll look at what HuggingFace Accelerate has to offer and how simple it is to perform distributed training and evaluation, while logging results to W\&B. Read the [Hugging Face Accelerate Super Charged with W\&B report](https://wandb.ai/gladiator/HF%20Accelerate%20+%20W\&B/reports/Hugging-Face-Accelerate-Super-Charged-with-Weights-Biases--VmlldzoyNzk3MDUx?utm_source=docs\&utm_medium=docs\&utm_campaign=accelerate-docs).


# Add W&B to a Python library Source: https://docs.wandb.ai/models/integrations/add-wandb-to-any-library Best practices for integrating Weights & Biases into your Python library for experiment tracking, system monitoring, and model management. This guide explains how to integrate Weights & Biases (W\&B) into a Python library. Follow these recommendations if you are integrating W\&B into a complex codebase—such as a training framework, SDK, or reusable library. If you are new to W\&B, review the core guides (for example, [Experiment Tracking](/models/track/)) before continuing. Below we cover best tips and best practices when the codebase you are working on is more complicated than a single Python training script or Jupyter notebook. ## Decide how users install W\&B Before you start, decide whether W\&B should be a required dependency or an optional feature of your library. ### Require W\&B as a dependency If W\&B is central to your library’s functionality, add the W\&B Python SDK (`wandb`) to your dependencies: ```txt theme={null} torch==1.8.0 ... wandb==0.13.* ``` ### Make W\&B optional on installation If W\&B is an optional feature, allow your library to run without it installed. You can either import `wandb` conditionally in Python or declare it as an optional dependency in `pyproject.toml`. Detect whether `wandb` is available and raise a clear error if a user enables W\&B features without installing it: ```python theme={null} try: import wandb _WANDB_AVAILABLE = True except ImportError: _WANDB_AVAILABLE = False ``` Declare `wandb` as an optional dependency to your `pyproject.toml` file: ```toml theme={null} [project] name = "my_awesome_lib" version = "0.1.0" dependencies = [ "torch", "sklearn" ] [project.optional-dependencies] dev = [ "wandb" ] ``` ## Authenticate users W\&B uses API keys to authenticate users and machines. ### Create an API key An API key authenticates a client or machine to W\&B. You can generate an API key from your user profile. For a more streamlined approach, create an API key by going directly to [User Settings](https://wandb.ai/settings). Copy the newly created API key immediately and save it in a secure location such as a password manager. 1. Click your user profile icon in the upper right corner. 2. Select **User Settings**, then scroll to the **API Keys** section. ### Install and log in to W\&B To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key: ```bash theme={null} export WANDB_API_KEY= ``` 2. Install the `wandb` library and log in: ```bash theme={null} pip install wandb wandb login ``` 1. Navigate to your terminal and install the Python SDK. ```bash theme={null} pip install wandb ``` 2. Log in to W\&B from your Python script or notebook. This will prompt you to enter your API key. ```python theme={null} import wandb wandb.login() ``` Copy and paste the following code snippet into a cell in your Jupyter notebook and run it. This will prompt you to enter your API key. ```notebook theme={null} !pip install wandb import wandb wandb.login() ``` ## Start a run A *run* represents a single unit of computation, such as a training experiment. Most libraries create one run per training job. For more information about runs, see [W\&B Runs](/models/runs/). Initialize a run with [`wandb.init()`](/models/ref/python/functions/init) and specify a name for your project and your team entity (team name). If you do not specify a project, W\&B stores your run in a default project called "uncategorized".: ```python theme={null} with wandb.init(project="", entity="") as run: ... ``` W\&B recommends that you use a context manager to ensure that your run is properly closed, even if an error occurs. If you do not use a context manager, you must call `run.finish()` to close the run and log all the data to W\&B. **When to call `wandb.init()`** Call `wandb.init()` as early as possible. W\&B captures stdout, stderr, and error messages, which makes debugging easier. Wrap your entire training loop in a `wandb.init()` context manager to ensure that all relevant information is captured in the run. This includes any error messages, which can be crucial for debugging. ### Set `wandb` as an optional dependency If you want to make `wandb` optional when your users use your library, you can either: * Define a `wandb` flag such as: ```python theme={null} trainer = my_trainer(..., use_wandb=True) ``` ```bash theme={null} python train.py ... --use-wandb ``` * Or, set `wandb` to be `disabled` in `wandb.init()`: ```python theme={null} wandb.init(mode="disabled") ``` ```bash theme={null} export WANDB_MODE=disabled ``` or ```bash theme={null} wandb disabled ``` * Or, set `wandb` to be offline - note this will still run `wandb`, it just won't try and communicate back to W\&B over the internet: ```bash theme={null} export WANDB_MODE=offline ``` or ```python theme={null} os.environ['WANDB_MODE'] = 'offline' ``` ```bash theme={null} wandb offline ``` ## Define a run config Provide a configuration dictionary when you initialize your run to log hyperparameters and other metadata to W\&B. Use the W\&B App to compare runs based on their config parameters and filter them in the Runs table. You can also use these parameters to group runs together in the W\&B App. For example, in the following image, the batch size (bathch\_size) was defined as a config parameter and is visible(see first column) in the Runs table. This allows users to filter and compare runs based on their batch size: W&B Runs table Typical config parameters values include: * Model name, version, architecture parameters, and hyperparameters. * Dataset name, version, number of training or validation examples. * Training parameters such as learning rate, batch size, and optimizer. The following code snippet shows how to log a config: ```python theme={null} config = {"batch_size": 32, ...} with wandb.init(..., config=config) as run: ... ``` ### Update the run config If values are not available at initialization time, update the config later with `wandb.Run.config.update`. For example, you might want to add a model’s parameters after the model is instantiated: ```python theme={null} with wandb.init(...) as run: model = MyModel(...) run.config.update({"model_parameters": 3500}) ``` For details, see [Configure experiments](/models/track/config/). ## Log metrics and data ### Log metrics Create a dictionary where the key value is the name of the metric. Pass this dictionary object to [`wandb.Run.log()`](/models/ref/python/experiments/run#method-run-log) to log it to W\&B: ```python theme={null} NUM_EPOCHS = 10 for epoch in range(NUM_EPOCHS): for input, ground_truth in data: prediction = model(input) loss = loss_fn(prediction, ground_truth) metrics = { "loss": loss } run.log(metrics) ``` Use metric name prefixes to group related metrics in the W\&B App. Common prefixes include `train/` and `val/` for training and validation metrics, respectively, but you can use any prefix that makes sense for your use case. This will create separate sections in your project's workspace for your training and validation metrics, or other metric types you'd like to separate: ```python theme={null} with wandb.init(...) as run: metrics = { "train/loss": 0.4, "train/learning_rate": 0.4, "val/loss": 0.5, "val/accuracy": 0.7 } run.log(metrics) ``` W&B Workspace See [`wandb.Run.log()`](/models/ref/python/experiments/run#method-run-log) for more details. ### Control the x-axis If you perform multiple calls to `wandb.Run.log()` for the same training step, the wandb SDK increments an internal step counter for each call to `wandb.Run.log()`. This counter may not align with the training step in your training loop. To avoid this situation, define your x-axis step explicitly with `wandb.Run.define_metric()`, one time, immediately after you call `wandb.init()`: ```python theme={null} with wandb.init(...) as run: run.define_metric("*", step_metric="global_step") ``` The glob pattern, `*`, means that every metric will use `global_step` as the x-axis in your charts. If you only want certain metrics to be logged against `global_step`, you can specify them instead: ```python theme={null} run.define_metric("train/loss", step_metric="global_step") ``` Now, log your metrics, your `step` metric, and your `global_step` each time you call `wandb.Run.log()`: ```python theme={null} for step, (input, ground_truth) in enumerate(data): ... run.log({"global_step": step, "train/loss": 0.1}) run.log({"global_step": step, "eval/loss": 0.2}) ``` If you do not have access to the independent step variable, for example "global\_step" is not available during your validation loop, the previously logged value for "global\_step" is automatically used by wandb. In this case, ensure you log an initial value for the metric so it has been defined when it’s needed. ### Log media and structured data In addition to scalars, you can log images, tables, text, audio, video, and more. Some considerations when logging data include: * How often should the metric be logged? Should it be optional? * What type of data could be helpful in visualizing? * For images, you can log sample predictions, segmentation masks, etc., to see the evolution over time. * For text, you can log tables of sample predictions for later exploration. See the [Log objects and media](/models/track/log) for examples. ## Support distributed training For frameworks supporting distributed environments, you can adapt any of the following workflows: * Log only from the main process (recommended). * Log from every process and group runs using a shared `group` name. See [Log Distributed Training Experiments](/models/track/log/distributed-training/) for more details. ## Track models and datasets with artifacts Use [W\&B Artifacts](/models/artifacts/) to track and version models and datasets. Artifacts provide storage and versioning for machine learning assets, and they automatically track lineage to show how data and models are related. Stored Datasets and Model Checkpoints in W&B Consider the following when integrating artifacts into your library: * Whether to log model checkpoints or datasets as artifacts (in case you want to make it optional). * Artifact input references (for example, `entity/project/artifact`). * Logging frequency of model checkpoints or datasets. For example, every epoch, every 500 steps, and so on. ### Log model checkpoints Log model checkpoints to W\&B. A common approach is to log checkpoints as artifacts using the unique run ID generated by W\&B as part of the artifact name. ```python theme={null} metadata = {"eval/accuracy": 0.8, "train/steps": 800} artifact = wandb.Artifact( name=f"model-{run.id}", metadata=metadata, type="model" ) artifact.add_dir("output_model") # local directory where the model weights are stored aliases = ["best", "epoch_10"] run.log_artifact(artifact, aliases=aliases) ``` The previous code snippet demonstrates how to log a model checkpoint as an artifact and add metadata such as evaluation accuracy and training steps. The artifact is given a name that includes the unique run ID, and it is tagged with [custom aliases](/models/artifacts/create-a-custom-alias/) for easy reference. ### Log input artifacts Log datasets or pretrained models used as inputs: ```python theme={null} dataset = wandb.Artifact(name="flowers", type="dataset") dataset.add_file("flowers.npy") run.use_artifact(dataset) ``` The previous code snippet creates an artifact for a dataset called "flowers" and adds a file to it. The artifact is then associated with the current run using `run.use_artifact()`, which allows W\&B to track the lineage of the dataset used in the run. ### Download artifacts Download previously logged artifacts from W\&B to use in your training or inference code. If you have a run context, use [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run) to reference an artifact in W\&B and then call [`wandb.Artifact.download()`](/models/ref/python/experiments/artifact) to download it to a local directory. ```python theme={null} with wandb.init(...) as run: artifact = run.use_artifact("user/project/artifact:latest") local_path = artifact.download() ``` Use the [W\&B Public API](/models/ref/python/public-api/) to reference and download an artifact without initializing a run. This is useful in scenarios such as distributed environments or when performing inference, where you may not want to create a new run. ```python theme={null} import wandb artifact = wandb.Api().artifact("user/project/artifact:latest") local_path = artifact.download() ``` See [Download and use artifacts](/models/artifacts/download-and-use-an-artifact/) for more information. ## Tune hyper-parameters If your library supports hyperparameter tuning, you can integrate [W\&B Sweeps](/models/sweeps/) to manage and visualize experiments. # Azure OpenAI Fine-Tuning Source: https://docs.wandb.ai/models/integrations/azure-openai-fine-tuning Fine-tune Azure OpenAI models with W&B experiment tracking to log metrics, hyperparameters, and training progress. ## Introduction Fine-tuning GPT-3.5 or GPT-4 models on Microsoft Azure using W\&B tracks, analyzes, and improves model performance by automatically capturing metrics and facilitating systematic evaluation through W\&B's experiment tracking and evaluation tools. Azure OpenAI fine-tuning metrics ## Prerequisites * Set up Azure OpenAI service according to [official Azure documentation](https://wandb.me/aoai-wb-int). * Configure a W\&B account with an API key. ## Workflow overview ### 1. Fine-tuning setup * Prepare training data according to Azure OpenAI requirements. * Configure the fine-tuning job in Azure OpenAI. * W\&B automatically tracks the fine-tuning process, logging metrics and hyperparameters. ### 2. Experiment tracking During fine-tuning, W\&B captures: * Training and validation metrics * Model hyperparameters * Resource utilization * Training artifacts ### 3. Model evaluation After fine-tuning, use [W\&B Weave](https://weave-docs.wandb.ai) to: * Evaluate model outputs against reference datasets * Compare performance across different fine-tuning runs * Analyze model behavior on specific test cases * Make data-driven decisions for model selection ## Real-world example * Explore the [medical note generation demo](https://wandb.me/aoai-ft-colab) to see how this integration facilitates: * Systematic tracking of fine-tuning experiments * Model evaluation using domain-specific metrics * Go through an [interactive demo of fine-tuning a notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/azure/azure_gpt_medical_notes.ipynb) ## Additional resources * [Azure OpenAI W\&B Integration Guide](https://wandb.me/aoai-wb-int) * [Azure OpenAI Fine-tuning Documentation](https://learn.microsoft.com/azure/ai-services/openai/how-to/fine-tuning?tabs=turbo%2Cpython\&pivots=programming-language-python) # Hugging Face Diffusers Source: https://docs.wandb.ai/models/integrations/diffusers Use W&B autolog with Hugging Face Diffusers to track prompts, generated media, configs, and pipeline architecture. [Hugging Face Diffusers](https://huggingface.co/docs/diffusers/index) is the go-to library for state-of-the-art pre-trained diffusion models for generating images, audio, and even 3D structures of molecules. The W\&B integration adds rich, flexible experiment tracking, media visualization, pipeline architecture, and configuration management to interactive centralized dashboards without compromising that ease of use. ## Next-level logging in just two lines Log all the prompts, negative prompts, generated media, and configs associated with your experiment by simply including 2 lines of code. Here are the 2 lines of code to begin logging: ```python theme={null} # import the autolog function from wandb.integration.diffusers import autolog # call the autolog before calling the pipeline autolog(init=dict(project="diffusers_logging")) ``` Experiment results logging ## Get started 1. Install `diffusers`, `transformers`, `accelerate`, and `wandb`. * Command line: ```shell theme={null} pip install --upgrade diffusers transformers accelerate wandb ``` * Notebook: ```bash theme={null} !pip install --upgrade diffusers transformers accelerate wandb ``` 2. Use `autolog` to initialize a W\&B Run and automatically track the inputs and the outputs from [all supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). You can call the `autolog()` function with the `init` parameter, which accepts a dictionary of parameters required by [`wandb.init()`](/models/ref/python/functions/init). When you call `autolog()`, it initializes a W\&B Run and automatically tracks the inputs and the outputs from [all supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). * Each pipeline call is tracked into its own [table](/models/tables/) in the workspace, and the configs associated with the pipeline call is appended to the list of workflows in the configs for that run. * The prompts, negative prompts, and the generated media are logged in a [`wandb.Table`](/models/tables/). * All other configs associated with the experiment including seed and the pipeline architecture are stored in the config section for the run. * The generated media for each pipeline call are also logged in [media panels](/models/track/log/media/) in the run. You can find a [list of supported pipeline calls](https://github.com/wandb/wandb/blob/main/wandb/integration/diffusers/autologger.py#L12-L72). In case, you want to request a new feature of this integration or report a bug associated with it, open an issue on the [W\&B GitHub issues page](https://github.com/wandb/wandb/issues). ## Examples ### Autologging Here is a brief end-to-end example of the autolog in action: ```python theme={null} import torch from diffusers import DiffusionPipeline # import the autolog function from wandb.integration.diffusers import autolog # call the autolog before calling the pipeline autolog(init=dict(project="diffusers_logging")) # Initialize the diffusion pipeline pipeline = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16 ).to("cuda") # Define the prompts, negative prompts, and seed. prompt = ["a photograph of an astronaut riding a horse", "a photograph of a dragon"] negative_prompt = ["ugly, deformed", "ugly, deformed"] generator = torch.Generator(device="cpu").manual_seed(10) # call the pipeline to generate the images images = pipeline( prompt, negative_prompt=negative_prompt, num_images_per_prompt=2, generator=generator, ) ``` ```python theme={null} import torch from diffusers import DiffusionPipeline import wandb # import the autolog function from wandb.integration.diffusers import autolog run = wandb.init() # call the autolog before calling the pipeline autolog(init=dict(project="diffusers_logging")) # Initialize the diffusion pipeline pipeline = DiffusionPipeline.from_pretrained( "stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16 ).to("cuda") # Define the prompts, negative prompts, and seed. prompt = ["a photograph of an astronaut riding a horse", "a photograph of a dragon"] negative_prompt = ["ugly, deformed", "ugly, deformed"] generator = torch.Generator(device="cpu").manual_seed(10) # call the pipeline to generate the images images = pipeline( prompt, negative_prompt=negative_prompt, num_images_per_prompt=2, generator=generator, ) # Finish the experiment run.finish() ``` * The results of a single experiment: Experiment results logging * The results of multiple experiments: Experiment results logging * The config of an experiment: Experiment config logging You need to explicitly call [`wandb.Run.finish()`](/models/ref/python/functions/finish) when executing the code in IPython notebook environments after calling the pipeline. This is not necessary when executing python scripts. ### Tracking multi-pipeline workflows This section demonstrates the autolog with a typical [Stable Diffusion XL + Refiner](https://huggingface.co/docs/diffusers/using-diffusers/sdxl#base-to-refiner-model) workflow, in which the latents generated by the [`StableDiffusionXLPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl) is refined by the corresponding refiner. ```python theme={null} import torch from diffusers import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline from wandb.integration.diffusers import autolog # initialize the SDXL base pipeline base_pipeline = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True, ) base_pipeline.enable_model_cpu_offload() # initialize the SDXL refiner pipeline refiner_pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-refiner-1.0", text_encoder_2=base_pipeline.text_encoder_2, vae=base_pipeline.vae, torch_dtype=torch.float16, use_safetensors=True, variant="fp16", ) refiner_pipeline.enable_model_cpu_offload() prompt = "a photo of an astronaut riding a horse on mars" negative_prompt = "static, frame, painting, illustration, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, deformed toes standing still, posing" # Make the experiment reproducible by controlling randomness. # The seed would be automatically logged to WandB. seed = 42 generator_base = torch.Generator(device="cuda").manual_seed(seed) generator_refiner = torch.Generator(device="cuda").manual_seed(seed) # Call WandB Autolog for Diffusers. This would automatically log # the prompts, generated images, pipeline architecture and all # associated experiment configs to W&B, thus making your # image generation experiments easy to reproduce, share and analyze. autolog(init=dict(project="sdxl")) # Call the base pipeline to generate the latents image = base_pipeline( prompt=prompt, negative_prompt=negative_prompt, output_type="latent", generator=generator_base, ).images[0] # Call the refiner pipeline to generate the refined image image = refiner_pipeline( prompt=prompt, negative_prompt=negative_prompt, image=image[None, :], generator=generator_refiner, ).images[0] ``` ```python theme={null} import torch from diffusers import StableDiffusionXLImg2ImgPipeline, StableDiffusionXLPipeline import wandb from wandb.integration.diffusers import autolog run = wandb.init() # initialize the SDXL base pipeline base_pipeline = StableDiffusionXLPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True, ) base_pipeline.enable_model_cpu_offload() # initialize the SDXL refiner pipeline refiner_pipeline = StableDiffusionXLImg2ImgPipeline.from_pretrained( "stabilityai/stable-diffusion-xl-refiner-1.0", text_encoder_2=base_pipeline.text_encoder_2, vae=base_pipeline.vae, torch_dtype=torch.float16, use_safetensors=True, variant="fp16", ) refiner_pipeline.enable_model_cpu_offload() prompt = "a photo of an astronaut riding a horse on mars" negative_prompt = "static, frame, painting, illustration, sd character, low quality, low resolution, greyscale, monochrome, nose, cropped, lowres, jpeg artifacts, deformed iris, deformed pupils, bad eyes, semi-realistic worst quality, bad lips, deformed mouth, deformed face, deformed fingers, deformed toes standing still, posing" # Make the experiment reproducible by controlling randomness. # The seed would be automatically logged to WandB. seed = 42 generator_base = torch.Generator(device="cuda").manual_seed(seed) generator_refiner = torch.Generator(device="cuda").manual_seed(seed) # Call WandB Autolog for Diffusers. This would automatically log # the prompts, generated images, pipeline architecture and all # associated experiment configs to W&B, thus making your # image generation experiments easy to reproduce, share and analyze. autolog(init=dict(project="sdxl")) # Call the base pipeline to generate the latents image = base_pipeline( prompt=prompt, negative_prompt=negative_prompt, output_type="latent", generator=generator_base, ).images[0] # Call the refiner pipeline to generate the refined image image = refiner_pipeline( prompt=prompt, negative_prompt=negative_prompt, image=image[None, :], generator=generator_refiner, ).images[0] # Finish the experiment run.finish() ``` * Example of a Stable Diffisuion XL + Refiner experiment: Stable Diffusion XL experiment tracking ## More resources * [A Guide to Prompt Engineering for Stable Diffusion](https://wandb.ai/geekyrakshit/diffusers-prompt-engineering/reports/A-Guide-to-Prompt-Engineering-for-Stable-Diffusion--Vmlldzo1NzY4NzQ3) * [PIXART-α: A Diffusion Transformer Model for Text-to-Image Generation](https://wandb.ai/geekyrakshit/pixart-alpha/reports/PIXART-A-Diffusion-Transformer-Model-for-Text-to-Image-Generation--Vmlldzo2MTE1NzM3) # Hugging Face Source: https://docs.wandb.ai/models/integrations/huggingface Visualize and track Hugging Face model performance with W&B, logging hyperparameters, metrics, and GPU utilization. Visualize your [Hugging Face](https://github.com/huggingface/transformers) model's performance quickly with a seamless [W\&B](https://wandb.ai/site) integration. Compare hyperparameters, output metrics, and system stats like GPU utilization across your models. ## Why should I use W\&B? Benefits of using W&B * **Unified dashboard**: Central repository for all your model metrics and predictions * **Lightweight**: No code changes required to integrate with Hugging Face * **Accessible**: Free for individuals and academic teams * **Secure**: All projects are private by default * **Trusted**: Used by machine learning teams at OpenAI, Toyota, Lyft and more Think of W\&B like GitHub for machine learning models— save machine learning experiments to your private, hosted dashboard. Experiment quickly with the confidence that all the versions of your models are saved for you, no matter where you're running your scripts. W\&B lightweight integrations works with any Python script, and all you need to do is sign up for a free W\&B account to start tracking and visualizing your models. In the Hugging Face Transformers repo, we've instrumented the Trainer to automatically log training and evaluation metrics to W\&B at each logging step. Here's an in depth look at how the integration works: [Hugging Face + W\&B Report](https://app.wandb.ai/jxmorris12/huggingface-demo/reports/Train-a-model-with-Hugging-Face-and-Weights-%26-Biases--VmlldzoxMDE2MTU). ## Install, import, and log in Install the Hugging Face and W\&B libraries, and the GLUE dataset and training script for this tutorial. * [Hugging Face Transformers](https://github.com/huggingface/transformers): Natural language models and datasets * [W\&B](/): Experiment tracking and visualization * [GLUE dataset](https://gluebenchmark.com/): A language understanding benchmark dataset * [GLUE script](https://raw.githubusercontent.com/huggingface/transformers/refs/heads/main/examples/pytorch/text-classification/run_glue.py): Model training script for sequence classification ```notebook theme={null} !pip install datasets wandb evaluate accelerate -qU !wget https://raw.githubusercontent.com/huggingface/transformers/refs/heads/main/examples/pytorch/text-classification/run_glue.py ``` ```notebook theme={null} # the run_glue.py script requires transformers dev !pip install -q git+https://github.com/huggingface/transformers ``` Before continuing, [sign up for a free account](https://app.wandb.ai/login?signup=true). ## Put in your API key Once you've signed up, run the next cell and click on the link to get your API key and authenticate this notebook. ```python theme={null} import wandb wandb.login() ``` Optionally, we can set environment variables to customize W\&B logging. See the [Hugging Face integration guide](/models/integrations/huggingface/). ```python theme={null} # Optional: log both gradients and parameters %env WANDB_WATCH=all ``` ## Train the model Next, call the downloaded training script [run\_glue.py](https://huggingface.co/transformers/examples.html#glue) and see training automatically get tracked to the W\&B dashboard. This script fine-tunes BERT on the Microsoft Research Paraphrase Corpus— pairs of sentences with human annotations indicating whether they are semantically equivalent. ```python theme={null} %env WANDB_PROJECT=huggingface-demo %env TASK_NAME=MRPC !python run_glue.py \ --model_name_or_path bert-base-uncased \ --task_name $TASK_NAME \ --do_train \ --do_eval \ --max_seq_length 256 \ --per_device_train_batch_size 32 \ --learning_rate 2e-4 \ --num_train_epochs 3 \ --output_dir /tmp/$TASK_NAME/ \ --overwrite_output_dir \ --logging_steps 50 ``` ## Visualize results in dashboard Click the link printed out above, or go to [wandb.ai](https://app.wandb.ai) to see your results stream in live. The link to see your run in the browser will appear after all the dependencies are loaded. Look for the following output: "**wandb**: View run at \[URL to your unique run]" **Visualize Model Performance** It's easy to look across dozens of experiments, zoom in on interesting findings, and visualize highly dimensional data. Model metrics dashboard **Compare Architectures** Here's an example comparing [BERT vs DistilBERT](https://app.wandb.ai/jack-morris/david-vs-goliath/reports/Does-model-size-matter%3F-Comparing-BERT-and-DistilBERT-using-Sweeps--VmlldzoxMDUxNzU). It's easy to see how different architectures effect the evaluation accuracy throughout training with automatic line plot visualizations. BERT vs DistilBERT comparison ## Track key information effortlessly by default W\&B saves a new run for each experiment. Here's the information that gets saved by default: * **Hyperparameters**: Settings for your model are saved in Config * **Model Metrics**: Time series data of metrics streaming in are saved in Log * **Terminal Logs**: Command line outputs are saved and available in a tab * **System Metrics**: GPU and CPU utilization, memory, temperature etc. ## Learn more * [Video walkthroughs on YouTube](http://wandb.me/youtube) # Hugging Face Transformers Source: https://docs.wandb.ai/models/integrations/huggingface_transformers Use W&B with Hugging Face Transformers Trainer for experiment tracking, model checkpointing, and dataset versioning. The [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) library makes state-of-the-art NLP models like BERT and training techniques like mixed precision and gradient checkpointing easy to use. The [W\&B integration](https://huggingface.co/transformers/main_classes/callback.html#transformers.integrations.WandbCallback) adds rich, flexible experiment tracking and model versioning to interactive centralized dashboards without compromising that ease of use. ## Next-level logging in few lines ```python theme={null} os.environ["WANDB_PROJECT"] = "" # name your W&B project os.environ["WANDB_LOG_MODEL"] = "checkpoint" # log all model checkpoints from transformers import TrainingArguments, Trainer args = TrainingArguments(..., report_to="wandb") # turn on W&B logging trainer = Trainer(..., args=args) ``` HuggingFace dashboard If you'd rather dive straight into working code, check out this [Google Colab](https://wandb.me/hf). ## Get started: track experiments ### Sign up and create an API key An API key authenticates your machine to W\&B. You can generate an API key from your user profile. For a more streamlined approach, create an API key by going directly to [User Settings](https://wandb.ai/settings). Copy the newly created API key immediately and save it in a secure location such as a password manager. 1. Click your user profile icon in the upper right corner. 2. Select **User Settings**, then scroll to the **API Keys** section. ### Install the `wandb` library and log in To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash theme={null} export WANDB_API_KEY= ``` 2. Install the `wandb` library and log in. ```shell theme={null} pip install wandb wandb login ``` ```bash theme={null} pip install wandb ``` ```python theme={null} import wandb wandb.login() ``` ```notebook theme={null} !pip install wandb import wandb wandb.login() ``` If you are using W\&B for the first time you might want to check out our [quickstart](/models/quickstart/) ### Name the project A W\&B Project is where all of the charts, data, and models logged from related runs are stored. Naming your project helps you organize your work and keep all the information about a single project in one place. To add a run to a project simply set the `WANDB_PROJECT` environment variable to the name of your project. The `WandbCallback` will pick up this project name environment variable and use it when setting up your run. ```bash theme={null} WANDB_PROJECT=amazon_sentiment_analysis ``` ```python theme={null} import os os.environ["WANDB_PROJECT"]="amazon_sentiment_analysis" ``` ```notebook theme={null} %env WANDB_PROJECT=amazon_sentiment_analysis ``` Make sure you set the project name *before* you initialize the `Trainer`. If a project name is not specified the project name defaults to `huggingface`. ### Log your training runs to W\&B This is **the most important step** when defining your `Trainer` training arguments, either inside your code or from the command line, is to set `report_to` to `"wandb"` in order enable logging with W\&B. The `logging_steps` argument in `TrainingArguments` will control how often training metrics are pushed to W\&B during training. You can also give a name to the training run in W\&B using the `run_name` argument. That's it. Now your models will log losses, evaluation metrics, model topology, and gradients to W\&B while they train. ```bash theme={null} python run_glue.py \ # run your Python script --report_to wandb \ # enable logging to W&B --run_name bert-base-high-lr \ # name of the W&B run (optional) # other command line arguments here ``` ```python theme={null} from transformers import TrainingArguments, Trainer args = TrainingArguments( # other args and kwargs here report_to="wandb", # enable logging to W&B run_name="bert-base-high-lr", # name of the W&B run (optional) logging_steps=1, # how often to log to W&B ) trainer = Trainer( # other args and kwargs here args=args, # your training args ) trainer.train() # start training and logging to W&B ``` Using TensorFlow? Just swap the PyTorch `Trainer` for the TensorFlow `TFTrainer`. ### Turn on model checkpointing Using [Artifacts](/models/artifacts/), you can store up to 100GB of models and datasets for free and then use the W\&B [Registry](/models/registry/). Using Registry, you can register models to explore and evaluate them, prepare them for staging, or deploy them in your production environment. To log your Hugging Face model checkpoints to Artifacts, set the `WANDB_LOG_MODEL` environment variable to *one* of: * **`checkpoint`**: Upload a checkpoint every `args.save_steps` from the [`TrainingArguments`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments). * **`end`**: Upload the model at the end of training, if `load_best_model_at_end` is also set. * **`false`**: Do not upload the model. ```bash theme={null} WANDB_LOG_MODEL="checkpoint" ``` ```python theme={null} import os os.environ["WANDB_LOG_MODEL"] = "checkpoint" ``` ```notebook theme={null} %env WANDB_LOG_MODEL="checkpoint" ``` Any Transformers `Trainer` you initialize from now on will upload models to your W\&B project. The model checkpoints you log will be viewable through the [Artifacts](/models/artifacts/) UI, and include the full model lineage (see an example model checkpoint in the UI [here](https://wandb.ai/wandb/arttest/artifacts/model/iv3_trained/5334ab69740f9dda4fed/lineage?_gl=1*yyql5q*_ga*MTQxOTYyNzExOS4xNjg0NDYyNzk1*_ga_JH1SJHJQXJ*MTY5MjMwNzI2Mi4yNjkuMS4xNjkyMzA5NjM2LjM3LjAuMA..)). By default, your model will be saved to W\&B Artifacts as `model-{run_id}` when `WANDB_LOG_MODEL` is set to `end` or `checkpoint-{run_id}` when `WANDB_LOG_MODEL` is set to `checkpoint`. However, If you pass a [`run_name`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments.run_name) in your `TrainingArguments`, the model will be saved as `model-{run_name}` or `checkpoint-{run_name}`. #### W\&B Registry Once you have logged your checkpoints to Artifacts, you can then register your best model checkpoints and centralize them across your team with [Registry](/models/registry/). Using Registry, you can organize your best models by task, manage the lifecycles of models, track and audit the entire ML lifecycle, and [automate](/models/automations/) downstream actions. To link a model Artifact, refer to [Registry](/models/registry/). ### Visualise evaluation outputs during training Visualing your model outputs during training or evaluation is often essential to really understand how your model is training. By using the callbacks system in the Transformers Trainer, you can log additional helpful data to W\&B such as your models' text generation outputs or other predictions to W\&B Tables. See the [Custom logging section](#custom-logging-log-and-view-evaluation-samples-during-training) below for a full guide on how to log evaluation outputs while training to log to a W\&B Table like this: Shows a W&B Table with evaluation outputs ### Finish your W\&B Run (Notebook only) If your training is encapsulated in a Python script, the W\&B run will end when your script finishes. If you are using a Jupyter or Google Colab notebook, you'll need to tell us when you're done with training by calling `run.finish()`. ```python theme={null} run = wandb.init() trainer.train() # start training and logging to W&B # post-training analysis, testing, other logged code run.finish() ``` ### Visualize your results Once you have logged your training results you can explore your results dynamically in the [W\&B Dashboard](/models/track/workspaces/). It's easy to compare across dozens of runs at once, zoom in on interesting findings, and coax insights out of complex data with flexible, interactive visualizations. ## Advanced features and FAQs ### How do I save the best model? If you pass `TrainingArguments` with `load_best_model_at_end=True` to your `Trainer`, W\&B saves the best performing model checkpoint to Artifacts. If you save your model checkpoints as Artifacts, you can promote them to the [Registry](/models/registry/). In Registry, you can: * Organize your best model versions by ML task. * Centralize models and share them with your team. * Stage models for production or bookmark them for further evaluation. * Trigger downstream CI/CD processes. ### How do I load a saved model? If you saved your model to W\&B Artifacts with `WANDB_LOG_MODEL`, you can download your model weights for additional training or to run inference. You just load them back into the same Hugging Face architecture that you used before. ```python theme={null} # Create a new run with wandb.init(project="amazon_sentiment_analysis") as run: # Pass the name and version of Artifact my_model_name = "model-bert-base-high-lr:latest" my_model_artifact = run.use_artifact(my_model_name) # Download model weights to a folder and return the path model_dir = my_model_artifact.download() # Load your Hugging Face model from that folder # using the same model class model = AutoModelForSequenceClassification.from_pretrained( model_dir, num_labels=num_labels ) # Do additional training, or run inference ``` ### How do I resume training from a checkpoint? If you had set `WANDB_LOG_MODEL='checkpoint'` you can also resume training by you can using the `model_dir` as the `model_name_or_path` argument in your `TrainingArguments` and pass `resume_from_checkpoint=True` to `Trainer`. ```python theme={null} last_run_id = "xxxxxxxx" # fetch the run_id from your wandb workspace # resume the wandb run from the run_id with wandb.init( project=os.environ["WANDB_PROJECT"], id=last_run_id, resume="must", ) as run: # Connect an Artifact to the run my_checkpoint_name = f"checkpoint-{last_run_id}:latest" my_checkpoint_artifact = run.use_artifact(my_model_name) # Download checkpoint to a folder and return the path checkpoint_dir = my_checkpoint_artifact.download() # reinitialize your model and trainer model = AutoModelForSequenceClassification.from_pretrained( "", num_labels=num_labels ) # your awesome training arguments here. training_args = TrainingArguments() trainer = Trainer(model=model, args=training_args) # make sure use the checkpoint dir to resume training from the checkpoint trainer.train(resume_from_checkpoint=checkpoint_dir) ``` ### How do I log and view evaluation samples during training Logging to W\&B via the Transformers `Trainer` is taken care of by the [`WandbCallback`](https://huggingface.co/transformers/main_classes/callback.html#transformers.integrations.WandbCallback) in the Transformers library. If you need to customize your Hugging Face logging you can modify this callback by subclassing `WandbCallback` and adding additional functionality that leverages additional methods from the Trainer class. Below is the general pattern to add this new callback to the HF Trainer, and further down is a code-complete example to log evaluation outputs to a W\&B Table: ```python theme={null} # Instantiate the Trainer as normal trainer = Trainer() # Instantiate the new logging callback, passing it the Trainer object evals_callback = WandbEvalsCallback(trainer, tokenizer, ...) # Add the callback to the Trainer trainer.add_callback(evals_callback) # Begin Trainer training as normal trainer.train() ``` #### View evaluation samples during training The following section shows how to customize the `WandbCallback` to run model predictions and log evaluation samples to a W\&B Table during training. We will every `eval_steps` using the `on_evaluate` method of the Trainer callback. Here, we wrote a `decode_predictions` function to decode the predictions and labels from the model output using the tokenizer. Then, we create a pandas DataFrame from the predictions and labels and add an `epoch` column to the DataFrame. Finally, we create a `wandb.Table` from the DataFrame and log it to wandb. Additionally, we can control the frequency of logging by logging the predictions every `freq` epochs. **Note**: Unlike the regular `WandbCallback` this custom callback needs to be added to the trainer **after** the `Trainer` is instantiated and not during initialization of the `Trainer`. This is because the `Trainer` instance is passed to the callback during initialization. ```python theme={null} from transformers.integrations import WandbCallback import pandas as pd def decode_predictions(tokenizer, predictions): labels = tokenizer.batch_decode(predictions.label_ids) logits = predictions.predictions.argmax(axis=-1) prediction_text = tokenizer.batch_decode(logits) return {"labels": labels, "predictions": prediction_text} class WandbPredictionProgressCallback(WandbCallback): """Custom WandbCallback to log model predictions during training. This callback logs model predictions and labels to a wandb.Table at each logging step during training. It allows to visualize the model predictions as the training progresses. Attributes: trainer (Trainer): The Hugging Face Trainer instance. tokenizer (AutoTokenizer): The tokenizer associated with the model. sample_dataset (Dataset): A subset of the validation dataset for generating predictions. num_samples (int, optional): Number of samples to select from the validation dataset for generating predictions. Defaults to 100. freq (int, optional): Frequency of logging. Defaults to 2. """ def __init__(self, trainer, tokenizer, val_dataset, num_samples=100, freq=2): """Initializes the WandbPredictionProgressCallback instance. Args: trainer (Trainer): The Hugging Face Trainer instance. tokenizer (AutoTokenizer): The tokenizer associated with the model. val_dataset (Dataset): The validation dataset. num_samples (int, optional): Number of samples to select from the validation dataset for generating predictions. Defaults to 100. freq (int, optional): Frequency of logging. Defaults to 2. """ super().__init__() self.trainer = trainer self.tokenizer = tokenizer self.sample_dataset = val_dataset.select(range(num_samples)) self.freq = freq def on_evaluate(self, args, state, control, **kwargs): super().on_evaluate(args, state, control, **kwargs) # control the frequency of logging by logging the predictions # every `freq` epochs if state.epoch % self.freq == 0: # generate predictions predictions = self.trainer.predict(self.sample_dataset) # decode predictions and labels predictions = decode_predictions(self.tokenizer, predictions) # add predictions to a wandb.Table predictions_df = pd.DataFrame(predictions) predictions_df["epoch"] = state.epoch records_table = self._wandb.Table(dataframe=predictions_df) # log the table to wandb self._wandb.log({"sample_predictions": records_table}) # First, instantiate the Trainer trainer = Trainer( model=model, args=training_args, train_dataset=lm_datasets["train"], eval_dataset=lm_datasets["validation"], ) # Instantiate the WandbPredictionProgressCallback progress_callback = WandbPredictionProgressCallback( trainer=trainer, tokenizer=tokenizer, val_dataset=lm_dataset["validation"], num_samples=10, freq=2, ) # Add the callback to the trainer trainer.add_callback(progress_callback) ``` For a more detailed example please refer to this [colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/huggingface/Custom_Progress_Callback.ipynb) ### What additional W\&B settings are available? Further configuration of what is logged with `Trainer` is possible by setting environment variables. A full list of W\&B environment variables [can be found here](/platform/hosting/env-vars). | Environment Variable | Usage | | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `WANDB_PROJECT` | Give your project a name (`huggingface` by default) | | `WANDB_LOG_MODEL` |

Log the model checkpoint as a W\&B Artifact (`false` by default)

  • false (default): No model checkpointing
  • checkpoint: A checkpoint will be uploaded every args.save\_steps (set in the Trainer's TrainingArguments).
  • end: The final model checkpoint will be uploaded at the end of training.
| | `WANDB_WATCH` |

Set whether you'd like to log your models gradients, parameters or neither

  • false (default): No gradient or parameter logging
  • gradients: Log histograms of the gradients
  • all: Log histograms of gradients and parameters
| | `WANDB_DISABLED` | Set to `true` to turn off logging entirely (`false` by default) | | `WANDB_QUIET`. | Set to `true` to limit statements logged to standard output to critical statements only (`false` by default) | | `WANDB_SILENT` | Set to `true` to silence the output printed by wandb (`false` by default) | ```bash theme={null} WANDB_WATCH=all WANDB_SILENT=true ``` ```notebook theme={null} %env WANDB_WATCH=all %env WANDB_SILENT=true ``` ### How do I customize `wandb.init()`? The `WandbCallback` that `Trainer` uses will call `wandb.init()` under the hood when `Trainer` is initialized. You can alternatively set up your runs manually by calling `wandb.init()` before the`Trainer` is initialized. This gives you full control over your W\&B run configuration. An example of what you might want to pass to `init` is below. For `wandb.init()` details, see the [`wandb.init()` reference](/models/ref/python/functions/init). ```python theme={null} wandb.init( project="amazon_sentiment_analysis", name="bert-base-high-lr", tags=["baseline", "high-lr"], group="bert", ) ``` ## Additional resources Below are 6 Transformers and W\&B related articles you might enjoy
Hyperparameter Optimization for Hugging Face Transformers * Three strategies for hyperparameter optimization for Hugging Face Transformers are compared: Grid Search, Bayesian Optimization, and Population Based Training. * We use a standard uncased BERT model from Hugging Face transformers, and we want to fine-tune on the RTE dataset from the SuperGLUE benchmark * Results show that Population Based Training is the most effective approach to hyperparameter optimization of our Hugging Face transformer model. Read the [Hyperparameter Optimization for Hugging Face Transformers report](https://wandb.ai/amogkam/transformers/reports/Hyperparameter-Optimization-for-Hugging-Face-Transformers--VmlldzoyMTc2ODI).
Hugging Tweets: Train a Model to Generate Tweets * In the article, the author demonstrates how to fine-tune a pre-trained GPT2 HuggingFace Transformer model on anyone's Tweets in five minutes. * The model uses the following pipeline: Downloading Tweets, Optimizing the Dataset, Initial Experiments, Comparing Losses Between Users, Fine-Tuning the Model. Read the full report [here](https://wandb.ai/wandb/huggingtweets/reports/HuggingTweets-Train-a-Model-to-Generate-Tweets--VmlldzoxMTY5MjI).
Sentence Classification With Hugging Face BERT and WB * In this article, we'll build a sentence classifier leveraging the power of recent breakthroughs in Natural Language Processing, focusing on an application of transfer learning to NLP. * We'll be using The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification, which is a set of sentences labeled as grammatically correct or incorrect that was first published in May 2018. * We'll use Google's BERT to create high performance models with minimal effort on a range of NLP tasks. Read the full report [here](https://wandb.ai/cayush/bert-finetuning/reports/Sentence-Classification-With-Huggingface-BERT-and-W-B--Vmlldzo4MDMwNA).
A Step by Step Guide to Tracking Hugging Face Model Performance * We use W\&B and Hugging Face transformers to train DistilBERT, a Transformer that's 40% smaller than BERT but retains 97% of BERT's accuracy, on the GLUE benchmark * The GLUE benchmark is a collection of nine datasets and tasks for training NLP models Read the full report [here](https://wandb.ai/jxmorris12/huggingface-demo/reports/A-Step-by-Step-Guide-to-Tracking-HuggingFace-Model-Performance--VmlldzoxMDE2MTU).
Examples of Early Stopping in HuggingFace * Fine-tuning a Hugging Face Transformer using Early Stopping regularization can be done natively in PyTorch or TensorFlow. * Using the EarlyStopping callback in TensorFlow is straightforward with the `tf.keras.callbacks.EarlyStopping`callback. * In PyTorch, there is not an off-the-shelf early stopping method, but there is a working early stopping hook available on GitHub Gist. Read the full report [here](https://wandb.ai/ayush-thakur/huggingface/reports/Early-Stopping-in-HuggingFace-Examples--Vmlldzo0MzE2MTM).
How to Fine-Tune Hugging Face Transformers on a Custom Dataset We fine tune a DistilBERT transformer for sentiment analysis (binary classification) on a custom IMDB dataset. Read the full report [here](https://wandb.ai/ayush-thakur/huggingface/reports/How-to-Fine-Tune-HuggingFace-Transformers-on-a-Custom-Dataset--Vmlldzo0MzQ2MDc).
## Get help or request features For any issues, questions, or feature requests for the Hugging Face W\&B integration, feel free to post in [this thread on the Hugging Face forums](https://discuss.huggingface.co/t/logging-experiment-tracking-with-w-b/498) or open an issue on the Hugging Face [Transformers GitHub repo](https://github.com/huggingface/transformers). # PyTorch Ignite Source: https://docs.wandb.ai/models/integrations/ignite Integrate W&B with PyTorch Ignite to automatically log training metrics, model parameters, and experiment configs. * See the resulting visualizations in this [example W\&B report →](https://app.wandb.ai/example-team/pytorch-ignite-example/reports/PyTorch-Ignite-with-W%26B--Vmlldzo0NzkwMg) * Try running the code yourself in this [example hosted notebook →](https://colab.research.google.com/drive/15e-yGOvboTzXU4pe91Jg-Yr7sae3zBOJ#scrollTo=ztVifsYAmnRr) Ignite supports W\&B handler to log metrics, model/optimizer parameters, gradients during training and validation. It can also be used to log model checkpoints to the W\&B cloud. This class is also a wrapper for the wandb module. This means that you can call any wandb function using this wrapper. See examples on how to save model parameters and gradients. ## Basic setup ```python theme={null} from argparse import ArgumentParser import wandb import torch from torch import nn from torch.optim import SGD from torch.utils.data import DataLoader import torch.nn.functional as F from torchvision.transforms import Compose, ToTensor, Normalize from torchvision.datasets import MNIST from ignite.engine import Events, create_supervised_trainer, create_supervised_evaluator from ignite.metrics import Accuracy, Loss from tqdm import tqdm class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) self.conv2_drop = nn.Dropout2d() self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.log_softmax(x, dim=-1) def get_data_loaders(train_batch_size, val_batch_size): data_transform = Compose([ToTensor(), Normalize((0.1307,), (0.3081,))]) train_loader = DataLoader(MNIST(download=True, root=".", transform=data_transform, train=True), batch_size=train_batch_size, shuffle=True) val_loader = DataLoader(MNIST(download=False, root=".", transform=data_transform, train=False), batch_size=val_batch_size, shuffle=False) return train_loader, val_loader ``` Using `WandBLogger` in ignite is a modular process. First, you create a `WandBLogger` object. Next, you attach it to a trainer or evaluator to automatically log the metrics. This example shows: * Logs training loss, attached to the trainer object. * Logs validation loss, attached to the evaluator. * Logs optional Parameters, such as learning rate. * Watches the model. ```python theme={null} from ignite.contrib.handlers.wandb_logger import * def run(train_batch_size, val_batch_size, epochs, lr, momentum, log_interval): train_loader, val_loader = get_data_loaders(train_batch_size, val_batch_size) model = Net() device = 'cpu' if torch.cuda.is_available(): device = 'cuda' optimizer = SGD(model.parameters(), lr=lr, momentum=momentum) trainer = create_supervised_trainer(model, optimizer, F.nll_loss, device=device) evaluator = create_supervised_evaluator(model, metrics={'accuracy': Accuracy(), 'nll': Loss(F.nll_loss)}, device=device) desc = "ITERATION - loss: {:.2f}" pbar = tqdm( initial=0, leave=False, total=len(train_loader), desc=desc.format(0) ) #WandBlogger Object Creation wandb_logger = WandBLogger( project="pytorch-ignite-integration", name="cnn-mnist", config={"max_epochs": epochs,"batch_size":train_batch_size}, tags=["pytorch-ignite", "mninst"] ) wandb_logger.attach_output_handler( trainer, event_name=Events.ITERATION_COMPLETED, tag="training", output_transform=lambda loss: {"loss": loss} ) wandb_logger.attach_output_handler( evaluator, event_name=Events.EPOCH_COMPLETED, tag="training", metric_names=["nll", "accuracy"], global_step_transform=lambda *_: trainer.state.iteration, ) wandb_logger.attach_opt_params_handler( trainer, event_name=Events.ITERATION_STARTED, optimizer=optimizer, param_name='lr' # optional ) wandb_logger.watch(model) ``` You can optionally utilize ignite `EVENTS` to log the metrics directly to the terminal ```python theme={null} @trainer.on(Events.ITERATION_COMPLETED(every=log_interval)) def log_training_loss(engine): pbar.desc = desc.format(engine.state.output) pbar.update(log_interval) @trainer.on(Events.EPOCH_COMPLETED) def log_training_results(engine): pbar.refresh() evaluator.run(train_loader) metrics = evaluator.state.metrics avg_accuracy = metrics['accuracy'] avg_nll = metrics['nll'] tqdm.write( "Training Results - Epoch: {} Avg accuracy: {:.2f} Avg loss: {:.2f}" .format(engine.state.epoch, avg_accuracy, avg_nll) ) @trainer.on(Events.EPOCH_COMPLETED) def log_validation_results(engine): evaluator.run(val_loader) metrics = evaluator.state.metrics avg_accuracy = metrics['accuracy'] avg_nll = metrics['nll'] tqdm.write( "Validation Results - Epoch: {} Avg accuracy: {:.2f} Avg loss: {:.2f}" .format(engine.state.epoch, avg_accuracy, avg_nll)) pbar.n = pbar.last_print_n = 0 trainer.run(train_loader, max_epochs=epochs) pbar.close() if __name__ == "__main__": parser = ArgumentParser() parser.add_argument('--batch_size', type=int, default=64, help='input batch size for training (default: 64)') parser.add_argument('--val_batch_size', type=int, default=1000, help='input batch size for validation (default: 1000)') parser.add_argument('--epochs', type=int, default=10, help='number of epochs to train (default: 10)') parser.add_argument('--lr', type=float, default=0.01, help='learning rate (default: 0.01)') parser.add_argument('--momentum', type=float, default=0.5, help='SGD momentum (default: 0.5)') parser.add_argument('--log_interval', type=int, default=10, help='how many batches to wait before logging training status') args = parser.parse_args() run(args.batch_size, args.val_batch_size, args.epochs, args.lr, args.momentum, args.log_interval) ``` This code generates these visualizations:: PyTorch Ignite training dashboard PyTorch Ignite performance PyTorch Ignite hyperparameter tuning results PyTorch Ignite model comparison dashboard Refer to the [Ignite Docs](https://pytorch.org/ignite/contrib/handlers.html#module-ignite.contrib.handlers.wandb_logger) for more details. # Keras Source: https://docs.wandb.ai/models/integrations/keras Use W&B Keras callbacks to track experiments, checkpoint models, and visualize predictions during training. Use Keras callbacks to track experiments, log model checkpoints, and visualize model predictions. Keras callbacks are available in the `wandb.integration.keras` module with Pyhon SDK versions `0.13.4` and above. W\&B Keras integration provides the following callbacks: * **`WandbMetricsLogger`** : Use this callback for [Experiment Tracking](/models/track/). It logs your training and validation metrics along with system metrics to W\&B. * **`WandbModelCheckpoint`** : Use this callback to log your model checkpoints to W\&B [Artifacts](/models/artifacts/). * **`WandbEvalCallback`**: This base callback logs model predictions to W\&B [Tables](/models/tables/) for interactive visualization. ## Install and import Keras integration Install the latest version of W\&B. ```bash theme={null} pip install -U wandb ``` To use the Keras integration, import required classes from `wandb.integration.keras`: ```python theme={null} import wandb from wandb.integration.keras import WandbMetricsLogger, WandbModelCheckpoint, WandbEvalCallback ``` The following sections describe each callback in detail with code examples. ## Track experiments with `WandbMetricsLogger` `wandb.integration.keras.WandbMetricsLogger()` automatically logs Keras' `logs` dictionary that callback methods such as `on_epoch_end`, `on_batch_end` etc, take as an argument. The partial example below shows how to use `WandbMetricsLogger()` in a Keras workflow. First, compile the model with desired optimizer, loss function, and metrics. Then, initialize a W\&B run using `wandb.init()`. Finally, pass the `WandbMetricsLogger()` callback to `model.fit()`. ```python theme={null} import wandb from wandb.integration.keras import WandbMetricsLogger import tensorflow as tf model.compile( optimizer = "adam", loss = "categorical_crossentropy", metrics = ["accuracy", tf.keras.metrics.TopKCategoricalAccuracy(k=5, name='top@5_accuracy')] ) # Initialize a new W&B Run with wandb.init(config={"batch_size": 64}) as run: # Pass the WandbMetricsLogger to model.fit model.fit( X_train, y_train, validation_data=(X_test, y_test), callbacks=[WandbMetricsLogger()] ) ``` The previous example logs training and validation metrics such as `loss`, `accuracy`, and `top@5_accuracy` to W\&B at the end of each epoch. It also logs: ### `WandbMetricsLogger` reference | Parameter | Description | | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `log_freq` | (`epoch`, `batch`, or an `int`): if `epoch`, logs metrics at the end of each epoch. If `batch`, logs metrics at the end of each batch. If an `int`, logs metrics at the end of that many batches. Defaults to `epoch`. | | `initial_global_step` | (int): Use this argument to correctly log the learning rate when you resume training from some initial\_epoch, and a learning rate scheduler is used. This can be computed as step\_size \* initial\_step. Defaults to 0. | ## Checkpoint a model using `WandbModelCheckpoint` Use `WandbModelCheckpoint` callback to save the Keras model (`SavedModel` format) or model weights periodically and uploads them to W\&B as a `wandb.Artifact` for model versioning. This callback is subclassed from [`tf.keras.callbacks.ModelCheckpoint()`](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint) ,thus the checkpointing logic is taken care of by the parent callback. This callback saves: * The model that has achieved best performance based on the monitor. * The model at the end of every epoch regardless of the performance. * The model at the end of the epoch or after a fixed number of training batches. * Only model weights or the whole model. * The model either in `SavedModel` format or in `.h5` format. Use this callback in conjunction with `WandbMetricsLogger()`. ```python theme={null} import wandb from wandb.integration.keras import WandbMetricsLogger, WandbModelCheckpoint # Initialize a new W&B Run with wandb.init(config={"bs": 12}) as run: # Pass the WandbModelCheckpoint to model.fit model.fit( X_train, y_train, validation_data=(X_test, y_test), callbacks=[ WandbMetricsLogger(), WandbModelCheckpoint("models"), ], ) ``` ### `WandbModelCheckpoint` reference | Parameter | Description | | | ------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - | | `filepath` | (str): path to save the mode file. | | | `monitor` | (str): The metric name to monitor. | | | `verbose` | (int): Verbosity mode, 0 or 1. Mode 0 is silent, and mode 1 displays messages when the callback takes an action. | | | `save_best_only` | (Boolean): if `save_best_only=True`, it only saves the latest model or the model it considers the best, according to the defined by the `monitor` and `mode` attributes. | | | `save_weights_only` | (Boolean): if True, saves only the model's weights. | | | `mode` | (`auto`, `min`, or `max`): For `val_acc`, set it to `max`, for `val_loss`, set it to `min`, and so on | | | `save_freq` | ("epoch" or int): When using ‘epoch’, the callback saves the model after each epoch. When using an integer, the callback saves the model at end of this many batches. Note that when monitoring validation metrics such as `val_acc` or `val_loss`, `save_freq` must be set to "epoch" as those metrics are only available at the end of an epoch. | | | `options` | (str): Optional `tf.train.CheckpointOptions` object if `save_weights_only` is true or optional `tf.saved_model.SaveOptions` object if `save_weights_only` is false. | | | `initial_value_threshold` | (float): Floating point initial "best" value of the metric to be monitored. | | ### Log checkpoints after N epochs By default (`save_freq="epoch"`), the callback creates a checkpoint and uploads it as an artifact after each epoch. To create a checkpoint after a specific number of batches, set `save_freq` to an integer. To checkpoint after `N` epochs, compute the cardinality of the `train` dataloader and pass it to `save_freq`: ```python theme={null} WandbModelCheckpoint( filepath="models/", save_freq=int((trainloader.cardinality()*N).numpy()) ) ``` ### Efficiently log checkpoints on a TPU architecture While checkpointing on TPUs you might encounter `UnimplementedError: File system scheme '[local]' not implemented` error message. This happens because the model directory (`filepath`) must use a cloud storage bucket path (`gs://bucket-name/...`), and this bucket must be accessible from the TPU server. Instead, W\&B uses the local path for checkpointing which in turn is uploaded as an artifact. ```python theme={null} checkpoint_options = tf.saved_model.SaveOptions(experimental_io_device="/job:localhost") WandbModelCheckpoint( filepath="models/, options=checkpoint_options, ) ``` ## Visualize model predictions using `WandbEvalCallback` The `WandbEvalCallback()` is an abstract base class to build Keras callbacks primarily for model prediction and, secondarily, dataset visualization. This abstract callback is agnostic with respect to the dataset and the task. To use this, inherit from this base `WandbEvalCallback()` callback class and implement the `add_ground_truth` and `add_model_prediction` methods. The `WandbEvalCallback()` is a utility class that provides methods to: * Create data and prediction `wandb.Table()` instances. * Log data and prediction Tables as `wandb.Artifact()`. * Log the data table `on_train_begin`. * log the prediction table `on_epoch_end`. The following example uses `WandbClfEvalCallback` for an image classification task. This example callback logs the validation data (`data_table`) to W\&B, performs inference, and logs the prediction (`pred_table`) to W\&B at the end of every epoch. ```python theme={null} import wandb from wandb.integration.keras import WandbMetricsLogger, WandbEvalCallback # Implement your model prediction visualization callback class WandbClfEvalCallback(WandbEvalCallback): def __init__( self, validation_data, data_table_columns, pred_table_columns, num_samples=100 ): super().__init__(data_table_columns, pred_table_columns) self.x = validation_data[0] self.y = validation_data[1] def add_ground_truth(self, logs=None): for idx, (image, label) in enumerate(zip(self.x, self.y)): self.data_table.add_data(idx, wandb.Image(image), label) def add_model_predictions(self, epoch, logs=None): preds = self.model.predict(self.x, verbose=0) preds = tf.argmax(preds, axis=-1) table_idxs = self.data_table_ref.get_index() for idx in table_idxs: pred = preds[idx] self.pred_table.add_data( epoch, self.data_table_ref.data[idx][0], self.data_table_ref.data[idx][1], self.data_table_ref.data[idx][2], pred, ) # ... # Initialize a new W&B Run with wandb.init(config={"hyper": "parameter"}) as run: # Add the Callbacks to Model.fit model.fit( X_train, y_train, validation_data=(X_test, y_test), callbacks=[ WandbMetricsLogger(), WandbClfEvalCallback( validation_data=(X_test, y_test), data_table_columns=["idx", "image", "label"], pred_table_columns=["epoch", "idx", "image", "label", "pred"], ), ], ) ``` ### `WandbEvalCallback` reference | Parameter | Description | | -------------------- | ------------------------------------------------ | | `data_table_columns` | (list) List of column names for the `data_table` | | `pred_table_columns` | (list) List of column names for the `pred_table` | ### Memory footprint details We log the `data_table` to W\&B when the `on_train_begin` method is invoked. Once it's uploaded as a W\&B Artifact, we get a reference to this table which can be accessed using `data_table_ref` class variable. The `data_table_ref` is a 2D list that can be indexed like `self.data_table_ref[idx][n]`, where `idx` is the row number while `n` is the column number. Let's see the usage in the example below. ### Customize the callback You can override the `on_train_begin` or `on_epoch_end` methods to have more fine-grained control. If you want to log the samples after `N` batches, you can implement `on_train_batch_end` method. If you are implementing a callback for model prediction visualization by inheriting `WandbEvalCallback` and something needs to be clarified or fixed, let us know by opening an [issue](https://github.com/wandb/wandb/issues). ## `WandbCallback` \[legacy] Use the W\&B library `WandbCallback()` Class to automatically save all the metrics and the loss values tracked in `model.fit()`. ```python theme={null} import wandb from wandb.integration.keras import WandbCallback with wandb.init(config={"hyper": "parameter"}) as run: # code to set up your model in Keras # Pass the callback to model.fit model.fit( X_train, y_train, validation_data=(X_test, y_test), callbacks=[WandbCallback()] ) ``` You can watch the short video [Get Started with Keras and W\&B in Less Than a Minute](https://www.youtube.com/watch?ab_channel=Weights\&Biases\&v=4FjDIJ-vO_M). For a more detailed video, watch [Integrate W\&B with Keras](https://www.youtube.com/watch?v=Bsudo7jbMow\&ab_channel=Weights%26Biases). You can review the [Colab Jupyter Notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/keras/Keras_pipeline_with_Weights_and_Biases.ipynb). See our [example repo](https://github.com/wandb/examples) for scripts, including a [Fashion MNIST example](https://github.com/wandb/examples/blob/master/examples/keras/keras-cnn-fashion/train.py) and the [W\&B Dashboard](https://wandb.ai/wandb/keras-fashion-mnist/runs/5z1d85qs) it generates. The `WandbCallback` class supports a wide variety of logging configuration options: specifying a metric to monitor, tracking of weights and gradients, logging of predictions on training\_data and validation\_data, and more. Check out the reference documentation for the `keras.WandbCallback` for full details. The `WandbCallback` * Automatically logs history data from any metrics collected by Keras: loss and anything passed into `keras_model.compile()`. * Sets summary metrics for the run associated with the "best" training step, as defined by the `monitor` and `mode` attributes. This defaults to the epoch with the minimum `val_loss`. `WandbCallback` by default saves the model associated with the best `epoch`. * Optionally logs gradient and parameter histogram. * Optionally saves training and validation data for wandb to visualize. ### `WandbCallback` reference | Arguments | | | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `monitor` | (str) name of metric to monitor. Defaults to `val_loss`. | | `mode` | (str) one of `{`auto`, `min`, `max`}`. `min` - save model when monitor is minimized `max` - save model when monitor is maximized `auto` - try to guess when to save the model (default). | | `save_model` | True - save a model when monitor beats all previous epochs False - don't save models | | `save_graph` | (boolean) if True save model graph to wandb (default to True). | | `save_weights_only` | (boolean) if True, saves only the model's weights(`model.save_weights(filepath)`). Otherwise, saves the full model). | | `log_weights` | (boolean) if True save histograms of the model's layer's weights. | | `log_gradients` | (boolean) if True log histograms of the training gradients | | `training_data` | (tuple) Same format `(X,y)` as passed to `model.fit`. This is needed for calculating gradients - this is mandatory if `log_gradients` is `True`. | | `validation_data` | (tuple) Same format `(X,y)` as passed to `model.fit`. A set of data for wandb to visualize. If you set this field, every epoch, wandb makes a small number of predictions and saves the results for later visualization. | | `generator` | (generator) a generator that returns validation data for wandb to visualize. This generator should return tuples `(X,y)`. Either `validate_data` or generator should be set for wandb to visualize specific data examples. | | `validation_steps` | (int) if `validation_data` is a generator, how many steps to run the generator for the full validation set. | | `labels` | (list) If you are visualizing your data with wandb this list of labels converts numeric output to understandable string if you are building a classifier with multiple classes. For a binary classifier, you can pass in a list of two labels \[`label for false`, `label for true`]. If `validate_data` and `generator` are both false, this does nothing. | | `predictions` | (int) the number of predictions to make for visualization each epoch, max is 100. | | `input_type` | (string) type of the model input to help visualization. can be one of: (`image`, `images`, `segmentation_mask`). | | `output_type` | (string) type of the model output to help visualziation. can be one of: (`image`, `images`, `segmentation_mask`). | | `log_evaluation` | (boolean) if True, save a Table containing validation data and the model's predictions at each epoch. See `validation_indexes`, `validation_row_processor`, and `output_row_processor` for additional details. | | `class_colors` | (\[float, float, float]) if the input or output is a segmentation mask, an array containing an rgb tuple (range 0-1) for each class. | | `log_batch_frequency` | (integer) if None, callback logs every epoch. If set to integer, callback logs training metrics every `log_batch_frequency` batches. | | `log_best_prefix` | (string) if None, saves no extra summary metrics. If set to a string, prepends the monitored metric and epoch with the prefix and saves the results as summary metrics. | | `validation_indexes` | (\[wandb.data\_types.\_TableLinkMixin]) an ordered list of index keys to associate with each validation example. If `log_evaluation` is True and you provide `validation_indexes`, does not create a Table of validation data. Instead, associates each prediction with the row represented by the `TableLinkMixin`. To obtain a list of row keys, use `Table.get_index() `. | | `validation_row_processor` | (Callable) a function to apply to the validation data, commonly used to visualize the data. The function receives an `ndx` (int) and a `row` (dict). If your model has a single input, then `row["input"]` contains the input data for the row. Otherwise, it contains the names of the input slots. If your fit function takes a single target, then `row["target"]` contains the target data for the row. Otherwise, it contains the names of the output slots. For example, if your input data is a single array, to visualize the data as an Image, provide `lambda ndx, row: {"img": wandb.Image(row["input"])}` as the processor. Ignored if `log_evaluation` is False or `validation_indexes` are present. | | `output_row_processor` | (Callable) same as `validation_row_processor`, but applied to the model's output. `row["output"]` contains the results of the model output. | | `infer_missing_processors` | (Boolean) Determines whether to infer `validation_row_processor` and `output_row_processor` if they are missing. Defaults to True. If you provide `labels`, W\&B attempts to infer classification-type processors where appropriate. | | `log_evaluation_frequency` | (int) Determines how often to log evaluation results. Defaults to `0` to log only at the end of training. Set to 1 to log every epoch, 2 to log every other epoch, and so on. Has no effect when `log_evaluation` is False. | ## Frequently asked questions ### How do I use `Keras` multiprocessing with `wandb`? When setting `use_multiprocessing=True`, this error may occur: ```python theme={null} Error("You must call wandb.init() before wandb.config.batch_size") ``` To work around it: 1. In the `Sequence` class construction, add: `wandb.init(group='...')`. 2. In `main`, make sure you're using `if __name__ == "__main__":` and put the rest of your script logic inside it. # PyTorch Lightning Source: https://docs.wandb.ai/models/integrations/lightning Use W&B with PyTorch Lightning through the built-in WandbLogger for experiment tracking and model checkpointing. PyTorch Lightning provides a lightweight wrapper for organizing your PyTorch code and easily adding advanced features such as distributed training and 16-bit precision. W\&B provides a lightweight wrapper for logging your ML experiments. But you don't need to combine the two yourself: W\&B is incorporated directly into the PyTorch Lightning library via the [`WandbLogger`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb). ## Integrate with Lightning ```python theme={null} from lightning.pytorch.loggers import WandbLogger from lightning.pytorch import Trainer wandb_logger = WandbLogger(log_model="all") trainer = Trainer(logger=wandb_logger) ``` **Using wandb.log():** The `WandbLogger` logs to W\&B using the Trainer's `global_step`. If you make additional calls to `wandb.log()` directly in your code, **do not** use the `step` argument in `wandb.log()`. Instead, log the Trainer's `global_step` like your other metrics: ```python theme={null} wandb.log({"accuracy":0.99, "trainer/global_step": step}) ``` ```python theme={null} import lightning as L from wandb.integration.lightning.fabric import WandbLogger wandb_logger = WandbLogger(log_model="all") fabric = L.Fabric(loggers=[wandb_logger]) fabric.launch() fabric.log_dict({"important_metric": important_metric}) ``` Interactive dashboards ### Sign up and create an API key An API key authenticates your machine to W\&B. You can generate an API key from your user profile. For a more streamlined approach, create an API key by going directly to [User Settings](https://wandb.ai/settings). Copy the newly created API key immediately and save it in a secure location such as a password manager. 1. Click your user profile icon in the upper right corner. 2. Select **User Settings**, then scroll to the **API Keys** section. ### Install the `wandb` library and log in To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash theme={null} export WANDB_API_KEY= ``` 2. Install the `wandb` library and log in. ```shell theme={null} pip install wandb wandb login ``` ```bash theme={null} pip install wandb ``` ```python theme={null} import wandb wandb.login() ``` ```notebook theme={null} !pip install wandb import wandb wandb.login() ``` ## Use PyTorch Lightning's `WandbLogger` PyTorch Lightning has multiple `WandbLogger` classes to log metrics and model weights, media, and more. * [`PyTorch`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) * [`Fabric`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) To integrate with Lightning, instantiate the `WandbLogger` and pass it to Lightning's `Trainer` or `Fabric`. ```python theme={null} trainer = Trainer(logger=wandb_logger) ``` ```python theme={null} fabric = L.Fabric(loggers=[wandb_logger]) fabric.launch() fabric.log_dict({ "important_metric": important_metric }) ``` ### Common logger arguments Below are some of the most used parameters in `WandbLogger`. Review the PyTorch Lightning documentation for details about all logger arguments. * [`PyTorch`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) * [`Fabric`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.loggers.wandb.html#module-lightning.pytorch.loggers.wandb) | Parameter | Description | | ----------- | ----------------------------------------------------------------------------- | | `project` | Define what wandb Project to log to | | `name` | Give a name to your wandb run | | `log_model` | Log all models if `log_model="all"` or at end of training if `log_model=True` | | `save_dir` | Path where data is saved | ## Log your hyperparameters ```python theme={null} class LitModule(LightningModule): def __init__(self, *args, **kwarg): self.save_hyperparameters() ``` ```python theme={null} wandb_logger.log_hyperparams( { "hyperparameter_1": hyperparameter_1, "hyperparameter_2": hyperparameter_2, } ) ``` ## Log additional config parameters ```python theme={null} # add one parameter wandb_logger.experiment.config["key"] = value # add multiple parameters wandb_logger.experiment.config.update({key1: val1, key2: val2}) # use directly wandb module wandb.config["key"] = value wandb.config.update() ``` ## Log gradients, parameter histogram and model topology You can pass your model object to `wandblogger.watch()` to monitor your models's gradients and parameters as you train. See the PyTorch Lightning `WandbLogger` documentation ## Log metrics You can log your metrics to W\&B when using the `WandbLogger` by calling `self.log('my_metric_name', metric_vale)` within your `LightningModule`, such as in your `training_step` or `validation_step methods.` The code snippet below shows how to define your `LightningModule` to log your metrics and your `LightningModule` hyperparameters. This example uses the [`torchmetrics`](https://github.com/Lightning-AI/torchmetrics) library to calculate your metrics ```python theme={null} import torch from torch.nn import Linear, CrossEntropyLoss, functional as F from torch.optim import Adam from torchmetrics.functional import accuracy from lightning.pytorch import LightningModule class My_LitModule(LightningModule): def __init__(self, n_classes=10, n_layer_1=128, n_layer_2=256, lr=1e-3): """method used to define the model parameters""" super().__init__() # mnist images are (1, 28, 28) (channels, width, height) self.layer_1 = Linear(28 * 28, n_layer_1) self.layer_2 = Linear(n_layer_1, n_layer_2) self.layer_3 = Linear(n_layer_2, n_classes) self.loss = CrossEntropyLoss() self.lr = lr # save hyper-parameters to self.hparams (auto-logged by W&B) self.save_hyperparameters() def forward(self, x): """method used for inference input -> output""" # (b, 1, 28, 28) -> (b, 1*28*28) batch_size, channels, width, height = x.size() x = x.view(batch_size, -1) # let's do 3 x (linear + relu) x = F.relu(self.layer_1(x)) x = F.relu(self.layer_2(x)) x = self.layer_3(x) return x def training_step(self, batch, batch_idx): """needs to return a loss from a single batch""" _, loss, acc = self._get_preds_loss_accuracy(batch) # Log loss and metric self.log("train_loss", loss) self.log("train_accuracy", acc) return loss def validation_step(self, batch, batch_idx): """used for logging metrics""" preds, loss, acc = self._get_preds_loss_accuracy(batch) # Log loss and metric self.log("val_loss", loss) self.log("val_accuracy", acc) return preds def configure_optimizers(self): """defines model optimizer""" return Adam(self.parameters(), lr=self.lr) def _get_preds_loss_accuracy(self, batch): """convenience function since train/valid/test steps are similar""" x, y = batch logits = self(x) preds = torch.argmax(logits, dim=1) loss = self.loss(logits, y) acc = accuracy(preds, y) return preds, loss, acc ``` ```python theme={null} import lightning as L import torch import torchvision as tv from wandb.integration.lightning.fabric import WandbLogger import wandb fabric = L.Fabric(loggers=[wandb_logger]) fabric.launch() model = tv.models.resnet18() optimizer = torch.optim.SGD(model.parameters(), lr=lr) model, optimizer = fabric.setup(model, optimizer) train_dataloader = fabric.setup_dataloaders( torch.utils.data.DataLoader(train_dataset, batch_size=batch_size) ) model.train() for epoch in range(num_epochs): for batch in train_dataloader: optimizer.zero_grad() loss = model(batch) loss.backward() optimizer.step() fabric.log_dict({"loss": loss}) ``` ## Log the min/max of a metric Using wandb's [`define_metric`](/models/ref/python/experiments/run#define_metric) function you can define whether you'd like your W\&B summary metric to display the min, max, mean or best value for that metric. If `define`\_`metric` \_ isn't used, then the last value logged with appear in your summary metrics. See the `define_metric` [reference docs here](/models/ref/python/experiments/run#define_metric) and the [guide here](/models/track/log/customize-logging-axes/) for more. To tell W\&B to keep track of the max validation accuracy in the W\&B summary metric, call `wandb.define_metric()` only once, at the beginning of training: ```python theme={null} class My_LitModule(LightningModule): ... def validation_step(self, batch, batch_idx): if trainer.global_step == 0: wandb.define_metric("val_accuracy", summary="max") preds, loss, acc = self._get_preds_loss_accuracy(batch) # Log loss and metric self.log("val_loss", loss) self.log("val_accuracy", acc) return preds ``` ```python theme={null} wandb.define_metric("val_accuracy", summary="max") fabric = L.Fabric(loggers=[wandb_logger]) fabric.launch() fabric.log_dict({"val_accuracy": val_accuracy}) ``` ## Checkpoint a model To save model checkpoints as W\&B [Artifacts](/models/artifacts/), use the Lightning [`ModelCheckpoint`](https://lightning.ai/docs/pytorch/stable/api/lightning.pytorch.callbacks.ModelCheckpoint.html) callback and set the `log_model` argument in the `WandbLogger`. ```python theme={null} trainer = Trainer(logger=wandb_logger, callbacks=[checkpoint_callback]) ``` ```python theme={null} fabric = L.Fabric(loggers=[wandb_logger], callbacks=[checkpoint_callback]) ``` The *latest* and *best* aliases are automatically set to easily retrieve a model checkpoint from a W\&B [Artifact](/models/artifacts/): ```python theme={null} # reference can be retrieved in artifacts panel # "VERSION" can be a version (ex: "v2") or an alias ("latest or "best") checkpoint_reference = "USER/PROJECT/MODEL-RUN_ID:VERSION" ``` ```python theme={null} # download checkpoint locally (if not already cached) wandb_logger.download_artifact(checkpoint_reference, artifact_type="model") ``` ```python theme={null} # download checkpoint locally (if not already cached) run = wandb.init(project="MNIST") artifact = run.use_artifact(checkpoint_reference, type="model") artifact_dir = artifact.download() ``` ```python theme={null} # load checkpoint model = LitModule.load_from_checkpoint(Path(artifact_dir) / "model.ckpt") ``` ```python theme={null} # Request the raw checkpoint full_checkpoint = fabric.load(Path(artifact_dir) / "model.ckpt") model.load_state_dict(full_checkpoint["model"]) optimizer.load_state_dict(full_checkpoint["optimizer"]) ``` The model checkpoints you log are viewable through the [W\&B Artifacts](/models/artifacts/) UI, and include the full model lineage (see an example model checkpoint in the UI [here](https://wandb.ai/wandb/arttest/artifacts/model/iv3_trained/5334ab69740f9dda4fed/lineage?_gl=1*yyql5q*_ga*MTQxOTYyNzExOS4xNjg0NDYyNzk1*_ga_JH1SJHJQXJ*MTY5MjMwNzI2Mi4yNjkuMS4xNjkyMzA5NjM2LjM3LjAuMA..)). To bookmark your best model checkpoints and centralize them across your team, you can link them to the [W\&B Model Registry](/models). Here you can organize your best models by task, manage model lifecycle, facilitate easy tracking and auditing throughout the ML lifecycle, and [automate](/models/automations/) downstream actions with webhooks or jobs. ## Log images, text, and more The `WandbLogger` has `log_image`, `log_text` and `log_table` methods for logging media. You can also directly call `wandb.log()` or `trainer.logger.experiment.log()` to log other media types such as Audio, Molecules, Point Clouds, 3D Objects and more. ```python theme={null} # using tensors, numpy arrays or PIL images wandb_logger.log_image(key="samples", images=[img1, img2]) # adding captions wandb_logger.log_image(key="samples", images=[img1, img2], caption=["tree", "person"]) # using file path wandb_logger.log_image(key="samples", images=["img_1.jpg", "img_2.jpg"]) # using .log in the trainer trainer.logger.experiment.log( {"samples": [wandb.Image(img, caption=caption) for (img, caption) in my_images]}, step=current_trainer_global_step, ) ``` ```python theme={null} # data should be a list of lists columns = ["input", "label", "prediction"] my_data = [["cheese", "english", "english"], ["fromage", "french", "spanish"]] # using columns and data wandb_logger.log_text(key="my_samples", columns=columns, data=my_data) # using a pandas DataFrame wandb_logger.log_text(key="my_samples", dataframe=my_dataframe) ``` ```python theme={null} # log a W&B Table that has a text caption, an image and audio columns = ["caption", "image", "sound"] # data should be a list of lists my_data = [ ["cheese", wandb.Image(img_1), wandb.Audio(snd_1)], ["wine", wandb.Image(img_2), wandb.Audio(snd_2)], ] # log the Table wandb_logger.log_table(key="my_samples", columns=columns, data=data) ``` You can use Lightning's Callbacks system to control when you log to W\&B via the `WandbLogger`, in this example we log a sample of our validation images and predictions: ```python theme={null} import torch import wandb import lightning.pytorch as pl from lightning.pytorch.loggers import WandbLogger # or # from wandb.integration.lightning.fabric import WandbLogger class LogPredictionSamplesCallback(Callback): def on_validation_batch_end( self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx ): """Called when the validation batch ends.""" # `outputs` comes from `LightningModule.validation_step` # which corresponds to our model predictions in this case # Let's log 20 sample image predictions from the first batch if batch_idx == 0: n = 20 x, y = batch images = [img for img in x[:n]] captions = [ f"Ground Truth: {y_i} - Prediction: {y_pred}" for y_i, y_pred in zip(y[:n], outputs[:n]) ] # Option 1: log images with `WandbLogger.log_image` wandb_logger.log_image(key="sample_images", images=images, caption=captions) # Option 2: log images and predictions as a W&B Table columns = ["image", "ground truth", "prediction"] data = [ [wandb.Image(x_i), y_i, y_pred] or x_i, y_i, y_pred in list(zip(x[:n], y[:n], outputs[:n])), ] wandb_logger.log_table(key="sample_table", columns=columns, data=data) trainer = pl.Trainer(callbacks=[LogPredictionSamplesCallback()]) ``` ## Use multiple GPUs with Lightning and W\&B PyTorch Lightning has Multi-GPU support through their DDP Interface. However, PyTorch Lightning's design requires you to be careful about how you instantiate our GPUs. Lightning assumes that each GPU (or Rank) in your training loop must be instantiated in exactly the same way - with the same initial conditions. However, only rank 0 process gets access to the `wandb.run` object, and for non-zero rank processes: `wandb.run = None`. This could cause your non-zero processes to fail. Such a situation can put you in a **deadlock** because rank 0 process will wait for the non-zero rank processes to join, which have already crashed. For this reason, be careful about how we set up your training code. The recommended way to set it up would be to have your code be independent of the `wandb.run` object. ```python theme={null} class MNISTClassifier(pl.LightningModule): def __init__(self): super(MNISTClassifier, self).__init__() self.model = nn.Sequential( nn.Flatten(), nn.Linear(28 * 28, 128), nn.ReLU(), nn.Linear(128, 10), ) self.loss = nn.CrossEntropyLoss() def forward(self, x): return self.model(x) def training_step(self, batch, batch_idx): x, y = batch y_hat = self.forward(x) loss = self.loss(y_hat, y) self.log("train/loss", loss) return {"train_loss": loss} def validation_step(self, batch, batch_idx): x, y = batch y_hat = self.forward(x) loss = self.loss(y_hat, y) self.log("val/loss", loss) return {"val_loss": loss} def configure_optimizers(self): return torch.optim.Adam(self.parameters(), lr=0.001) def main(): # Setting all the random seeds to the same value. # This is important in a distributed training setting. # Each rank will get its own set of initial weights. # If they don't match up, the gradients will not match either, # leading to training that may not converge. pl.seed_everything(1) train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4) val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False, num_workers=4) model = MNISTClassifier() wandb_logger = WandbLogger(project="") callbacks = [ ModelCheckpoint( dirpath="checkpoints", every_n_train_steps=100, ), ] trainer = pl.Trainer( max_epochs=3, gpus=2, logger=wandb_logger, strategy="ddp", callbacks=callbacks ) trainer.fit(model, train_loader, val_loader) ``` ## Examples You can follow along in a [video tutorial with a Colab notebook](https://wandb.me/lit-colab). ## Frequently asked questions ### How does W\&B integrate with Lightning? The core integration is based on the [Lightning `loggers` API](https://lightning.ai/docs/pytorch/stable/extensions/logging.html), which lets you write much of your logging code in a framework-agnostic way. `Logger`s are passed to the [Lightning `Trainer`](https://lightning.ai/docs/pytorch/stable/common/trainer.html) and are triggered based on that API's rich [hook-and-callback system](https://lightning.ai/docs/pytorch/stable/extensions/callbacks.html). This keeps your research code well-separated from engineering and logging code. ### What does the integration log without any additional code? We'll save your model checkpoints to W\&B, where you can view them or download them for use in future runs. We'll also capture [system metrics](/models/ref/python/experiments/system-metrics), like GPU usage and network I/O, environment information, like hardware and OS information, [code state](/models/app/features/panels/code/) (including git commit and diff patch, notebook contents and session history), and anything printed to the standard out. ### What if I need to use `wandb.run` in my training setup? You need to expand the scope of the variable you need to access yourself. In other words, make sure that the initial conditions are the same on all processes. ```python theme={null} if os.environ.get("LOCAL_RANK", None) is None: os.environ["WANDB_DIR"] = wandb.run.dir ``` If they are, you can use `os.environ["WANDB_DIR"]` to set up the model checkpoints directory. This way, any non-zero rank process can access `wandb.run.dir`. # OpenAI Gym Source: https://docs.wandb.ai/models/integrations/openai-gym Integrate W&B with OpenAI Gym to track reinforcement learning experiments and record episode performance videos. "The team that has been maintaining Gym since 2021 has moved all future development to [Gymnasium](https://github.com/Farama-Foundation/Gymnasium), a drop in replacement for Gym (import gymnasium as gym), and Gym will not be receiving any future updates." ([Source](https://github.com/openai/gym#the-team-that-has-been-maintaining-gym-since-2021-has-moved-all-future-development-to-gymnasium-a-drop-in-replacement-for-gym-import-gymnasium-as-gym-and-gym-will-not-be-receiving-any-future-updates-please-switch-over-to-gymnasium-as-soon-as-youre-able-to-do-so-if-youd-like-to-read-more-about-the-story-behind-this-switch-please-check-out-this-blog-post)) Since Gym is no longer an actively maintained project, try out our integration with Gymnasium. If you're using [OpenAI Gym](https://github.com/openai/gym), W\&B automatically logs videos of your environment generated by `gym.wrappers.Monitor`. Just set the `monitor_gym` keyword argument to [`wandb.init()`](/models/ref/python/functions/init) to `True` or call `wandb.gym.monitor()`. Our gym integration is very light. We simply [look at the name of the video file](https://github.com/wandb/wandb/blob/master/wandb/integration/gym/__init__.py#L15) being logged from `gym` and name it after that or fall back to `"videos"` if we don't find a match. If you want more control, you can always just manually [log a video](/models/track/log/media/). The [OpenRL Benchmark](https://wandb.me/openrl-benchmark-report) by[ CleanRL](https://github.com/vwxyzjn/cleanrl) uses this integration for its OpenAI Gym examples. You can find source code (including [the specific code used for specific runs](https://wandb.ai/cleanrl/cleanrl.benchmark/runs/2jrqfugg/code?workspace=user-costa-huang)) that demonstrates how to use gym with OpenAI Gym dashboard # PyTorch Source: https://docs.wandb.ai/models/integrations/pytorch Integrate W&B with PyTorch for experiment tracking, dataset versioning, and logging of metrics, gradients, and models. Use [W\&B](https://wandb.ai) for machine learning experiment tracking, dataset versioning, and project collaboration. Benefits of using W&B ## What this notebook covers We show you how to integrate W\&B with your PyTorch code to add experiment tracking to your pipeline. PyTorch and W&B integration diagram ```python theme={null} # import the library import wandb # capture a dictionary of hyperparameters with config config = { "learning_rate": 0.001, "epochs": 100, "batch_size": 128 } # start a new experiment with wandb.init(project="new-sota-model", config=config) as run: # set up model and data model, dataloader = get_model(), get_data() # optional: track gradients run.watch(model) for batch in dataloader: metrics = model.training_step() # log metrics inside your training loop to visualize model performance run.log(metrics) # optional: save model at the end model.to_onnx() run.save("model.onnx") ``` Follow along with a [video tutorial](https://wandb.me/pytorch-video). **Note**: Sections starting with *Step* are all you need to integrate W\&B in an existing pipeline. The rest just loads data and defines a model. ## Install, import, and log in ```python theme={null} import os import random import numpy as np import torch import torch.nn as nn import torchvision import torchvision.transforms as transforms from tqdm.auto import tqdm # Ensure deterministic behavior torch.backends.cudnn.deterministic = True random.seed(hash("setting random seeds") % 2**32 - 1) np.random.seed(hash("improves reproducibility") % 2**32 - 1) torch.manual_seed(hash("by removing stochasticity") % 2**32 - 1) torch.cuda.manual_seed_all(hash("so runs are repeatable") % 2**32 - 1) # Device configuration device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # remove slow mirror from list of MNIST mirrors torchvision.datasets.MNIST.mirrors = [mirror for mirror in torchvision.datasets.MNIST.mirrors if not mirror.startswith("http://yann.lecun.com")] ``` ### Step 0: Install W\&B To get started, we'll need to get the library. `wandb` is easily installed using `pip`. ```python theme={null} !pip install wandb onnx -Uq ``` ### Step 1: Import W\&B and Login In order to log data to our web service, you'll need to log in. If this is your first time using W\&B, you'll need to sign up for a free account at the link that appears. ``` import wandb wandb.login() ``` ## Define the experiment and pipeline ### Track metadata and hyperparameters with `wandb.init()` Programmatically, the first thing we do is define our experiment: what are the hyperparameters? what metadata is associated with this run? It's a pretty common workflow to store this information in a `config` dictionary (or similar object) and then access it as needed. For this example, we're only letting a few hyperparameters vary and hand-coding the rest. But any part of your model can be part of the `config`. We also include some metadata: we're using the MNIST dataset and a convolutional architecture. If we later work with, say, fully connected architectures on CIFAR in the same project, this will help us separate our runs. ```python theme={null} config = dict( epochs=5, classes=10, kernels=[16, 32], batch_size=128, learning_rate=0.005, dataset="MNIST", architecture="CNN") ``` Now, let's define the overall pipeline, which is pretty typical for model-training: 1. we first `make` a model, plus associated data and optimizer, then 2. we `train` the model accordingly and finally 3. `test` it to see how training went. We'll implement these functions below. ```python theme={null} def model_pipeline(hyperparameters): # tell wandb to get started with wandb.init(project="pytorch-demo", config=hyperparameters) as run: # access all HPs through run.config, so logging matches execution. config = run.config # make the model, data, and optimization problem model, train_loader, test_loader, criterion, optimizer = make(config) print(model) # and use them to train the model train(model, train_loader, criterion, optimizer, config) # and test its final performance test(model, test_loader) return model ``` The only difference here from a standard pipeline is that it all occurs inside the context of `wandb.init()`. Calling this function sets up a line of communication between your code and our servers. Passing the `config` dictionary to `wandb.init()` immediately logs all that information to us, so you'll always know what hyperparameter values you set your experiment to use. To ensure the values you chose and logged are always the ones that get used in your model, we recommend using the `run.config` copy of your object. Check the definition of `make` below to see some examples. > *Side Note*: We take care to run our code in separate processes, > so that any issues on our end > (such as if a giant sea monster attacks our data centers) > don't crash your code. > Once the issue is resolved, such as when the Kraken returns to the deep, > you can log the data with `wandb sync`. ```python theme={null} def make(config): # Make the data train, test = get_data(train=True), get_data(train=False) train_loader = make_loader(train, batch_size=config.batch_size) test_loader = make_loader(test, batch_size=config.batch_size) # Make the model model = ConvNet(config.kernels, config.classes).to(device) # Make the loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam( model.parameters(), lr=config.learning_rate) return model, train_loader, test_loader, criterion, optimizer ``` ### Define the data loading and model Now, we need to specify how the data is loaded and what the model looks like. This part is very important, but it's no different from what it would be without `wandb`, so we won't dwell on it. ```python theme={null} def get_data(slice=5, train=True): full_dataset = torchvision.datasets.MNIST(root=".", train=train, transform=transforms.ToTensor(), download=True) # equiv to slicing with [::slice] sub_dataset = torch.utils.data.Subset( full_dataset, indices=range(0, len(full_dataset), slice)) return sub_dataset def make_loader(dataset, batch_size): loader = torch.utils.data.DataLoader(dataset=dataset, batch_size=batch_size, shuffle=True, pin_memory=True, num_workers=2) return loader ``` Defining the model is normally the fun part. But nothing changes with `wandb`, so we're gonna stick with a standard ConvNet architecture. Don't be afraid to mess around with this and try some experiments -- all your results will be logged on [wandb.ai](https://wandb.ai). ```python theme={null} # Conventional and convolutional neural network class ConvNet(nn.Module): def __init__(self, kernels, classes=10): super(ConvNet, self).__init__() self.layer1 = nn.Sequential( nn.Conv2d(1, kernels[0], kernel_size=5, stride=1, padding=2), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2)) self.layer2 = nn.Sequential( nn.Conv2d(16, kernels[1], kernel_size=5, stride=1, padding=2), nn.ReLU(), nn.MaxPool2d(kernel_size=2, stride=2)) self.fc = nn.Linear(7 * 7 * kernels[-1], classes) def forward(self, x): out = self.layer1(x) out = self.layer2(out) out = out.reshape(out.size(0), -1) out = self.fc(out) return out ``` ### Define training logic Moving on in our `model_pipeline`, it's time to specify how we `train`. Two `wandb` functions come into play here: `watch` and `log`. ## Track gradients with `run.watch()` and everything else with `run.log()` `run.watch()` will log the gradients and the parameters of your model, every `log_freq` steps of training. All you need to do is call it before you start training. The rest of the training code remains the same: we iterate over epochs and batches, running forward and backward passes and applying our `optimizer`. ```python theme={null} def train(model, loader, criterion, optimizer, config): # Tell wandb to watch what the model gets up to: gradients, weights, and more. run = wandb.init(project="pytorch-demo", config=config) run.watch(model, criterion, log="all", log_freq=10) # Run training and track with wandb total_batches = len(loader) * config.epochs example_ct = 0 # number of examples seen batch_ct = 0 for epoch in tqdm(range(config.epochs)): for _, (images, labels) in enumerate(loader): loss = train_batch(images, labels, model, optimizer, criterion) example_ct += len(images) batch_ct += 1 # Report metrics every 25th batch if ((batch_ct + 1) % 25) == 0: train_log(loss, example_ct, epoch) def train_batch(images, labels, model, optimizer, criterion): images, labels = images.to(device), labels.to(device) # Forward pass ➡ outputs = model(images) loss = criterion(outputs, labels) # Backward pass ⬅ optimizer.zero_grad() loss.backward() # Step with optimizer optimizer.step() return loss ``` The only difference is in the logging code: where previously you might have reported metrics by printing to the terminal, now you pass the same information to `run.log()`. `run.log()` expects a dictionary with strings as keys. These strings identify the objects being logged, which make up the values. You can also optionally log which `step` of training you're on. > *Side Note*: I like to use the number of examples the model has seen, > since this makes for easier comparison across batch sizes, > but you can use raw steps or batch count. For longer training runs, it can also make sense to log by `epoch`. ```python theme={null} def train_log(loss, example_ct, epoch): with wandb.init(project="pytorch-demo") as run: # Log the loss and epoch number # This is where we log the metrics to W&B run.log({"epoch": epoch, "loss": loss}, step=example_ct) print(f"Loss after {str(example_ct).zfill(5)} examples: {loss:.3f}") ``` ### Define testing logic Once the model is done training, we want to test it: run it against some fresh data from production, perhaps, or apply it to some hand-curated examples. ## (Optional) Call `run.save()` This is also a great time to save the model's architecture and final parameters to disk. For maximum compatibility, we'll `export` our model in the [Open Neural Network eXchange (ONNX) format](https://onnx.ai/). Passing that filename to `run.save()` ensures that the model parameters are saved to W\&B's servers: no more losing track of which `.h5` or `.pb` corresponds to which training runs. For more advanced `wandb` features for storing, versioning, and distributing models, check out our [Artifacts tools](https://www.wandb.com/artifacts). ```python theme={null} def test(model, test_loader): model.eval() with wandb.init(project="pytorch-demo") as run: # Run the model on some test examples with torch.no_grad(): correct, total = 0, 0 for images, labels in test_loader: images, labels = images.to(device), labels.to(device) outputs = model(images) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print(f"Accuracy of the model on the {total} " + f"test images: {correct / total:%}") run.log({"test_accuracy": correct / total}) # Save the model in the exchangeable ONNX format torch.onnx.export(model, images, "model.onnx") run.save("model.onnx") ``` ### Run training and watch your metrics live on wandb.ai Now that we've defined the whole pipeline and slipped in those few lines of W\&B code, we're ready to run our fully tracked experiment. We'll report a few links to you: our documentation, the Project page, which organizes all the runs in a project, and the Run page, where this run's results will be stored. Navigate to the Run page and check out these tabs: 1. **Charts**, where the model gradients, parameter values, and loss are logged throughout training 2. **System**, which contains a variety of system metrics, including Disk I/O utilization, CPU and GPU metrics (watch that temperature soar), and more 3. **Logs**, which has a copy of anything pushed to standard out during training 4. **Files**, where, once training is complete, you can click on the `model.onnx` to view our network with the [Netron model viewer](https://github.com/lutzroeder/netron). Once the run in finished, when the `with wandb.init()` block exits, we'll also print a summary of the results in the cell output. ```python theme={null} # Build, train and analyze the model with the pipeline model = model_pipeline(config) ``` ### Test Hyperparameters with Sweeps We only looked at a single set of hyperparameters in this example. But an important part of most ML workflows is iterating over a number of hyperparameters. You can use W\&B Sweeps to automate hyperparameter testing and explore the space of possible models and optimization strategies. Check out a [Colab notebook demonstrating hyperparameter optimization using W\&B Sweeps](https://wandb.me/sweeps-colab). Running a hyperparameter sweep with W\&B is very easy. There are just 3 simple steps: 1. **Define the sweep:** We do this by creating a dictionary or a [YAML file](/models/sweeps/define-sweep-configuration/) that specifies the parameters to search through, the search strategy, the optimization metric et all. 2. **Initialize the sweep:** `sweep_id = wandb.sweep(sweep_config)` 3. **Run the sweep agent:** `wandb.agent(sweep_id, function=train)` That's all there is to running a hyperparameter sweep. PyTorch training dashboard ## Example gallery Explore examples of projects tracked and visualized with W\&B in our [Gallery →](https://app.wandb.ai/gallery). ## Advanced setup 1. [Environment variables](/platform/hosting/env-vars/): Set API keys in environment variables so you can run training on a managed cluster. 2. [Offline mode](/support/models/articles/how-do-i-run-wandb-offline): Use `dryrun` mode to train offline and sync results later. 3. [On-prem](/platform/hosting/hosting-options/self-managed): Install W\&B in a private cloud or air-gapped servers in your own infrastructure. We have local installations for everyone from academics to enterprise teams. 4. [Sweeps](/models/sweeps/): Set up hyperparameter search quickly with our lightweight tool for tuning. # PyTorch Geometric Source: https://docs.wandb.ai/models/integrations/pytorch-geometric Integrate W&B with PyTorch Geometric for graph visualization and experiment tracking in geometric deep learning. [PyTorch Geometric](https://github.com/pyg-team/pytorch_geometric) or PyG is one of the most popular libraries for geometric deep learning and W\&B works extremely well with it for visualizing graphs and tracking experiments. After you have installed PyTorch Geometric, follow these steps to get started. ## Sign up and create an API key An API key authenticates your machine to W\&B. You can generate an API key from your user profile. For a more streamlined approach, create an API key by going directly to [User Settings](https://wandb.ai/settings). Copy the newly created API key immediately and save it in a secure location such as a password manager. 1. Click your user profile icon in the upper right corner. 2. Select **User Settings**, then scroll to the **API Keys** section. ## Install the `wandb` library and log in To install the `wandb` library locally and log in: 1. Set the `WANDB_API_KEY` [environment variable](/models/track/environment-variables/) to your API key. ```bash theme={null} export WANDB_API_KEY= ``` 2. Install the `wandb` library and log in. ```shell theme={null} pip install wandb wandb login ``` ```bash theme={null} pip install wandb ``` ```python theme={null} import wandb wandb.login() ``` ```notebook theme={null} !pip install wandb import wandb wandb.login() ``` ## Visualize the graphs You can save details about the input graphs including number of edges, number of nodes and more. W\&B supports logging plotly charts and HTML panels so any visualizations you create for your graph can then also be logged to W\&B. ### Use PyVis The following snippet shows how you could do that with PyVis and HTML. ```python theme={null} from pyvis.network import Network import wandb with wandb.init(project=’graph_vis’) as run: net = Network(height="750px", width="100%", bgcolor="#222222", font_color="white") # Add the edges from the PyG graph to the PyVis network for e in tqdm(g.edge_index.T): src = e[0].item() dst = e[1].item() net.add_node(dst) net.add_node(src) net.add_edge(src, dst, value=0.1) # Save the PyVis visualisation to a HTML file net.show("graph.html") run.log({"eda/graph": wandb.Html("graph.html")}) ``` Interactive graph visualization ### Use Plotly To use plotly to create a graph visualization, first you need to convert the PyG graph to a networkx object. Following this you will need to create Plotly scatter plots for both nodes and edges. The snippet below can be used for this task. ```python theme={null} def create_vis(graph): G = to_networkx(graph) pos = nx.spring_layout(G) edge_x = [] edge_y = [] for edge in G.edges(): x0, y0 = pos[edge[0]] x1, y1 = pos[edge[1]] edge_x.append(x0) edge_x.append(x1) edge_x.append(None) edge_y.append(y0) edge_y.append(y1) edge_y.append(None) edge_trace = go.Scatter( x=edge_x, y=edge_y, line=dict(width=0.5, color='#888'), hoverinfo='none', mode='lines' ) node_x = [] node_y = [] for node in G.nodes(): x, y = pos[node] node_x.append(x) node_y.append(y) node_trace = go.Scatter( x=node_x, y=node_y, mode='markers', hoverinfo='text', line_width=2 ) fig = go.Figure(data=[edge_trace, node_trace], layout=go.Layout()) return fig with wandb.init(project=’visualize_graph’) as run: run.log({‘graph’: wandb.Plotly(create_vis(graph))}) ``` A visualization created using the example function and logged inside a W&B Table. ## Log metrics You can use W\&B to track your experiments and related metrics, such as loss functions, accuracy, and more. Add the following line to your training loop: ```python theme={null} with wandb.init(project="my_project", entity="my_entity") as run: run.log({ 'train/loss': training_loss, 'train/acc': training_acc, 'val/loss': validation_loss, 'val/acc': validation_acc }) ``` hits@K metrics over epochs ## More resources * [Recommending Amazon Products using Graph Neural Networks in PyTorch Geometric](https://wandb.ai/manan-goel/gnn-recommender/reports/Recommending-Amazon-Products-using-Graph-Neural-Networks-in-PyTorch-Geometric--VmlldzozMTA3MzYw#what-does-the-data-look-like?) * [Point Cloud Classification using PyTorch Geometric](https://wandb.ai/geekyrakshit/pyg-point-cloud/reports/Point-Cloud-Classification-using-PyTorch-Geometric--VmlldzozMTExMTE3) * [Point Cloud Segmentation using PyTorch Geometric](https://wandb.ai/wandb/point-cloud-segmentation/reports/Point-Cloud-Segmentation-using-Dynamic-Graph-CNN--VmlldzozMTk5MDcy) # Hugging Face Simple Transformers Source: https://docs.wandb.ai/models/integrations/simpletransformers How to integrate W&B with the Transformers library by Hugging Face. This library is based on the Transformers library by Hugging Face. Simple Transformers lets you quickly train and evaluate Transformer models. Only 3 lines of code are needed to initialize a model, train the model, and evaluate a model. It supports Sequence Classification, Token Classification (NER),Question Answering,Language Model Fine-Tuning, Language Model Training, Language Generation, T5 Model, Seq2Seq Tasks , Multi-Modal Classification and Conversational AI. To use W\&B for visualizing model training. To use this, set a project name for W\&B in the `wandb_project` attribute of the `args` dictionary. This logs all hyperparameter values, training losses, and evaluation metrics to the given project. ```python theme={null} model = ClassificationModel('roberta', 'roberta-base', args={'wandb_project': 'project-name'}) ``` Any additional arguments that go into `wandb.init()` can be passed as `wandb_kwargs`. ## Structure The library is designed to have a separate class for every NLP task. The classes that provide similar functionality are grouped together. * `simpletransformers.classification` - Includes all Classification models. * `ClassificationModel` * `MultiLabelClassificationModel` * `simpletransformers.ner` - Includes all Named Entity Recognition models. * `NERModel` * `simpletransformers.question_answering` - Includes all Question Answering models. * `QuestionAnsweringModel` Here are some minimal examples ## MultiLabel Classification ```text theme={null} model = MultiLabelClassificationModel("distilbert","distilbert-base-uncased",num_labels=6, args={"reprocess_input_data": True, "overwrite_output_dir": True, "num_train_epochs":epochs,'learning_rate':learning_rate, 'wandb_project': "simpletransformers"}, ) # Train the model model.train_model(train_df) # Evaluate the model result, model_outputs, wrong_predictions = model.eval_model(eval_df) ``` ## Question answering ```text theme={null} train_args = { 'learning_rate': wandb.config.learning_rate, 'num_train_epochs': 2, 'max_seq_length': 128, 'doc_stride': 64, 'overwrite_output_dir': True, 'reprocess_input_data': False, 'train_batch_size': 2, 'fp16': False, 'wandb_project': "simpletransformers" } model = QuestionAnsweringModel('distilbert', 'distilbert-base-cased', args=train_args) model.train_model(train_data) ``` SimpleTransformers provides classes as well as training scripts for all common natural language tasks. Here is the complete list of global arguments that are supported by the library, with their default arguments. ```text theme={null} global_args = { "adam_epsilon": 1e-8, "best_model_dir": "outputs/best_model", "cache_dir": "cache_dir/", "config": {}, "do_lower_case": False, "early_stopping_consider_epochs": False, "early_stopping_delta": 0, "early_stopping_metric": "eval_loss", "early_stopping_metric_minimize": True, "early_stopping_patience": 3, "encoding": None, "eval_batch_size": 8, "evaluate_during_training": False, "evaluate_during_training_silent": True, "evaluate_during_training_steps": 2000, "evaluate_during_training_verbose": False, "fp16": True, "fp16_opt_level": "O1", "gradient_accumulation_steps": 1, "learning_rate": 4e-5, "local_rank": -1, "logging_steps": 50, "manual_seed": None, "max_grad_norm": 1.0, "max_seq_length": 128, "multiprocessing_chunksize": 500, "n_gpu": 1, "no_cache": False, "no_save": False, "num_train_epochs": 1, "output_dir": "outputs/", "overwrite_output_dir": False, "process_count": cpu_count() - 2 if cpu_count() > 2 else 1, "reprocess_input_data": True, "save_best_model": True, "save_eval_checkpoints": True, "save_model_every_epoch": True, "save_steps": 2000, "save_optimizer_and_scheduler": True, "silent": False, "tensorboard_dir": None, "train_batch_size": 8, "use_cached_eval_features": False, "use_early_stopping": False, "use_multiprocessing": True, "wandb_kwargs": {}, "wandb_project": None, "warmup_ratio": 0.06, "warmup_steps": 0, "weight_decay": 0, } ``` Refer to [simpletransformers on github](https://github.com/ThilinaRajapakse/simpletransformers) for more detailed documentation. Checkout [this W\&B report](https://app.wandb.ai/cayush/simpletransformers/reports/Using-simpleTransformer-on-common-NLP-applications---Vmlldzo4Njk2NA) that covers training transformers on some the most popular GLUE benchmark datasets. [Try it out yourself on colab](https://colab.research.google.com/drive/1oXROllqMqVvBFcPgTKJRboTq96uWuqSz?usp=sharing). # TensorFlow Source: https://docs.wandb.ai/models/integrations/tensorflow Integrate W&B with TensorFlow for logging custom metrics, using estimator hooks, and TensorBoard log synchronization. ## Get started If you're already using TensorBoard, it's easy to integrate with wandb. ```python theme={null} import tensorflow as tf import wandb ``` ## Log custom metrics If you need to log additional custom metrics that aren't being logged to TensorBoard, you can call `run.log()` in your code `run.log({"custom": 0.8}) ` Setting the step argument in `run.log()` is turned off when syncing Tensorboard. If you'd like to set a different step count, you can log the metrics with a step metric as: ```python theme={null} with wandb.init(config=tf.flags.FLAGS, sync_tensorboard=True) as run: run.log({"custom": 0.8, "global_step":global_step}, step=global_step) ``` ## TensorFlow estimators hook If you want more control over what gets logged, wandb also provides a hook for TensorFlow estimators. It will log all `tf.summary` values in the graph. ```python theme={null} import tensorflow as tf import wandb run = wandb.init(config=tf.FLAGS) estimator.train(hooks=[wandb.tensorflow.WandbHook(steps_per_log=1000)]) run.finish() ``` ## Log manually The simplest way to log metrics in TensorFlow is by logging `tf.summary` with the TensorFlow logger: ```python theme={null} import wandb run = wandb.init(config=tf.flags.FLAGS, sync_tensorboard=True) with tf.Session() as sess: # ... wandb.tensorflow.log(tf.summary.merge_all()) ``` With TensorFlow 2, the recommended way of training a model with a custom loop is via using `tf.GradientTape`. You can read more in the [TensorFlow custom training walkthrough](https://www.tensorflow.org/tutorials/customization/custom_training_walkthrough). If you want to incorporate `wandb` to log metrics in your custom TensorFlow training loops you can follow this snippet: ```python theme={null} with tf.GradientTape() as tape: # Get the probabilities predictions = model(features) # Calculate the loss loss = loss_func(labels, predictions) # Log your metrics run.log("loss": loss.numpy()) # Get the gradients gradients = tape.gradient(loss, model.trainable_variables) # Update the weights optimizer.apply_gradients(zip(gradients, model.trainable_variables)) ``` A [full example of customizing training loops in TensorFlow 2](https://www.wandb.com/articles/wandb-customizing-training-loops-in-tensorflow-2) is available. ## How is W\&B different from TensorBoard? When the cofounders started working on W\&B, they were inspired to build a tool for the frustrated TensorBoard users at OpenAI. Here are a few things we've focused on improving: 1. **Reproduce models**: W\&B is good for experimentation, exploration, and reproducing models later. We capture not just the metrics, but also the hyperparameters and version of the code, and we can save your version-control status and model checkpoints for you so your project is reproducible. 2. **Automatic organization**: Whether you're picking up a project from a collaborator, coming back from a vacation, or dusting off an old project, W\&B makes it easy to see all the models that have been tried so no one wastes hours, GPU cycles, or carbon re-running experiments. 3. **Fast, flexible integration**: Add W\&B to your project in 5 minutes. Install our free open-source Python package and add a couple of lines to your code, and every time you run your model you'll have nice logged metrics and records. 4. **Persistent, centralized dashboard**: No matter where you train your models, whether on your local machine, in a shared lab cluster, or on spot instances in the cloud, your results are shared to the same centralized dashboard. You don't need to spend your time copying and organizing TensorBoard files from different machines. 5. **Powerful tables**: Search, filter, sort, and group results from different models. It's easy to look over thousands of model versions and find the best performing models for different tasks. TensorBoard isn't built to work well on large projects. 6. **Tools for collaboration**: Use W\&B to organize complex machine learning projects. It's easy to share a link to W\&B, and you can use private teams to have everyone sending results to a shared project. We also support collaboration via reports— add interactive visualizations and describe your work in markdown. This is a great way to keep a work log, share findings with your supervisor, or present findings to your lab or team. Get started with a [free account](https://wandb.ai) ## Examples We've created a few examples for you to see how the integration works: * [Example on Github](https://github.com/wandb/examples/blob/master/examples/tensorflow/tf-estimator-mnist/mnist.py): MNIST example Using TensorFlow Estimators * [Example on Github](https://github.com/wandb/examples/blob/master/examples/tensorflow/tf-cnn-fashion/train.py): Fashion MNIST example Using Raw TensorFlow * [Wandb Dashboard](https://app.wandb.ai/l2k2/examples-tf-estimator-mnist/runs/p0ifowcb): View result on W\&B * Customizing Training Loops in TensorFlow 2 - [Article](https://www.wandb.com/articles/wandb-customizing-training-loops-in-tensorflow-2) | [Dashboard](https://app.wandb.ai/sayakpaul/custom_training_loops_tf) # PyTorch torchtune Source: https://docs.wandb.ai/models/integrations/torchtune Use W&B logging in PyTorch torchtune for tracking LLM fine-tuning experiments with the WandBLogger metric logger. [torchtune](https://meta-pytorch.org/torchtune/stable/index.html) is a PyTorch-based library designed to streamline the authoring, fine-tuning, and experimentation processes for large language models (LLMs). Additionally, torchtune has built-in support for [logging with W\&B](https://meta-pytorch.org/torchtune/stable/deep_dives/wandb_logging.html), enhancing tracking and visualization of training processes. TorchTune training dashboard Check the W\&B blog post on [Fine-tuning Mistral 7B using torchtune](https://wandb.ai/capecape/torchtune-mistral/reports/torchtune-The-new-PyTorch-LLM-fine-tuning-library---Vmlldzo3NTUwNjM0). ## W\&B logging at your fingertips Override command line arguments at launch: ```bash theme={null} tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ metric_logger._component_=torchtune.utils.metric_logging.WandBLogger \ metric_logger.project="llama3_lora" \ log_every_n_steps=5 ``` Enable W\&B logging on the recipe's config: ```yaml theme={null} # inside llama3/8B_lora_single_device.yaml metric_logger: _component_: torchtune.utils.metric_logging.WandBLogger project: llama3_lora log_every_n_steps: 5 ``` ## Use the W\&B metric logger Enable W\&B logging on the recipe's config file by modifying the `metric_logger` section. Change the `_component_` to `torchtune.utils.metric_logging.WandBLogger` class. You can also pass a `project` name and `log_every_n_steps` to customize the logging behavior. You can also pass any other `kwargs` as you would to the [wandb.init()](/models/ref/python/functions/init) method. For example, if you are working on a team, you can pass the `entity` argument to the `WandBLogger` class to specify the team name. ```yaml theme={null} # inside llama3/8B_lora_single_device.yaml metric_logger: _component_: torchtune.utils.metric_logging.WandBLogger project: llama3_lora entity: my_project job_type: lora_finetune_single_device group: my_awesome_experiments log_every_n_steps: 5 ``` ```shell theme={null} tune run lora_finetune_single_device --config llama3/8B_lora_single_device \ metric_logger._component_=torchtune.utils.metric_logging.WandBLogger \ metric_logger.project="llama3_lora" \ metric_logger.entity="my_project" \ metric_logger.job_type="lora_finetune_single_device" \ metric_logger.group="my_awesome_experiments" \ log_every_n_steps=5 ``` ## What is logged? You can explore the W\&B dashboard to see the logged metrics. By default W\&B logs all of the hyperparameters from the config file and the launch overrides. W\&B captures the resolved config on the **Overview** tab. W\&B also stores the config in YAML format on the [Files tab](https://wandb.ai/capecape/torchtune/runs/joyknwwa/files). TorchTune configuration ### Logged metrics Each recipe has its own training loop. Check each individual recipe to see its logged metrics, which include these by default: | Metric | Description | | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `loss` | The loss of the model | | `lr` | The learning rate | | `tokens_per_second` | The tokens per second of the model | | `grad_norm` | The gradient norm of the model | | `global_step` | Corresponds to the current step in the training loop. Takes into account gradient accumulation, basically every time an optimizer step is taken, the model is updated, the gradients are accumulated and the model is updated once every `gradient_accumulation_steps` | `global_step` is not the same as the number of training steps. It corresponds to the current step in the training loop. Takes into account gradient accumulation, basically every time an optimizer step is taken the `global_step` is incremented by 1. For example, if the dataloader has 10 batches, gradient accumulation steps is 2 and run for 3 epochs, the optimizer will step 15 times, in this case `global_step` will range from 1 to 15. The streamlined design of torchtune allows to easily add custom metrics or modify the existing ones. It suffices to modify the corresponding [recipe file](https://github.com/meta-pytorch/torchtune/tree/main/recipes), for example, computing one could log `current_epoch` as a percentage of the total number of epochs as following: ```python theme={null} # inside `train.py` function in the recipe file self._metric_logger.log_dict( {"current_epoch": self.epochs * self.global_step / self._steps_per_epoch}, step=self.global_step, ) ``` This is a fast evolving library, the current metrics are subject to change. If you want to add a custom metric, you should modify the recipe and call the corresponding `self._metric_logger.*` function. ## Save and load checkpoints The torchtune library supports various [checkpoint formats](https://meta-pytorch.org/torchtune/stable/deep_dives/checkpointer.html). Depending on the origin of the model you are using, you should switch to the appropriate [checkpointer class](https://meta-pytorch.org/torchtune/stable/deep_dives/checkpointer.html). If you want to save the model checkpoints to [W\&B Artifacts](/models/artifacts/), the simplest solution is to override the `save_checkpoint` functions inside the corresponding recipe. Here is an example of how you can override the `save_checkpoint` function to save the model checkpoints to W\&B Artifacts. ```python theme={null} def save_checkpoint(self, epoch: int) -> None: ... ## Let's save the checkpoint to W&B ## depending on the Checkpointer Class the file will be named differently ## Here is an example for the full_finetune case checkpoint_file = Path.joinpath( self._checkpointer._output_dir, f"torchtune_model_{epoch}" ).with_suffix(".pt") wandb_artifact = wandb.Artifact( name=f"torchtune_model_{epoch}", type="model", # description of the model checkpoint description="Model checkpoint", # you can add whatever metadata you want as a dict metadata={ utils.SEED_KEY: self.seed, utils.EPOCHS_KEY: self.epochs_run, utils.TOTAL_EPOCHS_KEY: self.total_epochs, utils.MAX_STEPS_KEY: self.max_steps_per_epoch, }, ) wandb_artifact.add_file(checkpoint_file) wandb.log_artifact(wandb_artifact) ``` # XGBoost Source: https://docs.wandb.ai/models/integrations/xgboost Integrate W&B with XGBoost to log gradient boosting metrics, feature importance, and model performance automatically. The `wandb` library has a `WandbCallback` callback for logging metrics, configs and saved boosters from training with XGBoost. Here you can see a [live W\&B Dashboard](https://wandb.ai/morg/credit_scorecard) with outputs from the XGBoost `WandbCallback`. W&B Dashboard using XGBoost ## Get started Logging XGBoost metrics, configs and booster models to W\&B is as easy as passing the `WandbCallback` to XGBoost: ```python theme={null} from wandb.integration.xgboost import WandbCallback import xgboost as XGBClassifier ... # Start a wandb run with wandb.init() as run: # Pass WandbCallback to the model bst = XGBClassifier() bst.fit(X_train, y_train, callbacks=[WandbCallback(log_model=True)]) ``` You can open [this notebook](https://wandb.me/xgboost) for a comprehensive look at logging with XGBoost and W\&B ## `WandbCallback` reference ### Functionality Passing `WandbCallback` to a XGBoost model will: * log the booster model configuration to W\&B * log evaluation metrics collected by XGBoost, such as rmse, accuracy etc to W\&B * log training metrics collected by XGBoost (if you provide data to eval\_set) * log the best score and the best iteration * save and upload your trained model to W\&B Artifacts (when `log_model = True`) * log feature importance plot when `log_feature_importance=True` (default). * Capture the best eval metric in `wandb.Run.summary` when `define_metric=True` (default). ### Arguments * `log_model`: (boolean) if True save and upload the model to W\&B Artifacts * `log_feature_importance`: (boolean) if True log a feature importance bar plot * `importance_type`: (str) one of `{weight, gain, cover, total_gain, total_cover}` for tree model. weight for linear model. * `define_metric`: (boolean) if True (default) capture model performance at the best step, instead of the last step, of training in your `run.summary`. You can review the [source code for WandbCallback](https://github.com/wandb/wandb/blob/main/wandb/integration/xgboost/xgboost.py). For additional examples, check out the [repository of examples on GitHub](https://github.com/wandb/examples/tree/master/examples/boosting-algorithms). ## Tune your hyperparameters with Sweeps Attaining the maximum performance out of models requires tuning hyperparameters, like tree depth and learning rate. W\&B [Sweeps](/models/sweeps/) is a powerful toolkit for configuring, orchestrating, and analyzing large hyperparameter testing experiments. You can also try this [XGBoost & Sweeps Python script](https://github.com/wandb/examples/blob/master/examples/wandb-sweeps/sweeps-xgboost/xgboost_tune.py). XGBoost performance comparison # LLM Evaluation Jobs Source: https://docs.wandb.ai/models/launch Evaluate model checkpoints or hosted API models within W&B and analyze the results using automatically generated leaderboards. [LLM Evaluation Jobs](/models/launch) is a benchmarking framework for evaluating an LLM model's performance, using infrastructure managed by CoreWeave. Choose from a comprehensive suite of modern, industry-standard [model evaluation benchmarks](/models/launch/evaluations), then view, analyze, and share the results using automatic leaderboards and charts in W\&B Models. LLM Evaluation Jobs removes the complexity of deploying and maintaining GPU infrastructure yourself. LLM Evaluation Jobs is in **Preview** for [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). Compute is free during the preview period. [Learn more](/models/launch#pricing) ## How it works Evaluate a model checkpoint or a publicly accessible hosted OpenAI-compatible model in just a few steps: 1. Set up an evaluation job in W\&B Models. Define its benchmarks and configuration, such as whether to generate a leaderboard. 2. Launch the evaluation job. 3. View and analyze the results and leaderboard. Each time you launch an evaluation job with the same destination project, the project's leaderboard updates automatically. Example evaluation job leaderboard ## Next steps * Browse the [Evaluation benchmark catalog](/models/launch/evaluations) * [Evaluate a model checkpoint](/models/launch/evaluate-model-checkpoint) * [Evaluate an API-hosted model](/models/launch/evaluate-hosted-model) ## More details ### Pricing LLM Evaluation Jobs evaluates a model checkpoint or hosted API against popular benchmarks on fully-managed CoreWeave compute, with no infrastructure to manage. You pay only for resources consumed, not idle time. Pricing has two components: compute and storage. Compute is free during public preview, and we will announce pricing at general availability. Stored results include metrics and per-example traces saved in Models runs. Storage is billed monthly based on data volume. During the preview period, LLM Evaluation Jobs is available for Multi-tenant Cloud only. See the [Pricing](https://wandb.ai/pricing) page for details. ### Job limits An individual evaluation job has these limits: * The maximum size for a model to evaluate is 86 GB, including context. * Each job is limited to two GPUs. ### Requirements * To evaluate a model checkpoint, the model weights must be packaged as a VLLM-compatible artifact. See [Example: Prepare a model](/models/launch/evaluate-model-checkpoint#example-prepare-a-model) for details and example code. * To evaluate an OpenAI-compatible model, it must be accessible at a public URL, and an organization or team admin must configure a team secret with the API key for authentication. * Certain benchmarks use OpenAI models for scoring. To run these benchmarks, an organization or team admin must configure team secrets with the required API keys. See the [Evaluation benchmark catalog](/models/launch/evaluations) to determine whether a benchmark has this requirement. * Certain benchmarks require access to gated datasets in Hugging Face. To run one of these benchmarks, an organization or team admin must request access to the gated dataset in Hugging Face, generate a Hugging Face user access token, and configure it as a team secret. See the [Evaluation benchmark catalog](/models/launch/evaluations) to determine whether a benchmark has this requirement. For more details and instructions for meeting these requirements, see: * [Evaluate a model checkpoint](/models/launch/evaluate-model-checkpoint) * [Evaluate a hosted API model](/models/launch/evaluate-hosted-model) # Evaluate a hosted API model Source: https://docs.wandb.ai/models/launch/evaluate-hosted-model Evaluate a hosted API model using infrastructure managed by CoreWeave LLM Evaluation Jobs is in **Preview** for [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). Compute is free during the preview period. [Learn more](/models/launch#pricing) This page shows how to use [LLM Evaluation Jobs](/models/launch) to run a series of evaluation benchmarks on a hosted API model at a publicly accessible URL, using infrastructure managed by CoreWeave. Running these benchmarks helps you compare model performance, validate model quality, and publish results to a shared leaderboard without managing your own evaluation infrastructure. To evaluate a model checkpoint saved as an artifact in W\&B Models, see [Evaluate a model checkpoint](/models/launch/evaluate-model-checkpoint) instead. ## Prerequisites Before you create an evaluation job, complete the following: 1. Review the [requirements and limitations](/models/launch#more-details) for LLM Evaluation Jobs. 2. To run certain benchmarks, a team admin must add the required API keys as team-scoped secrets. Any team member can specify the secret when configuring an evaluation job. * An **OpenAI API key**: Used by benchmarks that use OpenAI models for scoring. Required if the field **Scorer API key** appears after you select a benchmark. The secret must be named `OPENAI_API_KEY`. * A **Hugging Face user access token**: Required for certain benchmarks like `lingoly` and `lingoly2` that require access to one or more gated Hugging Face datasets. Required if the field **Hugging Face Token** appears after you select a benchmark. The API key must have access to the relevant dataset. See the Hugging Face documentation for [User access tokens](https://huggingface.co/docs/hub/en/security-tokens) and [accessing gated datasets](https://huggingface.co/docs/hub/en/datasets-gated#access-gated-datasets-as-a-user). * To evaluate a model provided by [Serverless Inference](/inference), an organization or team admin must create `WANDB_API_KEY` with any value. The secret isn't used for authentication. 3. The model to evaluate must be available at a publicly accessible URL. An organization or team admin must create a team-scoped secret with the API key for authentication. 4. Create a new [W\&B project](/models/track/project-page) for the evaluation results. From the project sidebar, click **Create new project**. 5. Review the documentation for a given benchmark to understand how it works and learn about specific requirements. The [Available evaluation benchmarks](/models/launch/evaluations) reference includes relevant links. ## Evaluate your model Follow these steps to set up and launch an evaluation job. When you finish, your benchmark runs are queued on CoreWeave-managed infrastructure and their results appear in the destination W\&B project you specify. 1. Log in to W\&B, then click **Launch** in the project sidebar. The **LLM Evaluation Jobs** page displays. 2. Click **Evaluate hosted API model** to set up the evaluation. 3. Select a destination project to save the evaluation results to. 4. In the **Model** section, specify the base URL and model name to evaluate, and select the API key to use for authentication. Provide the model name in OpenAI-compatible format defined by the [AI Security Institute](https://inspect.aisi.org.uk/providers.html#openai-api). For example, specify an OpenAI model in the following syntax, where `[MODEL-NAME]` is the name of the model: `openai/[MODEL-NAME]`. For a list of hosted model providers and models, see [AI Security Institute's model provider reference](https://inspect.aisi.org.uk/providers.html). * To evaluate a model provided by [Serverless Inference](/inference), set the base URL to `https://api.inference.wandb.ai/v1` and specify the model name in the following syntax, where `[MODEL-ID]` is the model ID: `openai-api/wandb/[MODEL-ID]`. Refer to the [Inference model catalog](/inference/models) for details. * To use the [OpenRouter](https://inspect.aisi.org.uk/providers.html#openrouter) provider, prefix the model name with `openrouter` in the following syntax, where `[MODEL-NAME]` is the name of the model: `openrouter/[MODEL-NAME]`. * To evaluate a custom OpenAPI-compliant model, specify the model name in the following syntax, where `[MODEL-NAME]` is the name of the model: `openai-api/wandb/[MODEL-NAME]`. 5. Click **Select evaluations**, then select up to four benchmarks to run. 6. If you select benchmarks that use OpenAI models for scoring, the **Scorer API key** field displays. Click it, then select the `OPENAI_API_KEY` secret. A team admin can create a secret from this drawer by clicking **Create secret**. 7. If you select benchmarks that require access to gated datasets in Hugging Face, a **Hugging Face token** field displays. [Request access to the relevant dataset](https://huggingface.co/docs/hub/en/datasets-gated#access-gated-datasets-as-a-user), then select the secret that contains the Hugging Face user access token. 8. Optional: Set **Sample limit** to a positive integer to limit the maximum number of benchmark samples to evaluate. Otherwise, all samples in the task are included. 9. To create a leaderboard automatically, click **Publish results to leaderboard**. The leaderboard displays all evaluations together in a workspace panel, and you can also share it in a report. 10. Click **Launch** to launch the evaluation job. 11. Click the circular arrow icon at the top of the page to open the recent run modal. Evaluation jobs appear with your other recent runs. Click the name of a finished run to open it in single-run view, or click the **Leaderboard** link to open the leaderboard directly. For details, see [View the results](#view-the-results). This example job runs the `simpleqa` benchmark against the OpenAI model `o4-mini`: Example hosted model evaluation job If you published results to a leaderboard, you can compare evaluations side by side. This example leaderboard visualizes the performance of several OpenAI models together: Example leaderboard visualizing the performance of several hosted models ## Review evaluation results Review your evaluation job results in W\&B Models in the destination project's workspace. 1. Click the circular arrow icon at the top of the page to open the recent run modal, where evaluation jobs appear with other runs in the project. If the evaluation job has a leaderboard, click **Leaderboard** to open the leaderboard in full screen, or click a run name to open it in the project in single-run view. 2. View the evaluation job's traces in the **Evaluations** section of a workspace or in the **Traces** tab of the **Weave** sidebar panel. 3. Click the **Overview** tab to view detailed information about the evaluation job, including its configuration and summary metrics. 4. Click the **Logs** tab to view, search, or download the evaluation job's debug logs. 5. Click the **Files** tab to browse, view, or download the evaluation job's files, including code, log, configuration, and other output files. ## Customize a leaderboard The leaderboard shows results for all evaluation jobs sent to a given project, with one row per benchmark per evaluation job. Columns display details like the trace, input values, and output values for the evaluation job. For more information about leaderboards, see [Leaderboards in Weave](/weave/guides/core-types/leaderboards). To give feedback about a result from the leaderboard, click the emoji icon or the chat icon in the **Feedback** column. * By default, all evaluation jobs are displayed. Filter or search for an evaluation job using the run selector at the left. * By default, evaluation jobs are ungrouped. To group by one or more columns, click the **Group** icon. You can show or hide a group, or expand a group to view its runs. * By default, all operations are displayed. To display only a single operation, click **All ops** and select an operation. * To sort by a column, click the column heading. To customize the display of columns, click **Columns**. * By default, headers are organized in a single level. You can increase the header depth to organize related headers together. * Select or deselect individual columns to show or hide them, or show or hide all columns with a click. * Pin columns to display them before unpinned columns. ## Export a leaderboard To export a leaderboard: 1. Click the download icon, located near the **Columns** button. 2. To optimize the export size, only the trace roots are exported by default. To export full traces, turn off **Trace roots only**. 3. To optimize the export size, feedback and costs are not exported by default. To include them in the export, toggle **Feedback** or **Costs**. 4. By default, the export is in JSONL format. To customize the format, click **Export to file** and select a format. 5. To export the leaderboard in your browser, click **Export**. 6. To export the leaderboard programmatically, select **Python** or **cURL**, then click **Copy** and run the script or command. ## Re-run an evaluation job Depending on your situation, there are multiple ways to re-run an evaluation job or view its configuration. * To re-run the last evaluation job again, follow the steps in [Evaluate your model](#evaluate-your-model). Select the destination project, then the model artifact details and the selected benchmarks you selected last time are populated automatically. Optionally, make adjustments, then launch the evaluation job. * To re-run an evaluation job from the project's **Runs** tab or run selector, hover over the run name and click the play icon. The job configuration drawer displays with the settings pre-populated. Optionally adjust the settings, then click **Launch**. * To re-run an evaluation job from a different project, import its configuration: 1. Follow the steps in [Evaluate your model](#evaluate-your-model). After you select the destination project, click **Import configuration**. 2. Select the project that contains the evaluation job to import, then select the evaluation job run. The job configuration drawer displays with the settings pre-populated. 3. Optionally adjust the configuration. 4. Click **Launch**. ## Export an evaluation job configuration Export an evaluation job's configuration from the run's **Files** tab. 1. Open the run in single-run view. 2. Click the **Files** tab. 3. Click the download button next to `config.yaml` to download it locally. # Evaluate a model checkpoint Source: https://docs.wandb.ai/models/launch/evaluate-model-checkpoint Evaluate a VLLM-compatible model checkpoint using infrastructure managed by CoreWeave LLM Evaluation Jobs is in **Preview** for [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). Compute is free during the preview period. [Learn more](/models/launch#pricing) This page shows how to use [LLM Evaluation Jobs](/models/launch) to run a series of evaluation benchmarks on a fine-tuned model in W\&B Models, using infrastructure managed by CoreWeave. To evaluate a hosted API model served at a publicly accessible URL, see [Evaluate an API-hosted model](/models/launch/evaluate-hosted-model) instead, or run a small benchmark against a public OpenAI model endpoint with a streamlined [Quickstart](/models/launch#quickstart). ## Prerequisites Before you evaluate a model checkpoint, complete the following: 1. Review the [requirements and limitations](/models/launch#more-details) for LLM Evaluation Jobs. 2. To run certain benchmarks, a team admin must add the required API keys as [team-scoped secrets](/platform/secrets#add-a-secret). Any team member can specify the secret when configuring an evaluation job. See [Evaluation model catalog](/models/launch/evaluations) for requirements. * An **OpenAPI API key**: used by benchmarks that use OpenAI models for scoring. Required if the field **Scorer API key** appears after you select a benchmark. The secret must be named `OPENAI_API_KEY`. * A **Hugging Face user access token**: required for certain benchmarks like `lingoly` and `lingoly2` that require access to one or more gated Hugging Face datasets. Required if the field **Hugging Face token** appears after selecting a benchmark. The API key must have access to the relevant dataset. See the Hugging Face documentation for [User access tokens](https://huggingface.co/docs/hub/en/security-tokens) and [accessing gated datasets](https://huggingface.co/docs/hub/en/datasets-gated#access-gated-datasets-as-a-user). 3. Create a new [W\&B project](/models/track/project-page) for the evaluation results. From the project sidebar, click **Create new project**. 4. Package the model in VLLM-compatible format and save it as an artifact in W\&B Models. An attempt to benchmark any other type of artifact fails. For one approach, see the following [Example: Prepare a model](#example-prepare-a-model) section. 5. Review the documentation for a given benchmark to understand how it works and learn about specific requirements. For convenience, the [Available evaluation benchmarks](/models/launch/evaluations) reference includes relevant links. ## Evaluate your model After you complete the prerequisites, follow these steps to set up and launch an evaluation job: 1. Log in to W\&B, then click **Launch** in the project sidebar. The **LLM Evaluation Jobs** page displays. 2. Click **Evaluate model checkpoint** to set up the evaluation job. 3. Select a destination project to save the evaluation results to. 4. In the **Model artifact** section, specify the project, artifact, and version of the prepared model to evaluate. 5. Click **Evaluations**, then select up to four benchmarks. 6. If you select benchmarks that use OpenAI models for scoring, the **Scorer API key** field displays. Click it, then select the `OPENAI_API_KEY` secret. For convenience, a team admin can create a secret from this drawer by clicking **Create secret**. 7. If you select benchmarks that require access to gated datasets in Hugging Face, a **Hugging Face token** field displays. [Request access to the relevant dataset](https://huggingface.co/docs/hub/en/datasets-gated#access-gated-datasets-as-a-user), then select the secret that contains the Hugging Face user access token. 8. Optional: Set **Sample limit** to a positive integer to limit the maximum number of benchmark samples to evaluate. Otherwise, the job includes all samples in the task. 9. To create a leaderboard automatically, click **Publish results to leaderboard**. The leaderboard displays all evaluations together in a workspace panel, and you can also share it in a report. 10. Click **Launch** to launch the evaluation job. 11. Click the circular arrow icon at the top of the page to open the recent run modal. Evaluation jobs appear with your other recent runs. Click the name of a finished run to open it in single-run view, or click the **Leaderboard** link to open the leaderboard directly. For details, see [View the results](#view-the-results). After you evaluate your first model, many fields are pre-filled with the most recent values when you configure your next evaluation job. This example evaluation job runs two benchmarks against an artifact: Example model checkpoint evaluation job This example leaderboard visualizes the performance of several models together: Example leaderboard visualizing the performance of several models against several benchmark tasks ## Review evaluation results Review your evaluation job results in W\&B Models in the destination project's workspace. 1. Click the circular arrow icon at the top of the page to open the recent run modal, where evaluation jobs appear with other runs in the project. If the evaluation job has a leaderboard, click **Leaderboard** to open the leaderboard in full screen, or click a run name to open it in the project in single-run view. 2. View the evaluation job's traces in the **Evaluations** section of a workspace or in the **Traces** tab of the **Weave** sidebar panel. 3. Click the **Overview** tab to view detailed information about the evaluation job, including its configuration and summary metrics. 4. Click the **Logs** tab to view, search, or download the evaluation job's debug logs. 5. Click the **Files** tab to browse, view, or download the evaluation job's files, including code, log, configuration, and other output files. ## Customize a leaderboard The leaderboard shows results for all evaluation jobs sent to a given project, with one row per benchmark per evaluation job. Columns display details like the trace, input values, and output values for the evaluation job. For more information about leaderboards, see [Leaderboards in Weave](/weave/guides/core-types/leaderboards). To give feedback about a result from the leaderboard, click the emoji icon or the chat icon in the **Feedback** column. * By default, all evaluation jobs are displayed. Filter or search for an evaluation job using the run selector at the left. * By default, evaluation jobs are ungrouped. To group by one or more columns, click the **Group** icon. You can show or hide a group, or expand a group to view its runs. * By default, all operations are displayed. To display only a single operation, click **All ops** and select an operation. * To sort by a column, click the column heading. To customize the display of columns, click **Columns**. * By default, headers are organized in a single level. You can increase the header depth to organize related headers together. * Select or deselect individual columns to show or hide them, or show or hide all columns with a click. * Pin columns to display them before unpinned columns. ## Export a leaderboard To export a leaderboard: 1. Click the download icon, located near the **Columns** button. 2. To optimize the export size, only the trace roots are exported by default. To export full traces, turn off **Trace roots only**. 3. To optimize the export size, feedback and costs are not exported by default. To include them in the export, toggle **Feedback** or **Costs**. 4. By default, the export is in JSONL format. To customize the format, click **Export to file** and select a format. 5. To export the leaderboard in your browser, click **Export**. 6. To export the leaderboard programmatically, select **Python** or **cURL**, then click **Copy** and run the script or command. ## Re-run an evaluation job Depending on your situation, there are multiple ways to re-run an evaluation job or view its configuration. * To re-run the last evaluation job again, follow the steps in [Evaluate your model](#evaluate-your-model). Select the destination project, then the model artifact details and the selected benchmarks you selected last time are populated automatically. Optionally, make adjustments, then launch the evaluation job. * To re-run an evaluation job from the project's **Runs** tab or run selector, hover over the run name and click the play icon. The job configuration drawer displays with the settings pre-populated. Optionally adjust the settings, then click **Launch**. * To re-run an evaluation job from a different project, import its configuration: 1. Follow the steps in [Evaluate your model](#evaluate-your-model). After you select the destination project, click **Import configuration**. 2. Select the project that contains the evaluation job to import, then select the evaluation job run. The job configuration drawer displays with the settings pre-populated. 3. Optionally adjust the configuration. 4. Click **Launch**. ## Export an evaluation job configuration Export an evaluation job's configuration from the run's **Files** tab. 1. Open the run in single-run view. 2. Click the **Files** tab. 3. Click the download button next to `config.yaml` to download it locally. ## Example: Prepare a model Before you can evaluate a model checkpoint, you must package it in VLLM-compatible format and save it as an artifact in W\&B Models. This example shows one way to do this: ```python lines theme={null} import os from transformers import AutoTokenizer, AutoModelForCausalLM # Load your model model_name = "your-model-name" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) # Save in vLLM-compatible format save_dir = "path/to/save" tokenizer.save_pretrained(save_dir) model.save_pretrained(save_dir) # Save to W&B Models import wandb wandb_run = wandb.init(entity="your-entity-name", project="your-project-name") artifact = wandb.Artifact(name="your-artifact-name") artifact.add_dir(save_dir) logged_artifact = wandb_run.log_artifact(artifact) logged_artifact.wait() wandb.finish() ``` # Evaluation benchmark catalog Source: https://docs.wandb.ai/models/launch/evaluations Browse the evaluation benchmarks available through LLM Evaluation Jobs LLM Evaluation Jobs is in **Preview** for [W\&B Multi-tenant Cloud](/platform/hosting/hosting-options/multi_tenant_cloud). Compute is free during the preview period. [Learn more](/models/launch#pricing) This page catalogs the evaluation benchmarks available through [LLM Evaluation Jobs](/models/launch), organized by category. Use it to discover which benchmarks you can run, identify their task IDs, and check whether a benchmark requires additional credentials. Some benchmarks require additional credentials. A team admin must add these credentials as [team-scoped secrets](/platform/secrets#add-a-secret) before any team member can use the benchmarks in an evaluation job: * If a benchmark has `Yes` in the **OpenAI Scorer** column, the benchmark uses OpenAI models for scoring. An organization or team admin must add an OpenAI API key as a team secret. When you configure an evaluation job with a benchmark that has this requirement, set the **Scorer API key** field to the secret. * If a benchmark has a link in the **Gated HF Dataset** column, the benchmark requires access to a gated Hugging Face dataset. An organization or team admin must request access to the dataset in Hugging Face. The admin then creates a Hugging Face user access token and configures a team secret with that token. When you configure a benchmark with this requirement, set the **Hugging Face Token** field to the secret. ## Knowledge Evaluate factual knowledge across domains such as science, language, and general reasoning. | Evaluation | Task ID |
OpenAI Scorer
| Gated HF Dataset | Description | | --------------------------------------------------------------------------------------------- | ------------------- | ------------------------ | ---------------- | ----------------------------------------------------------- | | [BoolQ](https://github.com/google-research-datasets/boolean-questions) | `boolq` | | | Boolean yes/no questions from natural language queries | | [GPQA Diamond](https://arxiv.org/abs/2311.12022) | `gpqa_diamond` | | | Graduate-level science questions (highest quality subset) | | [HLE](https://arxiv.org/abs/2501.14249) | `hle` | | Yes | Human-level evaluation benchmark | | [Lingoly](https://arxiv.org/abs/2406.06196) | `lingoly` | | Yes | Linguistics olympiad problems | | [Lingoly Too](https://arxiv.org/abs/2503.02972) | `lingoly_too` | | Yes | Extended linguistics challenge problems | | [MMIU](https://arxiv.org/abs/2408.02718) | `mmiu` | | | Massive Multitask Language Understanding benchmark | | [MMLU (0-shot)](https://github.com/hendrycks/test) | `mmlu_0_shot` | | | Massive Multitask Language Understanding without examples | | [MMLU (5-shot)](https://github.com/hendrycks/test) | `mmlu_5_shot` | | | Massive Multitask Language Understanding with 5 examples | | [MMLU-Pro](https://arxiv.org/abs/2406.01574) | `mmlu_pro` | | | More challenging version of MMLU | | [ONET M6](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/onet) | `onet_m6` | | | Occupational knowledge benchmark | | [PAWS](https://github.com/google-research-datasets/paws) | `paws` | | | Paraphrase adversarial word substitution | | [SevenLLM MCQ (English)](https://arxiv.org/abs/2405.03446) | `sevenllm_mcq_en` | | | Multiple choice questions in English | | [SevenLLM MCQ (Chinese)](https://arxiv.org/abs/2405.03446) | `sevenllm_mcq_zh` | | | Multiple choice questions in Chinese | | [SevenLLM QA (English)](https://arxiv.org/abs/2405.03446) | `sevenllm_qa_en` | | | Question answering in English | | [SevenLLM QA (Chinese)](https://arxiv.org/abs/2405.03446) | `sevenllm_qa_zh` | | | Question answering in Chinese | | [SimpleQA](https://openai.com/index/introducing-simpleqa/) | `simpleqa` | Yes | | Straightforward factual question answering | | [SimpleQA Verified](https://openai.com/index/introducing-simpleqa/) | `simpleqa_verified` | | | Verified subset of SimpleQA with validated answers | | [WorldSense](https://github.com/facebookresearch/worldsense) | `worldsense` | | | Evaluates understanding of world knowledge and common sense | ## Reasoning Evaluate logical thinking, problem-solving, and common-sense reasoning capabilities. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | ------------------------------------------------------------------- | ---------------- | ------------- | ---------------- | --------------------------------------------------------------------------- | | [AGIE AQUA-RAT](https://arxiv.org/abs/1705.04146) | `agie_aqua_rat` | | | Algebraic question answering with rationales | | [AGIE LogiQA (English)](https://arxiv.org/abs/2007.08124) | `agie_logiqa_en` | | | Logical reasoning questions in English | | [AGIE LSAT Analytical Reasoning](https://www.lsac.org/) | `agie_lsat_ar` | | | LSAT analytical reasoning (logic games) problems | | [AGIE LSAT Logical Reasoning](https://www.lsac.org/) | `agie_lsat_lr` | | | LSAT logical reasoning questions | | [ARC Challenge](https://huggingface.co/datasets/allenai/ai2_arc) | `arc_challenge` | | | Challenging science questions requiring reasoning (AI2 Reasoning Challenge) | | [ARC Easy](https://huggingface.co/datasets/allenai/ai2_arc) | `arc_easy` | | | Easier set of science questions from the ARC dataset | | [BBH](https://github.com/suzgunmirac/BIG-Bench-Hard) | `bbh` | | | BIG-Bench Hard: challenging tasks from BIG-Bench | | [CoCoNot](https://arxiv.org/abs/2310.03697) | `coconot` | | | Counterfactual commonsense reasoning benchmark | | [CommonsenseQA](https://huggingface.co/datasets/tau/commonsense_qa) | `commonsense_qa` | | | Commonsense reasoning questions | | [HellaSwag](https://arxiv.org/abs/1905.07830) | `hellaswag` | | | Commonsense natural language inference | | [MUSR](https://arxiv.org/abs/2310.16049) | `musr` | | | Multi-step reasoning benchmark | | [PIQA](https://yonatanbisk.com/piqa/) | `piqa` | | | Physical commonsense reasoning | | [WinoGrande](https://winogrande.allenai.org/) | `winogrande` | | | Commonsense reasoning via pronoun resolution | ## Math Evaluate mathematical problem-solving across difficulty levels, from grade school to competition-level problems. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | --------------------------------------------------------------------------------------- | -------------------------- | ------------- | ---------------- | ---------------------------------------------------------------- | | [AGIE Math](https://arxiv.org/abs/2410.12211) | `agie_math` | | | Advanced mathematical reasoning from AGIE benchmark suite | | [AGIE SAT Math](https://satsuite.collegeboard.org/sat) | `agie_sat_math` | | | SAT mathematics questions | | [AIME 2024](https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions) | `aime2024` | | | American Invitational Mathematics Examination problems from 2024 | | [AIME 2025](https://artofproblemsolving.com/wiki/index.php/AIME_Problems_and_Solutions) | `aime2025` | | | American Invitational Mathematics Examination problems from 2025 | | [GSM8K](https://github.com/openai/grade-school-math) | `gsm8k` | | | Grade School Math 8K: multi-step math word problems | | [InfiniteBench Math Calc](https://arxiv.org/abs/2402.13718) | `infinite_bench_math_calc` | | | Mathematical calculations in long contexts | | [InfiniteBench Math Find](https://arxiv.org/abs/2402.13718) | `infinite_bench_math_find` | | | Finding mathematical patterns in long contexts | | [MATH](https://github.com/hendrycks/math) | `math` | | | Competition-level mathematics problems | | [MGSM](https://github.com/google-research/url-nlp/tree/main/mgsm) | `mgsm` | | | Multilingual Grade School Math | ## Code Evaluate programming and software development capabilities such as debugging, code execution prediction, and function calling. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | ------------------------------------------------------------------------------------------ | --------------------------- | ------------- | ---------------- | --------------------------------------------------------------------------------------- | | [BFCL](https://gorilla.cs.berkeley.edu/blogs/8_berkeley_function_calling_leaderboard.html) | `bfcl` | | | Berkeley Function Calling Leaderboard: tests function calling and tool use capabilities | | [InfiniteBench Code Debug](https://arxiv.org/abs/2402.13718) | `infinite_bench_code_debug` | | | Long-context code debugging tasks | | [InfiniteBench Code Run](https://arxiv.org/abs/2402.13718) | `infinite_bench_code_run` | | | Long-context code execution prediction | ## Reading Evaluate reading comprehension and information extraction from complex texts. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | ----------------------------------------------------------------------------------------------------------- | ----------------------------- | ------------- | ---------------- | ---------------------------------------------------------------------------------------- | | [AGIE LSAT Reading Comprehension](https://www.lsac.org/) | `agie_lsat_rc` | | | LSAT reading comprehension passages and questions | | [AGIE SAT English](https://satsuite.collegeboard.org/sat) | `agie_sat_en` | | | SAT reading and writing questions with passages | | [AGIE SAT English (No Passage)](https://satsuite.collegeboard.org/sat) | `agie_sat_en_without_passage` | | | SAT English questions without accompanying passages | | [DROP](https://github.com/allenai/allennlp-reading-comprehension/blob/master/allennlp_rc/eval/drop_eval.py) | `drop` | | | Discrete Reasoning Over Paragraphs: reading comprehension requiring numerical reasoning | | [RACE-H](https://www.cs.cmu.edu/~glai1/data/race/) | `race_h` | | | Reading comprehension from English exams (high difficulty) | | [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) | `squad` | | | Stanford Question Answering Dataset: extractive question answering on Wikipedia articles | ## Long context Evaluate the ability to process and reason over extended contexts, including retrieval and pattern recognition. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | --------------------------------------------------------------------------- | ------------------------------------ | ------------- | ---------------- | ------------------------------------------------- | | [InfiniteBench KV Retrieval](https://arxiv.org/abs/2402.13718) | `infinite_bench_kv_retrieval` | | | Key-value retrieval in long contexts | | [InfiniteBench LongBook (English)](https://arxiv.org/abs/2402.13718) | `infinite_bench_longbook_choice_eng` | | | Multiple choice questions on long books | | [InfiniteBench LongDialogue QA (English)](https://arxiv.org/abs/2402.13718) | `infinite_bench_longdialogue_qa_eng` | | | Question answering over long dialogues | | [InfiniteBench Number String](https://arxiv.org/abs/2402.13718) | `infinite_bench_number_string` | | | Number pattern recognition in long sequences | | [InfiniteBench Passkey](https://arxiv.org/abs/2402.13718) | `infinite_bench_passkey` | | | Retrieval of information from long context | | [NIAH](https://arxiv.org/abs/2406.07230) | `niah` | | | Needle in a Haystack: long-context retrieval test | ## Safety Evaluate alignment, bias detection, harmful content resistance, and truthfulness. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | ------------------------------------------------------------------------------------------------------------ | ------------------------------- | ------------- | ---------------- | --------------------------------------------------------------------- | | [AgentHarm](https://arxiv.org/abs/2410.09024) | `agentharm` | Yes | | Tests model resistance to harmful agent behavior and misuse scenarios | | [AgentHarm Benign](https://arxiv.org/abs/2410.09024) | `agentharm_benign` | Yes | | Benign baseline for AgentHarm to measure false positive rates | | [Agentic Misalignment](https://arxiv.org/abs/2510.05179) | `agentic_misalignment` | | | Evaluates potential misalignment in agentic behavior | | [AHB](https://arxiv.org/abs/2503.04804) | `ahb` | | | Agent Harmful Behavior: tests resistance to harmful agentic actions | | [AIRBench](https://arxiv.org/abs/2410.02407) | `air_bench` | | | Tests adversarial instruction resistance | | [BBEH](https://arxiv.org/abs/2502.19187) | `bbeh` | | | Bias Benchmark for Evaluating Harmful behavior | | [BBEH Mini](https://arxiv.org/abs/2502.19187) | `bbeh_mini` | | | Smaller version of BBEH benchmark | | [BBQ](https://arxiv.org/abs/2110.08193) | `bbq` | | | Bias Benchmark for Question Answering | | [BOLD](https://arxiv.org/abs/2101.11718) | `bold` | | | Bias in Open-Ended Language Generation Dataset | | [CYSE3 Visual Prompt Injection](https://arxiv.org/abs/2408.01605) | `cyse3_visual_prompt_injection` | | | Tests resistance to visual prompt injection attacks | | [Make Me Pay](https://arxiv.org/abs/2410.08691) | `make_me_pay` | | | Tests resistance to financial scam and fraud scenarios | | [MASK](https://arxiv.org/abs/2503.03750) | `mask` | Yes | Yes | Tests model's handling of sensitive information | | [Personality BFI](https://github.com/UKGovernmentBEIS/inspect_evals/tree/main/src/inspect_evals/personality) | `personality_BFI` | | | Big Five personality trait assessment | | [Personality TRAIT](https://arxiv.org/abs/2406.14703) | `personality_TRAIT` | | Yes | Comprehensive personality trait evaluation | | SOSBench | `sosbench` | Yes | | Safety and oversight stress test | | [StereoSet](https://github.com/moinnadeem/StereoSet) | `stereoset` | | | Measures stereotypical biases in language models | | [StrongREJECT](https://arxiv.org/abs/2402.10260) | `strong_reject` | | | Tests model's ability to reject harmful requests | | [Sycophancy](https://arxiv.org/abs/2310.13548) | `sycophancy` | | | Evaluates tendency toward sycophantic behavior | | [TruthfulQA](https://github.com/sylinrl/TruthfulQA) | `truthfulqa` | | | Tests model truthfulness and resistance to falsehoods | | [UCCB](https://huggingface.co/datasets/CraneAILabs/UCCB) | `uccb` | | | Unsafe Content Classification Benchmark | | [WMDP Bio](https://www.wmdp.ai/) | `wmdp_bio` | | | Tests hazardous knowledge in biology | | [WMDP Chem](https://www.wmdp.ai/) | `wmdp_chem` | | | Tests hazardous knowledge in chemistry | | [WMDP Cyber](https://www.wmdp.ai/) | `wmdp_cyber` | | | Tests hazardous knowledge in cybersecurity | | [XSTest](https://arxiv.org/abs/2308.01263) | `xstest` | Yes | | Exaggerated safety test for over-refusal detection | ## Domain-specific Evaluate specialized knowledge in medicine, chemistry, law, biology, and other professional fields. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | -------------------------------------------------------------- | ----------------------------- | ------------- | ---------------- | ----------------------------------------------------- | | [ChemBench](https://arxiv.org/abs/2404.01475) | `chembench` | | | Chemistry knowledge and problem-solving benchmark | | [HealthBench](https://arxiv.org/abs/2406.09746) | `healthbench` | Yes | | Healthcare and medical knowledge evaluation | | [HealthBench Consensus](https://arxiv.org/abs/2406.09746) | `healthbench_consensus` | Yes | | Healthcare questions with expert consensus | | [HealthBench Hard](https://arxiv.org/abs/2406.09746) | `healthbench_hard` | Yes | | Challenging healthcare scenarios | | [LabBench Cloning Scenarios](https://arxiv.org/abs/2407.10362) | `lab_bench_cloning_scenarios` | | | Laboratory experiment planning and cloning | | [LabBench DBQA](https://arxiv.org/abs/2407.10362) | `lab_bench_dbqa` | | | Database question answering for lab scenarios | | [LabBench FigQA](https://arxiv.org/abs/2407.10362) | `lab_bench_figqa` | | | Figure interpretation in scientific contexts | | [LabBench LitQA](https://arxiv.org/abs/2407.10362) | `lab_bench_litqa` | | | Literature-based question answering for research | | [LabBench ProtocolQA](https://arxiv.org/abs/2407.10362) | `lab_bench_protocolqa` | | | Laboratory protocol understanding | | [LabBench SeqQA](https://arxiv.org/abs/2407.10362) | `lab_bench_seqqa` | | | Biological sequence analysis questions | | [LabBench SuppQA](https://arxiv.org/abs/2407.10362) | `lab_bench_suppqa` | | | Supplementary material interpretation | | [LabBench TableQA](https://arxiv.org/abs/2407.10362) | `lab_bench_tableqa` | | | Table interpretation in scientific papers | | [MedQA](https://github.com/jind11/MedQA) | `medqa` | | | Medical licensing exam questions | | [PubMedQA](https://pubmedqa.github.io/) | `pubmedqa` | | | Biomedical question answering from research abstracts | | [SEC-QA v1](https://arxiv.org/abs/2406.14806) | `sec_qa_v1` | | | SEC filing question answering | | [SEC-QA v1 (5-shot)](https://arxiv.org/abs/2406.14806) | `sec_qa_v1_5_shot` | | | SEC-QA with 5 examples | | [SEC-QA v2](https://arxiv.org/abs/2406.14806) | `sec_qa_v2` | | | Updated SEC filing benchmark | | [SEC-QA v2 (5-shot)](https://arxiv.org/abs/2406.14806) | `sec_qa_v2_5_shot` | | | SEC-QA v2 with 5 examples | ## Multimodal Evaluate vision and language understanding combining visual and textual inputs. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | ----------------------------------------------------------------------- | -------------------------------------------- | ------------- | ---------------- | --------------------------------------------------------------------- | | [DocVQA](https://www.docvqa.org/) | `docvqa` | | | Document Visual Question Answering: questions about document images | | [MathVista](https://mathvista.github.io/) | `mathvista` | | | Mathematical reasoning with visual contexts combining vision and math | | [MMMU Multiple Choice](https://mmmu-benchmark.github.io/) | `mmmu_multiple_choice` | | | Multimodal understanding with multiple choice format | | [MMMU Open](https://mmmu-benchmark.github.io/) | `mmmu_open` | | | Multimodal understanding with open-ended responses | | [V\*Star Bench Attribute Recognition](https://arxiv.org/abs/2411.10006) | `vstar_bench_attribute_recognition` | | | Visual attribute recognition tasks | | [V\*Star Bench Spatial Relationship](https://arxiv.org/abs/2411.10006) | `vstar_bench_spatial_relationship_reasoning` | | | Spatial reasoning with visual inputs | ## Instruction following Evaluate adherence to specific instructions and formatting requirements. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | ------------------------------------------ | -------- | ------------- | ---------------- | ------------------------------------------------ | | [IFEval](https://arxiv.org/abs/2311.07911) | `ifeval` | | | Tests precise instruction-following capabilities | ## System Basic system validation and pre-flight checks. | Evaluation | Task ID | OpenAI Scorer | Gated HF Dataset | Description | | ------------------------------------------------------------------------------------------ | ------------ | ------------- | ---------------- | -------------------------------------- | | [Pre-Flight](https://ukgovernmentbeis.github.io/inspect_evals/evals/knowledge/pre_flight/) | `pre_flight` | | | Basic system check and validation test | ## Next steps * [Evaluate a model checkpoint](/models/launch/evaluate-model-checkpoint) * [Evaluate a hosted API model](/models/launch/evaluate-hosted-model) * View details about specific benchmarks at [AISI Inspect Evals](https://inspect.aisi.org.uk/evals/) # Registry overview Source: https://docs.wandb.ai/models/registry Use W&B Registry to manage and share artifact versions across your organization W\&B Registry is a curated central repository of [W\&B Artifact versions](/models/artifacts/create-a-new-artifact-version) within your organization. Users who [have permission](/models/registry/configure_registry/) within your organization can [download and use artifacts](/models/registry/download_use_artifact/), share, and collaboratively manage the lifecycle of all artifacts, regardless of the team that the user belongs to. Use the Registry to track artifact versions, audit the history of an artifact's usage and changes, ensure governance and compliance of your artifacts, and [automate downstream processes such as model CI/CD](/models/automations/). In summary, use W\&B Registry to: * [Promote](/models/registry/link_version/) artifact versions that satisfy a machine learning task to other users in your organization. * Organize [artifacts with tags](/models/registry/organize-with-tags/) so that you can find or reference specific artifacts. * Track an [artifact’s lineage](/models/registry/lineage/) and audit the history of changes. * [Automate](/models/automations/) downstream processes such as model CI/CD. * [Manage who in your organization](/models/registry/configure_registry/) can access artifacts in each registry. The following image shows the W\&B Registry landing page. A registry called `Model` is starred. Two collections are shown `DemoModels` and `Zoo_Classifier_Models`. W&B Registry ## Learn the basics Each organization initially contains two registries that you can use to organize your model and dataset artifacts called **Models** and **Datasets**, respectively. You can create [additional registries to organize other artifact types based on your organization's needs](/models/registry/create_registry). Each [*registry*](/models/registry/configure_registry/) consists of one or more [*collections*](/models/registry/create_collection/). Each collection represents a distinct task or use case. To add an artifact to a registry, you first log a [specific artifact version to W\&B](/models/artifacts/create-a-new-artifact-version/). Each time you log an artifact, W\&B automatically assigns a version to that artifact. Artifact versions use 0 indexing, so the first version is `v0`, the second version is `v1`, and so on. Once you log an artifact to W\&B, you can then link that specific artifact version to a collection in the registry. The term "link" refers to pointers that connect where W\&B stores the artifact and where the artifact is accessible in the registry. W\&B does not duplicate artifacts when you link an artifact to a collection. As an example, the following code example logs and links a model artifact called `"my_model.txt"` to a collection named `"first-collection"` within a registry called `"model"`: 1. Initialize a W\&B Run with `wandb.init()`. 2. Log the artifact to W\&B with `wandb.Run.log()`. 3. Specify the name of the collection and registry to link your artifact version to. 4. Link the artifact to the collection using `wandb.Run.link_artifact()`. Save this Python code to a script and run it. W\&B Python SDK version 0.18.6 or newer is required. ```python title="hello_collection.py" theme={null} import wandb import random # Initialize a W&B Run to track the artifact with wandb.init(project="registry_quickstart") as run: # Create a simulated model file so that you can log it with open("my_model.txt", "w") as f: f.write("Model: " + str(random.random())) # Log the artifact to W&B logged_artifact = run.log_artifact( artifact_or_path="./my_model.txt", name="gemma-finetuned", type="model" # Specifies artifact type ) # Specify the name of the collection and registry # you want to publish the artifact to COLLECTION_NAME = "first-collection" REGISTRY_NAME = "model" # Link the artifact to the registry run.link_artifact( artifact=logged_artifact, target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}" ) ``` W\&B automatically creates a collection for you if the collection you specify in the returned run object's `wandb.Run.link_artifact(target_path = "")` method does not exist within the registry you specify. Continuing from the previous example, after you run the script, navigate to W\&B Registry to view artifact versions that you and other members of your organization publish. Select **Registry** in the project sidebar below **Applications**. Select the `"Model"` registry. Within the registry, you should see the `"first-collection"` collection with your linked artifact version. Once you link an artifact version to a collection within a registry, members of your organization can [view](/models/registry/lineage), [download](/models/registry/download_use_artifact), [organize](/models/registry/organize-with-tags), and manage your artifact versions, create downstream automations, and more if they have the proper permissions. If an artifact version logs metrics (such as by using `wandb.Run.log_artifact()`), you can view metrics for that version from its details page, and you can compare metrics across artifact versions from the collection's page. Refer to [View linked artifacts in a registry](/models/registry/link_version/#view-linked-artifacts-in-a-registry). ## Enable W\&B Registry Based on your deployment type, satisfy the following conditions to enable W\&B Registry: | Deployment type | How to enable | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | Multi-tenant Cloud | No action required. W\&B Registry is available on the W\&B App. | | Dedicated Cloud | Contact your account team to enable W\&B Registry for your deployment. | | Self-Managed | For Server v0.71.0 or newer, no action required. Registry is enabled by default. For older supported Server versions, contact your W\&B account team to enable Registry. | ## Resources to get started Depending on your use case, explore the following resources to get started with the W\&B Registry: * Check out the tutorial video: * [Getting started with Registry from W\&B](https://www.youtube.com/watch?v=p4XkVOsjIeM) * Take the W\&B [Model CI/CD](https://www.wandb.courses/courses/enterprise-model-management) course and learn how to: * Use W\&B Registry to manage and version your artifacts, track lineage, and promote models through different lifecycle stages. * Automate your model management workflows using webhooks. * Integrate the registry with external ML systems and tools for model evaluation, monitoring, and deployment. # Reference an artifact version with aliases Source: https://docs.wandb.ai/models/registry/aliases Use default, custom, and protected aliases to reference specific artifact versions in W&B Registry. Reference a specific [artifact version](/models/artifacts/create-a-new-artifact-version/) with one or more aliases. [W\&B automatically assigns aliases](/models/registry/aliases/#default-aliases) to each artifact you link with the same name. You can also [create one or more custom aliases](/models/registry/aliases/#custom-aliases) to reference a specific artifact version. Aliases appear as rectangles with the name of that alias in the rectangle in the Registry UI. If an [alias is protected](/models/registry/aliases/#protected-aliases), it appears as a gray rectangle with a lock icon. Otherwise, the alias appears as an orange rectangle. Aliases are not shared across registries. **When to use an alias versus using a tag** Use an alias to reference a specific artifact version. Each alias within a collection is unique. Only one artifact version can have a specific alias at a time. Use tags to organize and group artifact versions or collections based on a common theme. Multiple artifact versions and collections can share the same tag. When you add an alias to an artifact version, you can optionally start a [Registry automation](/models/automations/automation-events/#registry) to notify a Slack channel or trigger a webhook. If the automation calls a webhook that needs an access token or other sensitive value in the request, store those strings as [team secrets](/platform/secrets) and select them when you configure the webhook for the automation. ## Default aliases W\&B automatically assigns the following aliases to each artifact version you link with the same name: * The `latest` alias to the most recent artifact version you link to a collection. * A unique version number. W\&B counts each artifact version (zero indexing) you link. W\&B uses the count number to assign a unique version number to that artifact. For example, if you link an artifact named `zoo_model` three times, W\&B creates three aliases `v0`, `v1`, and `v2` respectively. `v2` also has the `latest` alias. ## Custom aliases Create one or more custom aliases for a specific artifact versions based on your unique use case. For example: * You might use aliases such as `dataset_version_v0`, `dataset_version_v1`, and `dataset_version_v2` to identify which dataset a model was trained on. * You might use a `best_model` alias to keep track of the best performing artifact model version. Any user with a [**Member** or **Admin** registry role](/models/registry/configure_registry/#registry-roles) on a registry can add or remove a custom alias from a linked artifact in that registry. Users with the [**Restricted Viewer** or **Viewer** roles](/models/registry/configure_registry/#registry-roles) cannot add or remove aliases. [Protected aliases](/models/registry/aliases/#protected-aliases) provide a way to label and identify which artifact versions to protect from modification or deletion. You can create a custom alias with the W\&B Registry or the Python SDK. Based on your use case, click on a tab below that best fits your needs. 1. Navigate to the W\&B Registry. 2. Click the **View details** button in a collection. 3. Within the **Versions** section, click the **View** button for a specific artifact version. 4. Click the **+** button to add one or more aliases next to the **Aliases** field. When you link an artifact version to a collection with the Python SDK you can optionally provide a list of one or more aliases as an argument to the `alias` parameter in [`link_artifact()`](/models/ref/python/experiments/run.md/#link_artifact). W\&B creates an alias ([non protected alias](#custom-aliases)) for you if the alias you provide does not already exist. The following code snippet demonstrates how to link an artifact version to a collection and add aliases to that artifact version with the Python SDK. Replace values within `<>` with your own: ```python theme={null} import wandb # Initialize a run with wandb.init(entity = "", project = "") as run: # Create an artifact object # The type parameter specifies both the type of the # artifact object and the collection type artifact = wandb.Artifact(name = "", type = "") # Add the file to the artifact object. # Specify the path to the file on your local machine. artifact.add_file(local_path = "") # Specify the collection and registry to link the artifact to REGISTRY_NAME = "" COLLECTION_NAME = "" target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}" # Link the artifact version to the collection # Add one or more aliases to this artifact version run.link_artifact( artifact = artifact, target_path = target_path, aliases = ["", ""] ) ``` ### Protected aliases Use a [protected alias](/models/registry/aliases/#protected-aliases) to both label and identify artifact versions that should not be modified or deleted. For example, consider using a `production` protected alias to label and identify artifact versions that are in used in your organization's machine learning production pipeline. [Registry admin](/models/registry/configure_registry/#registry-roles) users and [service accounts](/support/models/articles/what-is-a-service-account-and-why-is-it-) with the **Admin** role can create protected aliases and add or remove protected aliases from an artifact version. Users and service accounts with **Member**, **Viewer**, and **Restricted Viewer** roles cannot unlink a protected version or delete a collection that contains a protected alias. See [Configure registry access](/models/registry/configure_registry/) for details. Common protected aliases include: * **Production**: The artifact version is ready for production use. * **Staging**: The artifact version is ready for testing. #### Create a protected alias The following steps describe how to create a protected alias in the W\&B Registry UI: 1. Navigate to the W\&B Registry. 2. Select a registry. 3. Click the gear button on the top right of the page to view the registry's settings. 4. Within the **Protected Aliases** section, click the **+** button to add one or more protected aliases. After creation, each protected alias appears as a gray rectangle with a lock icon in the **Protected Aliases** section. Unlike custom aliases that are not protected, creating protected aliases is available exclusively in the W\&B Registry UI and not programmatically with the Python SDK. To add a protected alias to an artifact version, you can use the W\&B Registry UI or the Python SDK. The following steps describe how to add a protected alias to an artifact version with the W\&B Registry UI: 1. Navigate to the W\&B Registry. 2. Click the **View details** button in a collection. 3. Within the **Versions** section, select the **View** button for a specific artifact version. 4. Click the **+** button to add one or more protected aliases next to the **Aliases** field. After a protected alias is created, an admin can add it to an artifact version programmatically with the Python SDK. See the W\&B Registry and Python SDK tabs in [Create a custom alias](#custom-aliases) section above for an example on how to add a protected alias to an artifact version. ## Find existing aliases You can find existing aliases with the [global search bar in the W\&B Registry](/models/registry/search_registry/#search-for-registry-items). To find a protected alias: 1. Navigate to the W\&B Registry. 2. Specify the search term in the search bar at the top of the page. Press Enter to search. Search results appear below the search bar if the term you specify matches an existing registry, collection name, artifact version tag, collection tag, or alias. ## Example The following code example is a continuation of this [W\&B Registry Tutorial notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb_registry/zoo_wandb.ipynb). To use the following code, you must first [retrieve and process the Zoo dataset as described in the notebook](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/wandb_registry/zoo_wandb.ipynb#scrollTo=87fecd29-8146-41e2-86fb-0bb4e3e3350a). Once you have the Zoo dataset, you can create an artifact version and add custom aliases to it. The following code snippet shows how to create an artifact version and add custom aliases to it. The example uses the Zoo dataset from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/111/zoo) and the `Model` collection in the `Zoo_Classifier_Models` registry. ```python theme={null} import wandb # Initialize a run with wandb.init(entity = "smle-reg-team-2", project = "zoo_experiment") as run: # Create an artifact object # The type parameter specifies both the type of the # artifact object and the collection type artifact = wandb.Artifact(name = "zoo_dataset", type = "dataset") # Add the file to the artifact object. # Specify the path to the file on your local machine. artifact.add_file(local_path="zoo_dataset.pt", name="zoo_dataset") artifact.add_file(local_path="zoo_labels.pt", name="zoo_labels") # Specify the collection and registry to link the artifact to REGISTRY_NAME = "Model" COLLECTION_NAME = "Zoo_Classifier_Models" target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}" # Link the artifact version to the collection # Add one or more aliases to this artifact version run.link_artifact( artifact = artifact, target_path = target_path, aliases = ["production-us", "production-eu"] ) ``` 1. First, you create an artifact object (`wandb.Artifact()`). 2. Next, you add two dataset PyTorch tensors to the artifact object with `wandb.Artifact.add_file()`. 3. Lastly, you link the artifact version to the `Model` collection in the `Zoo_Classifier_Models` registry with `link_artifact()`. You also add two custom aliases to the artifact version by passing `production-us` and `production-eu` as arguments to the `aliases` parameter. # Configure registry access Source: https://docs.wandb.ai/models/registry/configure_registry Configure W&B Registry access by managing users and teams, assigning roles, and setting role-based permissions. A registry admin can [configure registry roles](/models/registry/configure_registry/#configure-registry-roles), [add users](/models/registry/configure_registry/#add-a-user-or-a-team-to-a-registry), or [remove users](/models/registry/configure_registry/#remove-a-user-or-team-from-a-registry) from a registry by configuring the registry's settings. ## Manage users ### Add a user or a team Registry admins can add individual users or entire teams to a registry. To add a user or team to a registry: 1. Navigate to the W\&B Registry. 2. Select the registry you want to add a user or team to. 3. Click on the gear icon on the upper right hand corner to access the registry settings. 4. In the **Registry access** section, click **Add access**. 5. Specify one or more user names, emails, or the team names to the **Include users and teams** field. 6. Click **Add access**. Adding teams to registry Learn more about [configuring user roles in a registry](/models/registry/configure_registry/#configure-registry-roles), or [registry role permissions](/models/registry/configure_registry/#registry-role-permissions) . ### Remove a user or team A registry admin can remove individual users or entire teams from a registry. To remove a user or team from a registry: 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Select the registry you want to remove a user from. 3. Click on the gear icon on the upper right hand corner to access the registry settings. 4. Navigate to the **Registry access** section and type in the username, email, or team you want to remove. 5. Click the **Delete** button. Removing a user from a team also removes that user's access to the registry. ### Change the owner of a registry A registry admin can designate any member as a registry's owner, including a **Restricted Viewer** or a **Viewer**. Registry ownership is primarily for accountability purposes and does not confer any additional permissions beyond those granted by the user's assigned role. To change the owner: 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Select the registry you want to configure. 3. Click the gear icon on the upper right hand corner. 4. Scroll to the **Registry members and roles** section. 5. Hover over the row for a member. 6. Click the **action ()** menu at the end of the row, then click **Make owner**. ## Configure Registry roles This section shows how to configure roles for Registry members. For more information about Registry roles, including the cabilities of each role, order of precedence, defaults, and more, see [Details about Registry roles](#details-about-registry-roles). 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Select the registry you want to configure. 3. Click the gear icon on the upper right hand corner. 4. Scroll to the **Registry members and roles** section. 5. Within the **Member** field, search for the user or team you want to edit permissions for. 6. In the **Registry role** column, click the user's role. 7. From the dropdown, select the role you want to assign to the user. ## Details about Registry roles The following sections give more information about Registry roles. Your [role in a team](/platform/app/settings-page/teams/#team-roles-and-permissions) has no impact or relationship to your role in any registry. ### Default roles W\&B automatically assigns a default **registry role** to a user or team when they are added to a registry. This role determines what they can do in that registry. | Entity | Default registry role
(Dedicated Cloud / Self-Managed) | Default registry role
(Multi-tenant Cloud) | | ----------------------------------- | --------------------------------------------------------------------------- | ---------------------------------------------------------- | | Team | Restricted Viewer (Server v0.75.0+)
Viewer (Server v0.74.x and below) | Restricted Viewer | | User or service account (non admin) | Restricted Viewer (Server v0.75.0+)
Viewer (Server v0.74.x and below) | Restricted Viewer | | Service account (non admin) | Member
1 | Member1 | | Org admin | Admin | Admin | 1: Service accounts cannot have **Viewer** or **Restricted Viewer** roles. A registry admin can assign or modify roles for users and teams in the registry. See [Configure user roles in a registry](/models/registry/configure_registry/#configure-registry-roles) for more information. ### Role permissions The following table lists each Registry role, along with the permissions provided by each role: | Permission | Permission Group | Restricted Viewer
(Multi-tenant Cloud, by invitation) | Viewer | Member | Admin | | ---------------------------------------------------------------------------------------------------- | ---------------- | ---------------------------------------------------------- | :----: | :----: | :---: | | View a collection's details | Read | ✓ | ✓ | ✓ | ✓ | | View a linked artifact's details | Read | ✓ | ✓ | ✓ | ✓ | | Usage: Consume an artifact in a registry with use\_artifact | Read | | ✓ | ✓ | ✓ | | Download a linked artifact | Read | | ✓ | ✓ | ✓ | | Download files from an artifact's file viewer | Read | | ✓ | ✓ | ✓ | | Search a registry | Read | ✓ | ✓ | ✓ | ✓ | | View a registry's settings and user list | Read | ✓ | ✓ | ✓ | ✓ | | Create a new automation for a collection | Create | | | ✓ | ✓ | | Turn on Slack notifications for new version being added | Create | | | ✓ | ✓ | | Create a new collection | Create | | | ✓ | ✓ | | Create a new registry | Create | | | ✓ | ✓ | | Edit collection card (description) | Update | | | ✓ | ✓ | | Edit linked artifact description | Update | | | ✓ | ✓ | | Add or delete a collection's tag | Update | | | ✓ | ✓ | | Add or delete an alias from a linked artifact | Update | | | ✓ | ✓ | | Add or delete a [protected alias](/models/registry/aliases#protected-aliases) from a linked artifact | Update | | | | ✓ | | Create or delete a [protected alias](/models/registry/aliases#protected-aliases) | Update | | | | ✓ | | Link a new artifact | Update | | | ✓ | ✓ | | Edit allowed types list for a registry | Update | | | ✓ | ✓ | | Edit registry name | Update | | | ✓ | ✓ | | Delete a collection | Delete | | | ✓ | ✓ | | Delete an automation | Delete | | | ✓ | ✓ | | Unlink an artifact from a registry | Delete | | | ✓ | ✓ | | Edit accepted artifact types for a registry | Admin | | | | ✓ | | Change registry visibility (Organization or Restricted) | Admin | | | | ✓ | | Add users to a registry | Admin | | | | ✓ | | Assign or change a user's role in a registry | Admin | | | | ✓ | ### Inherited Registry role The registry's membership list shows each user's inherited (effective) registry role (in light gray) next to the role dropdown in their row. Registry membership list showing the user's effective registry role A user's effective role in a particular registry matches their *highest* role among their role in the organization, the registry, and the team that owns the registry, whether inherited or explicitly assigned. For example: * A team **Admin** or organization **Admin** with the **Viewer** role in a particular registry owned by the team is effectively an **Admin** of the registry. * A registry **Viewer** with the **Member** role in the team is effectively a **Member** of the registry. * A team **Viewer** with the **Member** role in a particular registry is effectively a **Member** of the registry. ### Restricted Viewer role details The **Restricted Viewer** role is Generally Available (GA). For Dedicated Cloud and Self-Managed, Server v0.75.0 or newer is required. This role provides read-only access to registry artifacts without the ability to create, update, or delete collections, automations, or other registry resources. Unlike a **Viewer**, a **Restricted Viewer**: * Cannot download artifact files or access file contents. * Cannot use artifacts with `wandb.Run.use_artifact()` in the W\&B SDK. #### SDK compatibility **SDK version requirement** To use the W\&B SDK to access artifacts as a **Restricted Viewer**, you must use W\&B SDK version 0.19.9 or higher. Otherwise, some SDK commands will result in permission errors. When a **Restricted Viewer** uses the SDK, certain functions are not available or work differently. The following methods are not available and result in permission errors: * [`Run.use_artifact()`](/models/ref/python/experiments/run/#method-runuse_artifact) * [`Artifact.download()`](/models/ref/python/experiments/artifact/#method-artifactdownload) * [`Artifact.file()`](/models/ref/python/experiments/artifact/#method-artifactfile) * [`Artifact.files()`](/models/ref/python/experiments/artifact/#method-artifactfiles) The following methods are limited to artifact metadata: * [`Artifact.get_entry()`](/models/ref/python/experiments/artifact/#method-artifactget_entry) * [`Artifact.get_path()`](/models/ref/python/experiments/artifact/#method-artifactget_path) * [`Artifact.get()`](/models/ref/python/experiments/artifact/#method-artifactget) * [`Artifact.verify()`](/models/ref/python/experiments/artifact/#method-artifactverify) ### Cross-registry permissions A user can have different roles in different registries. For example, a user can be a **Restricted Viewer** in Registry A but a **Viewer** in Registry B. In this case: * The same artifact linked to both registries will have different access levels * In Registry A, the user is a **Restricted Viewer** and cannot download files or use the artifact * In Registry B, the user is a **Viewer** and can download files and use the artifact * In other words, access is determined by the registry in which the artifact is accessed # Create a collection Source: https://docs.wandb.ai/models/registry/create_collection Create a collection of linked artifact versions within a W&B Registry and configure accepted artifact types. A *collection* is a set of linked artifact versions within a registry. Each collection represents a distinct task or use case. For example, within a registry you might have multiple collections. Each collection contains a different dataset such as MNIST, CIFAR-10, or ImageNet. As another example, you might have a registry called "chatbot" that contains a collection for model artifacts, another collection for dataset artifacts, and another collection for fine-tuned model artifacts. How you organize a registry and their collections is up to you. If you are familiar with W\&B Model Registry, you might aware of registered models. Registered models in the Model Registry are now referred to as collections in the W\&B Registry. ## Collection types Each collection accepts one, and only one, *type* of artifact. The type you specify restricts what sort of artifacts you, and other members of your organization, can link to that collection. You can think of artifact types similar to data types in programming languages such as Python. In this analogy, a collection can store strings, integers, or floats but not a mix of these data types. For example, suppose you create a collection that accepts "dataset" artifact types. This means that you can only link future artifact versions that have the type "dataset" to this collection. Similarly, you can only link artifacts of type "model" to a collection that accepts only model artifact types. You specify an artifact's type when you create that artifact object. Note the `type` field in `wandb.Artifact()`: ```python theme={null} import wandb # Initialize a run with wandb.init( entity = "", project = "" ) as run: # Create an artifact object artifact = wandb.Artifact( name="", type="" ) ``` When you create a collection, you can select from a list of predefined artifact types. The artifact types available to you depend on the registry that the collection belongs to. Before you link an artifact to a collection or create a new collection, [investigate the types of artifacts that collection accepts](#check-the-types-of-artifact-that-a-collection-accepts). ### Check the types of artifact that a collection accepts Before you link to a collection, inspect the artifact type that the collection accepts. You can inspect the artifact types that collection accepts programmatically with the W\&B Python SDK or interactively with the W\&B App An error message appears if you try to create link an artifact to a collection that does not accept that artifact type. You can find the accepted artifact types on the registry card on the homepage or within a registry's settings page. For both methods, first navigate to your W\&B Registry. Within the homepage of the W\&B Registry, you can view the accepted artifact types by scrolling to the registry card of that registry. The gray horizontal ovals within the registry card lists the artifact types that registry accepts. Artifact types selection For example, the following image shows multiple registry cards on the W\&B Registry homepage. Within the **Model** registry card, you can see two artifact types: **model** and **model-new**. To view accepted artifact types within a registry's settings page: 1. Click on the registry card you want to view the settings for. 2. Click on the gear icon in the upper right corner. 3. Scroll to the **Accepted artifact types** field. Programmatically view the artifact types that a registry accepts with the W\&B Python SDK: ```python theme={null} import wandb registry_name = "" artifact_types = wandb.Api().project(name=f"wandb-registry-{registry_name}").artifact_types() print(artifact_type.name for artifact_type in artifact_types) ``` Note that you do not initialize a run with the following code snippet. This is because it is unnecessary to create a run if you are only querying the W\&B API and not tracking an experiment, artifact and so on. Once you know what type of artifact a collection accepts, you can [create a collection](#create-a-collection). ## Create a collection Interactively or programmatically create a collection within a registry. You can not change the type of artifact that a collection accepts after you create it. ### Programmatically create a collection Use the `wandb.Run.link_artifact()` method to link an artifact to a collection. Specify both the collection and the registry to the `target_path` field as a path that takes the form of: ```python theme={null} f"wandb-registry-{registry_name}/{collection_name}" ``` Where `registry_name` is the name of the registry and `collection_name` is the name of the collection. Ensure to append the prefix `wandb-registry-` to the registry name. W\&B automatically creates a collection for you if you try to link an artifact to a collection that does not exist. If you specify a collection that does exists, W\&B links the artifact to the existing collection. The following code snippet shows how to programmatically create a collection. Ensure to replace the values enclosed in `<>` with your own: ```python theme={null} import wandb # Initialize a run with wandb.init(entity = "", project = "") as run: # Create an artifact object artifact = wandb.Artifact( name = "", type = "" ) registry_name = "" collection_name = "" target_path = f"wandb-registry-{registry_name}/{collection_name}" # Link the artifact to a collection run.link_artifact(artifact = artifact, target_path = target_path) ``` ### Interactively create a collection The following steps describe how to interactively create a collection using the W\&B Registry: 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Select a registry. 3. Click on the **Create collection** button in the upper right hand corner. 4. Provide a name for your collection in the **Name** field. 5. Select a type from the **Type** dropdown. Or, if the registry enables custom artifact types, provide one or more artifact types that this collection accepts. 6. Optionally provide a description of your collection in the **Description** field. 7. Optionally add one or more tags in the **Tags** field. 8. Click **Link version**. 9. From the **Project** dropdown, select the project where your artifact is stored. 10. From the **Artifact** collection dropdown, select your artifact. 11. From the **Version** dropdown, select the artifact version you want to link to your collection. 12. Click on the **Create collection** button. Create a new collection # Create a registry Source: https://docs.wandb.ai/models/registry/create_registry Create a W&B Registry with configurable visibility and accepted artifact types using the App UI or Python SDK. A registry offers flexibility and control over the artifact types that you can use, allows you to restrict the registry's visibility, and more. ## Create a registry Create a registry either programmatically using the W\&B Registry UI or the W\&B Python SDK. 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Click on the **Create registry** button. 3. Provide a name for the registry in the **Name** field. 4. Optionally provide a description about the registry. 5. Select who can view the registry from the **Registry visibility** dropdown. See [Registry visibility types](./configure_registry#registry-visibility-types) for more information on registry visibility options. 6. Select either **All types** or **Specify types** from the **Accepted artifacts type** dropdown. 7. (If you select **Specify types**) Add one or more artifact types that your registry accepts. 8. Click on the **Create registry** button. Use the [`wandb.Api().create_registry()`](/models/ref/python/#method-apicreate_registry) method to create a registry programmatically. Provide a name and [visibility](#visibility-types) for the registry for the `name` and `visibility` parameters, respectively. Copy and paste the code block below. Replace the values enclosed in `<>` with your own: ```python theme={null} import wandb registry = wandb.Api().create_registry( name="", visibility="< 'restricted' | 'organization' >", ) ``` See the [`wandb.Api().create_registry()`](/models/ref/python/#method-apicreate_registry) method reference for a full list of parameters that you can provide when you create a registry. An artifact type cannot be removed from a registry once it is saved in the registry's settings. For example, the following image shows a registry called `Fine_Tuned_Models` that a user is about to create. The registry is **Restricted** to only members that are manually added to the registry. Creating a new registry ## Visibility types The *visibility* of a registry determines who can access that registry. Restricting the visibility of a registry helps ensure that only specified members can access that registry. There are two type registry visibility options for a registry: | Visibility | Description | | ------------ | ---------------------------------------------------------- | | Restricted | Only invited organization members can access the registry. | | Organization | Everyone in the org can access the registry. | A team admin or registry admin can set the visibility of a registry. The user who creates a registry with Restricted visibility is added to the registry automatically as its registry admin. ## Configure the visibility of a registry A team admin or registry admin can assign the visibility of a registry during or after the creation of a registry. To restrict the visibility of an existing registry: 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Select a registry. 3. Click on the gear icon on the upper right hand corner. 4. From the **Registry visibility** dropdown, select the desired registry visibility. 5. If you select **Restricted visibility**: 1. Add members of your organization that you want to have access to this registry. Scroll to the **Registry members and roles** section and click on the **Add member** button. 2. Within the **Member** field, add the email or username of the member you want to add. 3. Click **Add new member**. Changing registry visibility settings from private to public or team-restricted access See [Create a registry](./create_registry#create-a-custom-registry) for more information on how assign the visibility of a registry when a team admin creates it. # Delete registry Source: https://docs.wandb.ai/models/registry/delete_registry Delete a W&B Registry as a Team admin or Registry admin using the Python SDK or the W&B App UI. This page shows how a Team admin or Registry admin can delete a registry. * A Team admin can delete any registry in the organization. * A Registry admin can delete a registry that they created. Deleting a registry also deletes collections that belong to that registry, but does not delete artifacts linked to the registry. Such an artifact remains in the original project that the artifact was logged to. Use the `wandb` API's `delete()` method to delete a registry programmatically. The following example illustrates how to: 1. Fetch the registry you want to delete with `api.registry()`. 2. Call the `delete()` method on the returned registry object to delete the registry. ```python theme={null} import wandb # Initialize the W&B API api = wandb.Api() # Fetch the registry you want to delete fetched_registry = api.registry("") # Deleting a registry fetched_registry.delete() ``` 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Select the registry you want to delete. 3. Click the gear icon in the upper right corner to view the registry's settings. 4. To delete the registry, click the trash can icon in the upper right corner of the settings page. 5. Confirm the registry to delete by entering its name in the modal that appears, then click **Delete**. # Download an artifact from a registry Source: https://docs.wandb.ai/models/registry/download_use_artifact Download an artifact linked to a W&B Registry collection by constructing its path and using the Python SDK. Use the W\&B Python SDK to download an artifact linked to a registry. To download and use an artifact, you need to know the name of the registry, the name of the collection, and the alias or index of the artifact version you want to download. Once you know the properties of the artifact, you can [construct the path to the linked artifact](#construct-path-to-linked-artifact) and download the artifact. Alternatively, you can [copy and paste a pre-generated code snippet](#copy-and-paste-pre-generated-code-snippet) from the W\&B App UI to download an artifact linked to a registry. ## Construct path to linked artifact To download an artifact linked to a registry, you must know the path of that linked artifact. The path consists of the registry name, collection name, and the alias or index of the artifact version you want to access. Once you have the registry, collection, and alias or index of the artifact version, you can construct the path to the linked artifact using the following string template: ```python theme={null} # Artifact name with version index specified f"wandb-registry-{REGISTRY}/{COLLECTION}:v{INDEX}" # Artifact name with alias specified f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}" ``` Replace the values within the curly braces `{}` with the name of the registry, collection, and the alias or index of the artifact version you want to access. Use the `wandb.Run.use_artifact()` method to access the artifact and download its contents once you have the path of the linked artifact. The following code snippet shows how to use and download an artifact linked to the W\&B Registry. Ensure to replace values within `<>` with your own: ```python theme={null} import wandb REGISTRY = '' COLLECTION = '' ALIAS = '' with wandb.init(entity = '', project = '') as run: artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}" # artifact_name = '' # Copy and paste Full name specified in the W&B Registry UI fetched_artifact = run.use_artifact(artifact_or_name = artifact_name) download_path = fetched_artifact.download() ``` The `wandb.Run.use_artifact()` method both creates a [run](/models/runs) and marks the artifact you download as the input to that run. Marking an artifact as the input to a run enables W\&B to track the lineage of that artifact. If you do not want to create a run, you can use the `wandb.Api()` object to access the artifact: ```python theme={null} import wandb REGISTRY = "" COLLECTION = "" VERSION = "" api = wandb.Api() artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}" artifact = api.artifact(name = artifact_name) ```
Example: Use and download an artifact linked to the W\&B Registry The following code example shows how a user can download an artifact linked to a collection called `phi3-finetuned` in the **Fine-tuned Models** registry. The alias of the artifact version is set to `production`. ```python theme={null} import wandb TEAM_ENTITY = "product-team-applications" PROJECT_NAME = "user-stories" REGISTRY = "Fine-tuned Models" COLLECTION = "phi3-finetuned" ALIAS = 'production' # Initialize a run inside the specified team and project with wandb.init(entity=TEAM_ENTITY, project = PROJECT_NAME) as run: artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{ALIAS}" # Access an artifact and mark it as input to your run for lineage tracking fetched_artifact = run.use_artifact(artifact_or_name = artifact_name) # Download artifact. Returns path to downloaded contents downloaded_path = fetched_artifact.download() ```
See [`wandb.Run.use_artifact()`](/models/ref/python/experiments/run#use_artifact) and [`Artifact.download()`](/models/ref/python/experiments/artifact#download) in the API Reference for parameters and return type. **Users with a personal entity that belong to multiple organizations** Users with a personal entity that belong to multiple organizations must also specify either the name of their organization or use a team entity when accessing artifacts linked to a registry. ```python theme={null} import wandb REGISTRY = "" COLLECTION = "" VERSION = "" # Ensure you are using your team entity to instantiate the API api = wandb.Api(overrides={"entity": ""}) artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}" artifact = api.artifact(name = artifact_name) # Construct the path api = wandb.Api() artifact_name = f"wandb-registry-{REGISTRY}/{COLLECTION}:{VERSION}" artifact = api.artifact(name = artifact_name) ``` ## Copy and paste pre-generated code snippet W\&B creates a code snippet that you can copy and paste into your Python script, notebook, or terminal to download an artifact linked to a registry. 1. Navigate to the W\&B Registry. 2. Select the name of the registry that contains your artifact. 3. Select the name of the collection. 4. From the list of artifact versions, select the version you want to access. 5. Select the **Usage** tab. 6. Copy the code snippet shown in the **Usage API** section. 7. Paste the code snippet into your Python script, notebook, or terminal. Step-by-step process to locate and copy usage code snippet from the registry UI # Lineage graphs and audit history Source: https://docs.wandb.ai/models/registry/lineage Use lineage graphs to visualize a linked artifact's history and audit a collection's history. Use a lineage graph to visualize a linked artifact's history. Audit a collection's history to track changes made to artifacts in that collection. ## Lineage graphs Within a collection in the W\&B Registry, you can view a history of the artifacts that an ML experiment uses. This history is called a *lineage graph*. A lineage graph shows: * Artifacts used as [inputs to a run](/models/artifacts/explore-and-traverse-an-artifact-graph#track-the-input-of-a-run). * Artifacts created as [outputs from a run](/models/artifacts/explore-and-traverse-an-artifact-graph#track-the-output-of-a-run). In other words, a lineage graph shows the input and output of a run. For example, the following image shows a typical lineage graph for artifacts created and used throughout an ML experiment: Registry lineage From left to right, the image shows: 1. Multiple runs log the `split_zoo_dataset:v0` artifact. 2. The "zesty-snowball-7" run uses the `split_zoo_dataset:v0` artifact for training. 3. The output of the "zesty-snowball-7" run is a model artifact called `zoo-qne08r7u:v0`. 4. A run called "glamorous-planet-8" uses the model artifact `zoo-qne08r7u:v0` to evaluate the model. To view a lineage graph for an artifact in a collection: 1. Navigate to the W\&B Registry. 2. Select the collection that contains the artifact. 3. From the dropdown, select the artifact version you want to view its lineage graph. 4. Select the **Lineage** tab. 5. Select a node to view detailed information about the run or artifact. See [Enable lineage graph tracking](/models/artifacts/explore-and-traverse-an-artifact-graph#enable-lineage-graph-tracking) to learn how to track the input and output of a run using the W\&B Python SDK. The following image shows the expanded detailed view of a run (`zesty-snowball-7`) when you select a node in the lineage graph: Expanded lineage node The following image shows the expanded detailed view of an artifact (`zoo-qne08r7u:v0`) when you select an artifact node in the lineage graph: Expanded artifact node details You can also view lineage graphs for artifacts you log to W\&B that are not part of a collection. See [Explore artifact graphs](/models/artifacts/explore-and-traverse-an-artifact-graph) for more information. ### Create a custom view Click **Custom** in the top-right corner of the lineage graph to create a custom view. You can filter and customize the lineage graph with the following options: * **Filter by artifact type**: Filter by artifact type values logged to W\&B. For example, if you log an artifact with the type "dataset", then "dataset" is available as a filter value. * **Filter by run job type**: Filter by run job type values logged to W\&B. For example, if you log a run with the job type "training", then "training" is available as a filter value. * **Include extended lineage**: Display items that are not in the direct lineage for the base version. * **Include generated artifacts**: Items created programmatically will show. * **Expand clusters**: Similar items with similar connections will not be grouped together. * **Downstream hops**: Descendant generations relative to the active node. * **Upstream hops**: Ancestor generations relative to the active node. To reset the lineage graph to its default view, click the backward arrow button. ## Audit a collection's history View actions that members of your organization take on that collection. You can view: * If an alias was added or removed from an artifact version. * If an artifact version was added or removed from a collection. For both actions, you can view the user that performed the action and the date the action occurred. To view a collection's action history: 1. Navigate to the W\&B Registry. 2. Select the collection you want to view its action history. 3. Select the dropdown menu next to the collection name. 4. Select the **Action History** option. # Link an artifact version to a collection Source: https://docs.wandb.ai/models/registry/link_version Link an artifact version to a collection in W&B Registry to share it across your organization. To make an artifact version available to your organization, *link* it to a [collection](/models/registry/create_collection) in the W\&B Registry. Linking moves the version from a [private, project-level scope to a shared, organization-level scope](/models/registry/create_registry#visibility-types). You can link an artifact version [programmatically by using the W\&B Python SDK or interactively in the W\&B App](/models/registry/link_version#link-an-artifact-to-a-collection). When you link an artifact, W\&B creates a reference between the source artifact and the collection entry. The linked version points to the source artifact version that was logged to a run within a project. You can view both the linked version in the collection and the source version in the project where it was logged. ## Link an artifact to a collection Based on your use case, follow the instructions described in the tabs below to link an artifact version. Before you start, check the following: * The types of artifacts that collection permits. For more information about collection types, see "Collection types" within [Create a collection](./create_collection). * The registry that the collection belongs to already exists. To check that the registry exists, navigate to the [Registry App and search for](/models/registry/search_registry) the name of the registry. Programmatically link an artifact version to a collection with [`wandb.Run.link_artifact()`](/models/ref/python/experiments/run#link_artifact) or [`wandb.Artifact.link()`](/models/ref/python/experiments/artifact#method-artifactlink). Use `wandb.Run.link_artifact()` to link an artifact version [within the context of a run](#link-an-artifact-version-within-the-context-of-a-run). Use `wandb.Artifact.link()` to link an *existing artifact version* [outside the context of a run](#link-an-artifact-version-outside-the-context-of-a-run). For both approaches, specify the name of the artifact (`wandb.Artifact(name=""`), the type of artifact (`wandb.Artifact(type=""`), and the `target_path` (`wandb.Artifact(target_path=""`)) of the collection and registry you want to link the artifact version to. The target path consists of the prefix `"wandb-registry"`, the name of the registry, and the name of the collection separated by a forward slashes: ```text theme={null} wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME} ``` ### Link an artifact version within the context of a run Use `wandb.Run.link_artifact()` to link an artifact version within the context of a run. To do so, first initialize a run with `wandb.init()`. Next, create an artifact object and add files to it. Finally, use the `wandb.Run.link_artifact()` method to link the artifact version to the collection. When you use this approach, a run is created in your W\&B project. The artifact version is linked to the collection and is associated with that run. Copy and paste the code snippet below. Replace values enclosed in `<>` with your own: ```python theme={null} import wandb entity = "" # Your team entity project = "" # The name of the project that contains your artifact # Initialize a run with wandb.init(entity = entity, project = project) as run: # Create an artifact object # The type parameter specifies both the type of the # artifact object and the collection type artifact = wandb.Artifact(name = "", type = "") # Add the file to the artifact object. # Specify the path to the file on your local machine. artifact.add_file(local_path = "") # Specify the collection and registry to link the artifact to REGISTRY_NAME = "" COLLECTION_NAME = "" target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}" # Link the artifact to the collection run.link_artifact(artifact = artifact, target_path = target_path) ``` ### Link an artifact version outside the context of a run Use `wandb.Artifact.link()` to link an existing artifact version outside the context of a run. With this approach, you do not need to initialize a run with `wandb.init()`. This means that a run is not created in your W\&B project. In other words, the artifact version is linked to the collection without being associated with a run. First, create an artifact object and add files to it. Next, use the `wandb.Artifact.link()` method to link the artifact version to the collection. Copy and paste the code snippet below. Replace values enclosed in `<>` with your own: ```python theme={null} import wandb # Create an artifact object # The type parameter specifies both the type of the # artifact object and the collection type artifact = wandb.Artifact(name = "", type = "") # Add the file to the artifact object. # Specify the path to the file on your local machine. artifact.add_file(local_path = "") # Specify the collection and registry to link the artifact to REGISTRY_NAME = "" COLLECTION_NAME = "" target_path=f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}" # Link the artifact to the collection artifact.link(target_path = target_path) ``` 1. Navigate to the W\&B Registry. W&B Registry UI with project sidebar 2. Hover your mouse next to the name of the collection you want to link an artifact version to. 3. Select the **action ()** menu next to **View details**. 4. From the dropdown, select **Link new version**. 5. From the sidebar that appears, select the name of a team from the **Team** dropdown. 6. From the **Project** dropdown, select the name of the project that contains your artifact. 7. From the **Artifact** dropdown, select the name of the artifact. 8. From the **Version** dropdown, select the artifact version you want to link to the collection. 1. Navigate to your project's artifact browser on the W\&B App at: `https://wandb.ai///artifacts` 2. Select the Artifacts icon in the project sidebar. 3. Click on the artifact version you want to link to your registry. 4. Within the **Version overview** section, click the **Link to registry** button. 5. From the modal that appears on the right of the screen, select an artifact from the **Select a register model** menu dropdown. 6. Click **Next step**. 7. (Optional) Select an alias from the **Aliases** dropdown. 8. Click **Link to registry**. You can [view a linked artifact's metadata, version data, usage, lineage information](/models/registry/link_version#view-linked-artifacts-in-a-registry") and more in the Registry App. ## View linked artifacts in a registry View information about linked artifacts such as metadata, lineage, and usage information in the W\&B Registry. 1. Navigate to the W\&B Registry. 2. Select the name of the registry that you linked the artifact to. 3. Select the name of the collection. 4. If the collection's artifacts log metrics, compare metrics across versions by clicking **Show metrics**. 5. From the list of artifact versions, select the version you want to access. Version numbers are incrementally assigned to each linked artifact version starting with `v0`. 6. To view details about an artifact version, click the version. From the tabs in this page, you can view that version's metadata (including logged metrics), lineage, and usage information. Make note of the **Full Name** field within the **Version** tab. The full name of a linked artifact consists of the registry, collection name, and the alias or index of the artifact version. ```text title="Full name of a linked artifact" theme={null} wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{INTEGER} ``` You need the full name of a linked artifact to access the artifact version programmatically. ## Troubleshooting Below are some common things to double check if you are not able to link an artifact. ### Logging artifacts from a personal account Artifacts logged to W\&B with a personal entity can not be linked to the registry. Make sure that you log artifacts using a team entity within your organization. Only artifacts logged within an organization's team can be linked to the organization's registry. Ensure that you log an artifact with a team entity if you want to link that artifact to a registry. #### Find your team entity W\&B uses the name of your team as the team's entity. For example, if your team is called **team-awesome**, your team entity is `team-awesome`. You can confirm the name of your team by: 1. Navigate to your team's W\&B profile page. 2. Copy the site's URL. It has the form of `https://wandb.ai/`. Where `` is the both the name of your team and the team's entity. #### Log from a team entity 1. Specify the team as the entity when you initialize a run with [`wandb.init()`](/models/ref/python/functions/init). If you do not specify the `entity` when you initialize a run, the run uses your default entity which may or may not be your team entity. ```python theme={null} import wandb with wandb.init(entity='', project='') as run: # Log your artifact here ``` 2. Log the artifact to the run either with `wandb.Run.log_artifact()` or by creating an Artifact object and then adding files to it with: ```python theme={null} artifact = wandb.Artifact(name="", type="") ``` To log artifacts, see [Construct artifacts](/models/artifacts/construct-an-artifact/). 3. If an artifact is logged to your personal entity, you will need to re-log it to an entity within your organization. ### Confirm the path of a registry in the W\&B App UI There are two ways to confirm the path of a registry with the UI: create an empty collection and view the collection details or copy and paste the autogenerated code on the collection's homepage. #### Copy and paste autogenerated code 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Click the registry you want to link an artifact to. 3. At the top of the page, you will see an autogenerated code block. 4. Copy and paste this into your code, ensure to replace the last part of the path with the name of your collection. Auto-generated code snippet #### Create an empty collection 1. Navigate to the W\&B Registry at [https://wandb.ai/registry/](https://wandb.ai/registry/). 2. Click the registry you want to link an artifact to. 3. Click on the empty collection. If an empty collection does not exist, create a new collection. 4. Within the code snippet that appears, identify the `target_path` field within `.link_artifact()`. 5. (Optional) Delete the collection. Create an empty collection For example, after completing the steps outlined, you find the code block with the `target_path` parameter: ```python theme={null} target_path = "smle-registries-bug-bash/wandb-registry-Golden Datasets/raw_images" ``` Breaking this down into its components, you can see what you will need to use to create the path to link your artifact programmatically: ```python theme={null} ORG_ENTITY_NAME = "smle-registries-bug-bash" REGISTRY_NAME = "Golden Datasets" COLLECTION_NAME = "raw_images" ``` Ensure that you replace the name of the collection from the temporary collection with the name of the collection that you want to link your artifact to. # Organize versions with tags Source: https://docs.wandb.ai/models/registry/organize-with-tags Use tags to organize collections or artifact versions within collections. You can add, remove, edit tags with the Python SDK or W&B App UI. Create and add tags to organize your collections or artifact versions within your registry. Add, modify, view, or remove tags to a collection or artifact version with the W\&B App UI or the W\&B Python SDK. **When to use a tag versus using an alias** Use aliases when you need to reference a specific artifact version uniquely. For example, use an alias such as 'production' or 'latest' to ensure that `artifact_name:alias` always points to a single, specific version. Use tags when you want more flexibility for grouping or searching. Tags are ideal when multiple versions or collections can share the same label, and you don’t need the guarantee that only one version is associated with a specific identifier. ## Add a tag to a collection Use the W\&B App UI or Python SDK to add a tag to a collection: Use the W\&B App UI to add a tag to a collection: 1. Navigate to the [W\&B Registry](https://wandb.ai/registry). 2. Click on a registry card. 3. Click **View details** next to the name of a collection. 4. Within the collection card, click on the plus icon (**+**) next to the **Tags** field and type in the name of the tag. 5. Press **Enter** on your keyboard. Adding tags to a Registry collection ```python theme={null} import wandb COLLECTION_TYPE = "" REGISTRY_NAME = "" COLLECTION_NAME = "" full_name = f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}" collection = wandb.Api().artifact_collection( type_name = COLLECTION_TYPE, name = full_name ) collection.tags = ["your-tag"] collection.save() ``` ## Update tags that belong to a collection Update a tag programmatically by reassigning or by mutating the `tags` attribute. W\&B recommends, and it is good Python practice, that you reassign the `tags` attribute instead of in-place mutation. For example, the following code snippet shows common ways to update a list with reassignment. For brevity, we continue the code example from the [Add a tag to a collection section](#add-a-tag-to-a-collection): ```python theme={null} collection.tags = [*collection.tags, "new-tag", "other-tag"] collection.tags = collection.tags + ["new-tag", "other-tag"] collection.tags = set(collection.tags) - set(tags_to_delete) collection.tags = [] # deletes all tags ``` The following code snippet shows how you can use in-place mutation to update tags that belong to an artifact version: ```python theme={null} collection.tags += ["new-tag", "other-tag"] collection.tags.append("new-tag") collection.tags.extend(["new-tag", "other-tag"]) collection.tags[:] = ["new-tag", "other-tag"] collection.tags.remove("existing-tag") collection.tags.pop() collection.tags.clear() ``` ## View tags that belong to a collection Use the W\&B App UI to view tags added to a collection: 1. Navigate to the [W\&B Registry](https://wandb.ai/registry). 2. Click on a registry card. 3. Click **View details** next to the name of a collection. If a collection has one or more tags, you can view those tags within the collection card next to the **Tags** field. Registry collection with selected tags Tags added to a collection also appear next to the name of that collection. For example, in the following image, a tag called "tag1" was added to the "zoo-dataset-tensors" collection. Tag management ## Remove a tag from a collection Use the W\&B App UI to remove a tag from a collection: 1. Navigate to the [W\&B Registry](https://wandb.ai/registry). 2. Click on a registry card. 3. Click **View details** next to the name of a collection. 4. Within the collection card, hover your mouse over the name of the tag you want to remove. 5. Click on the cancel button (**X** icon). ## Add a tag to an artifact version Add a tag to an artifact version linked to a collection with the W\&B App UI or with the Python SDK. 1. Navigate to the W\&B Registry at [https://wandb.ai/registry](https://wandb.ai/registry) 2. Click on a registry card. 3. Click **View details** next to the name of the collection you want to add a tag to. 4. Scroll down to **Versions**. 5. Click **View** next to an artifact version. 6. Within the **Version** tab, click on the plus icon (**+**) next to the **Tags** field and type in the name of the tag. 7. Press **Enter** on your keyboard. Adding tags to artifact versions Fetch the artifact version you want to add or update a tag to. Once you have the artifact version, you can access the artifact object's `tag` attribute to add or modify tags to that artifact. Pass in one or more tags as list to the artifacts `tag` attribute. Like other artifacts, you can fetch an artifact from W\&B without creating a run or you can create a run and fetch the artifact within that run. In either case, ensure to call the artifact object's `save` method to update the artifact on the W\&B servers. Copy and paste an appropriate code cells below to add or modify an artifact version's tag. Replace the values in `<>` with your own. The following code snippet shows how to fetch an artifact and add a tag without creating a new run: ```python title="Add a tag to an artifact version without creating a new run" theme={null} import wandb ARTIFACT_TYPE = "" REGISTRY_NAME = "" COLLECTION_NAME = "" VERSION = "" artifact_name = f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}" artifact = wandb.Api().artifact(name = artifact_name, type = ARTIFACT_TYPE) artifact.tags = ["tag2"] # Provide one or more tags in a list artifact.save() ``` The following code snippet shows how to fetch an artifact and add a tag by creating a new run: ```python title="Add a tag to an artifact version during a run" theme={null} import wandb REGISTRY_NAME = "" COLLECTION_NAME = "" VERSION = "" with wandb.init(entity = "", project="") as run: artifact_name = f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}" artifact = run.use_artifact(artifact_or_name = artifact_name) artifact.tags = ["tag2"] # Provide one or more tags in a list artifact.save() ``` ## Update tags that belong to an artifact version Update a tag programmatically by reassigning or by mutating the `tags` attribute. W\&B recommends, and it is good Python practice, that you reassign the `tags` attribute instead of in-place mutation. For example, the following code snippet shows common ways to update a list with reassignment. For brevity, we continue the code example from the [Add a tag to an artifact version section](#add-a-tag-to-an-artifact-version): ```python theme={null} artifact.tags = [*artifact.tags, "new-tag", "other-tag"] artifact.tags = artifact.tags + ["new-tag", "other-tag"] artifact.tags = set(artifact.tags) - set(tags_to_delete) artifact.tags = [] # deletes all tags ``` The following code snippet shows how you can use in-place mutation to update tags that belong to an artifact version: ```python theme={null} artifact.tags += ["new-tag", "other-tag"] artifact.tags.append("new-tag") artifact.tags.extend(["new-tag", "other-tag"]) artifact.tags[:] = ["new-tag", "other-tag"] artifact.tags.remove("existing-tag") artifact.tags.pop() artifact.tags.clear() ``` ## View tags that belong to an artifact version View tags that belong to an artifact version that is linked to a registry with the W\&B App UI or with the Python SDK. 1. Navigate to the [W\&B Registry](https://wandb.ai/registry). 2. Click on a registry card. 3. Click **View details** next to the name of the collection you want to add a tag to. 4. Scroll down to **Versions** section. If an artifact version has one or more tags, you can view those tags within the **Tags** column. Artifact version with tags Fetch the artifact version to view its tags. Once you have the artifact version, you can view tags that belong to that artifact by viewing the artifact object's `tag` attribute. Similar to other artifacts, you can fetch an artifact from W\&B without creating a run or you can create a run and fetch the artifact within that run. Copy and paste an appropriate code cells below to add or modify an artifact version's tag. Replace the values in `<>` with your own. The following code snippet shows how to fetch and view an artifact version's tags without creating a new run: ```python title="Add a tag to an artifact version without creating a new run" theme={null} import wandb ARTIFACT_TYPE = "" REGISTRY_NAME = "" COLLECTION_NAME = "" VERSION = "" artifact_name = f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}" artifact = wandb.Api().artifact(name = artifact_name, type = artifact_type) print(artifact.tags) ``` The following code snippet shows how to fetch and view artifact version's tags by creating a new run: ```python title="Add a tag to an artifact version during a run" theme={null} import wandb REGISTRY_NAME = "" COLLECTION_NAME = "" VERSION = "" with wandb.init(entity = "", project="") as run: artifact_name = f"wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME}:v{VERSION}" artifact = run.use_artifact(artifact_or_name = artifact_name) print(artifact.tags) ``` ## Remove a tag from an artifact version 1. Navigate to the [W\&B Registry](https://wandb.ai/registry). 2. Click on a registry card. 3. Click **View details** next to the name of the collection you want to add a tag to 4. Scroll down to **Versions** 5. Click **View** next to an artifact version 6. Within the **Version** tab, hover your mouse over the name of the tag 7. Click on the cancel button (**X** icon) ## Search existing tags Use the W\&B App UI to search existing tags in collections and artifact versions: 1. Navigate to the [W\&B Registry](https://wandb.ai/registry). 2. Click on a registry card. 3. Within the search bar, type in the name of a tag. Tag-based search ## Find artifact versions with a specific tag Use the W\&B Python SDK to find artifact versions that have a set of tags: ```python theme={null} import wandb api = wandb.Api() tagged_artifact_versions = api.artifacts( type_name = "", name = "", tags = ["", ""] ) for artifact_version in tagged_artifact_versions: print(artifact_version.tags) ``` # Annotate collections Source: https://docs.wandb.ai/models/registry/registry_cards Add descriptions and documentation to W&B Registry collections to help users understand their purpose and contents. Add human-friendly text to your collections to help users understand the purpose of the collection and the artifacts it contains. Depending on the collection, you might want to include information about the training data, model architecture, task, license, references, and deployment. The following lists some topics worth documenting in a collection: W\&B recommends including at minimum these details: * **Summary**: The purpose of the collection. The machine learning framework used for the machine learning experiment. * **License**: The legal terms and permissions associated with the use of the machine learning model. It helps model users understand the legal framework under which they can utilize the model. Common licenses include Apache 2.0, MIT, and GPL. * **References**: Citations or references to relevant research papers, datasets, or external resources. If your collection contains training data, consider including these additional details: * **Training data**: Describe the training data used * **Processing**: Processing done on the training data set. * **Data storage**: Where is that data stored and how to access it. If your collection contains a machine learning model, consider including these additional details: * **Architecture**: Information about the model architecture, layers, and any specific design choices. * **Task**: The specific type of task or problem that the machine that the collection model is designed to perform. It's a categorization of the model's intended capability. * **Deserialize the model**: Provide information on how someone on your team can load the model into memory. * **Deployment**: Details on how and where the model is deployed and guidance on how the model is integrated into other enterprise systems, such as a workflow orchestration platforms. ## Add a description to a collection Interactively or programmatically add a description to a collection with the W\&B Registry UI or Python SDK. 1. Navigate to the [W\&B Registry](https://wandb.ai/registry/). 2. Click on a collection. 3. Select **View details** next to the name of the collection. 4. Within the **Description** field, provide information about your collection. Format text within with [Markdown markup language](https://www.markdownguide.org/). Use the [`wandb.Api().artifact_collection()`](/models/ref/python/public-api/api#artifact_collection) method to access a collection's description. Use the returned object's `description` property to add, or update, a description to the collection. Specify the collection's type for the `type_name` parameter and the collection's full name for the `name` parameter. A collection's name consists of the prefix “wandb-registry”, the name of the registry, and the name of the collection separated by a forward slashes: ```text theme={null} wandb-registry-{REGISTRY_NAME}/{COLLECTION_NAME} ``` Copy and paste the following code snippet into your Python script or notebook. Replace values enclosed in angle brackets (`<>`) with your own. ```python theme={null} import wandb api = wandb.Api() collection = api.artifact_collection( type_name = "", name = "" ) collection.description = "This is a description." collection.save() ``` For example, the following image shows a collection that documents a model's architecture, intended use, performance information and more. Collection card # Find registry items Source: https://docs.wandb.ai/models/registry/search_registry Learn how to search for registries, collections, and artifact versions in the W&B Registry using the global search bar or queries. Use the [global search bar in the W\&B Registry](./search_registry#search-for-registry-items) to find a registry, collection, artifact version tag, collection tag, or alias. You can use queries to [filter registries, collections, and artifact versions](/models/registry/search_registry#query-registry-items) based on specific criteria using the W\&B Python SDK. The syntax and available operators you can use to query W\&B Registry is similar, but not identical, to MongoDB queries. Only items that you have permission to view appear in the search results. ## Search for registry items Use the W\&B App to search for a registry item: 1. Navigate to the W\&B Registry. 2. Specify the search term in the search bar at the top of the page. Press Enter to search. Search results appear below the search bar if the term you specify matches an existing registry, collection name, artifact version tag, collection tag, or alias. Searching within a Registry ## Query registry items Use [`wandb.Api().registries()`](/models/ref/python/public-api/api#registries) and *query predicates* to filter registries, collections, and artifact versions. A query predicate is a condition that specifies the criteria that returned items must meet. To create a query predicate, use a JSON-like dictionary that consists of [query name](/models/registry/search_registry#filterable-fields), one or more [operators](/models/registry/search_registry#supported-operators), and values. The following code snippet shows the general structure of a query predicate: ```python theme={null} { "query_name": { "operator": value } } ``` The following sections describe the available registry [query names](/models/registry/search_registry#filterable-fields), [supported operators](/models/registry/search_registry#supported-operators), and [example queries](/models/registry/search_registry#example-queries). ### Filterable fields The following table lists query names you can use based on the type of item you want to filter: | | query name | | ----------- | -------------------------------------------------------- | | registries | `name`, `description`, `created_at`, `updated_at` | | collections | `name`, `tag`, `description`, `created_at`, `updated_at` | | versions | `tag`, `alias`, `created_at`, `updated_at`, `metadata` | ### Supported operators W\&B supports the following comparison and logical operators for filtering registry items: #### Comparison operators | Operator | Description | | -------- | ------------------------ | | `$eq` | Equal to | | `$ne` | Not equal to | | `$gt` | Greater than | | `$gte` | Greater than or equal to | | `$lt` | Less than | | `$lte` | Less than or equal to | #### Logical operators | Operator | Description | | -------- | ------------------------------------------------------------------------------------------------- | | `$and` | Performs [AND](https://en.wikipedia.org/wiki/Logical_conjunction) logic to one or more conditions | | `$or` | Performs [OR](https://en.wikipedia.org/wiki/Logical_disjunction) logic to one or more conditions | | `$nor` | Performs [NOR](https://en.wikipedia.org/wiki/Logical_NOR) logic to one or more conditions | | `$not` | Performs [NOT](https://en.wikipedia.org/wiki/Negation) logic to a condition | #### Other operators | Operator | Description | | ----------- | ----------------------------------- | | `$regex` | Regular expression pattern matching | | `$exists` | Field exists/doesn't exist | | `$contains` | String contains value | ### Example queries The following code examples demonstrate some common search scenarios. To use the `wandb.Api().registries()` method, first import the W\&B Python SDK ([`wandb`](/models/ref/python/)) library: ```python theme={null} import wandb # (Optional) Create an instance of the wandb.Api() class for readability api = wandb.Api() ``` Filter all registries that contain the string `model`: ```python theme={null} # Filter all registries that contain the string `model` registry_filters = { "name": {"$regex": "model"} } # Returns an iterable of all registries that match the filters registries = api.registries(filter=registry_filters) ``` Filter all collections, independent of registry, that contains the string `yolo` in the collection name: ```python theme={null} # Filter all collections, independent of registry, that # contains the string `yolo` in the collection name collection_filters = { "name": {"$regex": "yolo"} } # Returns an iterable of all collections that match the filters collections = api.registries().collections(filter=collection_filters) ``` Filter all collections, independent of registry, that contains the string `yolo` in the collection name and possesses `cnn` as a tag: ```python theme={null} # Filter all collections, independent of registry, that contains the # string `yolo` in the collection name and possesses `cnn` as a tag collection_filters = { "name": {"$regex": "yolo"}, "tag": "cnn" } # Returns an iterable of all collections that match the filters collections = api.registries().collections(filter=collection_filters) ``` Find all artifact versions that contains the string `model` and has either the tag `image-classification` or an `latest` alias: ```python theme={null} # Find all artifact versions that contains the string `model` and # has either the tag `image-classification` or an `latest` alias registry_filters = { "name": {"$regex": "model"} } # Use logical $or operator to filter artifact versions version_filters = { "$or": [ {"tag": "image-classification"}, {"alias": "production"} ] } # Returns an iterable of all artifact versions that match the filters artifacts = api.registries(filter=registry_filters).collections().versions(filter=version_filters) ``` Each item in the `artifacts` iterable in the previous code snippet is an instance of the `Artifact` class. This means that you can access each artifact's attributes, such as `name`, `collection`, `aliases`, `tags`, `created_at`, and more: ```python theme={null} for art in artifacts: print(f"artifact name: {art.name}") print(f"collection artifact belongs to: { art.collection.name}") print(f"artifact aliases: {art.aliases}") print(f"tags attached to artifact: {art.tags}") print(f"artifact created at: {art.created_at}\n") ``` For a complete list of an artifact object's attributes, see the [Artifacts Class](/models/ref/python/experiments/artifact/) in the API Reference docs. Filter all artifact versions, independent of registry or collection, created between 2024-01-08 and 2025-03-04 at 13:10 UTC: ```python theme={null} # Find all artifact versions created between 2024-01-08 and 2025-03-04 at 13:10 UTC. artifact_filters = { "alias": "latest", "created_at" : {"$gte": "2024-01-08", "$lte": "2025-03-04 13:10:00"}, } # Returns an iterable of all artifact versions that match the filters artifacts = api.registries().collections().versions(filter=artifact_filters) ``` Specify the date and time in `YYYY-MM-DD HH:MM:SS` format for `created_at` and `updated_at` queries. You can omit the hours, minutes, and seconds if you want to filter by date only. # Reports overview Source: https://docs.wandb.ai/models/reports Project management and collaboration tools for machine learning projects Use W\&B Reports to: * Organize Runs. * Embed and automate visualizations. * Describe your findings. * Share updates with collaborators, either as a LaTeX zip file or a PDF. The following image shows a section of a report created from metrics that were logged to W\&B over the course of training. W&B report with benchmark results View the report where the above image was taken from [here](https://wandb.ai/stacey/saferlife/reports/SafeLife-Benchmark-Experiments--Vmlldzo0NjE4MzM). ## How it works Create a collaborative report with a few clicks. 1. Navigate to your W\&B project workspace in the W\&B App. 2. Click the **Create report** button in the upper right corner of your workspace. Create report button 3. A modal titled **Create Report** will appear. Select the charts and panels you want to add to your report. (You can add or remove charts and panels later). 4. Click **Create report**. 5. Edit the report to your desired state. 6. Click **Publish to project**. 7. Click the **Share** button to share your report with collaborators. See the [Create a report](/models/reports/create-a-report/) page for more information on how to create reports interactively and programmatically with the W\&B Python SDK. ## How to get started Depending on your use case, explore the following resources to get started with W\&B Reports: * Check out our [video demonstration](https://www.youtube.com/watch?v=2xeJIv_K_eI) to get an overview of W\&B Reports. * Explore the [Reports gallery](/models/reports/reports-gallery/) for examples of live reports. * Try the [Programmatic Workspaces](https://colab.research.google.com/github/wandb/wandb-workspaces/blob/Update-wandb-workspaces-tuturial/Workspace_tutorial.ipynb) notebook to learn how to create and customize your workspace. * Read curated Reports in [W\&B Fully Connected](https://wandb.me/fc). ## Recommended practices and tips For best practices and tips for Experiments and logging, see [Best Practices: Reports](https://wandb.ai/wandb/pytorch-lightning-e2e/reports/W-B-Best-Practices-Guide--VmlldzozNTU1ODY1#reports). # Clone and export reports Source: https://docs.wandb.ai/models/reports/clone-and-export-reports Export W&B Reports as PDF or LaTeX files, and clone reports using the App UI or the Report and Workspace API. W\&B Report and Workspace API is in Public Preview. This page describes how to export a W\&B Report to a portable file format. It also describes how to clone an existing report to reuse its structure as a starting point for new work. ## Export a report Export a report as a PDF or LaTeX file to share its contents outside of W\&B or to archive a static version of your analysis. In your report, select the **action ()** menu. Choose **Download** and select either PDF or LaTeX output format. ## Clone a report Clone a report to reuse an existing project's template and formatting as the basis for a new report. You can clone reports in the W\&B App UI or programmatically with the Report and Workspace API. In your report, select the **action ()** menu. Choose **Clone this report**. In the modal, pick a destination for your cloned report. Choose **Clone report**. Cloning reports When you clone a report, you specify the destination. If you clone it to a team, all team members can view it. If you clone it to your personal account, only you can view it by default. Use the Report and Workspace API to load an existing report from its URL and reuse it as a template for a new report. Load a report from a URL to use it as a template. Replace `PROJECT` with the name of your W\&B project. ```python theme={null} report = wr.Report( project=PROJECT, title="Quickstart Report", description="That was easy!" ) # Create report.save() # Save new_report = wr.Report.from_url(report.url) # Load ``` After you load the report, edit the content in `new_report.blocks` to customize the cloned report, then save it. Replace `ENTITY` with your W\&B entity name. ```python theme={null} pg = wr.PanelGrid( runsets=[ wr.Runset(ENTITY, PROJECT, "First Run Set"), wr.Runset(ENTITY, PROJECT, "Elephants Only!", query="elephant"), ], panels=[ wr.LinePlot(x="Step", y=["val_acc"], smoothing_factor=0.8), wr.BarPlot(metrics=["acc"]), wr.MediaBrowser(media_keys="img", num_columns=1), wr.RunComparer(diff_only="split", layout={"w": 24, "h": 9}), ], ) new_report.blocks = ( report.blocks[:1] + [wr.H1("Panel Grid Example"), pg] + report.blocks[1:] ) new_report.save() ``` # Collaborate on reports Source: https://docs.wandb.ai/models/reports/collaborate-on-reports Collaborate and share W&B Reports with peers, co-workers, and your team. This page describes ways to collaborate on reports with your team. Use these workflows to share results, gather feedback, and keep important reports accessible. You can share a report, edit it collaboratively, add comments, or star it for quick access. ## Share a report When viewing a report, click **Share**. You can share a report in one of the following ways: * To share a link to the report with an email address or a username, click **Invite**. Enter an email address or username, select **Can view** or **Can edit**, then click **Invite**. If you share by email, the email address doesn't need to be a member of your organization or team. * To generate a sharing link instead, click **Share**. Adjust the permissions for the link, then click **Copy report link**. Share the link with the member. When viewing the report, click a panel to open it in full-screen mode. If you copy the URL from the browser and share it with another user, the panel opens directly in full-screen mode when they access the link. ## Edit a report Multiple team members can edit the same report and publish changes when they're ready. When any team member clicks the **Edit** button to begin editing the report, W\&B automatically saves a draft. Select **Save to report** to publish your changes. If an edit conflict occurs, such as when two team members edit the report at once, a warning notification helps you resolve any conflicts. Report sharing modal for a report in a 'Public' project ## Comment on reports To leave a comment on a report, click **Comment**. To comment directly on a panel, hover over the panel, then click the comment button (). Adding a comment to a panel ## Star a report If your team has many reports, click the open star icon () at the top of a report to add it to your favorites. To remove the star, click the closed star icon (). When viewing your team's list of reports, click the open star in a report's row to add it to your favorites. Starred reports appear at the top of the list. From the list of reports, you can see how many members have starred each report to gauge its popularity. # Create a report Source: https://docs.wandb.ai/models/reports/create-a-report Create a W&B Report with the W&B App or programmatically. W\&B Report and Workspace API is in Public Preview. Select a tab to learn how to create a report in the W\&B App or programmatically with the Report and Workspace API. For an example of how to programmatically create a report, see this [Google Colab](https://colab.research.google.com/github/wandb/examples/blob/master/colabs/intro/Report_API_Quickstart.ipynb). 1. Navigate to your project workspace in the W\&B App. 2. In the upper-right corner of your workspace, click **Create report**. Create report button 3. A modal appears. Select the charts you want to start with. You can add or delete charts later from the report interface. Create report modal 4. Select the **Filter run sets** option to prevent new runs from being added to your report. You can toggle this option on or off. A draft report is saved automatically. Access it in the **Reports** tab. 1. Navigate to your project workspace in the W\&B App. 2. Select the **Reports** tab in your project. 3. Click **Create report**. Create report button Create a report programmatically: 1. Install W\&B SDK (`wandb`) and Report and Workspace API (`wandb-workspaces`): ```bash theme={null} pip install wandb wandb-workspaces ``` 2. Import the W\&B Python SDK and the Report and Workspace API. ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr ``` 3. Create a report with `wandb_workspaces.reports.v2.Report`. Create a report instance with the Report Class Public API ([`wandb.apis.reports`](/models/ref/python/public-api/api#reports)). Specify a name for the project. ```python theme={null} report = wr.Report(project="report_standard") ``` 4. Save the report. Reports aren't uploaded to W\&B until you call the `.save()` method: ```python theme={null} report.save() ``` For more information, see [Edit a report](/models/reports/edit-a-report). # Compare runs across projects Source: https://docs.wandb.ai/models/reports/cross-project-reports Compare runs from two different projects with cross-project reports. Watch a [video demonstrating comparing runs across projects](https://www.youtube.com/watch?v=uD4if_nGrs4) (2 min). Cross-project reports let you compare runs from two different projects side by side, which is useful when you want to evaluate experiments tracked under separate W\&B projects without duplicating data. Use the project selector in the run set table to pick a project. Compare runs across different projects The visualizations in the section pull columns from the first active run set. If you don't see the metric you're looking for in the line plot, make sure that the first run set checked in the section has that column available. This feature supports history data on time series lines, but doesn't support pulling different summary metrics from different projects. In other words, you can't create a scatter plot from columns that are only logged in another project. If you need to compare runs from two projects and the columns aren't working, add a tag to the runs in one project and then move those runs to the other project. You can still filter only the runs from each project, but the report includes all the columns for both sets of runs. ## View-only report links When you need to share a report with someone outside your project (such as a stakeholder who doesn't have a W\&B account), use a view-only link. Share a view-only link to a report that is in a private project or team project. View-only report links View-only report links add a secret access token to the URL, so anyone who opens the link can view the page. Anyone can use the magic link to view the report without logging in first. For customers on [W\&B Local](/platform/hosting/) private cloud installations, these links remain behind your firewall, so only members of your team with access to your private instance *and* access to the view-only link can view the report. In **view-only mode**, someone who isn't logged in can see the charts and mouse over to see tooltips of values, zoom in and out on charts, and scroll through columns in the table. When in view mode, they can't create new charts or new table queries to explore the data. View-only visitors to the report link can't click a run to get to the run page. Also, view-only visitors can't see the share modal. Instead, they see a tooltip on hover that says: `Sharing not available for view only access`. The magic links are only available for "Private" and "Team" projects. For "Public" (anyone can view) or "Open" (anyone can view and contribute runs) projects, you can't turn the links on or off because the project is public, meaning it's already available to anyone with the link. ## Send a graph to a report If you want to preserve a chart from a workspace alongside related analysis, you can send it directly to a report. Send a graph from your workspace to a report to keep track of your progress. Click the dropdown menu on the chart or panel you'd like to copy to a report and click **Add to report** to select the destination report. # Edit a report Source: https://docs.wandb.ai/models/reports/edit-a-report Edit a report interactively with the App UI or programmatically with the W&B SDK. W\&B Report and Workspace API is in Public Preview. This page describes how to edit a report interactively with the App UI or programmatically with the W\&B SDK. It covers adding plots, run sets, code blocks, markdown, HTML elements, and rich media, as well as filtering and grouping run sets, organizing report layout, and visualizing multi-dimensional relationships. A report's body consists of *blocks*. Blocks contain text, images, embedded visualizations, plots from experiments and runs, and panel grids. *Panel grids* are a specific type of block that hold panels and *run sets*. Run sets are a collection of runs logged to a project in W\&B. Panels are visualizations of run set data. Run the [Programmatic workspaces notebook](https://colab.research.google.com/github/wandb/wandb-workspaces/blob/Update-wandb-workspaces-tuturial/Workspace_tutorial.ipynb) for an end-to-end tutorial of creating a customized saved workspace view. To programmatically edit a report, you must have the W\&B Report and Workspace API (`wandb-workspaces`) installed along with the W\&B Python SDK: ```bash theme={null} pip install wandb wandb-workspaces ``` ## Add plots Plots let you visualize run data inside a report. Each panel grid has a set of run sets and a set of panels. The run sets at the bottom of the section control what data appears on the panels in the grid. Create a new panel grid if you want to add charts that pull data from a different set of runs. Enter a forward slash (`/`) in the report to display a dropdown menu. Select **Add panel** to add a panel. You can add any panel that W\&B supports, including a line plot, scatter plot, or parallel coordinates chart. Add charts to a report Add plots to a report programmatically with the SDK. Pass a list of one or more plot or chart objects to the `panels` parameter in the `PanelGrid` Public API Class. Create a plot or chart object with its associated Python Class. The following examples demonstrate how to create a line plot and scatter plot. ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr report = wr.Report( project="report-editing", title="Report title", description="Report description.", ) blocks = [ wr.PanelGrid( panels=[ wr.LinePlot(x="time", y="velocity"), wr.ScatterPlot(x="time", y="acceleration"), ] ) ] report.blocks = blocks report.save() ``` For more information about available plots and charts you can add to a report programmatically, see `wr.panels`. ## Add run sets Add run sets from projects interactively with the App UI or the W\&B SDK. Enter a forward slash (`/`) in the report to display a dropdown menu. From the dropdown, choose **Panel Grid**. W\&B automatically imports the run set from the project the report was created from. If you import a panel into a report, run names are inherited from the project. In the report, you can optionally [rename a run](/models/runs/#rename-a-run) to give the reader more context. The run is renamed only in the individual panel. If you clone the panel in the same report, the run is also renamed in the cloned panel. 1. In the report, click the pencil icon to open the report editor. 2. In the run set, find the run to rename. Hover over the report name and click the **action ()** menu. Select one of the following choices, then submit the form. * **Rename run for project**: Rename the run across the entire project. To generate a new random name, leave the field blank. * **Rename run for panel grid**: Rename the run only in the report, preserving the existing name in other contexts. Generating a new random name is not supported. 3. Click **Publish report**. Add run sets from projects with the `wr.Runset()` and `wr.PanelGrid` Classes. To add a runset, follow these steps: 1. Create a `wr.Runset()` object instance. Provide the name of the project that contains the run sets for the project parameter and the entity that owns the project for the entity parameter. 2. Create a `wr.PanelGrid()` object instance. Pass a list of one or more runset objects to the `run sets` parameter. 3. Store one or more `wr.PanelGrid()` object instances in a list. 4. Update the report instance blocks attribute with the list of panel grid instances. Replace `[PROJECT-NAME]` with your W\&B project name and `[ENTITY-NAME]` with your W\&B entity name in the following example: ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr report = wr.Report( project="report-editing", title="Report title", description="Report description.", ) panel_grids = wr.PanelGrid( runsets=[wr.RunSet(project="[PROJECT-NAME]", entity="[ENTITY-NAME]")] ) report.blocks = [panel_grids] report.save() ``` You can optionally add runsets and panels with one call to the SDK: ```python theme={null} import wandb report = wr.Report( project="report-editing", title="Report title", description="Report description.", ) panel_grids = wr.PanelGrid( panels=[ wr.LinePlot( title="line title", x="x", y=["y"], range_x=[0, 100], range_y=[0, 100], log_x=True, log_y=True, title_x="x axis title", title_y="y axis title", ignore_outliers=True, groupby="hyperparam1", groupby_aggfunc="mean", groupby_rangefunc="minmax", smoothing_factor=0.5, smoothing_type="gaussian", smoothing_show_original=True, max_runs_to_show=10, plot_type="stacked-area", font_size="large", legend_position="west", ), wr.ScatterPlot( title="scatter title", x="y", y="y", # z='x', range_x=[0, 0.0005], range_y=[0, 0.0005], # range_z=[0,1], log_x=False, log_y=False, # log_z=True, running_ymin=True, running_ymean=True, running_ymax=True, font_size="small", regression=True, ), ], runsets=[wr.RunSet(project="[PROJECT-NAME]", entity="[ENTITY-NAME]")], ) report.blocks = [panel_grids] report.save() ``` ## Freeze a run set A report automatically updates run sets to show the latest data from the project. Freeze a run set when you want to preserve its state in a report at a point in time, so that later runs don't change what the report displays. To freeze a run set when viewing a report, click the snowflake icon in its panel grid near the **Filter** button. Freeze runset button ## Group a run set programmatically Group runs in a run set programmatically with the [Workspace and Reports API](/models/ref/wandb_workspaces/reports). Grouping organizes related runs together in panels, which makes it easier to compare configurations and results. You can group runs in a run set by config values, run metadata, or summary metrics. The following table lists the available grouping methods along with the available keys for each grouping method: | Grouping method | Description | Available keys | | --------------- | ----------------------------- | ------------------------------------------------------------- | | Config values | Group runs by config values | Values specified in config parameter in `wandb.init(config=)` | | Run metadata | Group runs by run metadata | `State`, `Name`, `JobType` | | Summary metrics | Group runs by summary metrics | Values you log to a run with `wandb.Run.log()` | ### Group runs by config values Group runs by config values to compare runs with similar configurations. Config values are parameters you specify in your run configuration `(wandb.init(config=))`. To group runs by config values, use the `config.[KEY]` syntax, where `[KEY]` is the name of the config value you want to group by. For example, the following code snippet first initializes a run with a config value for `group`, then groups runs in a report based on the `group` config value. Replace `[ENTITY]` and `[PROJECT]` with your W\&B entity and project names. ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr entity = "[ENTITY]" project = "[PROJECT]" for group in ["control", "experiment_a", "experiment_b"]: for i in range(3): with wandb.init(entity=entity, project=project, group=group, config={"group": group, "run": i}, name=f"{group}_run_{i}") as run: # Simulate some training for step in range(100): run.log({ "acc": 0.5 + (step / 100) * 0.3 + (i * 0.05), "loss": 1.0 - (step / 100) * 0.5 }) ``` Within your Python script or notebook, you can then group runs by the `config.group` value: ```python theme={null} runset = wr.Runset( project=project, entity=entity, groupby=["config.group"] # Group by the "group" config value ) ``` Continuing from the previous example, you can create a report with the grouped run set: ```python theme={null} report = wr.Report( entity=entity, project=project, title="Grouped Runs Example", ) report.blocks = [ wr.PanelGrid( runsets=[runset], ) ] report.save() ``` ### Group runs by run metadata Group runs by a run's name (`Name`), state (`State`), or job type (`JobType`). Continuing from the previous example, you can group your runs by their name with the following code snippet: ```python theme={null} runset = wr.Runset( project=project, entity=entity, groupby=["Name"] # Group by run names ) ``` The name of the run is the name you specify in the `wandb.init(name=)` parameter. If you don't specify a name, W\&B generates a random name for the run. You can find the name of the run in the **Overview** page of a run in the W\&B App or programmatically with `Api.runs().run.name`. ### Group runs by summary metrics The following examples demonstrate how to group runs by summary metrics. Summary metrics are the values you log to a run with `wandb.Run.log()`. After you log a run, you can find the names of your summary metrics in the W\&B App under the **Summary** section of a run's **Overview** page. The syntax for grouping runs by summary metrics is `summary.[KEY]`, where `[KEY]` is the name of the summary metric you want to group by. For example, suppose you log a summary metric called `acc`. Replace `[ENTITY]` and `[PROJECT]` with your W\&B entity and project names: ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr entity = "[ENTITY]" project = "[PROJECT]" for group in ["control", "experiment_a", "experiment_b"]: for i in range(3): with wandb.init(entity=entity, project=project, group=group, config={"group": group, "run": i}, name=f"{group}_run_{i}") as run: # Simulate some training for step in range(100): run.log({ "acc": 0.5 + (step / 100) * 0.3 + (i * 0.05), "loss": 1.0 - (step / 100) * 0.5 }) ``` You can then group runs by the `summary.acc` summary metric: ```python theme={null} runset = wr.Runset( project=project, entity=entity, groupby=["summary.acc"] # Group by summary values ) ``` ## Filter a run set programmatically Programmatically filter run sets and add them to a report with the [Workspace and Reports API](/models/ref/wandb_workspaces/reports). Filtering narrows a run set to the specific runs you want to display, based on config values, metrics, tags, or run properties. The general syntax for a filter expression is: ```text theme={null} Filter('[KEY]') operation [VALUE] ``` In this expression, `[KEY]` is the name of the filter, `operation` is a comparison operator (for example, `>`, `<`, `==`, `in`, `not in`, `or`, and `and`), and `[VALUE]` is the value to compare against. `Filter` is a placeholder for the type of filter you want to apply. The following table lists the available filters and their descriptions: | Filter | Description | Available keys | | ------------------------ | ------------------------- | ---------------------------------------------------------------------------- | | `Config('[KEY]')` | Filter by config values | Values specified in `config` parameter in `wandb.init(config=)`. | | `SummaryMetric('[KEY]')` | Filter by summary metrics | Values you log to a run with `wandb.Run.log()`. | | `Tags('[KEY]')` | Filter by tags | Tag values that you add to your run (programmatically or with the W\&B App). | | `Metric('[KEY]')` | Filter by run properties | `tags`, `state`, `displayName`, `jobType` | After you define your filters, you can create a report and pass the filtered run sets to `wr.PanelGrid(runsets=)`. See the **Report and Workspace API** tabs throughout this page for more information about how to add various elements to a report programmatically. The following examples demonstrate how to filter run sets in a report. Replace values enclosed in brackets (for example, `[ENTITY]` and `[PROJECT]`) with your own values. ### Config filters Filter a runset by one or more config values. Config values are parameters you specify in your run configuration (`wandb.init(config=)`). For example, the following code snippet first initializes a run with a config value for `learning_rate` and `batch_size`, then filters runs in a report based on the `learning_rate` config value. ```python theme={null} import wandb config = { "learning_rate": 0.01, "batch_size": 32, } with wandb.init(project="[PROJECT]", entity="[ENTITY]", config=config) as run: # Your training code here pass ``` Within your Python script or notebook, you can then programmatically filter runs that have a learning rate greater than `0.01`. ```python theme={null} import wandb_workspaces.reports.v2 as wr runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="Config('learning_rate') > 0.01" ) ``` You can also filter by multiple config values with the `and` operator: ```python theme={null} runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="Config('learning_rate') > 0.01 and Config('batch_size') == 32" ) ``` Continuing from the previous example, you can create a report with the filtered runset as follows: ```python theme={null} report = wr.Report( entity="[ENTITY]", project="[PROJECT]", title="My Report" ) report.blocks = [ wr.PanelGrid( runsets=[runset], panels=[ wr.LinePlot( x="Step", y=["accuracy"], ) ] ) ] report.save() ``` ### Metric filters Filter run sets based on a run's tag (`tags`), run state (`state`), run name (`displayName`), or job type (`jobType`). `Metric` filters use a different syntax than other filters. You must pass values as a list. ```text theme={null} Metric('[KEY]') operation [VALUE] ``` For example, consider the following Python snippet that creates three runs and assigns each of them a name: ```python theme={null} import wandb with wandb.init(project="[PROJECT]", entity="[ENTITY]") as run: for i in range(3): run.name = f"run{i+1}" # Your training code here pass ``` When you create your report, you can filter runs by their display name. For example, to filter runs with names `run1`, `run2`, and `run3`, you can use the following code: ```python theme={null} runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="Metric('displayName') in ['run1', 'run2', 'run3']" ) ``` You can find the name of the run in the **Overview** page of a run in the W\&B App or programmatically with `Api.runs().run.name`. The following examples demonstrate how to filter a runset by the run's state (`finished`, `crashed`, or `running`): ```python theme={null} runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="Metric('state') in ['finished']" ) ``` ```python theme={null} runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="Metric('state') not in ['crashed']" ) ``` ### SummaryMetric filters The following examples demonstrate how to filter a run set by summary metrics. Summary metrics are the values you log to a run with `wandb.Run.log()`. After you log a run, you can find the names of your summary metrics in the W\&B App under the **Summary** section of a run's **Overview** page. ```python theme={null} runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="SummaryMetric('accuracy') > 0.9" ) ``` ```python theme={null} runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="Metric('state') in ['finished'] and SummaryMetric('train/train_loss') < 0.5" ) ``` ### Tags filters The following code snippet shows how to filter a runs set by its tags. Tags are values you add to a run (programmatically or with the W\&B App). ```python theme={null} runset = wr.Runset( entity="[ENTITY]", project="[PROJECT]", filters="Tags('training') == 'training'" ) ``` ## Add code blocks Add code blocks to your report interactively with the App UI or with the W\&B SDK. Enter a forward slash (`/`) in the report to display a dropdown menu. From the dropdown, choose **Code**. Select the name of the programming language on the right side of the code block to expand a dropdown. From the dropdown, select your programming language syntax. You can choose from JavaScript, Python, CSS, JSON, HTML, Markdown, and YAML. Use the `wr.CodeBlock` Class to create a code block programmatically. Provide the name of the language and the code you want to display for the language and code parameters, respectively. The following example demonstrates a list in a YAML file: ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr report = wr.Report(project="report-editing") report.blocks = [ wr.CodeBlock( code=["this:", "- is", "- a", "cool:", "- yaml", "- file"], language="yaml" ) ] report.save() ``` This renders a code block similar to: ```yaml theme={null} this: - is - a cool: - yaml - file ``` The following example demonstrates a Python code block: ```python theme={null} report = wr.Report(project="report-editing") report.blocks = [wr.CodeBlock(code=["Hello, World!"], language="python")] report.save() ``` This renders a code block similar to: ```md theme={null} Hello, World! ``` ## Add markdown Add markdown to your report interactively with the App UI or with the W\&B SDK. Enter a forward slash (`/`) in the report to display a dropdown menu. From the dropdown, choose **Markdown**. Use the `wandb.apis.reports.MarkdownBlock` Class to create a markdown block programmatically. Pass a string to the `text` parameter: ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr report = wr.Report(project="report-editing") report.blocks = [ wr.MarkdownBlock(text="Markdown cell with *italics* and **bold** and $e=mc^2$") ] ``` This renders a markdown block similar to: Rendered markdown block ## Add HTML elements Add HTML elements to your report interactively with the App UI or with the W\&B SDK. Enter a forward slash (`/`) in the report to display a dropdown menu. From the dropdown, select a type of text block. For example, to create an H2 heading block, select the `Heading 2` option. Pass a list of one or more HTML elements to `wandb.apis.reports.blocks` attribute. The following example demonstrates how to create an H1, H2, and an unordered list: ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr report = wr.Report(project="report-editing") report.blocks = [ wr.H1(text="How Programmatic Reports work"), wr.H2(text="Heading 2"), wr.UnorderedList(items=["Bullet 1", "Bullet 2"]), ] report.save() ``` This renders the HTML elements to the following: Rendered HTML elements ## Embed rich media links Embed rich media within the report with the App UI or with the W\&B SDK. Copy and paste URLs into reports to embed rich media within the report. The following animations demonstrate how to copy and paste URLs from Twitter, YouTube, and SoundCloud. ### Twitter Copy and paste a Tweet link URL into a report to view the Tweet within the report. Embedding Twitter content ### Youtube Copy and paste a YouTube video URL link to embed a video in the report. Embedding YouTube videos ### SoundCloud Copy and paste a SoundCloud link to embed an audio file into a report. Embedding SoundCloud audio Pass a list of one or more embedded media objects to the `wandb.apis.reports.blocks` attribute. The following example demonstrates how to embed video and Twitter media into a report: ```python theme={null} import wandb import wandb_workspaces.reports.v2 as wr report = wr.Report(project="report-editing") report.blocks = [ wr.Video(url="https://www.youtube.com/embed/6riDJMI-Y8U"), wr.Twitter( embed_html='\n' ), ] report.save() ``` ## Duplicate panel grids Duplicate a panel grid to reuse its layout in the same report or in a different report. Select a panel grid and copy-paste it to duplicate it in the same report or paste it into a different report. Highlight a whole panel grid section by selecting the drag handle in the upper right corner. Click and drag to highlight and select a region in a report such as panel grids, text, and headings. Copying panel grids ## Delete panel grids Select a panel grid and press `delete` on your keyboard to delete a panel grid. Deleting panel grids ## Collapse headers to organize reports Collapse headers in a report to hide content within a text block. When the report loads, only expanded headers show content. Collapsing headers in reports helps organize your content and prevents excessive data loading. The following gif demonstrates the process. Collapsing headers in a report. ## Visualize relationships across multiple dimensions To compare more variables than a 2D plot can show, you can use a color gradient as an additional dimension. Using a color gradient to represent one of the variables can make patterns easier to interpret. 1. Choose a variable to represent with a color gradient, like penalty scores or learning rates. This provides a clearer understanding of how penalty (color) interacts with reward/side effects (y-axis) over training time (x-axis). 2. Highlight key trends. Hover over a specific group of runs to highlight them in the visualization. # Embed a report Source: https://docs.wandb.ai/models/reports/embed-reports Embed W&B reports directly into Notion or with an HTML `iframe` element. Embed a W\&B report in an external page or tool so that your team can view live experiment results alongside other documentation. This page describes how to generate an embed code from a report and use it in HTML, Confluence, Notion, or Gradio. ## HTML `iframe` element Use an HTML `iframe` element to embed a report on any web page that accepts custom HTML. W\&B provides a prebuilt embed code that you can copy directly from the report. Select the **Share** button in the upper right of a report to open a dialog, then select **Copy embed code**. The copied snippet is an `iframe` HTML element that you can paste into any page that accepts custom HTML. Only *public* reports are viewable when embedded. Getting embed code ## Confluence The following animation demonstrates how to insert the direct link to the report within an `iframe` cell in Confluence. Embedding in Confluence ## Notion The following animation demonstrates how to insert a report into a Notion document using an **Embed** block in Notion and the report's embed code. Embedding in Notion ## Gradio To embed a W\&B report in a Gradio app, including an app hosted on Hugging Face Spaces, use Gradio's `gr.HTML` element to render an `iframe` that points to the report URL. ```python theme={null} import gradio as gr def wandb_report(url): iframe = f'