This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Experiments

These classes comprise the core building blocks for tracking machine learning experiments, managing artifacts, and configuring SDK behavior. These foundational classes enable you to log metrics, store model checkpoints, version datasets, and manage experiment configurations with full reproducibility and collaboration features.

For more details on using these classes in ML experiments, consult the Experiments and Artifacts docs.

Core Classes

Class Description
Run The primary unit of computation logged by W&B, representing a single ML experiment with metrics, configurations, and outputs.
Artifact Flexible and lightweight building block for dataset and model versioning with automatic deduplication and lineage tracking.
Settings Configuration management for the W&B SDK, controlling behavior from logging to API interactions.

Getting Started

Track an experiment

Create and track a machine learning experiment with metrics logging:

import wandb

# Initialize a new run
with wandb.init(project="my-experiments", config={"learning_rate": 0.001}) as run:
    # Access configuration
    config = run.config
    
    # Log metrics during training
    for epoch in range(10):
        metrics = train_one_epoch()  # Your training logic
        run.log({
            "loss": metrics["loss"],
            "accuracy": metrics["accuracy"],
            "epoch": epoch
        })
    
    # Log summary metrics
    run.summary["best_accuracy"] = max_accuracy

Version a model artifact

Create and log a versioned model artifact with metadata:

import wandb

with wandb.init(project="my-models") as run:
    # Train your model
    model = train_model()
    
    # Create an artifact for the model
    model_artifact = wandb.Artifact(
        name="my-model",
        type="model",
        description="ResNet-50 trained on ImageNet subset",
        metadata={
            "architecture": "ResNet-50",
            "dataset": "ImageNet-1K",
            "accuracy": 0.95
        }
    )
    
    # Add model files to the artifact
    model_artifact.add_file("model.pt")
    model_artifact.add_dir("model_configs/")
    
    # Log the artifact to W&B
    run.log_artifact(model_artifact)

Configure SDK settings

Customize W&B SDK behavior for your specific requirements:

import wandb

# Configure settings programmatically
wandb.Settings(
    project="production-runs",
    entity="my-team",
    mode="offline",  # Run offline, sync later
    save_code=True,   # Save source code
    quiet=True        # Reduce console output
)

# Or use environment variables
# export WANDB_PROJECT=production-runs
# export WANDB_MODE=offline

# Initialize with custom settings
with wandb.init() as run:
    # Your experiment code here
    pass

Track relationships between datasets, models, and evaluations:

import wandb

with wandb.init(project="ml-pipeline") as run:
    # Use a dataset artifact
    dataset = run.use_artifact("dataset:v1")
    dataset_dir = dataset.download()
    
    # Train model using the dataset
    model = train_on_dataset(dataset_dir)
    
    # Create model artifact with dataset lineage
    model_artifact = wandb.Artifact(
        name="trained-model",
        type="model"
    )
    model_artifact.add_file("model.pt")
    
    # Log with automatic lineage tracking
    run.log_artifact(model_artifact)

1 - Artifact

class Artifact

Flexible and lightweight building block for dataset and model versioning.

Construct an empty W&B Artifact. Populate an artifacts contents with methods that begin with add. Once the artifact has all the desired files, you can call run.log_artifact() to log it.

method Artifact.__init__

__init__(
    name: 'str',
    type: 'str',
    description: 'str | None' = None,
    metadata: 'dict[str, Any] | None' = None,
    incremental: 'bool' = False,
    use_as: 'str | None' = None
)  None

Args:

  • name (str): A human-readable name for the artifact. Use the name to identify a specific artifact in the W&B App UI or programmatically. You can interactively reference an artifact with the use_artifact Public API. A name can contain letters, numbers, underscores, hyphens, and dots. The name must be unique across a project.
  • type (str): The artifact’s type. Use the type of an artifact to both organize and differentiate artifacts. You can use any string that contains letters, numbers, underscores, hyphens, and dots. Common types include dataset or model. Include model within your type string if you want to link the artifact to the W&B Model Registry. Note that some types reserved for internal use and cannot be set by users. Such types include job and types that start with wandb-.
  • description (str | None) = None: A description of the artifact. For Model or Dataset Artifacts, add documentation for your standardized team model or dataset card. View an artifact’s description programmatically with the Artifact.description attribute or programmatically with the W&B App UI. W&B renders the description as markdown in the W&B App.
  • metadata (dict[str, Any] | None) = None: Additional information about an artifact. Specify metadata as a dictionary of key-value pairs. You can specify no more than 100 total keys.
  • incremental: Use Artifact.new_draft() method instead to modify an existing artifact.
  • use_as: Deprecated.
  • is_link: Boolean indication of if the artifact is a linked artifact(True) or source artifact(False).

Returns: An Artifact object.


property Artifact.aliases

List of one or more semantically-friendly references or

identifying “nicknames” assigned to an artifact version.

Aliases are mutable references that you can programmatically reference. Change an artifact’s alias with the W&B App UI or programmatically. See Create new artifact versions for more information.

Returns:

  • list[str]: The aliases property value.

property Artifact.collection

The collection this artifact was retrieved from.

A collection is an ordered group of artifact versions. If this artifact was retrieved from a portfolio / linked collection, that collection will be returned rather than the collection that an artifact version originated from. The collection that an artifact originates from is known as the source sequence.

Returns:

  • ArtifactCollection: The collection property value.

property Artifact.commit_hash

The hash returned when this artifact was committed.

Returns:

  • str: The commit_hash property value.

property Artifact.created_at

Timestamp when the artifact was created.

Returns:

  • str: The created_at property value.

property Artifact.description

A description of the artifact.

Returns:

  • str | None: The description property value.

property Artifact.digest

The logical digest of the artifact.

The digest is the checksum of the artifact’s contents. If an artifact has the same digest as the current latest version, then log_artifact is a no-op.

Returns:

  • str: The digest property value.

property Artifact.entity

The name of the entity that the artifact collection belongs to.

If the artifact is a link, the entity will be the entity of the linked artifact.

Returns:

  • str: The entity property value.

property Artifact.file_count

The number of files (including references).

Returns:

  • int: The file_count property value.

property Artifact.history_step

The nearest step at which history metrics were logged for the source run of the artifact.

Examples:

run = artifact.logged_by()
if run and (artifact.history_step is not None):
    history = run.sample_history(
        min_step=artifact.history_step,
        max_step=artifact.history_step + 1,
        keys=["my_metric"],
    )

Returns:

  • int | None: The history_step property value.

property Artifact.id

The artifact’s ID.

Returns:

  • str | None: The id property value.

Boolean flag indicating if the artifact is a link artifact.

True: The artifact is a link artifact to a source artifact. False: The artifact is a source artifact.

Returns:

  • bool: The is_link property value.

property Artifact.linked_artifacts

Returns a list of all the linked artifacts of a source artifact.

If the artifact is a link artifact (artifact.is_link == True), it will return an empty list. Limited to 500 results.

Returns:

  • list[Artifact]: The linked_artifacts property value.

property Artifact.manifest

The artifact’s manifest.

The manifest lists all of its contents, and can’t be changed once the artifact has been logged.

Returns:

  • ArtifactManifest: The manifest property value.

property Artifact.metadata

User-defined artifact metadata.

Structured data associated with the artifact.

Returns:

  • dict: The metadata property value.

property Artifact.name

The artifact name and version of the artifact.

A string with the format {collection}:{alias}. If fetched before an artifact is logged/saved, the name won’t contain the alias. If the artifact is a link, the name will be the name of the linked artifact.

Returns:

  • str: The name property value.

property Artifact.project

The name of the project that the artifact collection belongs to.

If the artifact is a link, the project will be the project of the linked artifact.

Returns:

  • str: The project property value.

property Artifact.qualified_name

The entity/project/name of the artifact.

If the artifact is a link, the qualified name will be the qualified name of the linked artifact path.

Returns:

  • str: The qualified_name property value.

property Artifact.size

The total size of the artifact in bytes.

Includes any references tracked by this artifact.

Returns:

  • int: The size property value.

property Artifact.source_artifact

Returns the source artifact. The source artifact is the original logged artifact.

If the artifact itself is a source artifact (artifact.is_link == False), it will return itself.

Returns:

  • Artifact: The source_artifact property value.

property Artifact.source_collection

The artifact’s source collection.

The source collection is the collection that the artifact was logged from.

Returns:

  • ArtifactCollection: The source_collection property value.

property Artifact.source_entity

The name of the entity of the source artifact.

Returns:

  • str: The source_entity property value.

property Artifact.source_name

The artifact name and version of the source artifact.

A string with the format {source_collection}:{alias}. Before the artifact is saved, contains only the name since the version is not yet known.

Returns:

  • str: The source_name property value.

property Artifact.source_project

The name of the project of the source artifact.

Returns:

  • str: The source_project property value.

property Artifact.source_qualified_name

The source_entity/source_project/source_name of the source artifact.

Returns:

  • str: The source_qualified_name property value.

property Artifact.source_version

The source artifact’s version.

A string with the format v{number}.

Returns:

  • str: The source_version property value.

property Artifact.state

The status of the artifact. One of: “PENDING”, “COMMITTED”, or “DELETED”.

Returns:

  • str: The state property value.

property Artifact.tags

List of one or more tags assigned to this artifact version.

Returns:

  • list[str]: The tags property value.

property Artifact.ttl

The time-to-live (TTL) policy of an artifact.

Artifacts are deleted shortly after a TTL policy’s duration passes. If set to None, the artifact deactivates TTL policies and will be not scheduled for deletion, even if there is a team default TTL. An artifact inherits a TTL policy from the team default if the team administrator defines a default TTL and there is no custom policy set on an artifact.

Raises:

  • ArtifactNotLoggedError: Unable to fetch inherited TTL if the artifact has not been logged or saved.

Returns:

  • timedelta | None: The ttl property value.

property Artifact.type

The artifact’s type. Common types include dataset or model.

Returns:

  • str: The type property value.

property Artifact.updated_at

The time when the artifact was last updated.

Returns:

  • str: The updated_at property value.

property Artifact.url

Constructs the URL of the artifact.

Returns:

  • str: The URL of the artifact.

Returns:

  • str: The url property value.

property Artifact.use_as

Deprecated.

Returns:

  • str | None: The use_as property value.

property Artifact.version

The artifact’s version.

A string with the format v{number}. If the artifact is a link artifact, the version will be from the linked collection.

Returns:

  • str: The version property value.

method Artifact.add

add(
    obj: 'WBValue',
    name: 'StrPath',
    overwrite: 'bool' = False
)  ArtifactManifestEntry

Add wandb.WBValue obj to the artifact.

Args:

  • obj: The object to add. Currently support one of Bokeh, JoinedTable, PartitionedTable, Table, Classes, ImageMask, BoundingBoxes2D, Audio, Image, Video, Html, Object3D
  • name: The path within the artifact to add the object.
  • overwrite: If True, overwrite existing objects with the same file path if applicable.

Returns: The added manifest entry

Raises:

  • ArtifactFinalizedError: You cannot make changes to the current artifact version because it is finalized. Log a new artifact version instead.

method Artifact.add_dir

add_dir(
    local_path: 'str',
    name: 'str | None' = None,
    skip_cache: 'bool | None' = False,
    policy: "Literal['mutable', 'immutable'] | None" = 'mutable',
    merge: 'bool' = False
)  None

Add a local directory to the artifact.

Args:

  • local_path: The path of the local directory.
  • name: The subdirectory name within an artifact. The name you specify appears in the W&B App UI nested by artifact’s type. Defaults to the root of the artifact.
  • skip_cache: If set to True, W&B will not copy/move files to the cache while uploading
  • policy: By default, “mutable”.
    • mutable: Create a temporary copy of the file to prevent corruption during upload.
    • immutable: Disable protection, rely on the user not to delete or change the file.
  • merge: If False (default), throws ValueError if a file was already added in a previous add_dir call and its content has changed. If True, overwrites existing files with changed content. Always adds new files and never removes files. To replace an entire directory, pass a name when adding the directory using add_dir(local_path, name=my_prefix) and call remove(my_prefix) to remove the directory, then add it again.

Raises:

  • ArtifactFinalizedError: You cannot make changes to the current artifact version because it is finalized. Log a new artifact version instead.
  • ValueError: Policy must be “mutable” or “immutable”

method Artifact.add_file

add_file(
    local_path: 'str',
    name: 'str | None' = None,
    is_tmp: 'bool | None' = False,
    skip_cache: 'bool | None' = False,
    policy: "Literal['mutable', 'immutable'] | None" = 'mutable',
    overwrite: 'bool' = False
)  ArtifactManifestEntry

Add a local file to the artifact.

Args:

  • local_path: The path to the file being added.
  • name: The path within the artifact to use for the file being added. Defaults to the basename of the file.
  • is_tmp: If true, then the file is renamed deterministically to avoid collisions.
  • skip_cache: If True, do not copy files to the cache after uploading.
  • policy: By default, set to “mutable”. If set to “mutable”, create a temporary copy of the file to prevent corruption during upload. If set to “immutable”, disable protection and rely on the user not to delete or change the file.
  • overwrite: If True, overwrite the file if it already exists.

Returns: The added manifest entry.

Raises:

  • ArtifactFinalizedError: You cannot make changes to the current artifact version because it is finalized. Log a new artifact version instead.
  • ValueError: Policy must be “mutable” or “immutable”

method Artifact.add_reference

add_reference(
    uri: 'ArtifactManifestEntry | str',
    name: 'StrPath | None' = None,
    checksum: 'bool' = True,
    max_objects: 'int | None' = None
)  Sequence[ArtifactManifestEntry]

Add a reference denoted by a URI to the artifact.

Unlike files or directories that you add to an artifact, references are not uploaded to W&B. For more information, see Track external files.

By default, the following schemes are supported:

  • http(s): The size and digest of the file will be inferred by the Content-Length and the ETag response headers returned by the server.
  • s3: The checksum and size are pulled from the object metadata. If bucket versioning is enabled, then the version ID is also tracked.
  • gs: The checksum and size are pulled from the object metadata. If bucket versioning is enabled, then the version ID is also tracked.
  • https, domain matching *.blob.core.windows.net
  • Azure: The checksum and size are be pulled from the blob metadata. If storage account versioning is enabled, then the version ID is also tracked.
  • file: The checksum and size are pulled from the file system. This scheme is useful if you have an NFS share or other externally mounted volume containing files you wish to track but not necessarily upload.

For any other scheme, the digest is just a hash of the URI and the size is left blank.

Args:

  • uri: The URI path of the reference to add. The URI path can be an object returned from Artifact.get_entry to store a reference to another artifact’s entry.
  • name: The path within the artifact to place the contents of this reference.
  • checksum: Whether or not to checksum the resource(s) located at the reference URI. Checksumming is strongly recommended as it enables automatic integrity validation. Disabling checksumming will speed up artifact creation but reference directories will not iterated through so the objects in the directory will not be saved to the artifact. We recommend setting checksum=False when adding reference objects, in which case a new version will only be created if the reference URI changes.
  • max_objects: The maximum number of objects to consider when adding a reference that points to directory or bucket store prefix. By default, the maximum number of objects allowed for Amazon S3, GCS, Azure, and local files is 10,000,000. Other URI schemas do not have a maximum.

Returns: The added manifest entries.

Raises:

  • ArtifactFinalizedError: You cannot make changes to the current artifact version because it is finalized. Log a new artifact version instead.

method Artifact.checkout

checkout(root: 'str | None' = None)  str

Replace the specified root directory with the contents of the artifact.

WARNING: This will delete all files in root that are not included in the artifact.

Args:

  • root: The directory to replace with this artifact’s files.

Returns: The path of the checked out contents.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

method Artifact.delete

delete(delete_aliases: 'bool' = False)  None

Delete an artifact and its files.

If called on a linked artifact, only the link is deleted, and the source artifact is unaffected.

Use artifact.unlink() instead of artifact.delete() to remove a link between a source artifact and a linked artifact.

Args:

  • delete_aliases: If set to True, deletes all aliases associated with the artifact. Otherwise, this raises an exception if the artifact has existing aliases. This parameter is ignored if the artifact is linked (a member of a portfolio collection).

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

method Artifact.download

download(
    root: 'StrPath | None' = None,
    allow_missing_references: 'bool' = False,
    skip_cache: 'bool | None' = None,
    path_prefix: 'StrPath | None' = None,
    multipart: 'bool | None' = None
)  FilePathStr

Download the contents of the artifact to the specified root directory.

Existing files located within root are not modified. Explicitly delete root before you call download if you want the contents of root to exactly match the artifact.

Args:

  • root: The directory W&B stores the artifact’s files.
  • allow_missing_references: If set to True, any invalid reference paths will be ignored while downloading referenced files.
  • skip_cache: If set to True, the artifact cache will be skipped when downloading and W&B will download each file into the default root or specified download directory.
  • path_prefix: If specified, only files with a path that starts with the given prefix will be downloaded. Uses unix format (forward slashes).
  • multipart: If set to None (default), the artifact will be downloaded in parallel using multipart download if individual file size is greater than 2GB. If set to True or False, the artifact will be downloaded in parallel or serially regardless of the file size.

Returns: The path to the downloaded contents.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

method Artifact.file

file(root: 'str | None' = None)  StrPath

Download a single file artifact to the directory you specify with root.

Args:

  • root: The root directory to store the file. Defaults to ./artifacts/self.name/.

Returns: The full path of the downloaded file.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.
  • ValueError: If the artifact contains more than one file.

method Artifact.files

files(names: 'list[str] | None' = None, per_page: 'int' = 50)  ArtifactFiles

Iterate over all files stored in this artifact.

Args:

  • names: The filename paths relative to the root of the artifact you wish to list.
  • per_page: The number of files to return per request.

Returns: An iterator containing File objects.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

method Artifact.finalize

finalize()  None

Finalize the artifact version.

You cannot modify an artifact version once it is finalized because the artifact is logged as a specific artifact version. Create a new artifact version to log more data to an artifact. An artifact is automatically finalized when you log the artifact with log_artifact.


method Artifact.get

get(name: 'str')  WBValue | None

Get the WBValue object located at the artifact relative name.

Args:

  • name: The artifact relative name to retrieve.

Returns: W&B object that can be logged with run.log() and visualized in the W&B UI.

Raises:

  • ArtifactNotLoggedError: if the artifact isn’t logged or the run is offline.

method Artifact.get_added_local_path_name

get_added_local_path_name(local_path: 'str')  str | None

Get the artifact relative name of a file added by a local filesystem path.

Args:

  • local_path: The local path to resolve into an artifact relative name.

Returns: The artifact relative name.


method Artifact.get_entry

get_entry(name: 'StrPath')  ArtifactManifestEntry

Get the entry with the given name.

Args:

  • name: The artifact relative name to get

Returns: A W&B object.

Raises:

  • ArtifactNotLoggedError: if the artifact isn’t logged or the run is offline.
  • KeyError: if the artifact doesn’t contain an entry with the given name.

method Artifact.get_path

get_path(name: 'StrPath')  ArtifactManifestEntry

Deprecated. Use get_entry(name).


method Artifact.is_draft

is_draft()  bool

Check if artifact is not saved.

Returns: Boolean. False if artifact is saved. True if artifact is not saved.


method Artifact.json_encode

json_encode()  dict[str, Any]

Returns the artifact encoded to the JSON format.

Returns: A dict with string keys representing attributes of the artifact.


link(target_path: 'str', aliases: 'list[str] | None' = None)  Artifact

Link this artifact to a portfolio (a promoted collection of artifacts).

Args:

  • target_path: The path to the portfolio inside a project. The target path must adhere to one of the following schemas {portfolio}, {project}/{portfolio} or {entity}/{project}/{portfolio}. To link the artifact to the Model Registry, rather than to a generic portfolio inside a project, set target_path to the following schema {"model-registry"}/{Registered Model Name} or {entity}/{"model-registry"}/{Registered Model Name}.
  • aliases: A list of strings that uniquely identifies the artifact inside the specified portfolio.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

Returns: The linked artifact.


method Artifact.logged_by

logged_by()  Run | None

Get the W&B run that originally logged the artifact.

Returns: The name of the W&B run that originally logged the artifact.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

method Artifact.new_draft

new_draft()  Artifact

Create a new draft artifact with the same content as this committed artifact.

Modifying an existing artifact creates a new artifact version known as an “incremental artifact”. The artifact returned can be extended or modified and logged as a new version.

Returns: An Artifact object.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

method Artifact.new_file

new_file(
    name: 'str',
    mode: 'str' = 'x',
    encoding: 'str | None' = None
)  Iterator[IO]

Open a new temporary file and add it to the artifact.

Args:

  • name: The name of the new file to add to the artifact.
  • mode: The file access mode to use to open the new file.
  • encoding: The encoding used to open the new file.

Returns: A new file object that can be written to. Upon closing, the file is automatically added to the artifact.

Raises:

  • ArtifactFinalizedError: You cannot make changes to the current artifact version because it is finalized. Log a new artifact version instead.

method Artifact.remove

remove(item: 'StrPath | ArtifactManifestEntry')  None

Remove an item from the artifact.

Args:

  • item: The item to remove. Can be a specific manifest entry or the name of an artifact-relative path. If the item matches a directory all items in that directory will be removed.

Raises:

  • ArtifactFinalizedError: You cannot make changes to the current artifact version because it is finalized. Log a new artifact version instead.
  • FileNotFoundError: If the item isn’t found in the artifact.

method Artifact.save

save(
    project: 'str | None' = None,
    settings: 'wandb.Settings | None' = None
)  None

Persist any changes made to the artifact.

If currently in a run, that run will log this artifact. If not currently in a run, a run of type “auto” is created to track this artifact.

Args:

  • project: A project to use for the artifact in the case that a run is not already in context.
  • settings: A settings object to use when initializing an automatic run. Most commonly used in testing harness.

unlink()  None

Unlink this artifact if it is currently a member of a promoted collection of artifacts.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.
  • ValueError: If the artifact is not linked, in other words, it is not a member of a portfolio collection.

method Artifact.used_by

used_by()  list[Run]

Get a list of the runs that have used this artifact and its linked artifacts.

Returns: A list of Run objects.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.

method Artifact.verify

verify(root: 'str | None' = None)  None

Verify that the contents of an artifact match the manifest.

All files in the directory are checksummed and the checksums are then cross-referenced against the artifact’s manifest. References are not verified.

Args:

  • root: The directory to verify. If None artifact will be downloaded to ‘./artifacts/self.name/’.

Raises:

  • ArtifactNotLoggedError: If the artifact is not logged.
  • ValueError: If the verification fails.

method Artifact.wait

wait(timeout: 'int | None' = None)  Artifact

If needed, wait for this artifact to finish logging.

Args:

  • timeout: The time, in seconds, to wait.

Returns: An Artifact object.

2 - Run

class Run

A unit of computation logged by W&B. Typically, this is an ML experiment.

Call wandb.init() to create a new run. wandb.init() starts a new run and returns a wandb.Run object. Each run is associated with a unique ID (run ID). W&B recommends using a context (with statement) manager to automatically finish the run.

For distributed training experiments, you can either track each process separately using one run per process or track all processes to a single run. See Log distributed training experiments for more information.

You can log data to a run with wandb.Run.log(). Anything you log using wandb.Run.log() is sent to that run. See Create an experiment or wandb.init API reference page or more information.

There is a another Run object in the wandb.apis.public namespace. Use this object is to interact with runs that have already been created.

Attributes:

  • summary: (Summary) A summary of the run, which is a dictionary-like object. For more information, see
  • [Log summary metrics](https: //docs.wandb.ai/guides/track/log/log-summary/).

Examples: Create a run with wandb.init():

import wandb

# Start a new run and log some data
# Use context manager (`with` statement) to automatically finish the run
with wandb.init(entity="entity", project="project") as run:
    run.log({"accuracy": acc, "loss": loss})

property Run.config

Config object associated with this run.

Returns:

  • wandb_config.Config: The config property value.

property Run.config_static

Static config object associated with this run.

Returns:

  • wandb_config.ConfigStatic: The config_static property value.

property Run.dir

The directory where files associated with the run are saved.

Returns:

  • str: The dir property value.

property Run.disabled

True if the run is disabled, False otherwise.

Returns:

  • bool: The disabled property value.

property Run.entity

The name of the W&B entity associated with the run.

Entity can be a username or the name of a team or organization.

Returns:

  • str: The entity property value.

property Run.group

Returns the name of the group associated with this run.

Grouping runs together allows related experiments to be organized and visualized collectively in the W&B UI. This is especially useful for scenarios such as distributed training or cross-validation, where multiple runs should be viewed and managed as a unified experiment.

In shared mode, where all processes share the same run object, setting a group is usually unnecessary, since there is only one run and no grouping is required.

Returns:

  • str: The group property value.

property Run.id

Identifier for this run.

Returns:

  • str: The id property value.

property Run.job_type

Name of the job type associated with the run.

View a run’s job type in the run’s Overview page in the W&B App.

You can use this to categorize runs by their job type, such as “training”, “evaluation”, or “inference”. This is useful for organizing and filtering runs in the W&B UI, especially when you have multiple runs with different job types in the same project. For more information, see Organize runs.

Returns:

  • str: The job_type property value.

property Run.name

Display name of the run.

Display names are not guaranteed to be unique and may be descriptive. By default, they are randomly generated.

Returns:

  • str | None: The name property value.

property Run.notes

Notes associated with the run, if there are any.

Notes can be a multiline string and can also use markdown and latex equations inside $$, like $x + 3$.

Returns:

  • str | None: The notes property value.

property Run.offline

True if the run is offline, False otherwise.

Returns:

  • bool: The offline property value.

property Run.path

Path to the run.

Run paths include entity, project, and run ID, in the format entity/project/run_id.

Returns:

  • str: The path property value.

property Run.project

Name of the W&B project associated with the run.

Returns:

  • str: The project property value.

property Run.project_url

URL of the W&B project associated with the run, if there is one.

Offline runs do not have a project URL.

Returns:

  • str | None: The project_url property value.

property Run.resumed

True if the run was resumed, False otherwise.

Returns:

  • bool: The resumed property value.

property Run.settings

A frozen copy of run’s Settings object.

Returns:

  • Settings: The settings property value.

property Run.start_time

Unix timestamp (in seconds) of when the run started.

Returns:

  • float: The start_time property value.

property Run.sweep_id

Identifier for the sweep associated with the run, if there is one.

Returns:

  • str | None: The sweep_id property value.

property Run.sweep_url

URL of the sweep associated with the run, if there is one.

Offline runs do not have a sweep URL.

Returns:

  • str | None: The sweep_url property value.

property Run.tags

Tags associated with the run, if there are any.

Returns:

  • tuple | None: The tags property value.

property Run.url

The url for the W&B run, if there is one.

Offline runs will not have a url.

Returns:

  • str | None: The url property value.

method Run.alert

alert(
    title: 'str',
    text: 'str',
    level: 'str | AlertLevel | None' = None,
    wait_duration: 'int | float | timedelta | None' = None
)  None

Create an alert with the given title and text.

Args:

  • title: The title of the alert, must be less than 64 characters long.
  • text: The text body of the alert.
  • level: The alert level to use, either: INFO, WARN, or ERROR.
  • wait_duration: The time to wait (in seconds) before sending another alert with this title.

method Run.define_metric

define_metric(
    name: 'str',
    step_metric: 'str | wandb_metric.Metric | None' = None,
    step_sync: 'bool | None' = None,
    hidden: 'bool | None' = None,
    summary: 'str | None' = None,
    goal: 'str | None' = None,
    overwrite: 'bool | None' = None
)  wandb_metric.Metric

Customize metrics logged with wandb.Run.log().

Args:

  • name: The name of the metric to customize.
  • step_metric: The name of another metric to serve as the X-axis for this metric in automatically generated charts.
  • step_sync: Automatically insert the last value of step_metric into wandb.Run.log() if it is not provided explicitly. Defaults to True if step_metric is specified.
  • hidden: Hide this metric from automatic plots.
  • summary: Specify aggregate metrics added to summary. Supported aggregations include “min”, “max”, “mean”, “last”, “first”, “best”, “copy” and “none”. “none” prevents a summary from being generated. “best” is used together with the goal parameter, “best” is deprecated and should not be used, use “min” or “max” instead. “copy” is deprecated and should not be used.
  • goal: Specify how to interpret the “best” summary type. Supported options are “minimize” and “maximize”. “goal” is deprecated and should not be used, use “min” or “max” instead.
  • overwrite: If false, then this call is merged with previous define_metric calls for the same metric by using their values for any unspecified parameters. If true, then unspecified parameters overwrite values specified by previous calls.

Returns: An object that represents this call but can otherwise be discarded.


method Run.display

display(height: 'int' = 420, hidden: 'bool' = False)  bool

Display this run in Jupyter.


method Run.finish

finish(exit_code: 'int | None' = None, quiet: 'bool | None' = None)  None

Finish a run and upload any remaining data.

Marks the completion of a W&B run and ensures all data is synced to the server. The run’s final state is determined by its exit conditions and sync status.

Run States:

  • Running: Active run that is logging data and/or sending heartbeats.
  • Crashed: Run that stopped sending heartbeats unexpectedly.
  • Finished: Run completed successfully (exit_code=0) with all data synced.
  • Failed: Run completed with errors (exit_code!=0).
  • Killed: Run was forcibly stopped before it could finish.

Args:

  • exit_code: Integer indicating the run’s exit status. Use 0 for success, any other value marks the run as failed.
  • quiet: Deprecated. Configure logging verbosity using wandb.Settings(quiet=...).

method Run.finish_artifact

finish_artifact(
    artifact_or_path: 'Artifact | str',
    name: 'str | None' = None,
    type: 'str | None' = None,
    aliases: 'list[str] | None' = None,
    distributed_id: 'str | None' = None
)  Artifact

Finishes a non-finalized artifact as output of a run.

Subsequent “upserts” with the same distributed ID will result in a new version.

Args:

  • artifact_or_path: A path to the contents of this artifact, can be in the following forms: - /local/directory - /local/directory/file.txt - s3://bucket/path You can also pass an Artifact object created by calling wandb.Artifact.
  • name: An artifact name. May be prefixed with entity/project. Valid names can be in the following forms: - name:version - name:alias - digest This will default to the basename of the path prepended with the current run id if not specified.
  • type: The type of artifact to log, examples include dataset, model
  • aliases: Aliases to apply to this artifact, defaults to ["latest"]
  • distributed_id: Unique string that all distributed jobs share. If None, defaults to the run’s group name.

Returns: An Artifact object.


link_artifact(
    artifact: 'Artifact',
    target_path: 'str',
    aliases: 'list[str] | None' = None
)  Artifact

Link the given artifact to a portfolio (a promoted collection of artifacts).

Linked artifacts are visible in the UI for the specified portfolio.

Args:

  • artifact: the (public or local) artifact which will be linked
  • target_path: str - takes the following forms: {portfolio}, {project}/{portfolio}, or {entity}/{project}/{portfolio}
  • aliases: List[str] - optional alias(es) that will only be applied on this linked artifact inside the portfolio. The alias “latest” will always be applied to the latest version of an artifact that is linked.

Returns: The linked artifact.


link_model(
    path: 'StrPath',
    registered_model_name: 'str',
    name: 'str | None' = None,
    aliases: 'list[str] | None' = None
)  Artifact | None

Log a model artifact version and link it to a registered model in the model registry.

Linked model versions are visible in the UI for the specified registered model.

This method will:

  • Check if ’name’ model artifact has been logged. If so, use the artifact version that matches the files located at ‘path’ or log a new version. Otherwise log files under ‘path’ as a new model artifact, ’name’ of type ‘model’.
  • Check if registered model with name ‘registered_model_name’ exists in the ‘model-registry’ project. If not, create a new registered model with name ‘registered_model_name’.
  • Link version of model artifact ’name’ to registered model, ‘registered_model_name’.
  • Attach aliases from ‘aliases’ list to the newly linked model artifact version.

Args:

  • path: (str) A path to the contents of this model, can be in the following forms:
    • /local/directory
    • /local/directory/file.txt
    • s3://bucket/path
  • registered_model_name: The name of the registered model that the model is to be linked to. A registered model is a collection of model versions linked to the model registry, typically representing a team’s specific ML Task. The entity that this registered model belongs to will be derived from the run.
  • name: The name of the model artifact that files in ‘path’ will be logged to. This will default to the basename of the path prepended with the current run id if not specified.
  • aliases: Aliases that will only be applied on this linked artifact inside the registered model. The alias “latest” will always be applied to the latest version of an artifact that is linked.

Raises:

  • AssertionError: If registered_model_name is a path or if model artifact ’name’ is of a type that does not contain the substring ‘model’.
  • ValueError: If name has invalid special characters.

Returns: The linked artifact if linking was successful, otherwise None.


method Run.log

log(
    data: 'dict[str, Any]',
    step: 'int | None' = None,
    commit: 'bool | None' = None
)  None

Upload run data.

Use log to log data from runs, such as scalars, images, video, histograms, plots, and tables. See Log objects and media for code snippets, best practices, and more.

Basic usage:

import wandb

with wandb.init() as run:
     run.log({"train-loss": 0.5, "accuracy": 0.9})

The previous code snippet saves the loss and accuracy to the run’s history and updates the summary values for these metrics.

Visualize logged data in a workspace at wandb.ai, or locally on a self-hosted instance of the W&B app, or export data to visualize and explore locally, such as in a Jupyter notebook, with the Public API.

Logged values don’t have to be scalars. You can log any W&B supported Data Type such as images, audio, video, and more. For example, you can use wandb.Table to log structured data. See Log tables, visualize and query data tutorial for more details.

W&B organizes metrics with a forward slash (/) in their name into sections named using the text before the final slash. For example, the following results in two sections named “train” and “validate”:

with wandb.init() as run:
     # Log metrics in the "train" section.
     run.log(
         {
             "train/accuracy": 0.9,
             "train/loss": 30,
             "validate/accuracy": 0.8,
             "validate/loss": 20,
         }
     )

Only one level of nesting is supported; run.log({"a/b/c": 1}) produces a section named “a/b”.

run.log() is not intended to be called more than a few times per second. For optimal performance, limit your logging to once every N iterations, or collect data over multiple iterations and log it in a single step.

By default, each call to log creates a new “step”. The step must always increase, and it is not possible to log to a previous step. You can use any metric as the X axis in charts. See Custom log axes for more details.

In many cases, it is better to treat the W&B step like you’d treat a timestamp rather than a training step.

with wandb.init() as run:
     # Example: log an "epoch" metric for use as an X axis.
     run.log({"epoch": 40, "train-loss": 0.5})

It is possible to use multiple wandb.Run.log() invocations to log to the same step with the step and commit parameters. The following are all equivalent:

with wandb.init() as run:
     # Normal usage:
     run.log({"train-loss": 0.5, "accuracy": 0.8})
     run.log({"train-loss": 0.4, "accuracy": 0.9})

     # Implicit step without auto-incrementing:
     run.log({"train-loss": 0.5}, commit=False)
     run.log({"accuracy": 0.8})
     run.log({"train-loss": 0.4}, commit=False)
     run.log({"accuracy": 0.9})

     # Explicit step:
     run.log({"train-loss": 0.5}, step=current_step)
     run.log({"accuracy": 0.8}, step=current_step)
     current_step += 1
     run.log({"train-loss": 0.4}, step=current_step)
     run.log({"accuracy": 0.9}, step=current_step)

Args:

  • data: A dict with str keys and values that are serializable
  • Python objects including: int, float and string; any of the wandb.data_types; lists, tuples and NumPy arrays of serializable Python objects; other dicts of this structure.
  • step: The step number to log. If None, then an implicit auto-incrementing step is used. See the notes in the description.
  • commit: If true, finalize and upload the step. If false, then accumulate data for the step. See the notes in the description. If step is None, then the default is commit=True; otherwise, the default is commit=False.

Examples: For more and more detailed examples, see our guides to logging.

Basic usage

import wandb

with wandb.init() as run:
    run.log({"train-loss": 0.5, "accuracy": 0.9

Incremental logging

import wandb

with wandb.init() as run:
    run.log({"loss": 0.2}, commit=False)
    # Somewhere else when I'm ready to report this step:
    run.log({"accuracy": 0.8})

Histogram

import numpy as np
import wandb

# sample gradients at random from normal distribution
gradients = np.random.randn(100, 100)
with wandb.init() as run:
    run.log({"gradients": wandb.Histogram(gradients)})

Image from NumPy

import numpy as np
import wandb

with wandb.init() as run:
    examples = []
    for i in range(3):
         pixels = np.random.randint(low=0, high=256, size=(100, 100, 3))
         image = wandb.Image(pixels, caption=f"random field {i}")
         examples.append(image)
    run.log({"examples": examples})

Image from PIL

import numpy as np
from PIL import Image as PILImage
import wandb

with wandb.init() as run:
    examples = []
    for i in range(3):
         pixels = np.random.randint(
             low=0,
             high=256,
             size=(100, 100, 3),
             dtype=np.uint8,
         )
         pil_image = PILImage.fromarray(pixels, mode="RGB")
         image = wandb.Image(pil_image, caption=f"random field {i}")
         examples.append(image)
    run.log({"examples": examples})

Video from NumPy

import numpy as np
import wandb

with wandb.init() as run:
    # axes are (time, channel, height, width)
    frames = np.random.randint(
         low=0,
         high=256,
         size=(10, 3, 100, 100),
         dtype=np.uint8,
    )
    run.log({"video": wandb.Video(frames, fps=4)})

Matplotlib plot

from matplotlib import pyplot as plt
import numpy as np
import wandb

with wandb.init() as run:
    fig, ax = plt.subplots()
    x = np.linspace(0, 10)
    y = x * x
    ax.plot(x, y)  # plot y = x^2
    run.log({"chart": fig})

PR Curve

import wandb

with wandb.init() as run:
    run.log({"pr": wandb.plot.pr_curve(y_test, y_probas, labels)})

3D Object

import wandb

with wandb.init() as run:
    run.log(
         {
             "generated_samples": [
                 wandb.Object3D(open("sample.obj")),
                 wandb.Object3D(open("sample.gltf")),
                 wandb.Object3D(open("sample.glb")),
             ]
         }
    )

Raises:

  • wandb.Error: If called before wandb.init().
  • ValueError: If invalid data is passed.

method Run.log_artifact

log_artifact(
    artifact_or_path: 'Artifact | StrPath',
    name: 'str | None' = None,
    type: 'str | None' = None,
    aliases: 'list[str] | None' = None,
    tags: 'list[str] | None' = None
)  Artifact

Declare an artifact as an output of a run.

Args:

  • artifact_or_path: (str or Artifact) A path to the contents of this artifact, can be in the following forms: - /local/directory - /local/directory/file.txt - s3://bucket/path You can also pass an Artifact object created by calling wandb.Artifact.
  • name: (str, optional) An artifact name. Valid names can be in the following forms: - name:version - name:alias - digest This will default to the basename of the path prepended with the current run id if not specified.
  • type: (str) The type of artifact to log, examples include dataset, model
  • aliases: (list, optional) Aliases to apply to this artifact, defaults to ["latest"]
  • tags: (list, optional) Tags to apply to this artifact, if any.

Returns: An Artifact object.


method Run.log_code

log_code(
    root: 'str | None' = '.',
    name: 'str | None' = None,
    include_fn: 'Callable[[str, str], bool] | Callable[[str], bool]' = <function _is_py_requirements_or_dockerfile at 0x104a016c0>,
    exclude_fn: 'Callable[[str, str], bool] | Callable[[str], bool]' = <function exclude_wandb_fn at 0x10572a680>
)  Artifact | None

Save the current state of your code to a W&B Artifact.

By default, it walks the current directory and logs all files that end with .py.

Args:

  • root: The relative (to os.getcwd()) or absolute path to recursively find code from.
  • name: (str, optional) The name of our code artifact. By default, we’ll name the artifact source-$PROJECT_ID-$ENTRYPOINT_RELPATH. There may be scenarios where you want many runs to share the same artifact. Specifying name allows you to achieve that.
  • include_fn: A callable that accepts a file path and (optionally) root path and returns True when it should be included and False otherwise. This
  • defaults to lambda path, root: path.endswith(".py").
  • exclude_fn: A callable that accepts a file path and (optionally) root path and returns True when it should be excluded and False otherwise. This defaults to a function that excludes all files within <root>/.wandb/ and <root>/wandb/ directories.

Examples: Basic usage

import wandb

with wandb.init() as run:
    run.log_code()

Advanced usage

import wandb

with wandb.init() as run:
    run.log_code(
         root="../",
         include_fn=lambda path: path.endswith(".py") or path.endswith(".ipynb"),
         exclude_fn=lambda path, root: os.path.relpath(path, root).startswith(
             "cache/"
         ),
    )

Returns: An Artifact object if code was logged


method Run.log_model

log_model(
    path: 'StrPath',
    name: 'str | None' = None,
    aliases: 'list[str] | None' = None
)  None

Logs a model artifact containing the contents inside the ‘path’ to a run and marks it as an output to this run.

The name of model artifact can only contain alphanumeric characters, underscores, and hyphens.

Args:

  • path: (str) A path to the contents of this model, can be in the following forms: - /local/directory - /local/directory/file.txt - s3://bucket/path
  • name: A name to assign to the model artifact that the file contents will be added to. This will default to the basename of the path prepended with the current run id if not specified.
  • aliases: Aliases to apply to the created model artifact, defaults to ["latest"]

Raises:

  • ValueError: If name has invalid special characters.

Returns: None


method Run.mark_preempting

mark_preempting()  None

Mark this run as preempting.

Also tells the internal process to immediately report this to server.


method Run.restore

restore(
    name: 'str',
    run_path: 'str | None' = None,
    replace: 'bool' = False,
    root: 'str | None' = None
)  None | TextIO

Download the specified file from cloud storage.

File is placed into the current directory or run directory. By default, will only download the file if it doesn’t already exist.

Args:

  • name: The name of the file.
  • run_path: Optional path to a run to pull files from, i.e. username/project_name/run_id if wandb.init has not been called, this is required.
  • replace: Whether to download the file even if it already exists locally
  • root: The directory to download the file to. Defaults to the current directory or the run directory if wandb.init was called.

Returns: None if it can’t find the file, otherwise a file object open for reading.

Raises:

  • CommError: If W&B can’t connect to the W&B backend.
  • ValueError: If the file is not found or can’t find run_path.

method Run.save

save(
    glob_str: 'str | os.PathLike',
    base_path: 'str | os.PathLike | None' = None,
    policy: 'PolicyName' = 'live'
)  bool | list[str]

Sync one or more files to W&B.

Relative paths are relative to the current working directory.

A Unix glob, such as “myfiles/*”, is expanded at the time save is called regardless of the policy. In particular, new files are not picked up automatically.

A base_path may be provided to control the directory structure of uploaded files. It should be a prefix of glob_str, and the directory structure beneath it is preserved.

When given an absolute path or glob and no base_path, one directory level is preserved as in the example above.

Args:

  • glob_str: A relative or absolute path or Unix glob.
  • base_path: A path to use to infer a directory structure; see examples.
  • policy: One of live, now, or end.
    • live: upload the file as it changes, overwriting the previous version
    • now: upload the file once now
    • end: upload file when the run ends

Returns: Paths to the symlinks created for the matched files.

For historical reasons, this may return a boolean in legacy code.

import wandb

run = wandb.init()

run.save("these/are/myfiles/*")
# => Saves files in a "these/are/myfiles/" folder in the run.

run.save("these/are/myfiles/*", base_path="these")
# => Saves files in an "are/myfiles/" folder in the run.

run.save("/User/username/Documents/run123/*.txt")
# => Saves files in a "run123/" folder in the run. See note below.

run.save("/User/username/Documents/run123/*.txt", base_path="/User")
# => Saves files in a "username/Documents/run123/" folder in the run.

run.save("files/*/saveme.txt")
# => Saves each "saveme.txt" file in an appropriate subdirectory
#    of "files/".

# Explicitly finish the run since a context manager is not used.
run.finish()

method Run.status

status()  RunStatus

Get sync info from the internal backend, about the current run’s sync status.


method Run.unwatch

unwatch(
    models: 'torch.nn.Module | Sequence[torch.nn.Module] | None' = None
)  None

Remove pytorch model topology, gradient and parameter hooks.

Args:

  • models: Optional list of pytorch models that have had watch called on them.

method Run.upsert_artifact

upsert_artifact(
    artifact_or_path: 'Artifact | str',
    name: 'str | None' = None,
    type: 'str | None' = None,
    aliases: 'list[str] | None' = None,
    distributed_id: 'str | None' = None
)  Artifact

Declare (or append to) a non-finalized artifact as output of a run.

Note that you must call run.finish_artifact() to finalize the artifact. This is useful when distributed jobs need to all contribute to the same artifact.

Args:

  • artifact_or_path: A path to the contents of this artifact, can be in the following forms:
    • /local/directory
    • /local/directory/file.txt
    • s3://bucket/path
  • name: An artifact name. May be prefixed with “entity/project”. Defaults to the basename of the path prepended with the current run ID if not specified. Valid names can be in the following forms:
    • name:version
    • name:alias
    • digest
  • type: The type of artifact to log. Common examples include dataset, model.
  • aliases: Aliases to apply to this artifact, defaults to ["latest"].
  • distributed_id: Unique string that all distributed jobs share. If None, defaults to the run’s group name.

Returns: An Artifact object.


method Run.use_artifact

use_artifact(
    artifact_or_name: 'str | Artifact',
    type: 'str | None' = None,
    aliases: 'list[str] | None' = None,
    use_as: 'str | None' = None
)  Artifact

Declare an artifact as an input to a run.

Call download or file on the returned object to get the contents locally.

Args:

  • artifact_or_name: The name of the artifact to use. May be prefixed with the name of the project the artifact was logged to ("" or “/”). If no entity is specified in the name, the Run or API setting’s entity is used. Valid names can be in the following forms
    • name:version
    • name:alias
  • type: The type of artifact to use.
  • aliases: Aliases to apply to this artifact
  • use_as: This argument is deprecated and does nothing.

Returns: An Artifact object.

Examples:

import wandb

run = wandb.init(project="<example>")

# Use an artifact by name and alias
artifact_a = run.use_artifact(artifact_or_name="<name>:<alias>")

# Use an artifact by name and version
artifact_b = run.use_artifact(artifact_or_name="<name>:v<version>")

# Use an artifact by entity/project/name:alias
artifact_c = run.use_artifact(
   artifact_or_name="<entity>/<project>/<name>:<alias>"
)

# Use an artifact by entity/project/name:version
artifact_d = run.use_artifact(
   artifact_or_name="<entity>/<project>/<name>:v<version>"
)

# Explicitly finish the run since a context manager is not used.
run.finish()

method Run.use_model

use_model(name: 'str')  FilePathStr

Download the files logged in a model artifact ’name’.

Args:

  • name: A model artifact name. ’name’ must match the name of an existing logged model artifact. May be prefixed with entity/project/. Valid names can be in the following forms
    • model_artifact_name:version
    • model_artifact_name:alias

Returns:

  • path (str): Path to downloaded model artifact file(s).

Raises:

  • AssertionError: If model artifact ’name’ is of a type that does not contain the substring ‘model’.

method Run.watch

watch(
    models: 'torch.nn.Module | Sequence[torch.nn.Module]',
    criterion: 'torch.F | None' = None,
    log: "Literal['gradients', 'parameters', 'all'] | None" = 'gradients',
    log_freq: 'int' = 1000,
    idx: 'int | None' = None,
    log_graph: 'bool' = False
)  None

Hook into given PyTorch model to monitor gradients and the model’s computational graph.

This function can track parameters, gradients, or both during training.

Args:

  • models: A single model or a sequence of models to be monitored.
  • criterion: The loss function being optimized (optional).
  • log: Specifies whether to log “gradients”, “parameters”, or “all”. Set to None to disable logging. (default=“gradients”).
  • log_freq: Frequency (in batches) to log gradients and parameters. (default=1000)
  • idx: Index used when tracking multiple models with wandb.watch. (default=None)
  • log_graph: Whether to log the model’s computational graph. (default=False)

Raises: ValueError: If wandb.init() has not been called or if any of the models are not instances of torch.nn.Module.

3 - Settings

class Settings

Settings for the W&B SDK.

This class manages configuration settings for the W&B SDK, ensuring type safety and validation of all settings. Settings are accessible as attributes and can be initialized programmatically, through environment variables (WANDB_ prefix), and with configuration files.

The settings are organized into three categories:

  1. Public settings: Core configuration options that users can safely modify to customize W&B’s behavior for their specific needs.
  2. Internal settings: Settings prefixed with ‘x_’ that handle low-level SDK behavior. These settings are primarily for internal use and debugging. While they can be modified, they are not considered part of the public API and may change without notice in future versions.
  3. Computed settings: Read-only settings that are automatically derived from other settings or the environment.

method Settings.__init__

__init__(
    allow_offline_artifacts: 'bool' = True,
    allow_val_change: 'bool' = False,
    anonymous: 'Literal['must', 'allow', 'never'] | None' = None,
    api_key: 'str | None' = None,
    azure_account_url_to_access_key: 'dict[str, str] | None' = None,
    base_url: 'str' = 'https://api.wandb.ai',
    code_dir: 'str | None' = None,
    config_paths: 'Sequence | None' = None,
    console: 'Literal['auto', 'off', 'wrap', 'redirect', 'wrap_raw', 'wrap_emu']' = 'auto',
    console_multipart: 'bool' = False,
    credentials_file: 'str' = None,
    disable_code: 'bool' = False,
    disable_git: 'bool' = False,
    disable_job_creation: 'bool' = True,
    docker: 'str | None' = None,
    email: 'str | None' = None,
    entity: 'str | None' = None,
    organization: 'str | None' = None,
    force: 'bool' = False,
    fork_from: 'RunMoment | None' = None,
    git_commit: 'str | None' = None,
    git_remote: 'str' = 'origin',
    git_remote_url: 'str | None' = None,
    git_root: 'str | None' = None,
    heartbeat_seconds: 'int' = 30,
    host: 'str | None' = None,
    http_proxy: 'str | None' = None,
    https_proxy: 'str | None' = None,
    identity_token_file: 'str | None' = None,
    ignore_globs: 'Sequence' = (),
    init_timeout: 'float' = 90.0,
    insecure_disable_ssl: 'bool' = False,
    job_name: 'str | None' = None,
    job_source: 'Literal['repo', 'artifact', 'image'] | None' = None,
    label_disable: 'bool' = False,
    launch: 'bool' = False,
    launch_config_path: 'str | None' = None,
    login_timeout: 'float | None' = None,
    mode: 'Literal['online', 'offline', 'shared', 'disabled', 'dryrun', 'run']' = 'online',
    notebook_name: 'str | None' = None,
    program: 'str | None' = None,
    program_abspath: 'str | None' = None,
    program_relpath: 'str | None' = None,
    project: 'str | None' = None,
    quiet: 'bool' = False,
    reinit: 'Literal['default', 'return_previous', 'finish_previous', 'create_new'] | bool' = 'default',
    relogin: 'bool' = False,
    resume: 'Literal['allow', 'must', 'never', 'auto'] | None' = None,
    resume_from: 'RunMoment | None' = None,
    resumed: 'bool' = False,
    root_dir: 'str' = None,
    run_group: 'str | None' = None,
    run_id: 'str | None' = None,
    run_job_type: 'str | None' = None,
    run_name: 'str | None' = None,
    run_notes: 'str | None' = None,
    run_tags: 'tuple[str, Ellipsis] | None' = None,
    sagemaker_disable: 'bool' = False,
    save_code: 'bool | None' = None,
    settings_system: 'str | None' = None,
    max_end_of_run_history_metrics: 'int' = 10,
    max_end_of_run_summary_metrics: 'int' = 10,
    show_colors: 'bool | None' = None,
    show_emoji: 'bool | None' = None,
    show_errors: 'bool' = True,
    show_info: 'bool' = True,
    show_warnings: 'bool' = True,
    silent: 'bool' = False,
    start_method: 'str | None' = None,
    strict: 'bool | None' = None,
    summary_timeout: 'int' = 60,
    summary_warnings: 'int' = 5,
    sweep_id: 'str | None' = None,
    sweep_param_path: 'str | None' = None,
    symlink: 'bool' = None,
    sync_tensorboard: 'bool | None' = None,
    table_raise_on_max_row_limit_exceeded: 'bool' = False,
    username: 'str | None' = None,
    x_cli_only_mode: 'bool' = False,
    x_disable_meta: 'bool' = False,
    x_disable_stats: 'bool' = False,
    x_disable_viewer: 'bool' = False,
    x_disable_machine_info: 'bool' = False,
    x_executable: 'str | None' = None,
    x_extra_http_headers: 'dict[str, str] | None' = None,
    x_file_stream_max_bytes: 'int | None' = None,
    x_file_stream_max_line_bytes: 'int | None' = None,
    x_file_stream_transmit_interval: 'float | None' = None,
    x_file_stream_retry_max: 'int | None' = None,
    x_file_stream_retry_wait_min_seconds: 'float | None' = None,
    x_file_stream_retry_wait_max_seconds: 'float | None' = None,
    x_file_stream_timeout_seconds: 'float | None' = None,
    x_file_transfer_retry_max: 'int | None' = None,
    x_file_transfer_retry_wait_min_seconds: 'float | None' = None,
    x_file_transfer_retry_wait_max_seconds: 'float | None' = None,
    x_file_transfer_timeout_seconds: 'float | None' = None,
    x_files_dir: 'str | None' = None,
    x_flow_control_custom: 'bool | None' = None,
    x_flow_control_disabled: 'bool | None' = None,
    x_graphql_retry_max: 'int | None' = None,
    x_graphql_retry_wait_min_seconds: 'float | None' = None,
    x_graphql_retry_wait_max_seconds: 'float | None' = None,
    x_graphql_timeout_seconds: 'float | None' = None,
    x_internal_check_process: 'float' = 8.0,
    x_jupyter_name: 'str | None' = None,
    x_jupyter_path: 'str | None' = None,
    x_jupyter_root: 'str | None' = None,
    x_label: 'str | None' = None,
    x_live_policy_rate_limit: 'int | None' = None,
    x_live_policy_wait_time: 'int | None' = None,
    x_log_level: 'int' = 20,
    x_network_buffer: 'int | None' = None,
    x_primary: 'bool' = True,
    x_proxies: 'dict[str, str] | None' = None,
    x_runqueue_item_id: 'str | None' = None,
    x_save_requirements: 'bool' = True,
    x_server_side_derived_summary: 'bool' = False,
    x_server_side_expand_glob_metrics: 'bool' = True,
    x_service_transport: 'str | None' = None,
    x_service_wait: 'float' = 30.0,
    x_skip_transaction_log: 'bool' = False,
    x_start_time: 'float | None' = None,
    x_stats_pid: 'int' = 16976,
    x_stats_sampling_interval: 'float' = 15.0,
    x_stats_neuron_monitor_config_path: 'str | None' = None,
    x_stats_dcgm_exporter: 'str | None' = None,
    x_stats_open_metrics_endpoints: 'dict[str, str] | None' = None,
    x_stats_open_metrics_filters: 'dict[str, dict[str, str]] | Sequence | None' = None,
    x_stats_open_metrics_http_headers: 'dict[str, str] | None' = None,
    x_stats_disk_paths: 'Sequence | None' = ('/',),
    x_stats_cpu_count: 'int | None' = None,
    x_stats_cpu_logical_count: 'int | None' = None,
    x_stats_gpu_count: 'int | None' = None,
    x_stats_gpu_type: 'str | None' = None,
    x_stats_gpu_device_ids: 'Sequence | None' = None,
    x_stats_buffer_size: 'int' = 0,
    x_stats_coreweave_metadata_base_url: 'str' = 'http://169.254.169.254',
    x_stats_coreweave_metadata_endpoint: 'str' = '/api/v2/cloud-init/meta-data',
    x_stats_track_process_tree: 'bool' = False,
    x_sync: 'bool' = False,
    x_update_finish_state: 'bool' = True
)  None

Args:

  • allow_offline_artifacts (bool): Flag to allow table artifacts to be synced in offline mode. To revert to the old behavior, set this to False.

  • allow_val_change (bool): Flag to allow modification of Config values after they’ve been set.

  • anonymous (Optional[Literal[‘must’, ‘allow’, ’never’]]): Controls anonymous data logging. Possible values are:

    • “never”: requires you to link your W&B account before tracking the run, so you don’t accidentally create an anonymous run.
    • “allow”: lets a logged-in user track runs with their account, but lets someone who is running the script without a W&B account see the charts in the UI.
    • “must”: sends the run to an anonymous account instead of to a signed-up user account.
  • api_key (Optional[str]): The W&B API key.

  • azure_account_url_to_access_key (Optional[Dict[str, str]]): Mapping of Azure account URLs to their corresponding access keys for Azure integration.

  • base_url (str): The URL of the W&B backend for data synchronization.

  • code_dir (Optional[str]): Directory containing the code to be tracked by W&B.

  • config_paths (Optional[Sequence]): Paths to files to load configuration from into the Config object.

  • console (Literal[‘auto’, ‘off’, ‘wrap’, ‘redirect’, ‘wrap_raw’, ‘wrap_emu’]): The type of console capture to be applied. Possible values are: “auto” - Automatically selects the console capture method based on the system environment and settings. “off” - Disables console capture. “redirect” - Redirects low-level file descriptors for capturing output. “wrap” - Overrides the write methods of sys.stdout/sys.stderr. Will be mapped to either “wrap_raw” or “wrap_emu” based on the state of the system. “wrap_raw” - Same as “wrap” but captures raw output directly instead of through an emulator. Derived from the wrap setting and should not be set manually. “wrap_emu” - Same as “wrap” but captures output through an emulator. Derived from the wrap setting and should not be set manually.

  • console_multipart (bool): Whether to produce multipart console log files.

  • credentials_file (str): Path to file for writing temporary access tokens.

  • disable_code (bool): Whether to disable capturing the code.

  • disable_git (bool): Whether to disable capturing the git state.

  • disable_job_creation (bool): Whether to disable the creation of a job artifact for W&B Launch.

  • docker (Optional[str]): The Docker image used to execute the script.

  • email (Optional[str]): The email address of the user.

  • entity (Optional[str]): The W&B entity, such as a user or a team.

  • organization (Optional[str]): The W&B organization.

  • force (bool): Whether to pass the force flag to wandb.login().

  • fork_from (Optional[RunMoment]): Specifies a point in a previous execution of a run to fork from. The point is defined by the run ID, a metric, and its value. Currently, only the metric ‘_step’ is supported.

  • git_commit (Optional[str]): The git commit hash to associate with the run.

  • git_remote (str): The git remote to associate with the run.

  • git_remote_url (Optional[str]): The URL of the git remote repository.

  • git_root (Optional[str]): Root directory of the git repository.

  • host (Optional[str]): Hostname of the machine running the script.

  • http_proxy (Optional[str]): Custom proxy servers for http requests to W&B.

  • https_proxy (Optional[str]): Custom proxy servers for https requests to W&B.

  • identity_token_file (Optional[str]): Path to file containing an identity token (JWT) for authentication.

  • ignore_globs (Sequence): Unix glob patterns relative to files_dir specifying files to exclude from upload.

  • init_timeout (float): Time in seconds to wait for the wandb.init call to complete before timing out.

  • insecure_disable_ssl (bool): Whether to insecurely disable SSL verification.

  • job_name (Optional[str]): Name of the Launch job running the script.

  • job_source (Optional[Literal[‘repo’, ‘artifact’, ‘image’]]): Source type for Launch.

  • label_disable (bool): Whether to disable automatic labeling features.

  • launch_config_path (Optional[str]): Path to the launch configuration file.

  • login_timeout (Optional[float]): Time in seconds to wait for login operations before timing out.

  • mode (Literal[‘online’, ‘offline’, ‘shared’, ‘disabled’, ‘dryrun’, ‘run’]): The operating mode for W&B logging and synchronization.

  • notebook_name (Optional[str]): Name of the notebook if running in a Jupyter-like environment.

  • program (Optional[str]): Path to the script that created the run, if available.

  • program_abspath (Optional[str]): The absolute path from the root repository directory to the script that created the run. Root repository directory is defined as the directory containing the .git directory, if it exists. Otherwise, it’s the current working directory.

  • program_relpath (Optional[str]): The relative path to the script that created the run.

  • project (Optional[str]): The W&B project ID.

  • quiet (bool): Flag to suppress non-essential output.

  • reinit (Union[Literal[‘default’, ‘return_previous’, ‘finish_previous’, ‘create_new’], bool]): What to do when wandb.init() is called while a run is active. Options:

    • “default”: Use “finish_previous” in notebooks and “return_previous” otherwise.
    • “return_previous”: Return the most recently created run that is not yet finished. This does not update wandb.run; see the “create_new” option.
    • “finish_previous”: Finish all active runs, then return a new run.
    • “create_new”: Create a new run without modifying other active runs. Does not update wandb.run and top-level functions like wandb.log. Because of this, some older integrations that rely on the global run will not work. Can also be a boolean, but this is deprecated. False is the same as “return_previous”, and True is the same as “finish_previous”.
  • relogin (bool): Flag to force a new login attempt.

  • resume (Optional[Literal[‘allow’, ‘must’, ’never’, ‘auto’]]): Specifies the resume behavior for the run. Options:

    • “must”: Resumes from an existing run with the same ID. If no such run exists, it will result in failure.
    • “allow”: Attempts to resume from an existing run with the same ID. If none is found, a new run will be created.
    • “never”: Always starts a new run. If a run with the same ID already exists, it will result in failure.
    • “auto”: Automatically resumes from the most recent failed run on the same machine.
  • resume_from (Optional[RunMoment]): Specifies a point in a previous execution of a run to resume from. The point is defined by the run ID, a metric, and its value. Currently, only the metric ‘_step’ is supported.

  • root_dir (str): The root directory to use as the base for all run-related paths. In particular, this is used to derive the wandb directory and the run directory.

  • run_group (Optional[str]): Group identifier for related runs. Used for grouping runs in the UI.

  • run_id (Optional[str]): The ID of the run.

  • run_job_type (Optional[str]): Type of job being run (e.g., training, evaluation).

  • run_name (Optional[str]): Human-readable name for the run.

  • run_notes (Optional[str]): Additional notes or description for the run.

  • run_tags (Optional[Tuple[str, Ellipsis]]): Tags to associate with the run for organization and filtering.

  • sagemaker_disable (bool): Flag to disable SageMaker-specific functionality.

  • save_code (Optional[bool]): Whether to save the code associated with the run.

  • settings_system (Optional[str]): Path to the system-wide settings file.

  • max_end_of_run_history_metrics (int): Maximum number of history sparklines to display at the end of a run.

  • max_end_of_run_summary_metrics (int): Maximum number of summary metrics to display at the end of a run.

  • show_errors (bool): Whether to display error messages.

  • show_info (bool): Whether to display informational messages.

  • show_warnings (bool): Whether to display warning messages.

  • silent (bool): Flag to suppress all output.

  • strict (Optional[bool]): Whether to enable strict mode for validation and error checking.

  • summary_timeout (int): Time in seconds to wait for summary operations before timing out.

  • sweep_id (Optional[str]): Identifier of the sweep this run belongs to.

  • sweep_param_path (Optional[str]): Path to the sweep parameters configuration.

  • symlink (bool): Whether to use symlinks (True by default except on Windows).

  • sync_tensorboard (Optional[bool]): Whether to synchronize TensorBoard logs with W&B.

  • table_raise_on_max_row_limit_exceeded (bool): Whether to raise an exception when table row limits are exceeded.

  • username (Optional[str]): Username.

  • x_disable_meta (bool): Flag to disable the collection of system metadata.

  • x_disable_stats (bool): Flag to disable the collection of system metrics.

  • x_extra_http_headers (Optional[Dict[str, str]]): Additional headers to add to all outgoing HTTP requests.

  • x_label (Optional[str]): Label to assign to system metrics and console logs collected for the run. This is used to group data by on the frontend and can be used to distinguish data from different processes in a distributed training job.

  • x_primary (bool): Determines whether to save internal wandb files and metadata. In a distributed setting, this is useful for avoiding file overwrites from secondary processes when only system metrics and logs are needed, as the primary process handles the main logging.

  • x_save_requirements (bool): Flag to save the requirements file.

  • x_server_side_derived_summary (bool): Flag to delegate automatic computation of summary from history to the server. This does not disable user-provided summary updates.

  • x_service_wait (float): Time in seconds to wait for the wandb-core internal service to start.

  • x_skip_transaction_log (bool): Whether to skip saving the run events to the transaction log. This is only relevant for online runs. Can be used to reduce the amount of data written to disk. Should be used with caution, as it removes the gurantees about recoverability.

  • x_stats_sampling_interval (float): Sampling interval for the system monitor in seconds.

  • x_stats_dcgm_exporter (Optional[str]): Endpoint to extract Nvidia DCGM metrics from. Options:

    • Extract DCGM-related metrics from a query to the Prometheus /api/v1/query endpoint. It is a common practice to aggregate metrics reported by the instances of the DCGM Exporter running on different nodes in a cluster using Prometheus.
    • TODO: Parse metrics directly from the /metrics endpoint of the DCGM Exporter. Examples:
    • http://localhost:9400/api/v1/query?query=DCGM_FI_DEV_GPU_TEMP{node="l1337", cluster="globular"}.
  • x_stats_open_metrics_endpoints (Optional[Dict[str, str]]): OpenMetrics /metrics endpoints to monitor for system metrics.

  • x_stats_open_metrics_filters (Union[Dict[str, Dict[str, str]], Sequence, None]): Filter to apply to metrics collected from OpenMetrics /metrics endpoints. Supports two formats:

    • {“metric regex pattern, including endpoint name as prefix”: {“label”: “label value regex pattern”}}
    • (“metric regex pattern 1”, “metric regex pattern 2”, …)
  • x_stats_open_metrics_http_headers (Optional[Dict[str, str]]): HTTP headers to add to OpenMetrics requests.

  • x_stats_disk_paths (Optional[Sequence]): System paths to monitor for disk usage.

  • x_stats_cpu_count (Optional[int]): System CPU count. If set, overrides the auto-detected value in the run metadata.

  • x_stats_cpu_logical_count (Optional[int]): Logical CPU count. If set, overrides the auto-detected value in the run metadata.

  • x_stats_gpu_count (Optional[int]): GPU device count. If set, overrides the auto-detected value in the run metadata.

  • x_stats_gpu_type (Optional[str]): GPU device type. If set, overrides the auto-detected value in the run metadata.

  • x_stats_gpu_device_ids (Optional[Sequence]): GPU device indices to monitor. If not set, the system monitor captures metrics for all GPUs. Assumes 0-based indexing matching CUDA/ROCm device enumeration.

  • x_stats_track_process_tree (bool): Monitor the entire process tree for resource usage, starting from x_stats_pid. When True, the system monitor aggregates the RSS, CPU%, and thread count from the process with PID x_stats_pid and all of its descendants. This can have a performance overhead and is disabled by default.

  • x_update_finish_state (bool): Flag to indicate whether this process can update the run’s final state on the server. Set to False in distributed training when only the main process should determine the final state.

Returns: An Settings object.

classmethod Settings.catch_private_settings

catch_private_settings(
    values
)  None

Check if a private field is provided and assign to the corresponding public one.

This is a compatibility layer to handle previous versions of the settings.

method Settings.validate_skip_transaction_log

validate_skip_transaction_log()  None

classmethod Settings.validate_run_tags

validate_run_tags(
    value
)  None

Validate run tags.

Validates that each tag:

  • Is between 1 and 64 characters in length (inclusive)
  • Converts single string values to tuple format
  • Preserves None values

Args:

  • value: A string, list, tuple, or None representing tags

Returns:

  • tuple: A tuple of validated tags, or None

Raises:

  • ValueError: If any tag is empty or exceeds 64 characters
  • <!-- lazydoc-ignore-classmethod: internal –>

property Settings.colab_url

The URL to the Colab notebook, if running in Colab.

Returns:

  • Optional[str]: The colab_url property value.

property Settings.deployment

property Settings.files_dir

Absolute path to the local directory where the run’s files are stored.

Returns:

  • str: The files_dir property value.

property Settings.is_local

property Settings.log_dir

The directory for storing log files.

Returns:

  • str: The log_dir property value.

property Settings.log_internal

The path to the file to use for internal logs.

Returns:

  • str: The log_internal property value.

The path to the symlink to the internal log file of the most recent run.

Returns:

  • str: The log_symlink_internal property value.

The path to the symlink to the user-process log file of the most recent run.

Returns:

  • str: The log_symlink_user property value.

property Settings.log_user

The path to the file to use for user-process logs.

Returns:

  • str: The log_user property value.

property Settings.project_url

The W&B URL where the project can be viewed.

Returns:

  • str: The project_url property value.

property Settings.resume_fname

The path to the resume file.

Returns:

  • str: The resume_fname property value.

property Settings.run_mode

The mode of the run. Can be either “run” or “offline-run”.

Returns:

  • Literal['run', 'offline-run']: The run_mode property value.

property Settings.run_url

The W&B URL where the run can be viewed.

Returns:

  • str: The run_url property value.

property Settings.settings_workspace

The path to the workspace settings file.

Returns:

  • str: The settings_workspace property value.

property Settings.sweep_url

The W&B URL where the sweep can be viewed.

Returns:

  • str: The sweep_url property value.

property Settings.sync_dir

The directory for storing the run’s files.

Returns:

  • str: The sync_dir property value.

property Settings.sync_file

Path to the append-only binary transaction log file.

Returns:

  • str: The sync_file property value.

Path to the symlink to the most recent run’s transaction log file.

Returns:

  • str: The sync_symlink_latest property value.

property Settings.timespec

The time specification for the run.

Returns:

  • str: The timespec property value.

property Settings.wandb_dir

Full path to the wandb directory.

Returns:

  • str: The wandb_dir property value.

method Settings.update_from_system_config_file

update_from_system_config_file()  None

Update settings from the system config file.

method Settings.update_from_workspace_config_file

update_from_workspace_config_file()  None

Update settings from the workspace config file.

method Settings.update_from_env_vars

update_from_env_vars(
    environ: 'Dict[str, Any]'
)  None

Update settings from environment variables.

method Settings.update_from_system_environment

update_from_system_environment()  None

Update settings from the system environment.

method Settings.update_from_dict

update_from_dict(
    settings: 'Dict[str, Any]'
)  None

Update settings from a dictionary.

method Settings.update_from_settings

update_from_settings(
    settings: 'Settings'
)  None

Update settings from another instance of Settings.

method Settings.to_proto

to_proto()  wandb_settings_pb2.Settings

Generate a protobuf representation of the settings.

4 - System Metrics Reference

Metrics automatically logged by W&B.

This page provides detailed information about the system metrics that are tracked by the W&B SDK.

CPU

Process CPU Percent (CPU)

Percentage of CPU usage by the process, normalized by the number of available CPUs.

W&B assigns a cpu tag to this metric.

Process CPU Threads

The number of threads utilized by the process.

W&B assigns a proc.cpu.threads tag to this metric.

Disk

By default, the usage metrics are collected for the / path. To configure the paths to be monitored, use the following setting:

run = wandb.init(
    settings=wandb.Settings(
        x_stats_disk_paths=("/System/Volumes/Data", "/home", "/mnt/data"),
    ),
)

Disk Usage Percent

Represents the total system disk usage in percentage for specified paths.

W&B assigns a disk.{path}.usagePercent tag to this metric.

Disk Usage

Represents the total system disk usage in gigabytes (GB) for specified paths. The paths that are accessible are sampled, and the disk usage (in GB) for each path is appended to the samples.

W&B assigns a disk.{path}.usageGB tag to this metric.

Disk In

Indicates the total system disk read in megabytes (MB). The initial disk read bytes are recorded when the first sample is taken. Subsequent samples calculate the difference between the current read bytes and the initial value.

W&B assigns a disk.in tag to this metric.

Disk Out

Represents the total system disk write in megabytes (MB). Similar to Disk In, the initial disk write bytes are recorded when the first sample is taken. Subsequent samples calculate the difference between the current write bytes and the initial value.

W&B assigns a disk.out tag to this metric.

Memory

Process Memory RSS

Represents the Memory Resident Set Size (RSS) in megabytes (MB) for the process. RSS is the portion of memory occupied by a process that is held in main memory (RAM).

W&B assigns a proc.memory.rssMB tag to this metric.

Process Memory Percent

Indicates the memory usage of the process as a percentage of the total available memory.

W&B assigns a proc.memory.percent tag to this metric.

Memory Percent

Represents the total system memory usage as a percentage of the total available memory.

W&B assigns a memory_percent tag to this metric.

Memory Available

Indicates the total available system memory in megabytes (MB).

W&B assigns a proc.memory.availableMB tag to this metric.

Network

Network Sent

Represents the total bytes sent over the network. The initial bytes sent are recorded when the metric is first initialized. Subsequent samples calculate the difference between the current bytes sent and the initial value.

W&B assigns a network.sent tag to this metric.

Network Received

Indicates the total bytes received over the network. Similar to Network Sent, the initial bytes received are recorded when the metric is first initialized. Subsequent samples calculate the difference between the current bytes received and the initial value.

W&B assigns a network.recv tag to this metric.

NVIDIA GPU

In addition to the metrics described below, if the process and/or its descendants use a particular GPU, W&B captures the corresponding metrics as gpu.process.{gpu_index}.{metric_name}

GPU Memory Utilization

Represents the GPU memory utilization in percent for each GPU.

W&B assigns a gpu.{gpu_index}.memory tag to this metric.

GPU Memory Allocated

Indicates the GPU memory allocated as a percentage of the total available memory for each GPU.

W&B assigns a gpu.{gpu_index}.memoryAllocated tag to this metric.

GPU Memory Allocated Bytes

Specifies the GPU memory allocated in bytes for each GPU.

W&B assigns a gpu.{gpu_index}.memoryAllocatedBytes tag to this metric.

GPU Utilization

Reflects the GPU utilization in percent for each GPU.

W&B assigns a gpu.{gpu_index}.gpu tag to this metric.

GPU Temperature

The GPU temperature in Celsius for each GPU.

W&B assigns a gpu.{gpu_index}.temp tag to this metric.

GPU Power Usage Watts

Indicates the GPU power usage in Watts for each GPU.

W&B assigns a gpu.{gpu_index}.powerWatts tag to this metric.

GPU Power Usage Percent

Reflects the GPU power usage as a percentage of its power capacity for each GPU.

W&B assigns a gpu.{gpu_index}.powerPercent tag to this metric.

GPU SM Clock Speed

Represents the clock speed of the Streaming Multiprocessor (SM) on the GPU in MHz. This metric is indicative of the processing speed within the GPU cores responsible for computation tasks.

W&B assigns a gpu.{gpu_index}.smClock tag to this metric.

GPU Memory Clock Speed

Represents the clock speed of the GPU memory in MHz, which influences the rate of data transfer between the GPU memory and processing cores.

W&B assigns a gpu.{gpu_index}.memoryClock tag to this metric.

GPU Graphics Clock Speed

Represents the base clock speed for graphics rendering operations on the GPU, expressed in MHz. This metric often reflects performance during visualization or rendering tasks.

W&B assigns a gpu.{gpu_index}.graphicsClock tag to this metric.

GPU Corrected Memory Errors

Tracks the count of memory errors on the GPU that W&B automatically corrects by error-checking protocols, indicating recoverable hardware issues.

W&B assigns a gpu.{gpu_index}.correctedMemoryErrors tag to this metric.

GPU Uncorrected Memory Errors

Tracks the count of memory errors on the GPU that W&B uncorrected, indicating non-recoverable errors which can impact processing reliability.

W&B assigns a gpu.{gpu_index}.unCorrectedMemoryErrors tag to this metric.

GPU Encoder Utilization

Represents the percentage utilization of the GPU’s video encoder, indicating its load when encoding tasks (for example, video rendering) are running.

W&B assigns a gpu.{gpu_index}.encoderUtilization tag to this metric.

AMD GPU

W&B extracts metrics from the output of the rocm-smi tool supplied by AMD (rocm-smi -a --json).

ROCm 6.x (latest) and 5.x formats are supported. Learn more about ROCm formats in the AMD ROCm documentation. The newer format includes more details.

AMD GPU Utilization

Represents the GPU utilization in percent for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.gpu tag to this metric.

AMD GPU Memory Allocated

Indicates the GPU memory allocated as a percentage of the total available memory for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.memoryAllocated tag to this metric.

AMD GPU Temperature

The GPU temperature in Celsius for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.temp tag to this metric.

AMD GPU Power Usage Watts

The GPU power usage in Watts for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.powerWatts tag to this metric.

AMD GPU Power Usage Percent

Reflects the GPU power usage as a percentage of its power capacity for each AMD GPU device.

W&B assigns a gpu.{gpu_index}.powerPercent to this metric.

Apple ARM Mac GPU

Apple GPU Utilization

Indicates the GPU utilization in percent for Apple GPU devices, specifically on ARM Macs.

W&B assigns a gpu.0.gpu tag to this metric.

Apple GPU Memory Allocated

The GPU memory allocated as a percentage of the total available memory for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.memoryAllocated tag to this metric.

Apple GPU Temperature

The GPU temperature in Celsius for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.temp tag to this metric.

Apple GPU Power Usage Watts

The GPU power usage in Watts for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.powerWatts tag to this metric.

Apple GPU Power Usage Percent

The GPU power usage as a percentage of its power capacity for Apple GPU devices on ARM Macs.

W&B assigns a gpu.0.powerPercent tag to this metric.

Graphcore IPU

Graphcore IPUs (Intelligence Processing Units) are unique hardware accelerators designed specifically for machine intelligence tasks.

IPU Device Metrics

These metrics represent various statistics for a specific IPU device. Each metric has a device ID (device_id) and a metric key (metric_key) to identify it. W&B assigns a ipu.{device_id}.{metric_key} tag to this metric.

Metrics are extracted using the proprietary gcipuinfo library, which interacts with Graphcore’s gcipuinfo binary. The sample method fetches these metrics for each IPU device associated with the process ID (pid). Only the metrics that change over time, or the first time a device’s metrics are fetched, are logged to avoid logging redundant data.

For each metric, the method parse_metric is used to extract the metric’s value from its raw string representation. The metrics are then aggregated across multiple samples using the aggregate method.

The following lists available metrics and their units:

  • Average Board Temperature (average board temp (C)): Temperature of the IPU board in Celsius.
  • Average Die Temperature (average die temp (C)): Temperature of the IPU die in Celsius.
  • Clock Speed (clock (MHz)): The clock speed of the IPU in MHz.
  • IPU Power (ipu power (W)): Power consumption of the IPU in Watts.
  • IPU Utilization (ipu utilisation (%)): Percentage of IPU utilization.
  • IPU Session Utilization (ipu utilisation (session) (%)): IPU utilization percentage specific to the current session.
  • Data Link Speed (speed (GT/s)): Speed of data transmission in Giga-transfers per second.

Google Cloud TPU

Tensor Processing Units (TPUs) are Google’s custom-developed ASICs (Application Specific Integrated Circuits) used to accelerate machine learning workloads.

TPU Memory usage

The current High Bandwidth Memory usage in bytes per TPU core.

W&B assigns a tpu.{tpu_index}.memoryUsageBytes tag to this metric.

TPU Memory usage percentage

The current High Bandwidth Memory usage in percent per TPU core.

W&B assigns a tpu.{tpu_index}.memoryUsageBytes tag to this metric.

TPU Duty cycle

TensorCore duty cycle percentage per TPU device. Tracks the percentage of time over the sample period during which the accelerator TensorCore was actively processing. A larger value means better TensorCore utilization.

W&B assigns a tpu.{tpu_index}.dutyCycle tag to this metric.

AWS Trainium

AWS Trainium is a specialized hardware platform offered by AWS that focuses on accelerating machine learning workloads. The neuron-monitor tool from AWS is used to capture the AWS Trainium metrics.

Trainium Neuron Core Utilization

The utilization percentage of each NeuronCore, reported on a per-core basis.

W&B assigns a trn.{core_index}.neuroncore_utilization tag to this metric.

Trainium Host Memory Usage, Total

The total memory consumption on the host in bytes.

W&B assigns a trn.host_total_memory_usage tag to this metric.

Trainium Neuron Device Total Memory Usage

The total memory usage on the Neuron device in bytes.

W&B assigns a trn.neuron_device_total_memory_usage) tag to this metric.

Trainium Host Memory Usage Breakdown:

The following is a breakdown of memory usage on the host:

  • Application Memory (trn.host_total_memory_usage.application_memory): Memory used by the application.
  • Constants (trn.host_total_memory_usage.constants): Memory used for constants.
  • DMA Buffers (trn.host_total_memory_usage.dma_buffers): Memory used for Direct Memory Access buffers.
  • Tensors (trn.host_total_memory_usage.tensors): Memory used for tensors.

Trainium Neuron Core Memory Usage Breakdown

Detailed memory usage information for each NeuronCore:

  • Constants (trn.{core_index}.neuroncore_memory_usage.constants)
  • Model Code (trn.{core_index}.neuroncore_memory_usage.model_code)
  • Model Shared Scratchpad (trn.{core_index}.neuroncore_memory_usage.model_shared_scratchpad)
  • Runtime Memory (trn.{core_index}.neuroncore_memory_usage.runtime_memory)
  • Tensors (trn.{core_index}.neuroncore_memory_usage.tensors)

OpenMetrics

Capture and log metrics from external endpoints that expose OpenMetrics / Prometheus-compatible data with support for custom regex-based metric filters to be applied to the consumed endpoints.

Refer to Monitoring GPU cluster performance in W&B for a detailed example of how to use this feature in a particular case of monitoring GPU cluster performance with the NVIDIA DCGM-Exporter.