Log tables
Use wandb.Table
to log data to visualize and query with W&B. In this guide, learn how to:
Create tables
To define a Table, specify the columns you want to see for each row of data. Each row might be a single item in your training dataset, a particular step or epoch during training, a prediction made by your model on a test item, an object generated by your model, etc. Each column has a fixed type: numeric, text, boolean, image, video, audio, etc. You do not need to specify the type in advance. Give each column a name, and make sure to only pass data of that type into that column index. For a more detailed example, see this report.
Use the wandb.Table
constructor in one of two ways:
- List of Rows: Log named columns and rows of data. For example the proceeding code snippet generates a table with two rows and three columns:
wandb.Table(columns=["a", "b", "c"], data=[["1a", "1b", "1c"], ["2a", "2b", "2c"]])
- Pandas DataFrame: Log a DataFrame using
wandb.Table(dataframe=my_df)
. Column names will be extracted from the DataFrame.
From an existing array or dataframe
# assume a model has returned predictions on four images
# with the following fields available:
# - the image id
# - the image pixels, wrapped in a wandb.Image()
# - the model's predicted label
# - the ground truth label
my_data = [
[0, wandb.Image("img_0.jpg"), 0, 0],
[1, wandb.Image("img_1.jpg"), 8, 0],
[2, wandb.Image("img_2.jpg"), 7, 1],
[3, wandb.Image("img_3.jpg"), 1, 1],
]
# create a wandb.Table() with corresponding columns
columns = ["id", "image", "prediction", "truth"]
test_table = wandb.Table(data=my_data, columns=columns)
Add data
Tables are mutable. As your script executes you can add more data to your table, up to 200,000 rows. There are two ways to add data to a table:
- Add a Row:
table.add_data("3a", "3b", "3c")
. Note that the new row is not represented as a list. If your row is in list format, use the star notation,*
,to expand the list to positional arguments:table.add_data(*my_row_list)
. The row must contain the same number of entries as there are columns in the table. - Add a Column:
table.add_column(name="col_name", data=col_data)
. Note that the length ofcol_data
must be equal to the table's current number of rows. Here,col_data
can be a list data, or a NumPy NDArray.
Adding data incrementally
This code sample shows how to create and populate a W&B table incrementally. You define the table with predefined columns, including confidence scores for all possible labels, and add data row by row during inference. You can also add data to tables incrementally when resuming runs.
# Define the columns for the table, including confidence scores for each label
columns = ["id", "image", "guess", "truth"]
for digit in range(10): # Add confidence score columns for each digit (0-9)
columns.append(f"score_{digit}")
# Initialize the table with the defined columns
test_table = wandb.Table(columns=columns)
# Iterate through the test dataset and add data to the table row by row
# Each row includes the image ID, image, predicted label, true label, and confidence scores
for img_id, img in enumerate(mnist_test_data):
true_label = mnist_test_data_labels[img_id] # Ground truth label
guess_label = my_model.predict(img) # Predicted label
test_table.add_data(
img_id, wandb.Image(img), guess_label, true_label
) # Add row data to the table
Adding data to resumed runs
You can incrementally update a W&B table in resumed runs by loading an existing table from an artifact, retrieving the last row of data, and adding the updated metrics. Then, reinitialize the table for compatibility and log the updated version back to W&B.
# Load the existing table from the artifact
best_checkpt_table = wandb.use_artifact(table_tag).get(table_name)
# Get the last row of data from the table for resuming
best_iter, best_metric_max, best_metric_min = best_checkpt_table.data[-1]
# Update the best metrics as needed
# Add the updated data to the table
best_checkpt_table.add_data(best_iter, best_metric_max, best_metric_min)
# Reinitialize the table with its updated data to ensure compatibility
best_checkpt_table = wandb.Table(
columns=["col1", "col2", "col3"], data=best_checkpt_table.data
)
# Log the updated table to Weights & Biases
wandb.log({table_name: best_checkpt_table})
Retrieve data
Once data is in a Table, access it by column or by row:
- Row Iterator: Users can use the row iterator of Table such as
for ndx, row in table.iterrows(): ...
to efficiently iterate over the data's rows. - Get a Column: Users can retrieve a column of data using
table.get_column("col_name")
. As a convenience, users can passconvert_to="numpy"
to convert the column to a NumPy NDArray of primitives. This is useful if your column contains media types such aswandb.Image
so that you can access the underlying data directly.
Save tables
After you generate a table of data in your script, for example a table of model predictions, save it to W&B to visualize the results live.
Log a table to a run
Use wandb.log()
to save your table to the run, like so:
run = wandb.init()
my_table = wandb.Table(columns=["a", "b"], data=[["1a", "1b"], ["2a", "2b"]])
run.log({"table_key": my_table})
Each time a table is logged to the same key, a new version of the table is created and stored in the backend. This means you can log the same table across multiple training steps to see how model predictions improve over time, or compare tables across different runs, as long as they're logged to the same key. You can log up to 200,000 rows.
To log more than 200,000 rows, you can override the limit with:
wandb.Table.MAX_ARTIFACT_ROWS = X
However, this would likely cause performance issues, such as slower queries, in the UI.
Access tables programmatically
In the backend, Tables are persisted as Artifacts. If you are interested in accessing a specific version, you can do so with the artifact API:
with wandb.init() as run:
my_table = run.use_artifact("run-<run-id>-<table-name>:<tag>").get("<table-name>")
For more information on Artifacts, see the Artifacts Chapter in the Developer Guide.
Visualize tables
Any table logged this way will show up in your Workspace on both the Run Page and the Project Page. For more information, see Visualize and Analyze Tables.
Artifact tables
Use artifact.add()
to log tables to the Artifacts section of your run instead of the workspace. This could be useful if you have a dataset that you want to log once and then reference for future runs.
run = wandb.init(project="my_project")
# create a wandb Artifact for each meaningful step
test_predictions = wandb.Artifact("mnist_test_preds", type="predictions")
# [build up your predictions data as above]
test_table = wandb.Table(data=data, columns=columns)
test_predictions.add(test_table, "my_test_key")
run.log_artifact(test_predictions)
Refer to this Colab for a detailed example of artifact.add() with image data and this Report for an example of how to use Artifacts and Tables to version control and deduplicate tabular data.
Join Artifact tables
You can join tables you have locally constructed or tables you have retrieved from other artifacts using wandb.JoinedTable(table_1, table_2, join_key)
.
Args | Description |
---|---|
table_1 | (str, wandb.Table , ArtifactEntry) the path to a wandb.Table in an artifact, the table object, or ArtifactEntry |
table_2 | (str, wandb.Table , ArtifactEntry) the path to a wandb.Table in an artifact, the table object, or ArtifactEntry |
join_key | (str, [str, str]) key or keys on which to perform the join |
To join two Tables you have logged previously in an artifact context, fetch them from the artifact and join the result into a new Table.
For example, demonstrates how to read one Table of original songs called 'original_songs'
and another Table of synthesized versions of the same songs called 'synth_songs'
. The proceeding code example joins the two tables on "song_id"
, and uploads the resulting table as a new W&B Table:
import wandb
run = wandb.init(project="my_project")
# fetch original songs table
orig_songs = run.use_artifact("original_songs:latest")
orig_table = orig_songs.get("original_samples")
# fetch synthesized songs table
synth_songs = run.use_artifact("synth_songs:latest")
synth_table = synth_songs.get("synth_samples")
# join tables on "song_id"
join_table = wandb.JoinedTable(orig_table, synth_table, "song_id")
join_at = wandb.Artifact("synth_summary", "analysis")
# add table to artifact and log to W&B
join_at.add(join_table, "synth_explore")
run.log_artifact(join_at)
Read this tutorial for an example on how to combine two previously stored tables stored in different Artifact objects.