wandb.Tableto log data to visualize and query with W&B. In this guide, learn how to:
To define a Table, specify the columns you want to see for each row of data. Each row might be a single item in your training dataset, a particular step or epoch during training, a prediction made by your model on a test item, an object generated by your model, etc. Each column has a fixed type: numeric, text, boolean, image, video, audio, etc. You don't need to specify the type in advance—simply give each column a name, and make sure to only pass data of that type into that column index. For a more detailed example, see this report.
wandb.Tableconstructor in one of two ways:
- 1.List of Rows: Log named columns and rows of data. For example:
wandb.Table(columns=["a", "b", "c"], data=[["1a", "1b", "1c"], ["2a", "2b", "2c"]])generates a table with two rows and three columns.
- 2.Pandas DataFrame: Log a DataFrame using
wandb.Table(dataframe=my_df). Column names will be extracted from the DataFrame.
# assume a model has returned predictions on four images
# with the following fields available:
# - the image id
# - the image pixels, wrapped in a wandb.Image()
# - the model's predicted label
# - the ground truth label
my_data = [
[0, wandb.Image("img_0.jpg"), 0, 0],
[1, wandb.Image("img_1.jpg"), 8, 0],
[2, wandb.Image("img_2.jpg"), 7, 1],
[3, wandb.Image("img_3.jpg"), 1, 1]
# create a wandb.Table() with corresponding columns
columns=["id", "image", "prediction", "truth"]
test_table = wandb.Table(data=my_data, columns=columns)
Tables are mutable, and as your script executes you can add more data to your table, up to 200,000 rows. There are two ways to add data to a table:
- 1.Add a Row:
table.add_data("3a", "3b", "3c"). Note that the new row is not represented as a list. If your row is in list format, use star notation to expand the list to positional arguments:
table.add_data(*my_row_list). The row must contain the same number of entries as there are columns in the table.
- 2.Add a Column:
table.add_column(name="col_name", data=col_data). Note that the length of
col_datamust be equal to the table's current number of rows. Here,
col_datacan be a list data, or a Numpy NDArray.
# create a Table with the same columns as above,
# plus confidence scores for all labels
columns=["id", "image", "guess", "truth"]
for digit in range(10):
columns.append("score_" + str(digit))
test_table = wandb.Table(columns=columns)
# run inference on every image, assuming my_model returns the
# predicted label, and the ground truth labels are available
for img_id, img in enumerate(mnist_test_data):
true_label = mnist_test_data_labels[img_id]
guess_label = my_model.predict(img)
test_table.add_data(img_id, wandb.Image(img), \
Once data is in a Table, access it by column or by row:
- 1.Row Iterator: Users can use the row iterator of Table such as
for ndx, row in table.iterrows(): ...to efficiently iterate over the data's rows.
- 2.Get a Column: Users can retrieve a column of data using
table.get_column("col_name"). As a convenience, users can pass
convert_to="numpy"to convert the column to a Numpy NDArray of primitives. This is useful if your column contains media types such as wandb.Image so that you can access the underlying data directly.
After you generate a table of data in your script, for example a table of model predictions, save it to W&B to visualize the results live. If you'd like to save and version larger datasets check out Artifact Tables.
wandb.log()to save your table to the run, like so:
run = wandb.init()
my_table = wandb.Table(columns=["a", "b"], data=[["1a", "1b"], ["2a", "2b"]])
Each time a table is logged to the same key, a new version of the table is created and stored in the backend. This means you can log the same table across multiple training steps to see how model predictions improve over time, or compare tables across different runs, as long as they're logged to the same key. You can log up to 200,000 rows.
In the backend, Tables are persisted as Artifacts. If you are interested in accessing a specific version, you can do so via the artifact API:
with wandb.init() as run:
my_table = run.use_artifact("run-<run-id>-<table-name>:<tag>").get("<table-name>")
Any table logged this way will show up in your Workspace on both the Run Page and the Project Page.
artifact.add()to log tables to the Artifacts section of your run instead of the workspace. This could be useful if you have a dataset that you want to log once and then reference for future runs. Refer to this Colab for a detailed example of artifact.add() with image data → and this Report for an example of how to use Artifacts and Tables to version control and deduplicate tabular data →.
run = wandb.init(project="my_project")
# create a wandb Artifact for each meaningful step
test_predictions = wandb.Artifact("mnist_test_preds",
# [build up your predictions data as above]
test_table = wandb.Table(data=data, columns=columns)
You can join tables you've locally constructed or tables you've retrieved from other artifacts using
wandb.JoinedTable(table_1, table_2, join_key).
To join two Tables you've logged previously in an artifact context, fetch them from the artifact and join the result into a new Table. For example, read one Table of original songs and another Table of synthesized versions of the same songs, join on "song_id", and upload a new Table to explore (live example → ).
run = wandb.init(project="my_project")
# fetch original songs table
orig_songs = run.use_artifact('original_songs:latest')
orig_table = orig_songs.get("original_samples")
# fetch synthesized songs table
synth_songs = run.use_artifact('synth_songs:latest')
synth_table = synth_songs.get("synth_samples")
# join tables on "song_id"
join_table = wandb.JoinedTable(orig_table, synth_table "song_id")
join_at = wandb.Artifact("synth_summary", "analysis")
# add table to artifact and log to W&B