Artifacts
Use W&B Artifacts to track and version data as the inputs and outputs of your W&B Runs. For example, a model training run might take in a dataset as input and produce a trained model as output. In addition to logging hyperparameters, metadata and metrics to a run, you can use an artifact to log, track and version the dataset used to train the model as input and another artifact for the resulting model checkpoints as outputs.
Use casesโ
You can use artifacts throughout your entire ML workflow as inputs and outputs of runs. You can use datasets, models, or even other artifacts as inputs for processing.
Use Case | Input | Output |
---|---|---|
Model Training | Dataset (training and validation data) | Trained Model |
Dataset Pre-Processing | Dataset (raw data) | Dataset (pre-processed data) |
Model Evaluation | Model + Dataset (test data) | W&B Table |
Model Optimization | Model | Optimized Model |
Create an artifactโ
Create an artifact with four lines of code:
- Create a W&B Run.
- Create an artifact object with the
wandb.Artifact
API. - Add one or more files, such as a model file or dataset, to your artifact object. In this example, you'll add a single file.
- Log your artifact to W&B.
run = wandb.init(project="artifacts-example", job_type="add-dataset")
artifact = wandb.Artifact(name="my_data", type="dataset")
artifact.add_file(local_path="./dataset.h5") # Add dataset file to artifact
run.log_artifact(artifact) # Logs the artifact version "my_data:v0"
See the track external files page for information on how to add references to files or directories stored in external object storage, like an Amazon S3 bucket.
Download an artifactโ
Indicate the artifact you want to mark as input to your run with the use_artifact
method, which returns an artifact object:
artifact = run.use_artifact("my_data:latest") #returns a run object using the "my_data" artifact
Then, use the returned object to download all contents of the artifact:
datadir = artifact.download() #downloads the full "my_data" artifact to the default directory.
You can pass a custom path into the root
parameter to download an artifact to a specific directory. For alternate ways to download artifacts and to see additional parameters, see the guide on downloading and using artifacts
Next stepsโ
- Learn how to version, update, or delete artifacts.
- Learn how to trigger downstream workflows in response to changes to your artifacts with artifact automation.
- Learn about the model registry, a space that houses trained models.
- Explore the Python SDK and CLI reference guides.