> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt
> Use this file to discover all available pages before exploring further.

> W&B's Embedding Projector allows users to plot multi-dimensional embeddings on a 2D plane using common dimension reduction algorithms like PCA, UMAP, and t-SNE.

# Embed objects

<Frame>
  <img src="https://mintcdn.com/wb-21fd5541/wYBIlf7cqDpGjWr9/images/weave/embedding_projector.png?fit=max&auto=format&n=wYBIlf7cqDpGjWr9&q=85&s=9fdd987748fbbae7dcb2ca1ecd1788f2" alt="Embedding projector" width="1846" height="1296" data-path="images/weave/embedding_projector.png" />
</Frame>

[Embeddings](https://developers.google.com/machine-learning/crash-course/embeddings/video-lecture) are used to represent objects (people, images, posts, words, etc...) with a list of numbers - sometimes referred to as a *vector*. In machine learning and data science use cases, embeddings can be generated using a variety of approaches across a range of applications. This page assumes the reader is familiar with embeddings and is interested in visually analyzing them inside of W\&B.

## Embedding examples

* [Live Interactive Demo Report](https://wandb.ai/timssweeney/toy_datasets/reports/Feature-Report-W-B-Embeddings-Projector--VmlldzoxMjg2MjY4?accessToken=bo36zrgl0gref1th5nj59nrft9rc4r71s53zr2qvqlz68jwn8d8yyjdz73cqfyhq)
* [Example Colab](https://colab.research.google.com/drive/1DaKL4lZVh3ETyYEM1oJ46ffjpGs8glXA#scrollTo=D--9i6-gXBm_).

### Hello World

W\&B allows you to log embeddings using the `wandb.Table` class. Consider the following example of 3 embeddings, each consisting of 5 dimensions:

```python theme={null}
import wandb

with wandb.init(project="embedding_tutorial") as run:
  embeddings = [
      # D1   D2   D3   D4   D5
      [0.2, 0.4, 0.1, 0.7, 0.5],  # embedding 1
      [0.3, 0.1, 0.9, 0.2, 0.7],  # embedding 2
      [0.4, 0.5, 0.2, 0.2, 0.1],  # embedding 3
  ]
  run.log(
      {"embeddings": wandb.Table(columns=["D1", "D2", "D3", "D4", "D5"], data=embeddings)}
  )
  run.finish()
```

After running the above code, the W\&B dashboard will have a new Table containing your data. You can select `2D Projection` from the upper right panel selector to plot the embeddings in 2 dimensions. Smart default will be automatically selected, which can be easily overridden in the configuration menu accessed by clicking the gear icon. In this example, we automatically use all 5 available numeric dimensions.

<Frame>
  <img src="https://mintcdn.com/wb-21fd5541/wKCrMJZKG3PxyJhv/images/app_ui/weave_hello_world.png?fit=max&auto=format&n=wKCrMJZKG3PxyJhv&q=85&s=dda0c4699ebae70004bfb6d27ba6d164" alt="2D projection example" width="1611" height="708" data-path="images/app_ui/weave_hello_world.png" />
</Frame>

### Digits MNIST

While the above example shows the basic mechanics of logging embeddings, typically you are working with many more dimensions and samples. Let's consider the MNIST Digits dataset ([UCI ML hand-written digits dataset](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits)[s](https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits)) made available via [SciKit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html). This dataset has 1797 records, each with 64 dimensions. The problem is a 10 class classification use case. We can convert the input data to an image for visualization as well.

```python theme={null}
import wandb
from sklearn.datasets import load_digits

with wandb.init(project="embedding_tutorial") as run:

  # Load the dataset
  ds = load_digits(as_frame=True)
  df = ds.data

  # Create a "target" column
  df["target"] = ds.target.astype(str)
  cols = df.columns.tolist()
  df = df[cols[-1:] + cols[:-1]]

  # Create an "image" column
  df["image"] = df.apply(
      lambda row: wandb.Image(row[1:].values.reshape(8, 8) / 16.0), axis=1
  )
  cols = df.columns.tolist()
  df = df[cols[-1:] + cols[:-1]]

  run.log({"digits": df})
```

After running the above code, again we are presented with a Table in the UI. By selecting `2D Projection` we can configure the definition of the embedding, coloring, algorithm (PCA, UMAP, t-SNE), algorithm parameters, and even overlay (in this case we show the image when hovering over a point). In this particular case, these are all "smart defaults" and you should see something very similar with a single click on `2D Projection`. ([Interact with this embedding tutorial example](https://wandb.ai/timssweeney/embedding_tutorial/runs/k6guxhum?workspace=user-timssweeney)).

<Frame>
  <img src="https://mintcdn.com/wb-21fd5541/wYBIlf7cqDpGjWr9/images/weave/embedding_projector.png?fit=max&auto=format&n=wYBIlf7cqDpGjWr9&q=85&s=9fdd987748fbbae7dcb2ca1ecd1788f2" alt="MNIST digits projection" width="1846" height="1296" data-path="images/weave/embedding_projector.png" />
</Frame>

## Logging options

You can log embeddings in a number of different formats:

1. **Single Embedding Column:** Often your data is already in a "matrix"-like format. In this case, you can create a single embedding column - where the data type of the cell values can be `list[int]`, `list[float]`, or `np.ndarray`.
2. **Multiple Numeric Columns:** In the above two examples, we use this approach and create a column for each dimension. We currently accept python `int` or `float` for the cells.

<Frame>
  <img src="https://mintcdn.com/wb-21fd5541/wYBIlf7cqDpGjWr9/images/weave/logging_options.png?fit=max&auto=format&n=wYBIlf7cqDpGjWr9&q=85&s=dcafa5eca61ed6e79ad8b62e389af4f7" alt="Single embedding column" width="422" height="301" data-path="images/weave/logging_options.png" />
</Frame>

<Frame>
  <img src="https://mintcdn.com/wb-21fd5541/wYBIlf7cqDpGjWr9/images/weave/logging_option_image_right.png?fit=max&auto=format&n=wYBIlf7cqDpGjWr9&q=85&s=f57315d55a458696a91017aa720880bb" alt="Multiple numeric columns" width="416" height="387" data-path="images/weave/logging_option_image_right.png" />
</Frame>

Furthermore, just like all tables, you have many options regarding how to construct the table:

1. Directly from a **dataframe** using `wandb.Table(dataframe=df)`
2. Directly from a **list of data** using `wandb.Table(data=[...], columns=[...])`
3. Build the table **incrementally row-by-row** (great if you have a loop in your code). Add rows to your table using `table.add_data(...)`
4. Add an **embedding column** to your table (great if you have a list of predictions in the form of embeddings): `table.add_col("col_name", ...)`
5. Add a **computed column** (great if you have a function or model you want to map over your table): `table.add_computed_columns(lambda row, ndx: {"embedding": model.predict(row)})`

## Plotting options

After selecting `2D Projection`, you can click the gear icon to edit the rendering settings. In addition to selecting the intended columns (see above), you can select an algorithm of interest (along with the desired parameters). Below you can see the parameters for UMAP and t-SNE respectively.

<Frame>
  <img src="https://mintcdn.com/wb-21fd5541/wYBIlf7cqDpGjWr9/images/weave/plotting_options_left.png?fit=max&auto=format&n=wYBIlf7cqDpGjWr9&q=85&s=64efef689f252c73012dea76df6ca303" alt="UMAP parameters" width="803" height="380" data-path="images/weave/plotting_options_left.png" />
</Frame>

<Frame>
  <img src="https://mintcdn.com/wb-21fd5541/wYBIlf7cqDpGjWr9/images/weave/plotting_options_right.png?fit=max&auto=format&n=wYBIlf7cqDpGjWr9&q=85&s=eb292dd05df661defc0956954ff668da" alt="t-SNE parameters" width="799" height="384" data-path="images/weave/plotting_options_right.png" />
</Frame>

<Note>
  Note: we currently downsample to a random subset of 1000 rows and 50 dimensions for all three algorithms.
</Note>
