> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Redact PII from traces

> Automatically redact Personally Identifiable Information from traces

<Warning>
  This feature is only accessible via the Python SDK.
</Warning>

Some organizations process Personally Identifiable Information (PII) such as names, phone numbers, and email addresses in their Large Language Model (LLM) workflows. Storing this data in Weights & Biases (W\&B) Weave poses compliance and security risks. Stripping this data from being logged can help keep your agent compliant with policies like [GDPR](https://gdpr.eu/) and [HIPAA](https://www.hhs.gov/hipaa/index.html).

The *Sensitive Data Protection* feature allows you to automatically redact Personally Identifiable Information (PII) from a [trace](/weave/guides/tracking) before it is sent to Weave servers. This feature integrates [Microsoft Presidio](https://microsoft.github.io/presidio/) into the Weave Python SDK, which means that you can control redaction settings at the SDK level.

The Sensitive Data Protection feature introduces the following functionality to the Python SDK:

* A `redact_pii` setting, which can be toggled on or off in the `weave.init()` call to enable PII redaction.
* Automatic redaction of [common entities](#entities-redacted-by-default) when `redact_pii = True`.
* Customizable redaction fields using the configurable `redact_pii_fields` setting.
* Exclude specific entities from redaction using the `redact_pii_exclude_fields` setting.

## Enable PII redaction

To get started with the Sensitive Data Protection feature in Weave, complete the following steps:

1. Install the required dependencies:

   ```bash theme={null}
   pip install presidio-analyzer presidio-anonymizer
   ```

2. Modify your `weave.init()` call to enable redaction. When `redact_pii=True`, [common entities are redacted by default](#entities-redacted-by-default):

   ```python lines theme={null}
   import weave

   weave.init("my-project", settings={"redact_pii": True})
   ```

3. (Optional) Customize redaction fields using the `redact_pii_fields` parameter:

   ```python lines theme={null}
   weave.init("my-project", settings={"redact_pii": True, "redact_pii_fields":["CREDIT_CARD", "US_SSN"]})
   ```

   For a full list of the entities that can be detected and redacted, see [PII entities supported by Presidio](https://microsoft.github.io/presidio/supported_entities/).

4. (Optional) Exclude specific entities from redaction using the `redact_pii_exclude_fields` parameter. This is useful when you want to keep the default redaction but preserve certain entity types. The following example demonstrates how to redact all [default entities](#entities-redacted-by-default) except `EMAIL_ADDRESS` and `PERSON`:

   ```python lines theme={null}
   weave.init("my-project", settings={"redact_pii": True, "redact_pii_exclude_fields":["EMAIL_ADDRESS", "PERSON"]})
   ```

## Entities redacted by default

The following entities are automatically redacted when PII redaction is enabled:

* `CREDIT_CARD`
* `CRYPTO`
* `EMAIL_ADDRESS`
* `ES_NIF`
* `FI_PERSONAL_IDENTITY_CODE`
* `IBAN_CODE`
* `IN_AADHAAR`
* `IN_PAN`
* `IP_ADDRESS`
* `LOCATION`
* `PERSON`
* `PHONE_NUMBER`
* `UK_NHS`
* `UK_NINO`
* `US_BANK_NUMBER`
* `US_DRIVER_LICENSE`
* `US_PASSPORT`
* `US_SSN`

## Redacting sensitive keys with `REDACT_KEYS`

In addition to PII redaction, the Weave SDK also supports redaction of custom keys using `REDACT_KEYS`. This is useful when you want to protect additional sensitive data that might not fall under the PII category but needs to be kept private. Examples include:

* API keys
* Authentication headers
* Tokens
* Internal IDs
* Config values

### Pre-defined `REDACT_KEYS`

Weave automatically redacts the following sensitive keys by default:

```json theme={null}
[
  "api_key",
  "auth_headers",
  "authorization"
]
```

### Adding your own keys

You can extend this list with your own custom keys that you want to redact from traces:

```python lines theme={null}
import weave
from weave.utils import sanitize

client = weave.init("my-project", settings={"redact_pii": True})

# Add custom keys to redact
sanitize.add_redact_key("client_id")
sanitize.add_redact_key("token")

client_id = "123"
token = "789"

@weave.op
def test(client_id, token):
    return client_id + token

test(client_id, token)
```

When viewed in the Weave UI, the values of `client_id` and `token` appear as `"REDACTED"`:

```python lines theme={null}
client_id = "REDACTED"
token = "REDACTED"
```

## Usage information

* This feature is only available in the Python SDK.
* Enabling redaction increases processing time due to the Presidio dependency.
