This feature is only accessible through the Python SDK.
Some organizations process Personally Identifiable Information (PII) such as names, phone numbers, and email addresses in their Large Language Model (LLM) workflows. Storing this data in W&B Weave poses compliance and security risks. Stripping this data before you log it helps keep your agent compliant with policies like GDPR and HIPAA.
The Sensitive Data Protection feature lets you automatically redact Personally Identifiable Information (PII) from a trace before Weave sends it to Weave servers. This feature integrates Microsoft Presidio into the Weave Python SDK, so you can control redaction settings at the SDK level.
The Sensitive Data Protection feature introduces the following functionality to the Python SDK:
- A
redact_pii setting, which you can enable or disable in the weave.init() call to turn on PII redaction.
- Automatic redaction of common entities when
redact_pii = True.
- Customizable redaction fields with the configurable
redact_pii_fields setting.
- Exclusion of specific entities from redaction with the
redact_pii_exclude_fields setting.
Enable PII redaction
This section walks you through installing the required dependencies, turning on PII redaction in weave.init(), and optionally tailoring which entities Weave redacts. To get started with the Sensitive Data Protection feature in Weave, complete the following steps:
-
Install the required dependencies:
pip install presidio-analyzer presidio-anonymizer
-
Modify your
weave.init() call to enable redaction. When redact_pii=True, Weave redacts common entities by default:
import weave
weave.init("my-project", settings={"redact_pii": True})
-
Optional: Customize redaction fields with the
redact_pii_fields parameter:
weave.init("my-project", settings={"redact_pii": True, "redact_pii_fields":["CREDIT_CARD", "US_SSN"]})
For a full list of the entities that can be detected and redacted, see PII entities supported by Presidio.
-
Optional: Exclude specific entities from redaction with the
redact_pii_exclude_fields parameter. This is useful when you want to keep the default redaction but preserve certain entity types. The following example shows how to redact all default entities except EMAIL_ADDRESS and PERSON:
weave.init("my-project", settings={"redact_pii": True, "redact_pii_exclude_fields":["EMAIL_ADDRESS", "PERSON"]})
After you complete these steps, Weave redacts the configured PII entities from traces before sending them to Weave servers.
Entities redacted by default
This section lists the PII entity types that Weave redacts automatically when you enable PII redaction without specifying redact_pii_fields. Weave automatically redacts the following entities when you enable PII redaction:
CREDIT_CARD
CRYPTO
EMAIL_ADDRESS
ES_NIF
FI_PERSONAL_IDENTITY_CODE
IBAN_CODE
IN_AADHAAR
IN_PAN
IP_ADDRESS
LOCATION
PERSON
PHONE_NUMBER
UK_NHS
UK_NINO
US_BANK_NUMBER
US_DRIVER_LICENSE
US_PASSPORT
US_SSN
Redact sensitive keys with REDACT_KEYS
In addition to PII redaction, the Weave SDK supports redaction of custom keys with REDACT_KEYS. This is useful when you want to protect additional sensitive data that might not fall under the PII category but must remain private. Examples include:
- API keys
- Authentication headers
- Tokens
- Internal IDs
- Config values
Pre-defined REDACT_KEYS
Weave automatically redacts the following sensitive keys by default:
[
"api_key",
"auth_headers",
"authorization"
]
Add your own keys
If your application uses other field names that contain sensitive values, you can extend the default REDACT_KEYS list. Add your own custom keys that you want to redact from traces:
import weave
from weave.utils import sanitize
client = weave.init("my-project", settings={"redact_pii": True})
# Add custom keys to redact
sanitize.add_redact_key("client_id")
sanitize.add_redact_key("token")
client_id = "123"
token = "789"
@weave.op
def test(client_id, token):
return client_id + token
test(client_id, token)
When you view them in the Weave UI, the values of client_id and token appear as "REDACTED":
client_id = "REDACTED"
token = "REDACTED"
- This feature is only available in the Python SDK.
- When you enable redaction, processing time increases because of the Presidio dependency.