// TODO: Docs

MASK Action

The MASK action in ByteSizer allows users to anonymize or obfuscate sensitive data fields using a variety of masking techniques. It operates on multiple columns of a Dask DataFrame, with each column configurable to use a different masking method.

How It Works

The MASK action is configured in the workflow YAML file. Example:

- id: mask
  action: MASK
  parameters:
    fields:
      name:
        type: faker
        provider: name
      email:
        type: hashlib
        algo: sha256
      ssn:
        type: ff3
        key: "0123456789abcdef0123456789abcdef"
        tweak: "abcdef9876543210"
        radix: 10
      phone:
        type: pseudonym
        salt: "mysalt"
      token:
        type: fernet
        key: ""

Supported Masking Techniques

1. Hashlib

Description: Hashes the value using standard hashing algorithms.
Library: hashlib

  • algo: Algorithm (e.g., sha256, sha512, md5)

2. Fernet

Description: Encrypts the value using Fernet symmetric encryption.
Library: cryptography (Fernet)

  • key: Base64-encoded 32-byte key

3. Faker

Description: Replaces the value with realistic fake data such as names, addresses, emails.
Library: Faker

  • provider: Fake data type (e.g., name, address, email)

4. FF3 (Format-Preserving Encryption)

Description: Applies format-preserving encryption, useful for numeric or string fields.
Library: FFX/FF3

  • key: Hex-encoded encryption key
  • tweak: Hex-encoded tweak value
  • radix: Numerical base (default: 10)

5. Pseudonym

Description: Deterministic pseudonymization using salted hashing.

  • salt: Optional salt string

Summary

The MASK action is flexible and powerful, supporting multiple anonymization strategies. It can be tailored for:

  • Generating anonymized test data
  • Obfuscating PII for analytics
  • Ensuring GDPR/CCPA compliance