Skip to main content

Overview

Data protection rules define how Orka detects and protects sensitive data. Orka uses field-level protection to secure specific fields, so tables with mixed sensitivity levels remain usable. Rules match patterns in column names and values to detect sensitive data. When Orka scans tables, it labels and protects matching columns with the configured action.

Security by design

Orka’s security is foundational rather than added-on:
  • Automatic detection - sensitive data is automatically identified during scanning
  • Before movement - protection is applied before data leaves the source
  • Field-level granularity - only sensitive fields are protected, not entire tables
  • Rule-based - consistent policies automatically applied across all data

Field-level protection

Orka protects individual fields instead of blocking entire tables:
  • Tables with a mix of sensitive and non-sensitive data can be safely used
  • Only the sensitive columns are protected
  • Users get access to valuable data that would otherwise be blocked
  • Compliance requirements are still met

Protection methods

When a data protection rule matches sensitive data, it applies one of four protection methods (actions):

1. Mask

Hides values with asterisks while preserving structure
  • Replaces characters with asterisks (for example: “[email protected]” → ”****@email.com”)
  • Preserves data format and structure
  • Masked in catalog and destination systems
  • Users can see that data exists but can’t see actual values
Use masking when you want users to know data exists but not see the actual values.

2. Hash

One-way hash protecting values while preserving referential integrity
  • Converts values to consistent hash values
  • Same input always produces same output
  • Allows JOIN operations on these values in destination systems
  • Impossible to reverse back to original values
Maintains record relationships without exposing values.

3. Drop

Removes the column entirely (complete data suppression)
  • Column excluded from destination systems
  • Column values never appear in Orka
  • Most restrictive protection method
  • Used for highly sensitive data that should never leave the source
  • Can’t be reversed downstream
Use Drop sparingly because removed columns can’t be recovered in the lakehouse.

4. None (no action)

The rule identifies the data but applies no protection
  • Used to disable a data protection rule while still allowing it to be evaluated
  • Helpful for testing and rule development
  • Provides visibility without enforcement

Rule priority

If multiple rules match a single column, the column is automatically protected with the highest priority action:
Drop > Hash > Mask > None
The most restrictive protection wins when rules conflict.

Manage data protection rules

Only members of the data protection admin group can manage data protection rules.
To manage data protection rules, go to Data protection rules in the sidebar under the Data protection section.

Create a data protection rule

Creating a data protection rule involves three steps:

Step 1: Identify sensitive data

Define that identify sensitive data in your tables:
  • Column name pattern (required) - Regular expression to match column names
  • Column value pattern (required) - Regular expression to match values within columns
Use the Test pattern option to validate your regex patterns with sample data before you proceed. This helps ensure patterns match correctly.
Example pattern: Email detection
Column name pattern: (?i)(email|e_mail|emailaddress|mail)
Column value pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
The column name pattern uses case-insensitive matching (?i) to find columns named “email”, “e_mail”, “emailaddress” or “mail”. The column value pattern validates email format in the actual data.

Step 2: Apply protection

Configure the rule and choose protection method:
  • Display name (required) - Human-readable name for the rule
  • Description (optional) - Explain what the rule detects and why
  • Protection method - Select Mask, Hash, Drop or None
  • Category - Classify the rule (PII, Financial, Health, Authentication or Custom)

Step 3: Review changes

Review your rule configuration and save.

Edit data protection rules

Members of the Data protection admin group can edit:
  • Display name
  • Description
  • Category
  • Protection action

Delete data protection rules

Data protection rules can be deleted by members of the Data protection admin group if no longer needed.
Document rules clearly. Good descriptions help other administrators understand rule purpose.

Rule categories

Organize rules into categories for easier management:
  • Custom - organization-specific sensitive data
  • PII (Personally Identifiable Information) - names, addresses, phone numbers
  • Financial - credit cards, bank accounts, transaction data
  • Health - medical records, diagnoses, healthcare-related information
  • Authentication - passwords, tokens, API keys

Our recommendations

  • Create rules before you publish tables - create and test rules before you publish tables. Creating rules after you publish triggers re-scans and may impact active pipelines
  • Configure prebuilt rules first - start with prebuilt rules and set appropriate actions. These cover common sensitive data types and provide a strong baseline
  • Use the least restrictive protection that meets requirements - use Mask when structure is needed, Hash when you need to preserve relationships, and Drop only for highly sensitive data
  • Test rules before production - use the testing interface to validate rules with real sample data before you deploy to production
  • Document custom rules thoroughly - write clear descriptions so other administrators understand rule purpose
  • Audit rules regularly - review active rules regularly to ensure they still align with compliance requirements and business needs

Troubleshoot

Common causes:
  • Rule action set to None - rule matches but takes no action
  • Pattern doesn’t match - regular expression doesn’t match your data
  • Rule priority - another rule with higher priority applies instead
  • Rule not saved - changes were not saved
To fix:
  1. Use the pattern testing interface with sample data
  2. Verify the action is set (not None)
  3. Test the regular expression pattern
  4. Check for conflicting rules with higher priority
  5. Re-save the rule if needed