Create data protection rules - Orka Documentation

Overview

Data protection rules define how Orka detects and protects sensitive data. Orka uses field-level protection to secure specific fields, so tables with mixed sensitivity levels remain usable. Rules match patterns in column names and values to detect sensitive data. When Orka scans tables, it labels and protects matching columns with the configured action.

Security by design

Orka’s security is foundational rather than added-on:

Automatic detection - sensitive data is automatically identified during scanning
Before movement - protection is applied before data leaves the source
Field-level granularity - only sensitive fields are protected, not entire tables
Rule-based - consistent policies automatically applied across all data

Field-level protection

Orka protects individual fields instead of blocking entire tables:

Tables with a mix of sensitive and non-sensitive data can be safely used
Only the sensitive columns are protected
Users get access to valuable data that would otherwise be blocked
Compliance requirements are still met

Protection methods

When a data protection rule matches sensitive data, it applies one of four protection methods (actions):

1. Mask

Hides values with asterisks while preserving structure

Replaces characters with asterisks (for example: “[email protected]” → ”****@email.com”)
Preserves data format and structure
Masked in catalog and destination systems
Users can see that data exists but can’t see actual values

Use masking when you want users to know data exists but not see the actual values.

2. Hash

One-way hash protecting values while preserving referential integrity

Converts values to consistent hash values
Same input always produces same output
Allows JOIN operations on these values in destination systems
Impossible to reverse back to original values

Maintains record relationships without exposing values.

3. Drop

Removes the column entirely (complete data suppression)

Column excluded from destination systems
Column values never appear in Orka
Most restrictive protection method
Used for highly sensitive data that should never leave the source
Can’t be reversed downstream

Use Drop sparingly because removed columns can’t be recovered in the lakehouse.

4. None (no action)

The rule identifies the data but applies no protection

Used to disable a data protection rule while still allowing it to be evaluated
Helpful for testing and rule development
Provides visibility without enforcement

Rule priority

If multiple rules match a single column, the column is automatically protected with the highest priority action:

Drop > Hash > Mask > None

The most restrictive protection wins when rules conflict.

Manage data protection rules

Only members of the data protection admin group can manage data protection rules.

To manage data protection rules, go to Data protection rules in the sidebar under the Data protection section.

Create a data protection rule

Creating a data protection rule involves three steps:

Step 1: Identify sensitive data

Define that identify sensitive data in your tables:

Column name pattern (required) - Regular expression to match column names
Column value pattern (required) - Regular expression to match values within columns

Use the Test pattern option to validate your regex patterns with sample data before you proceed. This helps ensure patterns match correctly.

Example pattern: Email detection

Column name pattern: (?i)(email|e_mail|emailaddress|mail)
Column value pattern: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

The column name pattern uses case-insensitive matching (?i) to find columns named “email”, “e_mail”, “emailaddress” or “mail”. The column value pattern validates email format in the actual data.

Step 2: Apply protection

Configure the rule and choose protection method:

Display name (required) - Human-readable name for the rule
Description (optional) - Explain what the rule detects and why
Protection method - Select Mask, Hash, Drop or None
Category - Classify the rule (PII, Financial, Health, Authentication or Custom)

Step 3: Review changes

Review your rule configuration and save.

Edit data protection rules

Members of the Data protection admin group can edit:

Display name
Description
Category
Protection action

Delete data protection rules

Data protection rules can be deleted by members of the Data protection admin group if no longer needed.

Document rules clearly. Good descriptions help other administrators understand rule purpose.

Rule categories

Organize rules into categories for easier management:

Custom - organization-specific sensitive data
PII (Personally Identifiable Information) - names, addresses, phone numbers
Financial - credit cards, bank accounts, transaction data
Health - medical records, diagnoses, healthcare-related information
Authentication - passwords, tokens, API keys

Our recommendations

Create rules before you publish tables - create and test rules before you publish tables. Creating rules after you publish triggers re-scans and may impact active pipelines
Configure prebuilt rules first - start with prebuilt rules and set appropriate actions. These cover common sensitive data types and provide a strong baseline
Use the least restrictive protection that meets requirements - use Mask when structure is needed, Hash when you need to preserve relationships, and Drop only for highly sensitive data
Test rules before production - use the testing interface to validate rules with real sample data before you deploy to production
Document custom rules thoroughly - write clear descriptions so other administrators understand rule purpose
Audit rules regularly - review active rules regularly to ensure they still align with compliance requirements and business needs

Troubleshoot

Why doesn't my data protection rule detect sensitive data?

Common causes:

Rule action set to None - rule matches but takes no action
Pattern doesn’t match - regular expression doesn’t match your data
Rule priority - another rule with higher priority applies instead
Rule not saved - changes were not saved

To fix:

Use the pattern testing interface with sample data
Verify the action is set (not None)
Test the regular expression pattern
Check for conflicting rules with higher priority
Re-save the rule if needed

Assign permissions

​Overview

​Security by design

​Field-level protection

​Protection methods

​1. Mask

​2. Hash

​3. Drop

​4. None (no action)

​Rule priority

​Manage data protection rules

​Create a data protection rule

​Step 1: Identify sensitive data

​Step 2: Apply protection

​Step 3: Review changes

​Edit data protection rules

​Delete data protection rules

​Rule categories

​Our recommendations

​Troubleshoot

​Related resources

Overview

Security by design

Field-level protection

Protection methods

1. Mask

2. Hash

3. Drop

4. None (no action)

Rule priority

Manage data protection rules

Create a data protection rule

Step 1: Identify sensitive data

Step 2: Apply protection

Step 3: Review changes

Edit data protection rules

Delete data protection rules

Rule categories

Our recommendations

Troubleshoot

Related resources