Pattern types
Data protection rules use two types of patterns to detect sensitive data:- Column name: use when column names indicate sensitive data (for example, a column named
emailorssn). - Value: use to detect the actual data within columns.
- Column pattern: Case-insensitive match for “email”, “e_mail”, “emailaddress” or “mail”
- Value pattern: Standard email format validation
- Combined: Detect email columns by name or content
Regex basics
If you’re new to regular expressions, here are the essential elements:| Element | Meaning | Example | Matches |
|---|---|---|---|
. | Any single character | a.c | ”abc”, “a1c”, “a c” |
* | Zero or more of previous | ab*c | ”ac”, “abc”, “abbc” |
+ | One or more of previous | ab+c | ”abc”, “abbc” (not “ac”) |
? | Zero or one of previous | ab?c | ”ac”, “abc” (not “abbc”) |
^ | Start of string | ^hello | ”hello world” (not “say hello”) |
$ | End of string | world$ | ”hello world” (not “world peace”) |
| | OR operator | cat|dog | ”cat”, “dog” |
[] | Character class | [abc] | ”a”, “b”, “c” |
[^] | Negated character class | [^abc] | Any character except a, b, c |
() | Grouping | (ab)+ | ”ab”, “abab”, “ababab” |
\b | Word boundary | \bcat\b | ”cat” (not “category”) |
\d | Any digit | \d{3} | ”123”, “456” |
\w | Word character | \w+ | ”hello”, “user123” |
\s | Whitespace | \s+ | ” ”, ” ”, tab |
Escape special characters
To match special characters literally, escape them with a backslash:Best practices
Test patterns before you apply them
Test patterns before you apply them
Use Orka’s rule testing interface with actual sample data from your databases before you apply patterns to production.
Review prebuilt rules first
Review prebuilt rules first
Before you create custom patterns, check if Orka’s prebuilt rules already cover your use case.
Document your patterns
Document your patterns
Add clear descriptions to custom rules that explain what the pattern detects and why. This helps other members of the Data protection admin group understand and maintain your rules.
Account for data variations
Account for data variations
Account for inconsistencies in real-world data: different separators, optional formatting, case variations and common misspellings.