1. Choose source
Select the type of database you want to connect to. Orka currently supports PostgreSQL as a source only. Support for additional sources coming soon.2. Configure credentials
Enter the connection details:- Host and port
- Credentials (username and password or service account)
- Database name
Use Test connection to verify configuration.
Database permissions
The database user you configure has to have the following permissions: For PostgreSQL:SELECTprivilege on all tables you want to scanCONNECTprivilege on the databaseUSAGEprivilege on schemas containing the tables
Table ingestion
After you connect to a source, complete the table ingestion workflow in three steps.Step 1: Specify tables
Orka displays the tables it finds with metadata:- Row count - number of records in the table
- Column count - number of columns and fields
- Last modified date - when the table was last updated
Orka automatically filters empty, backup and test tables.
Filter tables
You can use filters to narrow down the table list:- Hide empty tables
- Hide test tables
- Hide backup tables
- Custom name filters
Step 2: Scan and select
Orka runs an automated scanning process that:- Samples rows from each selected table
- Extracts the schema and column metadata
- Evaluates data protection rules against column names and values
- Profiles the data to reveal key metrics
Understand profiling metrics
Profiling reveals valuable insights about your data quality:| Metric | Description | Example Value |
|---|---|---|
| Null % | Percentage of null or empty values in the column | 0.02% means almost all values are populated |
| Cardinality | Number of unique values in the column | 15,264 unique values indicates high diversity |
| Uniqueness | Percentage of unique values compared to total rows | 99.92% means nearly every value is distinct |
| Matching rules | Which data protection rules match this column | ”Email address”, “Social Security Number” |
Profiling helps you assess data usefulness before it’s moved to the lakehouse.
Sensitivity classifications
Orka automatically identifies sensitive data based on data protection rules. Each column is classified and shows which protection method Orka applies.Understand protection indicators
Scan results display colored badges that show how Orka protects sensitive columns:- Masked - converts values to asterisks, hides sensitive values while preserving structure
- Hashed - applies one-way hash, protects values while allowing JOINs
- Dropped - removes column entirely, most restrictive option
The protection method depends on which data protection rule matched the column and its configured action.
Step 3: Review and publish
Before you publish, review the scan results and add descriptions to help others understand what each table contains. Table descriptions:- Appear in data catalog search results
- Help analysts find relevant data faster
- Provide context about table purpose and contents
- Are indexed and searchable
You can add more tables from this source at any time. Go to the source detail page and click Add more tables to scan and publish additional tables.
Our recommendations
- Start with non-sensitive data - begin with tables that contain no sensitive data to understand the workflow without security approval delays
- Use table filters - hide empty, test and backup tables to focus on production data that matters
- Review scans carefully - review sensitivity classifications. False positives can delay data access, while false negatives can cause compliance issues
- Provide table documentation - if your database has table and column descriptions, Orka indexes them to make data discovery easier
Troubleshoot
Why can't I unpublish tables from the catalog?
Why can't I unpublish tables from the catalog?
Once you publish tables to the catalog, you can’t unpublish them. Published tables may have active pipelines that depend on them.If you need to remove tables:
- Delete any active pipelines that use the tables
- Contact a member of the Admin group to discuss removing the tables from the catalog
Consider carefully which tables to publish before you complete the publish workflow.
Why can't I connect to my database?
Why can't I connect to my database?
Common causes:
| Issue | Solution |
|---|---|
| Incorrect credentials | Verify username/password or service account |
| Network connectivity | Check firewall rules and network access |
| Wrong host/port | Verify the hostname and port number |
| Database name typo | Confirm the exact database name |
| Insufficient permissions | Ensure the user has required database permissions |
Verify credentials outside of Orka first (for example, using a database client).
Why don't I see any tables after I connect?
Why don't I see any tables after I connect?
Possible causes:
- The database contains no tables
- The user lacks permission to list tables
- Tables are in a different schema or catalog
- Active filters hide all tables
- Verify the database has tables with a database client
- Check the user has SELECT and SHOW TABLES permissions
- Try to adjust table filters (show empty tables)
- Verify you connected to the correct database/schema
Why is table discovery or scan slow?
Why is table discovery or scan slow?
Slow table discovery can be caused by:
- Large number of tables in the database
- Slow database response
- Network latency
- Large tables
- Many tables selected for scan
- Complex data protection rules
- Database load
- Use filters to limit scope (hide empty, test, backup tables)
- Scan tables in smaller batches
- Schedule scans in off-peak hours
- Start with smaller tables to test the workflow
Why are so many fields marked as sensitive?
Why are so many fields marked as sensitive?
This happens when data protection rules are too broad and match more fields than you need.To fix:
- Review which data protection rules match
- Check if prebuilt rules need adjustment
- Members of the Data protection admin group should review and adjust rule patterns
- Consider custom rules with more specific patterns
Test rule patterns with sample data before you scan large databases to find overly broad patterns.