Skip to main content
Sources are connections to operational databases. You can discover tables, scan them for sensitive data and publish them to the catalog. To connect a source, go to Sources in the sidebar under the Connections section.

1. Choose source

Select the type of database you want to connect to. Orka currently supports PostgreSQL as a source only. Support for additional sources coming soon.

2. Configure credentials

Enter the connection details:
  • Host and port
  • Credentials (username and password or service account)
  • Database name
Use Test connection to verify configuration.

Database permissions

The database user you configure has to have the following permissions: For PostgreSQL:
  • SELECT privilege on all tables you want to scan
  • CONNECT privilege on the database
  • USAGE privilege on schemas containing the tables
Ensure the user has read access to the tables before attempting to connect.

Table ingestion

After you connect to a source, complete the table ingestion workflow in three steps.

Step 1: Specify tables

Orka displays the tables it finds with metadata:
  • Row count - number of records in the table
  • Column count - number of columns and fields
  • Last modified date - when the table was last updated
Orka automatically filters empty, backup and test tables.

Filter tables

You can use filters to narrow down the table list:
  • Hide empty tables
  • Hide test tables
  • Hide backup tables
  • Custom name filters
Select the tables you want to scan and profile. Choose only what you need in your lakehouse.

Step 2: Scan and select

Orka runs an automated scanning process that:
  1. Samples rows from each selected table
  2. Extracts the schema and column metadata
  3. Evaluates data protection rules against column names and values
  4. Profiles the data to reveal key metrics

Understand profiling metrics

Profiling reveals valuable insights about your data quality:
MetricDescriptionExample Value
Null %Percentage of null or empty values in the column0.02% means almost all values are populated
CardinalityNumber of unique values in the column15,264 unique values indicates high diversity
UniquenessPercentage of unique values compared to total rows99.92% means nearly every value is distinct
Matching rulesWhich data protection rules match this column”Email address”, “Social Security Number”
Profiling helps you assess data usefulness before it’s moved to the lakehouse.

Sensitivity classifications

Orka automatically identifies sensitive data based on data protection rules. Each column is classified and shows which protection method Orka applies.

Understand protection indicators

Scan results display colored badges that show how Orka protects sensitive columns:
  • Masked - converts values to asterisks, hides sensitive values while preserving structure
  • Hashed - applies one-way hash, protects values while allowing JOINs
  • Dropped - removes column entirely, most restrictive option
The protection method depends on which data protection rule matched the column and its configured action.

Step 3: Review and publish

Before you publish, review the scan results and add descriptions to help others understand what each table contains. Table descriptions:
  • Appear in data catalog search results
  • Help analysts find relevant data faster
  • Provide context about table purpose and contents
  • Are indexed and searchable
You can add more tables from this source at any time. Go to the source detail page and click Add more tables to scan and publish additional tables.

Our recommendations

  • Start with non-sensitive data - begin with tables that contain no sensitive data to understand the workflow without security approval delays
  • Use table filters - hide empty, test and backup tables to focus on production data that matters
  • Review scans carefully - review sensitivity classifications. False positives can delay data access, while false negatives can cause compliance issues
  • Provide table documentation - if your database has table and column descriptions, Orka indexes them to make data discovery easier

Troubleshoot

Once you publish tables to the catalog, you can’t unpublish them. Published tables may have active pipelines that depend on them.If you need to remove tables:
  1. Delete any active pipelines that use the tables
  2. Contact a member of the Admin group to discuss removing the tables from the catalog
Consider carefully which tables to publish before you complete the publish workflow.
Common causes:
IssueSolution
Incorrect credentialsVerify username/password or service account
Network connectivityCheck firewall rules and network access
Wrong host/portVerify the hostname and port number
Database name typoConfirm the exact database name
Insufficient permissionsEnsure the user has required database permissions
Verify credentials outside of Orka first (for example, using a database client).
Possible causes:
  • The database contains no tables
  • The user lacks permission to list tables
  • Tables are in a different schema or catalog
  • Active filters hide all tables
To fix:
  1. Verify the database has tables with a database client
  2. Check the user has SELECT and SHOW TABLES permissions
  3. Try to adjust table filters (show empty tables)
  4. Verify you connected to the correct database/schema
Slow table discovery can be caused by:
  • Large number of tables in the database
  • Slow database response
  • Network latency
Slow data scan can be caused by:
  • Large tables
  • Many tables selected for scan
  • Complex data protection rules
  • Database load
To improve performance:
  • Use filters to limit scope (hide empty, test, backup tables)
  • Scan tables in smaller batches
  • Schedule scans in off-peak hours
  • Start with smaller tables to test the workflow
This happens when data protection rules are too broad and match more fields than you need.To fix:
  1. Review which data protection rules match
  2. Check if prebuilt rules need adjustment
  3. Members of the Data protection admin group should review and adjust rule patterns
  4. Consider custom rules with more specific patterns
Test rule patterns with sample data before you scan large databases to find overly broad patterns.