Ingest data - Orka Documentation

Sources are connections to operational databases. You can discover tables, scan them for sensitive data and publish them to the catalog. To connect a source, go to Sources in the sidebar under the Connections section.

1. Choose source

Select the type of database you want to connect to. Orka currently supports PostgreSQL as a source only. Support for additional sources coming soon.

2. Configure credentials

Enter the connection details:

Host and port
Credentials (username and password or service account)
Database name

Use Test connection to verify configuration.

Database permissions

The database user you configure has to have the following permissions: For PostgreSQL:

SELECT privilege on all tables you want to scan
CONNECT privilege on the database
USAGE privilege on schemas containing the tables

Ensure the user has read access to the tables before attempting to connect.

Table ingestion

After you connect to a source, complete the table ingestion workflow in three steps.

Step 1: Specify tables

Orka displays the tables it finds with metadata:

Row count - number of records in the table
Column count - number of columns and fields
Last modified date - when the table was last updated

Orka automatically filters empty, backup and test tables.

Filter tables

You can use filters to narrow down the table list:

Hide empty tables
Hide test tables
Hide backup tables
Custom name filters

Select the tables you want to scan and profile. Choose only what you need in your lakehouse.

Step 2: Scan and select

Orka runs an automated scanning process that:

Samples rows from each selected table
Extracts the schema and column metadata
Evaluates data protection rules against column names and values
Profiles the data to reveal key metrics

Understand profiling metrics

Profiling reveals valuable insights about your data quality:

Metric	Description	Example Value
Null %	Percentage of null or empty values in the column	0.02% means almost all values are populated
Cardinality	Number of unique values in the column	15,264 unique values indicates high diversity
Uniqueness	Percentage of unique values compared to total rows	99.92% means nearly every value is distinct
Matching rules	Which data protection rules match this column	”Email address”, “Social Security Number”

Profiling helps you assess data usefulness before it’s moved to the lakehouse.

Sensitivity classifications

Orka automatically identifies sensitive data based on data protection rules. Each column is classified and shows which protection method Orka applies.

Understand protection indicators

Scan results display colored badges that show how Orka protects sensitive columns:

Masked - converts values to asterisks, hides sensitive values while preserving structure
Hashed - applies one-way hash, protects values while allowing JOINs
Dropped - removes column entirely, most restrictive option

The protection method depends on which data protection rule matched the column and its configured action.

Step 3: Review and publish

Before you publish, review the scan results and add descriptions to help others understand what each table contains. Table descriptions:

Appear in data catalog search results
Help analysts find relevant data faster
Provide context about table purpose and contents
Are indexed and searchable

You can add more tables from this source at any time. Go to the source detail page and click Add more tables to scan and publish additional tables.

Our recommendations

Start with non-sensitive data - begin with tables that contain no sensitive data to understand the workflow without security approval delays
Use table filters - hide empty, test and backup tables to focus on production data that matters
Review scans carefully - review sensitivity classifications. False positives can delay data access, while false negatives can cause compliance issues
Provide table documentation - if your database has table and column descriptions, Orka indexes them to make data discovery easier

Troubleshoot

Why can't I unpublish tables from the catalog?

Once you publish tables to the catalog, you can’t unpublish them. Published tables may have active pipelines that depend on them.If you need to remove tables:

Delete any active pipelines that use the tables
Contact a member of the Admin group to discuss removing the tables from the catalog

Consider carefully which tables to publish before you complete the publish workflow.

Why can't I connect to my database?

Common causes:

Issue	Solution
Incorrect credentials	Verify username/password or service account
Network connectivity	Check firewall rules and network access
Wrong host/port	Verify the hostname and port number
Database name typo	Confirm the exact database name
Insufficient permissions	Ensure the user has required database permissions

Verify credentials outside of Orka first (for example, using a database client).

Why don't I see any tables after I connect?

Possible causes:

The database contains no tables
The user lacks permission to list tables
Tables are in a different schema or catalog
Active filters hide all tables

To fix:

Verify the database has tables with a database client
Check the user has SELECT and SHOW TABLES permissions
Try to adjust table filters (show empty tables)
Verify you connected to the correct database/schema

Why is table discovery or scan slow?

Slow table discovery can be caused by:

Large number of tables in the database
Slow database response
Network latency

Slow data scan can be caused by:

Large tables
Many tables selected for scan
Complex data protection rules
Database load

To improve performance:

Use filters to limit scope (hide empty, test, backup tables)
Scan tables in smaller batches
Schedule scans in off-peak hours
Start with smaller tables to test the workflow

Why are so many fields marked as sensitive?

This happens when data protection rules are too broad and match more fields than you need.To fix:

Review which data protection rules match
Check if prebuilt rules need adjustment
Members of the Data protection admin group should review and adjust rule patterns
Consider custom rules with more specific patterns

Test rule patterns with sample data before you scan large databases to find overly broad patterns.

​1. Choose source

​2. Configure credentials

​Database permissions

​Table ingestion

​Step 1: Specify tables

​Filter tables

​Step 2: Scan and select

​Understand profiling metrics

​Sensitivity classifications

​Understand protection indicators

​Step 3: Review and publish

​Our recommendations

​Troubleshoot

​Related resources

1. Choose source

2. Configure credentials

Database permissions

Table ingestion

Step 1: Specify tables

Filter tables

Step 2: Scan and select

Understand profiling metrics

Sensitivity classifications

Understand protection indicators

Step 3: Review and publish

Our recommendations

Troubleshoot

Related resources