Databricks Storage
OWOX Data Marts supports Databricks as a storage destination for your data marts. This guide will help you connect your Databricks workspace to OWOX.
Overview
Section titled âOverviewâDatabricks is a unified data analytics platform built on Apache Spark that provides a lakehouse architecture combining the best features of data lakes and data warehouses. With OWOX Data Marts, you can:
- Create and manage data marts in your Databricks workspace
- Use SQL warehouses for fast query execution
- Leverage Unity Catalog for data governance
- Store data in Delta Lake format with ACID transactions
Prerequisites
Section titled âPrerequisitesâBefore connecting Databricks to OWOX, ensure you have:
- A Databricks workspace (AWS, Azure, or GCP)
- A SQL warehouse (compute resource for running queries)
- Appropriate permissions to:
- Create and manage Personal Access Tokens
- Create catalogs, schemas, and tables (or access to existing ones)
- Execute queries on the SQL warehouse
Connection Setup
Section titled âConnection SetupâStep 1: Find Your Workspace URL
Section titled âStep 1: Find Your Workspace URLâYour Databricks workspace URL (host) is the hostname you see in your browser when accessing Databricks.
Format by cloud provider:
- AWS:
dbc-12345678-90ab.cloud.databricks.com - Azure:
adb-123456789.7.azuredatabricks.net - GCP:
12345678901234.5.gcp.databricks.com
To find it:
- Sign in to your Databricks workspace
- Look at the URL in your browserâs address bar
- Copy the hostname (everything before the first
/)
Step 2: Get SQL Warehouse HTTP Path
Section titled âStep 2: Get SQL Warehouse HTTP PathâThe HTTP path identifies which SQL warehouse will be used for query execution.
To find it:
- In your Databricks workspace, click SQL Warehouses in the sidebar
- Select the SQL warehouse you want to use
- Go to the Connection Details tab
- Copy the HTTP Path value
Example: /sql/1.0/warehouses/abc123def456789
Step 3: Generate a Personal Access Token
Section titled âStep 3: Generate a Personal Access TokenâPersonal Access Tokens (PAT) provide secure authentication to Databricks.
To generate a token:
- Sign in to your Databricks workspace
- Click your username in the top right corner
- Select User Settings
- Go to the Developer tab
- Next to Access tokens, click Manage
- Click Generate new token
- (Optional) Enter a comment describing the tokenâs purpose (e.g., âOWOX Data Martsâ)
- (Optional) Set a lifetime for the token
- Click Generate
- Important: Copy and save the token immediately - you wonât be able to see it again!
Security recommendations:
- Never share your Personal Access Token
- Set an expiration date when possible
- Revoke tokens that are no longer needed
- Use separate tokens for different applications
Step 4: Configure Storage in OWOX
Section titled âStep 4: Configure Storage in OWOXâ- Go to Settings â Storages in OWOX
- Click Add Storage
- Select Databricks as the storage type
- Fill in the required fields:
- Title: A friendly name for this storage (e.g., âProduction Databricksâ)
- Host: Your workspace URL from Step 1
- HTTP Path: SQL warehouse path from Step 2
- Personal Access Token: Token generated in Step 3
- Click Test Connection to verify the connection
- Click Save
Catalog and Schema Configuration
Section titled âCatalog and Schema ConfigurationâUnlike some other storages, Databricks catalog and schema are not configured at the storage level. Instead, they are specified when creating connector data marts:
- When setting up a connector, youâll specify the fully qualified table name in the format:
catalog.schema.table_name - This allows flexibility to use different catalogs and schemas for different data marts
- If youâre not using Unity Catalog, you can use the default
hive_metastorecatalog
Examples:
main.analytics.user_events(Unity Catalog)hive_metastore.default.data(without Unity Catalog)
Supported Features
Section titled âSupported FeaturesâData Types
Section titled âData TypesâDatabricks Data Marts support the following Databricks SQL data types:
| OWOX Type | Databricks Type |
|---|---|
| string | STRING |
| number | DOUBLE |
| integer | BIGINT |
| boolean | BOOLEAN |
| date | DATE |
| datetime | TIMESTAMP |
| timestamp | TIMESTAMP |
| array | ARRAY |
| object | STRUCT<> |
Operations
Section titled âOperationsâ- MERGE operations: Data updates use Databricks MERGE statements for efficient upserts
- Auto-create tables: Tables are created automatically if they donât exist
- Schema evolution: New columns are added automatically as needed
- Delta Lake: All tables use Delta Lake format with ACID transactions
Unity Catalog
Section titled âUnity CatalogâIf your workspace uses Unity Catalog:
- Ensure you have appropriate permissions on the catalog and schema
- The catalog will be created automatically if it doesnât exist (requires
CREATE CATALOGpermission) - The schema will be created automatically if it doesnât exist (requires
CREATE SCHEMApermission) - Tables are created with full three-level namespace:
catalog.schema.table
If Unity Catalog is not enabled, you can use the default hive_metastore catalog.
Troubleshooting
Section titled âTroubleshootingâConnection Issues
Section titled âConnection IssuesâProblem: âFailed to connect to Databricksâ
Solutions:
- Verify your workspace URL is correct (no
https://prefix needed) - Check that the Personal Access Token is valid and not expired
- Ensure the token hasnât been revoked
- Verify the SQL warehouse is running (it should auto-start when needed)
Problem: âSQL warehouse not foundâ
Solutions:
- Verify the HTTP path is correct
- Check that the SQL warehouse exists and you have access to it
- Ensure the warehouse hasnât been deleted or renamed
Permission Issues
Section titled âPermission IssuesâProblem: âPermission deniedâ when creating tables
Solutions:
- Verify you have
CREATE TABLEpermission on the catalog/schema - Check Unity Catalog permissions if applicable
- Ensure your user or service principal has appropriate grants
Problem: âCatalog not foundâ errors
Solutions:
- If using Unity Catalog, verify the catalog exists or you have
CREATE CATALOGpermission - Use the correct three-level namespace:
catalog.schema.table - For non-Unity workspaces, use
hive_metastoreas the catalog name
Query Execution Issues
Section titled âQuery Execution IssuesâProblem: Queries are slow
Solutions:
- Check if the SQL warehouse is properly sized for your workload
- Consider using a larger warehouse size
- Enable auto-scaling if not already enabled
- Review query execution plans for optimization opportunities
Problem: âQuery execution failedâ errors
Solutions:
- Check the error message for specific SQL syntax issues
- Verify table and column names are correctly quoted
- Ensure data types are compatible
Best Practices
Section titled âBest Practicesâ-
Warehouse Management
- Use appropriately sized SQL warehouses for your workload
- Enable auto-stop to reduce costs when warehouse is idle
- Consider using different warehouses for different workloads
-
Security
- Rotate Personal Access Tokens regularly
- Set token expiration dates
- Use workspace-level or account-level tokens as appropriate
- Implement IP access lists if security requirements demand it
-
Cost Optimization
- Use auto-stop for SQL warehouses
- Right-size your warehouse (start small and scale up if needed)
- Monitor DBU usage in Databricks billing console
- Consider using serverless SQL warehouses when available
-
Data Organization
- Use Unity Catalog for better data governance
- Organize data marts into logical catalogs and schemas
- Follow naming conventions for easier management
- Document catalog and schema purposes
-
Performance
- Use Delta Lakeâs optimization features (OPTIMIZE, VACUUM)
- Consider partitioning large tables
- Use appropriate data types for better compression
- Monitor query performance and optimize as needed
Additional Resources
Section titled âAdditional Resourcesâ- Databricks SQL Warehouses Documentation
- Personal Access Tokens Documentation
- Unity Catalog Documentation
- Delta Lake Documentation
- Databricks SQL Reference
Support
Section titled âSupportâIf you encounter issues not covered in this guide:
- Check the OWOX Documentation
- Contact OWOX Support
- Review Databricks documentation for platform-specific issues