PIGSTY

pg_duckdb

DuckDB Embedded in Postgres

Overview

Attributes

YesNoYesYesYesNoNo-

Packages

EL
PIGSTY
pg_duckdb_$v*0.3.1-
17
16
15
14
13
Debian
PIGSTY
postgresql-$v-pg-duckdb0.3.1-
17
16
15
14
13

Comments

invalidate duckdb_fdw, broken on el8 due to c++ too low


Availability

PG17PG16PG15PG14PG13
el8.x86_64
el8.aarch64
el9.x86_64
0.3.1
0.3.1
0.3.1
0.3.1
el9.aarch64
0.3.1
0.3.1
0.3.1
0.3.1
d12.x86_64
0.3.1
0.3.1
0.3.1
0.3.1
d12.aarch64
0.3.1
0.3.1
0.3.1
0.3.1
u22.x86_64
0.3.1
0.3.1
0.3.1
0.3.1
u22.aarch64
0.3.1
0.3.1
0.3.1
0.3.1
u24.x86_64
0.3.1
0.3.1
0.3.1
0.3.1
u24.aarch64
0.3.1
0.3.1
0.3.1
0.3.1
CONTRIB
PGDG
PIGSTY

Download

To add the required PGDG / PIGSTY upstream repository, use:

pig repo add pgdg -u    # add PGDG repo and update cache (leave existing repos)
pig repo add pigsty -u  # add PIGSTY repo and update cache (leave existing repos)
pig repo add pgsql -u   # add PGDG + Pigsty repo and update cache (leave existing repos)
pig repo set all -u     # set repo to all = NODE + PGSQL + INFRA  (remove existing repos)
./node.yml -t node_repo -e node_repo_modules=node,pgsql # -l <cluster>

Or download the latest packages directly:


Install

Install this extension with:

pig ext install pg_duckdb; # install by extension name, for the current active PG version

pig ext install pg_duckdb -v 17;   # install for PG 17
pig ext install pg_duckdb -v 16;   # install for PG 16
pig ext install pg_duckdb -v 15;   # install for PG 15
pig ext install pg_duckdb -v 14;   # install for PG 14
dnf install pg_duckdb_17*;
dnf install pg_duckdb_16*;
dnf install pg_duckdb_15*;
dnf install pg_duckdb_14*;
apt install postgresql-17-pg-duckdb;
apt install postgresql-16-pg-duckdb;
apt install postgresql-15-pg-duckdb;
apt install postgresql-14-pg-duckdb;
./pgsql.yml -t pg_ext -e '{"pg_extensions": ["pg_duckdb"]}' # -l <cls>

Preload this extension with:

shared_preload_libraries = 'pg_duckdb'; # add to pg cluster config

Create this extension with:

CREATE EXTENSION pg_duckdb;

Usage

pg_duckdb embeds DuckDB's columnar-vectorized analytics engine directly into PostgreSQL, enabling high-performance analytics within PostgreSQL while maintaining full SQL compatibility and ACID properties.

Query Acceleration

Force analytical queries to use DuckDB's vectorized execution engine for dramatic performance improvements.

-- Enable DuckDB execution for current session
SET duckdb.force_execution = true;

-- Analytical queries automatically use DuckDB engine
SELECT 
    product_category,
    AVG(price) as avg_price,
    COUNT(*) as total_items
FROM products 
GROUP BY product_category
ORDER BY avg_price DESC;

-- Check execution plan
EXPLAIN SELECT count(*) FROM large_table;
-- Shows: Custom Scan (DuckDBScan) with DuckDB execution plan

Data Lake Integration

Query cloud storage and modern data formats directly from PostgreSQL without ETL.

-- Read Parquet files from S3
SELECT customer_id, SUM(amount) as total_spent
FROM read_parquet('s3://data-lake/transactions/*.parquet')
WHERE purchase_date >= '2024-01-01'
GROUP BY customer_id;

-- Query CSV files with schema inference
SELECT * FROM read_csv('https://example.com/sales_data.csv')
WHERE region = 'North America';

-- Read JSON files with automatic flattening
SELECT event_type, COUNT(*) as event_count
FROM read_json('s3://events/user_events.json')
GROUP BY event_type;

-- Scan Apache Iceberg tables
SELECT * FROM iceberg_scan('s3://warehouse/orders/metadata.json')
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

-- Query Delta Lake tables
SELECT product_id, SUM(quantity) as total_sales
FROM delta_scan('s3://delta-lake/sales/')
GROUP BY product_id;

Hybrid Storage Tables

Create DuckDB-backed tables for analytical workloads with columnar storage and compression.

-- Create analytics table using DuckDB storage
CREATE TABLE sales_analytics (
    transaction_id BIGINT,
    product_id INT,
    customer_id INT,
    sale_date DATE,
    amount DECIMAL(10,2),
    quantity INT
) USING duckdb;

-- Insert data (stored in columnar format)
INSERT INTO sales_analytics 
SELECT id, product_id, customer_id, sale_date, price * quantity, quantity
FROM transactions
WHERE sale_date >= '2024-01-01';

-- High-performance analytical queries
SELECT 
    product_id,
    DATE_TRUNC('month', sale_date) as month,
    SUM(amount) as monthly_revenue,
    AVG(quantity) as avg_quantity
FROM sales_analytics
GROUP BY product_id, month
ORDER BY monthly_revenue DESC;

Cloud Storage Security

Manage secure credentials for cloud data access.

-- Create S3 credentials
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'AKIAIOSFODNN7EXAMPLE',
    secret := 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
);

-- Create Azure credentials
SELECT duckdb.create_simple_secret(
    type := 'AZURE',
    connection_string := 'DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey'
);

-- Use credential chain for AWS
CREATE SERVER s3_server FOREIGN DATA WRAPPER duckdb_fdw 
OPTIONS (type 'S3', provider 'credential_chain');

MotherDuck Cloud Integration

Connect to MotherDuck cloud analytics platform for distributed analytics.

-- Enable MotherDuck connection
CALL duckdb.enable_motherduck('your_token_here', 'production_db');

-- Create cloud-synchronized table
CREATE TABLE cloud_metrics(
    timestamp TIMESTAMP,
    metric_name TEXT,
    value DOUBLE,
    tags JSON
) USING duckdb;

-- Query across local and cloud data
SELECT 
    local.user_id,
    local.session_count,
    cloud.conversion_rate
FROM user_sessions local
JOIN ddb$shared_analytics.user_metrics cloud 
  ON local.user_id = cloud.user_id;

-- Force cloud synchronization
SELECT duckdb.force_motherduck_sync();

Extension Management

Leverage DuckDB's extension ecosystem for additional capabilities.

-- Install extensions (superuser required)
SELECT duckdb.install_extension('iceberg');
SELECT duckdb.install_extension('delta'); 
SELECT duckdb.install_extension('azure');

-- View installed extensions
SELECT name, loaded, installed 
FROM duckdb.extensions();

-- Extensions auto-load when functions are used
SELECT * FROM delta_scan('azure://container/delta-table/');

Direct DuckDB Queries

Execute raw DuckDB queries for advanced functionality.

-- Execute DuckDB query directly
SELECT duckdb.query('PRAGMA version');

-- Raw query with complex DuckDB syntax
SELECT duckdb.raw_query($$
    WITH monthly_sales AS (
        SELECT DATE_TRUNC('month', sale_date) as month, 
               SUM(amount) as total
        FROM sales_data 
        GROUP BY month
    )
    SELECT month, total,
           LAG(total) OVER (ORDER BY month) as prev_month,
           total - LAG(total) OVER (ORDER BY month) as growth
    FROM monthly_sales
$$);

-- Reset DuckDB instance
SELECT duckdb.recycle_ddb();

Configuration

Performance Tuning

-- Memory allocation (default: 80% of system memory)
SET duckdb.max_memory = '8GB';

-- Thread configuration
SET duckdb.threads = 8;

-- PostgreSQL scan optimization
SET duckdb.max_workers_per_postgres_scan = 4;

-- Disable specific execution for debugging
SET duckdb.force_execution = false;

Security Controls

-- Restrict access to specific PostgreSQL roles
SET duckdb.postgres_role = 'analytics_team';

-- Disable filesystem access
SET duckdb.disabled_filesystems = 'LocalFileSystem';

-- Control extension installation
SET duckdb.autoinstall_known_extensions = false;
SET duckdb.allow_community_extensions = false;

-- External access control
SET duckdb.enable_external_access = true;

Data Type Compatibility

PostgreSQL TypeDuckDB SupportNotes
INTEGER, BIGINTFullDirect mapping
REAL, DOUBLE PRECISIONFullDirect mapping
TEXT, VARCHARFullDirect mapping
BOOLEANFullDirect mapping
UUIDFullDirect mapping
JSON/JSONBFullJSONB converted to JSON
DATE, TIMESTAMPFullMicrosecond precision
NUMERICLimitedMay lose precision
ARRAYS (1D)FullMulti-dimensional not supported
ENUMNoneNot supported

Use Cases

  • Real-time Analytics: Accelerate analytical queries on transactional data by 10-100x
  • Data Lake Analytics: Query Parquet, Iceberg, and Delta Lake tables directly
  • Hybrid Workloads: Combine OLTP applications with high-performance analytics
  • Operational Dashboards: Build responsive dashboards on live PostgreSQL data
  • ETL Elimination: Remove data movement overhead for analytical workloads
  • Cloud Data Integration: Seamlessly access cloud storage from PostgreSQL

pg_duckdb transforms PostgreSQL into a powerful hybrid OLTP/OLAP database, enabling organizations to run analytical workloads at scale without sacrificing transactional consistency or requiring separate analytics infrastructure.