pg_duckdb

Overview


2430	`pg_duckdb`	`pg_duckdb`	0.3.1	OLAP	MIT	C++

Attributes


Yes	No	Yes	Yes	Yes	No	No	-

Packages


EL	PIGSTY	`pg_duckdb_$v*`	`0.3.1`	-	17 16 15 14 13
Debian	PIGSTY	`postgresql-$v-pg-duckdb`	`0.3.1`	-	17 16 15 14 13

Comments

invalidate duckdb_fdw, broken on el8 due to c++ too low

	PG17	PG16	PG15	PG14	PG13
`el8.x86_64`	✗	✗	✗	✗	✗
`el8.aarch64`	✗	✗	✗	✗	✗
`el9.x86_64`	0.3.1	0.3.1	0.3.1	0.3.1	✗
`el9.aarch64`	0.3.1	0.3.1	0.3.1	0.3.1	✗
`d12.x86_64`	0.3.1	0.3.1	0.3.1	0.3.1	✗
`d12.aarch64`	0.3.1	0.3.1	0.3.1	0.3.1	✗
`u22.x86_64`	0.3.1	0.3.1	0.3.1	0.3.1	✗
`u22.aarch64`	0.3.1	0.3.1	0.3.1	0.3.1	✗
`u24.x86_64`	0.3.1	0.3.1	0.3.1	0.3.1	✗
`u24.aarch64`	0.3.1	0.3.1	0.3.1	0.3.1	✗

Download

To add the required PGDG / PIGSTY upstream repository, use:

pig repo add pgdg -u    # add PGDG repo and update cache (leave existing repos)
pig repo add pigsty -u  # add PIGSTY repo and update cache (leave existing repos)
pig repo add pgsql -u   # add PGDG + Pigsty repo and update cache (leave existing repos)
pig repo set all -u     # set repo to all = NODE + PGSQL + INFRA  (remove existing repos)

./node.yml -t node_repo -e node_repo_modules=node,pgsql # -l <cluster>

Or download the latest packages directly:

Install

Install this extension with:

pig ext install pg_duckdb; # install by extension name, for the current active PG version

pig ext install pg_duckdb -v 17;   # install for PG 17
pig ext install pg_duckdb -v 16;   # install for PG 16
pig ext install pg_duckdb -v 15;   # install for PG 15
pig ext install pg_duckdb -v 14;   # install for PG 14

dnf install pg_duckdb_17*;
dnf install pg_duckdb_16*;
dnf install pg_duckdb_15*;
dnf install pg_duckdb_14*;

apt install postgresql-17-pg-duckdb;
apt install postgresql-16-pg-duckdb;
apt install postgresql-15-pg-duckdb;
apt install postgresql-14-pg-duckdb;

./pgsql.yml -t pg_ext -e '{"pg_extensions": ["pg_duckdb"]}' # -l <cls>

Preload this extension with:

shared_preload_libraries = 'pg_duckdb'; # add to pg cluster config

Create this extension with:

CREATE EXTENSION pg_duckdb;

pg_duckdb embeds DuckDB's columnar-vectorized analytics engine directly into PostgreSQL, enabling high-performance analytics within PostgreSQL while maintaining full SQL compatibility and ACID properties.

Query Acceleration

Force analytical queries to use DuckDB's vectorized execution engine for dramatic performance improvements.

-- Enable DuckDB execution for current session
SET duckdb.force_execution = true;

-- Analytical queries automatically use DuckDB engine
SELECT 
    product_category,
    AVG(price) as avg_price,
    COUNT(*) as total_items
FROM products 
GROUP BY product_category
ORDER BY avg_price DESC;

-- Check execution plan
EXPLAIN SELECT count(*) FROM large_table;
-- Shows: Custom Scan (DuckDBScan) with DuckDB execution plan

Data Lake Integration

Query cloud storage and modern data formats directly from PostgreSQL without ETL.

-- Read Parquet files from S3
SELECT customer_id, SUM(amount) as total_spent
FROM read_parquet('s3://data-lake/transactions/*.parquet')
WHERE purchase_date >= '2024-01-01'
GROUP BY customer_id;

-- Query CSV files with schema inference
SELECT * FROM read_csv('https://example.com/sales_data.csv')
WHERE region = 'North America';

-- Read JSON files with automatic flattening
SELECT event_type, COUNT(*) as event_count
FROM read_json('s3://events/user_events.json')
GROUP BY event_type;

-- Scan Apache Iceberg tables
SELECT * FROM iceberg_scan('s3://warehouse/orders/metadata.json')
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

-- Query Delta Lake tables
SELECT product_id, SUM(quantity) as total_sales
FROM delta_scan('s3://delta-lake/sales/')
GROUP BY product_id;

Hybrid Storage Tables

Create DuckDB-backed tables for analytical workloads with columnar storage and compression.

-- Create analytics table using DuckDB storage
CREATE TABLE sales_analytics (
    transaction_id BIGINT,
    product_id INT,
    customer_id INT,
    sale_date DATE,
    amount DECIMAL(10,2),
    quantity INT
) USING duckdb;

-- Insert data (stored in columnar format)
INSERT INTO sales_analytics 
SELECT id, product_id, customer_id, sale_date, price * quantity, quantity
FROM transactions
WHERE sale_date >= '2024-01-01';

-- High-performance analytical queries
SELECT 
    product_id,
    DATE_TRUNC('month', sale_date) as month,
    SUM(amount) as monthly_revenue,
    AVG(quantity) as avg_quantity
FROM sales_analytics
GROUP BY product_id, month
ORDER BY monthly_revenue DESC;

Cloud Storage Security

Manage secure credentials for cloud data access.

-- Create S3 credentials
SELECT duckdb.create_simple_secret(
    type := 'S3',
    key_id := 'AKIAIOSFODNN7EXAMPLE',
    secret := 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
);

-- Create Azure credentials
SELECT duckdb.create_simple_secret(
    type := 'AZURE',
    connection_string := 'DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey'
);

-- Use credential chain for AWS
CREATE SERVER s3_server FOREIGN DATA WRAPPER duckdb_fdw 
OPTIONS (type 'S3', provider 'credential_chain');

MotherDuck Cloud Integration

Connect to MotherDuck cloud analytics platform for distributed analytics.

-- Enable MotherDuck connection
CALL duckdb.enable_motherduck('your_token_here', 'production_db');

-- Create cloud-synchronized table
CREATE TABLE cloud_metrics(
    timestamp TIMESTAMP,
    metric_name TEXT,
    value DOUBLE,
    tags JSON
) USING duckdb;

-- Query across local and cloud data
SELECT 
    local.user_id,
    local.session_count,
    cloud.conversion_rate
FROM user_sessions local
JOIN ddb$shared_analytics.user_metrics cloud 
  ON local.user_id = cloud.user_id;

-- Force cloud synchronization
SELECT duckdb.force_motherduck_sync();

Extension Management

Leverage DuckDB's extension ecosystem for additional capabilities.

-- Install extensions (superuser required)
SELECT duckdb.install_extension('iceberg');
SELECT duckdb.install_extension('delta'); 
SELECT duckdb.install_extension('azure');

-- View installed extensions
SELECT name, loaded, installed 
FROM duckdb.extensions();

-- Extensions auto-load when functions are used
SELECT * FROM delta_scan('azure://container/delta-table/');

Direct DuckDB Queries

Execute raw DuckDB queries for advanced functionality.

-- Execute DuckDB query directly
SELECT duckdb.query('PRAGMA version');

-- Raw query with complex DuckDB syntax
SELECT duckdb.raw_query($$
    WITH monthly_sales AS (
        SELECT DATE_TRUNC('month', sale_date) as month, 
               SUM(amount) as total
        FROM sales_data 
        GROUP BY month
    )
    SELECT month, total,
           LAG(total) OVER (ORDER BY month) as prev_month,
           total - LAG(total) OVER (ORDER BY month) as growth
    FROM monthly_sales
$$);

-- Reset DuckDB instance
SELECT duckdb.recycle_ddb();

Configuration

Performance Tuning

-- Memory allocation (default: 80% of system memory)
SET duckdb.max_memory = '8GB';

-- Thread configuration
SET duckdb.threads = 8;

-- PostgreSQL scan optimization
SET duckdb.max_workers_per_postgres_scan = 4;

-- Disable specific execution for debugging
SET duckdb.force_execution = false;

Security Controls

-- Restrict access to specific PostgreSQL roles
SET duckdb.postgres_role = 'analytics_team';

-- Disable filesystem access
SET duckdb.disabled_filesystems = 'LocalFileSystem';

-- Control extension installation
SET duckdb.autoinstall_known_extensions = false;
SET duckdb.allow_community_extensions = false;

-- External access control
SET duckdb.enable_external_access = true;

Data Type Compatibility

PostgreSQL Type	DuckDB Support	Notes
INTEGER, BIGINT	Full	Direct mapping
REAL, DOUBLE PRECISION	Full	Direct mapping
TEXT, VARCHAR	Full	Direct mapping
BOOLEAN	Full	Direct mapping
UUID	Full	Direct mapping
JSON/JSONB	Full	JSONB converted to JSON
DATE, TIMESTAMP	Full	Microsecond precision
NUMERIC	Limited	May lose precision
ARRAYS (1D)	Full	Multi-dimensional not supported
ENUM	None	Not supported

Use Cases

Real-time Analytics: Accelerate analytical queries on transactional data by 10-100x
Data Lake Analytics: Query Parquet, Iceberg, and Delta Lake tables directly
Hybrid Workloads: Combine OLTP applications with high-performance analytics
Operational Dashboards: Build responsive dashboards on live PostgreSQL data
ETL Elimination: Remove data movement overhead for analytical workloads
Cloud Data Integration: Seamlessly access cloud storage from PostgreSQL

pg_duckdb transforms PostgreSQL into a powerful hybrid OLTP/OLAP database, enabling organizations to run analytical workloads at scale without sacrificing transactional consistency or requiring separate analytics infrastructure.