pg_mooncake
Columnstore Table in Postgres
Overview
2440 | pg_mooncake | pg_mooncake | 0.1.2 | OLAP | MIT | C++ |
Attributes
Yes | No | No | No | Yes | No | No | - |
Packages
EL | PIGSTY | pg_mooncake_$v* | 0.1.2 | - | 17 16 15 14 13 |
Debian | PIGSTY | postgresql-$v-pg-mooncake | 0.1.2 | - | 17 16 15 14 13 |
Availability
PG17 | PG16 | PG15 | PG14 | PG13 | |
---|---|---|---|---|---|
el8.x86_64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
el8.aarch64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
el9.x86_64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
el9.aarch64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
d12.x86_64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
d12.aarch64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
u22.x86_64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
u22.aarch64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
u24.x86_64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
u24.aarch64 | 0.1.2 | 0.1.2 | 0.1.2 | 0.1.2 | ✗ |
Download
To add the required PGDG / PIGSTY upstream repository, use:
pig repo add pgdg -u # add PGDG repo and update cache (leave existing repos)
pig repo add pigsty -u # add PIGSTY repo and update cache (leave existing repos)
pig repo add pgsql -u # add PGDG + Pigsty repo and update cache (leave existing repos)
pig repo set all -u # set repo to all = NODE + PGSQL + INFRA (remove existing repos)
./node.yml -t node_repo -e node_repo_modules=node,pgsql # -l <cluster>
Or download the latest packages directly:
Install
Install this extension with:
pig ext install pg_mooncake; # install by extension name, for the current active PG version
pig ext install pg_mooncake -v 17; # install for PG 17
pig ext install pg_mooncake -v 16; # install for PG 16
pig ext install pg_mooncake -v 15; # install for PG 15
pig ext install pg_mooncake -v 14; # install for PG 14
dnf install pg_mooncake_17*;
dnf install pg_mooncake_16*;
dnf install pg_mooncake_15*;
dnf install pg_mooncake_14*;
apt install postgresql-17-pg-mooncake;
apt install postgresql-16-pg-mooncake;
apt install postgresql-15-pg-mooncake;
apt install postgresql-14-pg-mooncake;
./pgsql.yml -t pg_ext -e '{"pg_extensions": ["pg_mooncake"]}' # -l <cls>
Create this extension with:
CREATE EXTENSION pg_mooncake;
Usage
pg_mooncake creates real-time columnstore mirrors of PostgreSQL tables using Apache Iceberg format, enabling fast analytics queries with sub-second data freshness while maintaining full PostgreSQL compatibility.
Warning: This extension conflicts with
pg_duckdb
&duckdb_fdw
due to sharedlibduckdb.so
library usage.
Real-time Columnstore Mirrors
Create analytical mirrors of existing PostgreSQL tables that automatically sync data changes.
-- Create extension
CREATE EXTENSION pg_mooncake;
-- Create source table
CREATE TABLE sales_data(
id bigint PRIMARY KEY,
product_id int,
sale_date timestamp,
amount decimal(10,2),
region text
);
-- Create real-time columnstore mirror
CALL mooncake.create_table('sales_analytics', 'sales_data');
-- Insert data (automatically synced to mirror)
INSERT INTO sales_data VALUES
(1, 101, '2024-01-15 10:30:00', 1250.00, 'North'),
(2, 102, '2024-01-15 11:45:00', 890.50, 'South');
-- Query columnstore mirror for analytics
SELECT
region,
DATE_TRUNC('month', sale_date) as month,
SUM(amount) as total_sales,
AVG(amount) as avg_sale
FROM sales_analytics
GROUP BY region, month;
Direct Columnstore Tables
Create columnstore tables directly for analytical workloads.
-- Create columnstore table
CREATE TABLE user_activity(
user_id BIGINT,
activity_type TEXT,
activity_timestamp TIMESTAMP,
duration INT
) USING columnstore;
-- Insert data
INSERT INTO user_activity VALUES
(1, 'login', '2024-01-01 08:00:00', 120),
(2, 'page_view', '2024-01-01 08:05:00', 30),
(3, 'logout', '2024-01-01 08:30:00', 60);
-- Run analytical queries
SELECT
user_id,
activity_type,
SUM(duration) AS total_duration,
COUNT(*) AS activity_count
FROM user_activity
GROUP BY user_id, activity_type
ORDER BY user_id, activity_type;
Cloud Storage Integration
Configure pg_mooncake to use cloud storage backends like S3.
-- Create S3 credentials
SELECT mooncake.create_secret(
'my_s3_secret',
'S3',
'ACCESS_KEY_ID',
'SECRET_ACCESS_KEY',
'{"REGION": "us-west-2"}'
);
-- Set default S3 bucket
SET mooncake.default_bucket = 's3://my-analytics-bucket';
-- Disable local cache for cloud environments
SET mooncake.enable_local_cache = false;
-- Create columnstore table using S3 storage
CREATE TABLE metrics_data(
timestamp TIMESTAMP,
metric_name TEXT,
value DOUBLE PRECISION,
tags JSONB
) USING columnstore;
Multi-Format Data Loading
Load data from various file formats into columnstore tables.
-- Load from Parquet files
COPY metrics_data FROM 's3://data-lake/metrics/*.parquet';
-- Load from CSV files
COPY user_events FROM '/path/to/events.csv' WITH CSV HEADER;
-- Load from JSON files
COPY log_data FROM 's3://logs/application.json' WITH (FORMAT json);
External Data Lake Access
Query external Iceberg and Delta Lake tables directly.
-- Query external Iceberg table
SELECT COUNT(*) FROM iceberg_scan('s3://datalake/sales_iceberg/');
-- Query Delta Lake table
SELECT * FROM delta_scan('s3://datalake/user_events_delta/')
WHERE event_date >= '2024-01-01';
Hybrid Workloads
Combine transactional and analytical queries seamlessly.
-- Join columnstore table with regular PostgreSQL table
SELECT
t.symbol,
t.price,
c.company_name,
c.sector
FROM trades_analytics t
JOIN companies c ON t.symbol = c.symbol
WHERE t.trade_date >= CURRENT_DATE - INTERVAL '7 days'
AND t.volume > 1000000;
-- Mixed workload: OLTP inserts with OLAP queries
BEGIN;
INSERT INTO orders (customer_id, product_id, quantity, price)
VALUES (1001, 5502, 2, 29.99);
-- Immediate analytics on fresh data
SELECT
product_id,
SUM(quantity * price) as revenue,
COUNT(*) as order_count
FROM orders_analytics
WHERE order_date >= CURRENT_DATE
GROUP BY product_id;
COMMIT;
Performance Features
Query Acceleration
Queries on columnstore tables use DuckDB execution engine for high performance:
EXPLAIN SELECT
region,
AVG(amount) as avg_sale,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY amount) as p95
FROM sales_analytics
GROUP BY region;
-- Shows: Custom Scan (DuckDBScan) with optimized execution plan
Configuration Options
-- Set maximum memory usage (default: 1GB)
SET mooncake.max_memory = '4GB';
-- Configure thread count for parallel processing
SET mooncake.threads = 8;
-- Enable/disable local caching
SET mooncake.enable_local_cache = true;
-- Set data compression level
SET mooncake.compression = 'zstd';
Use Cases
- Real-time Analytics: Live dashboards and reports with sub-second data freshness
- Hybrid Workloads: Applications requiring both OLTP and OLAP capabilities
- Data Lake Integration: Bridge PostgreSQL with modern data lake architectures
- Performance Optimization: Accelerate analytical queries without application changes
- Operational Analytics: Real-time monitoring and alerting on transactional data
pg_mooncake provides PostgreSQL-native columnstore capabilities while maintaining compatibility with existing applications and enabling integration with modern data lake ecosystems through Apache Iceberg format.