Observability Infra
Modern observability stack with 3000+ metrics, 30+ dashboards, and enterprise-grade monitoring
Pigsty delivers unparalleled observability with a modern monitoring stack built on industry best practices. Automatically monitor every component with 3000+ metrics, 30+ dashboards.
Complete Insight: Monitor everything from high-level cluster health to individual table statistics. Get complete insight into the past, present, and future of your infrastructure.
Architecture Overview
Pigsty's observability infrastructure leverages battle-tested open-source components in a cohesive, production-ready stack:
Grafana Visualization Engine
Dashboards with advanced interactive visualizations
Prometheus Metrics Database
Time-series storage with powerful query language
Loki Logging Platform
Centralized logging with label-based indexing
AlertManager
Alert aggregate, management and escalation
Service Architecture
graph TB
subgraph "Observability Stack"
Grafana[Grafana :3000]
Prometheus[Prometheus :9090]
Loki[Loki :3100]
AlertManager[AlertManager :9093]
Pushgateway[Pushgateway :9091]
Blackbox[Blackbox :9115]
end
subgraph "Data Sources"
PG[(PostgreSQL)]
Node[Node Metrics]
Redis[(Redis)]
MinIO[(MinIO)]
end
subgraph "Exporters"
PGExp[pg_exporter]
NodeExp[node_exporter]
RedisExp[redis_exporter]
MinIOExp[minio_exporter]
end
PG --> PGExp
Node --> NodeExp
Redis --> RedisExp
MinIO --> MinIOExp
PGExp --> Prometheus
NodeExp --> Prometheus
RedisExp --> Prometheus
MinIOExp --> Prometheus
Prometheus --> Grafana
Prometheus --> AlertManager
Loki --> Grafana
Monitoring Dashboards
Multi-Level Dashboard Hierarchy
Pigsty provides 26+ PostgreSQL dashboards organized in a logical drill-down hierarchy:
# Global overview dashboards
dashboards:
- Home: Global cluster overview and key metrics
- INFRA: Infrastructure services status
- NODES: Node-level resource utilization
- Alert: Active alerts and notification status
Purpose: High-level operational visibility across entire environment Audience: Operations teams, management dashboards
# Cluster-focused dashboards
dashboards:
- PGSQL Cluster: Cluster health and replication status
- PGSQL Service: Service endpoints and load balancing
- PGSQL Activity: Connection pools and query activity
- PGSQL Replication: Streaming replication metrics
Purpose: Cluster-wide PostgreSQL performance and health Audience: Database administrators, SRE teams
# Instance-specific dashboards
dashboards:
- PGSQL Instance: Detailed PostgreSQL server metrics
- PGSQL Persist: WAL, checkpoints, and persistence
- PGSQL Proxy: Pgbouncer connection pooling metrics
- PGSQL Session: Active sessions and lock analysis
Purpose: Deep-dive into individual PostgreSQL instances Audience: Database developers, performance engineers
# Database and object-level dashboards
dashboards:
- PGSQL Database: Database-specific performance metrics
- PGSQL Table: Table statistics and access patterns
- PGSQL Query: Query performance and optimization
- PGSQL Slow: Slow query analysis and tuning
Purpose: Application-level database performance analysis Audience: Application developers, database analysts
Dashboard Features
Drill-Down Navigation
Seamless exploration from overview to granular details with contextual linking
Time Range Controls
Flexible time windows from real-time to historical analysis over months
Multi-Dimensional Filtering
Dynamic filtering by cluster, instance, database, or custom labels
Alert Integration
Visual alert correlation with metrics and direct links to alert details
Grafana Deployment
Enhanced Grafana Stack
Pigsty extends Grafana with powerful plugins and data sources for advanced analytics:
# Essential Grafana plugins
grafana_plugins:
- grafana-piechart-panel # Pie chart visualizations
- grafana-polystat-panel # Multi-value status panels
- grafana-worldmap-panel # Geographic visualizations
- grafana-clock-panel # Time display widgets
Purpose: Essential visualization capabilities for monitoring dashboards
# Advanced visualization plugins
grafana_plugins:
- echarts-panel # Apache ECharts integration
- volkovlabs-echarts-panel # Enhanced ECharts support
- volkovlabs-form-panel # Interactive forms
- volkovlabs-variable-panel # Dynamic variables
Purpose: Rich, interactive visualizations for complex data analysis
# Extended data source support
grafana_datasources:
- infinity-datasource # REST API and file data sources
- redis-datasource # Redis data source
- clickhouse-datasource # ClickHouse integration
- postgres-datasource # Enhanced PostgreSQL support
Purpose: Connect to diverse data sources beyond traditional metrics
# Pigsty-specific customizations
custom_features:
- pigsty-theme # Custom branding and colors
- dashboard-provisioning # Automated dashboard deployment
- alert-templates # Pre-configured alert rules
- data-link-automation # Context-aware navigation
Purpose: Tailored user experience optimized for PostgreSQL environments
Configuration & Customization
# Advanced Grafana configuration
grafana_config:
# Authentication
auth.anonymous.enabled: true
auth.anonymous.org_role: Viewer
auth.disable_login_form: false
# Security
security.allow_embedding: true
security.cookie_secure: true
security.cookie_samesite: strict
# Performance
database.max_open_conn: 300
database.max_idle_conn: 300
database.conn_max_lifetime: 14400
# Alerting
alerting.enabled: true
alerting.execute_alerts: true
unified_alerting.enabled: true
# Custom panels
panels.enable_alpha: true
feature_toggles.enable: ngalert,live,publicDashboards
Prometheus Stack
Complete Monitoring Ecosystem
Pigsty deploys the full Prometheus ecosystem for comprehensive observability:
Prometheus Server
Core metrics database with advanced querying and storage capabilities
# Prometheus configuration highlights
prometheus_config:
global:
scrape_interval: 15s # Default scrape frequency
evaluation_interval: 15s # Rule evaluation frequency
external_labels:
cluster: '{{ pg_cluster }}'
rule_files:
- "/etc/prometheus/rules/*.yml"
scrape_configs:
- job_name: 'node' # Node-level metrics
- job_name: 'postgres' # PostgreSQL metrics
- job_name: 'redis' # Redis metrics
- job_name: 'pushgateway' # Batch job metrics
AlertManager
Intelligent alert routing with suppression, grouping, and escalation
# AlertManager routing configuration
alertmanager_routes:
- match:
severity: critical
receiver: pagerduty-critical
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
- match:
severity: warning
receiver: slack-warnings
group_wait: 1m
group_interval: 10m
repeat_interval: 24h
Pushgateway
Batch job metrics collection for ephemeral workloads and cron jobs
# Example: Backup job metrics
echo "backup_duration_seconds $(date +%s)" | curl --data-binary @- \
http://pushgateway:9091/metrics/job/pg-backup/instance/pg-test
Blackbox Exporter
Network connectivity monitoring with HTTP, TCP, and ICMP probes
# Blackbox probe configuration
blackbox_probes:
http_2xx:
prober: http
timeout: 5s
http:
valid_status_codes: [200]
tcp_connect:
prober: tcp
timeout: 5s
Pre-configured Alert Rules
# Sample PostgreSQL alert rules
alert_rules:
- alert: PostgreSQLDown
expr: pg_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "PostgreSQL instance {{ $labels.instance }} is down"
- alert: PostgreSQLHighConnections
expr: pg_stat_database_numbackends / pg_settings_max_connections > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "High connection usage on {{ $labels.instance }}"
- alert: PostgreSQLReplicationLag
expr: pg_replication_lag_seconds > 300
for: 5m
labels:
severity: warning
annotations:
summary: "Replication lag > 5 minutes on {{ $labels.instance }}"
pg_exporter: Advanced PostgreSQL Monitoring
Custom Metrics Engine
Pigsty's pg_exporter is a highly customizable PostgreSQL metrics collector supporting all PostgreSQL versions with fine-grained metric control:
# pg_exporter key capabilities
features:
- auto_discovery: true # Automatic database discovery
- custom_queries: true # User-defined metric queries
- version_aware: true # PostgreSQL version detection
- rds_compatible: true # Cloud database support
- label_customization: true # Flexible metric labeling
- connection_pooling: true # Efficient connection reuse
Advantages: Flexible, lightweight, and highly configurable
# PostgreSQL version support matrix
supported_versions:
- postgresql_9_6: legacy_metrics_set
- postgresql_10: enhanced_metrics_set
- postgresql_11: advanced_metrics_set
- postgresql_12: modern_metrics_set
- postgresql_13: extended_metrics_set
- postgresql_14: latest_metrics_set
- postgresql_15: cutting_edge_metrics_set
- postgresql_16: next_gen_metrics_set
Benefit: Single exporter for heterogeneous PostgreSQL environments
# Custom metric definition example
custom_queries:
pg_custom_business_metrics:
query: |
SELECT
schemaname,
tablename,
n_tup_ins as inserts_total,
n_tup_upd as updates_total,
n_tup_del as deletes_total
FROM pg_stat_user_tables
metrics:
- inserts_total:
usage: COUNTER
description: "Total number of inserts"
- updates_total:
usage: COUNTER
description: "Total number of updates"
# RDS monitoring configuration
rds_monitoring:
connection_string: "postgres://monitor:password@rds.region.rds.amazonaws.com:5432/postgres"
metrics_subset: rds_safe # RDS-compatible metrics only
auto_discovery: false # Manual database specification
query_timeout: 30s # Conservative timeout
# RDS-specific metrics
included_databases: [production, staging]
excluded_schemas: [information_schema, pg_catalog]
Metrics Configuration
# Comprehensive pg_exporter configuration
pg_exporter_config:
# Connection settings
data_source_name: "postgres://dbuser_monitor:password@localhost:5432/postgres"
# Metric collection
auto_discover_databases: true
exclude_databases: [template0, template1]
include_databases: [postgres, business_db]
# Query customization
query_path: "/etc/pg_exporter/queries"
metric_prefix: "pg"
# Performance tuning
parallel_scrape: true
scrape_timeout: 30s
max_connections: 3
# Security
ssl_mode: require
ssl_cert: "/etc/ssl/pg-client.crt"
ssl_key: "/etc/ssl/pg-client.key"
Host & Infrastructure Monitoring
Node-Level Observability
Comprehensive host monitoring with node_exporter and system-level metrics:
System Resources
CPU, Memory, Disk, Network - Complete hardware utilization tracking
Process Monitoring
Process trees, file descriptors - Detailed process-level insights
Network Stack
TCP connections, socket stats - Network performance analysis
Storage Analysis
I/O patterns, filesystem metrics - Storage performance optimization
Multi-Service Monitoring
# Complete service monitoring matrix
monitored_services:
infrastructure:
- haproxy_exporter:9101 # Load balancer metrics
- nginx_exporter:9113 # Web server performance
- etcd_exporter:2379 # Cluster coordination
databases:
- postgres_exporter:9630 # PostgreSQL primary metrics
- pgbouncer_exporter:9631 # Connection pool metrics
- redis_exporter:9121 # Redis cache performance
- minio_exporter:9000 # Object storage metrics
system:
- node_exporter:9100 # Host system metrics
- process_exporter:9256 # Process-level monitoring
- blackbox_exporter:9115 # Connectivity probes
External Database Monitoring
Cloud RDS Integration
Monitor Amazon RDS, Google Cloud SQL, and Azure Database with minimal configuration:
# AWS RDS monitoring setup
aws_rds_monitoring:
connection_method: iam_auth # IAM database authentication
endpoint: mydb.region.rds.amazonaws.com
port: 5432
# Required RDS setup
setup_commands: |
CREATE USER dbuser_monitor WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE postgres TO dbuser_monitor;
GRANT pg_monitor TO dbuser_monitor;
# Monitoring configuration
metrics_collection:
- basic_stats: true
- query_performance: true
- connection_stats: true
- replication_stats: false # Not available in RDS
# Google Cloud SQL monitoring
gcp_cloudsql_monitoring:
connection_method: ssl_cert # SSL certificate authentication
endpoint: project:region:instance
# Cloud SQL specific configuration
cloud_sql_proxy: true
proxy_port: 5432
monitoring_scope:
- performance_insights: true
- query_insights: true
- connection_monitoring: true
# Azure Database for PostgreSQL
azure_postgres_monitoring:
connection_method: active_directory
endpoint: myserver.postgres.database.azure.com
# Azure-specific features
azure_monitor_integration: true
diagnostic_settings: enabled
metrics_subset: azure_compatible
# Generic external PostgreSQL
external_postgres:
connection_string: "postgres://monitor:pass@external-host:5432/postgres"
# Minimal required permissions
required_grants: |
GRANT CONNECT ON DATABASE postgres TO dbuser_monitor;
GRANT pg_monitor TO dbuser_monitor;
# Safe metric collection
metrics_mode: read_only
custom_queries: disabled
admin_functions: false
Data Analytics & Visualization Platform
Beyond Traditional Monitoring
Pigsty's observability stack doubles as a powerful data analytics platform:
Business Intelligence
Custom dashboards for business metrics and KPI visualization
Application Analytics
User behavior tracking and application performance insights
Operational Intelligence
Trend analysis and capacity planning with predictive capabilities
Data Exploration
Ad-hoc querying and interactive data exploration tools
Advanced Analytics Features
# Advanced Grafana analytics capabilities
analytics_features:
data_sources:
- prometheus: time_series_analysis
- postgres: business_data_queries
- infinity: rest_api_integration
- csv: file_based_data_import
visualization_types:
- echarts: interactive_charts
- worldmap: geographic_analysis
- heatmap: correlation_analysis
- table: detailed_data_views
interactive_features:
- variable_templating: dynamic_filtering
- drill_down_links: contextual_navigation
- annotation_support: event_correlation
- alert_integration: automated_notifications
Custom Business Dashboards
-- Example: Business metrics integration
SELECT
date_trunc('hour', created_at) as hour,
COUNT(*) as orders_per_hour,
SUM(total_amount) as revenue_per_hour,
AVG(total_amount) as avg_order_value
FROM orders
WHERE created_at >= NOW() - INTERVAL '24 hours'
GROUP BY date_trunc('hour', created_at)
ORDER BY hour;
Low-Code Application Development
Grafana as Development Platform
Transform Grafana into a low-code application platform for operational tools:
// Example: Database maintenance form
{
"type": "volkovlabs-form-panel",
"title": "Database Maintenance",
"targets": [
{
"datasource": "PostgreSQL",
"query": "SELECT pg_size_pretty(pg_database_size($database))",
"variables": {
"database": "${form.database}"
}
}
],
"form": {
"elements": [
{
"type": "select",
"name": "database",
"options": ["production", "staging", "development"]
},
{
"type": "button",
"name": "vacuum",
"action": "VACUUM ANALYZE ${database}"
}
]
}
}
# Dynamic dashboard generation
dynamic_dashboards:
template_based: true
variable_driven: true
features:
- auto_refresh: 30s
- conditional_panels: true
- user_personalization: true
- responsive_layout: true
data_sources:
- prometheus: infrastructure_metrics
- postgres: application_data
- api: external_integrations
# Workflow automation integration
workflow_features:
alerting_actions:
- webhook_triggers: true
- automated_remediation: true
- escalation_policies: true
operational_tasks:
- scheduled_maintenance: true
- capacity_planning: true
- performance_tuning: true
// Example: REST API integration
const infinityQuery = {
datasource: "infinity",
type: "json",
url: "https://api.internal.com/metrics",
parser: "json",
columns: [
{ selector: "data.response_time", type: "number" },
{ selector: "data.error_rate", type: "number" },
{ selector: "timestamp", type: "time" }
]
};
Reusable Infrastructure
Modular Deployment
Pigsty's observability infrastructure is designed for reusability and composability:
Standalone Deployment
Independent monitoring - Deploy observability stack without PostgreSQL
Multi-Tenant Support
Isolated environments - Multiple teams sharing single monitoring infrastructure
Hybrid Integration
Existing systems - Integrate with current monitoring solutions
Cloud Agnostic
Any environment - On-premises, cloud, or hybrid deployments
Configuration Templates
# Reusable monitoring templates
monitoring_templates:
minimal:
components: [prometheus, grafana]
resource_usage: low
use_case: development
standard:
components: [prometheus, grafana, alertmanager, loki]
resource_usage: medium
use_case: production
enterprise:
components: [prometheus, grafana, alertmanager, loki, pushgateway, blackbox]
resource_usage: high
use_case: large_scale_production
ha_enabled: true
retention: 90d
cloud_native:
components: [prometheus_operator, grafana_operator]
deployment: kubernetes
scaling: horizontal
storage: persistent_volumes
Integration Patterns
# Common integration scenarios
integration_patterns:
existing_prometheus:
federation: true
data_source: remote_prometheus
dashboard_import: true
existing_grafana:
datasource_provisioning: true
dashboard_provisioning: true
alert_rules_import: true
cloud_monitoring:
cloudwatch_integration: true
stackdriver_integration: true
azure_monitor_integration: true
hybrid_deployment:
on_premise_prometheus: true
cloud_grafana: true
secure_tunneling: true
Best Practices
Performance Optimization
Resource Planning: Monitoring infrastructure can consume significant resources. Plan accordingly for metrics retention and query performance.
- Metric Cardinality: Limit high-cardinality labels to prevent storage explosion
- Retention Policies: Balance historical data needs with storage costs
- Query Optimization: Use recording rules for frequently accessed metrics
- Dashboard Performance: Optimize panel queries and time ranges
Security Considerations
# Security best practices
security_config:
authentication:
- ldap_integration: true
- oauth_providers: [github, google, okta]
- api_key_rotation: automated
authorization:
- role_based_access: true
- team_based_folders: true
- dashboard_permissions: granular
network_security:
- tls_encryption: enforced
- firewall_rules: restrictive
- reverse_proxy: nginx
Operational Guidelines
- Regular Backups: Backup Grafana configurations and Prometheus data
- Monitoring the Monitors: Set up meta-monitoring for observability stack
- Capacity Planning: Monitor storage growth and plan upgrades
- Documentation: Maintain runbooks for common operational tasks
Limitations & Considerations
Scalability Boundaries
- Metrics Volume: High-cardinality metrics can impact performance
- Dashboard Complexity: Overly complex dashboards may have slow load times
- Long-term Storage: Consider external storage for long-term metrics retention
- Network Bandwidth: Factor in metrics collection network overhead
Planning Considerations
- Resource Requirements: Monitoring stack requires dedicated resources
- Backup Strategy: Plan for configuration and data backup/restore
- High Availability: Consider HA deployment for critical environments
- Integration Complexity: Plan integration with existing monitoring systems
Pigsty's observability infrastructure transforms monitoring from a reactive afterthought into a proactive operational advantage. With comprehensive metrics, intuitive dashboards, and flexible architecture, you gain deep insights into every aspect of your database infrastructure.