Monitoring Setup

Complete guide for setting up monitoring and observability for the SBM CRM Platform.

Monitoring Stack

Components

Prometheus - Metrics collection
Grafana - Visualization and dashboards
Alertmanager - Alert management
ELK Stack - Log aggregation (optional)
Sentry - Error tracking

Prometheus Setup

Install Prometheus

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus

Configure Prometheus

Edit /opt/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'sbmcrm-api'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'postgresql'
    static_configs:
      - targets: ['localhost:9187']
  
  - job_name: 'redis'
    static_configs:
      - targets: ['localhost:9121']

Start Prometheus

# Create systemd service
sudo nano /etc/systemd/system/prometheus.service

Service file:

[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml
Restart=always

[Install]
WantedBy=multi-user.target

sudo systemctl start prometheus
sudo systemctl enable prometheus

Application Metrics

Expose Metrics Endpoint

// Express.js example
const promClient = require('prom-client');

// Create metrics registry
const register = new promClient.Registry();

// Default metrics
promClient.collectDefaultMetrics({ register });

// Custom metrics
const httpRequestDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status'],
  registers: [register]
});

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

Key Metrics to Track

Request Rate - Requests per second
Response Time - P50, P95, P99 latencies
Error Rate - 4xx and 5xx errors
Database Connections - Active connections
Cache Hit Rate - Redis cache performance
Queue Length - Message queue depth

Grafana Setup

Install Grafana

# Ubuntu/Debian
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install grafana

# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Configure Data Source

Access Grafana at http://localhost:3000
Login with admin/admin
Add Prometheus data source
URL: http://localhost:9090

Create Dashboards

Application Dashboard

Key panels:

Request rate (requests/second)
Response time (P50, P95, P99)
Error rate (%)
Active users
Database query time
Cache hit rate

Business Metrics Dashboard

New registrations
Points earned/redeemed
Campaign participation
Revenue metrics
Customer tier distribution

Alerting

Alertmanager Configuration

Edit /opt/alertmanager/alertmanager.yml:

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'web.hook'
receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://localhost:5001/'

Alert Rules

Create /opt/prometheus/alerts.yml:

groups:
- name: sbmcrm_alerts
  rules:
  - alert: HighErrorRate
    expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High error rate detected"
  
  - alert: HighResponseTime
    expr: histogram_quantile(0.95, http_request_duration_seconds) > 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High response time detected"
  
  - alert: DatabaseConnectionHigh
    expr: pg_stat_database_numbackends > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High database connections"

Logging

Application Logging

const winston = require('winston');

const logger = winston.createLogger({
  level: 'info',
  format: winston.format.json(),
  transports: [
    new winston.transports.File({ filename: 'error.log', level: 'error' }),
    new winston.transports.File({ filename: 'combined.log' })
  ]
});

// Use logger
logger.info('User registered', { userId: '12345' });
logger.error('Database connection failed', { error: err.message });

Log Aggregation (ELK Stack)

Optional: Set up ELK stack for centralized logging:

Install Elasticsearch
Install Logstash
Install Kibana
Configure log shipping

Error Tracking (Sentry)

Install Sentry SDK

npm install @sentry/node

Configure Sentry

const Sentry = require('@sentry/node');

Sentry.init({
  dsn: 'https://your-sentry-dsn@sentry.io/project-id',
  environment: 'production',
  tracesSampleRate: 0.1
});

// Capture exceptions
try {
  // Your code
} catch (error) {
  Sentry.captureException(error);
}

Health Checks

Application Health Endpoint

app.get('/health', async (req, res) => {
  const health = {
    status: 'healthy',
    timestamp: new Date().toISOString(),
    checks: {
      database: await checkDatabase(),
      redis: await checkRedis(),
      externalApis: await checkExternalApis()
    }
  };
  
  const isHealthy = Object.values(health.checks).every(check => check.status === 'ok');
  res.status(isHealthy ? 200 : 503).json(health);
});

Monitoring Health Checks

Set up uptime monitoring:

Pingdom
UptimeRobot
Custom health check endpoint monitoring

Performance Monitoring

APM Tools

Consider using Application Performance Monitoring tools:

New Relic
Datadog
AppDynamics

Database Monitoring

Monitor PostgreSQL:

Query performance
Connection pool usage
Slow queries
Table sizes

Cache Monitoring

Monitor Redis:

Memory usage
Hit rate
Eviction rate
Connection count

Best Practices

Set Up Alerts Early - Don't wait for issues
Monitor Business Metrics - Not just technical metrics
Regular Review - Review dashboards weekly
Document Runbooks - Clear procedures for alerts
Test Alerts - Ensure alerts work correctly
Retention Policy - Set appropriate retention
Cost Monitoring - Monitor cloud costs

Monitoring Stack​

Components​

Prometheus Setup​

Install Prometheus​

Configure Prometheus​

Start Prometheus​

Application Metrics​

Expose Metrics Endpoint​

Key Metrics to Track​

Grafana Setup​

Install Grafana​

Configure Data Source​

Create Dashboards​

Application Dashboard​

Business Metrics Dashboard​

Alerting​

Alertmanager Configuration​

Alert Rules​

Logging​

Application Logging​

Log Aggregation (ELK Stack)​

Error Tracking (Sentry)​

Install Sentry SDK​

Configure Sentry​

Health Checks​

Application Health Endpoint​

Monitoring Health Checks​

Performance Monitoring​

APM Tools​

Database Monitoring​

Cache Monitoring​

Best Practices​