Skip to main content

Monitoring Setup

Complete guide for setting up monitoring and observability for the SBM CRM Platform.

Monitoring Stack

Components

  • Prometheus - Metrics collection
  • Grafana - Visualization and dashboards
  • Alertmanager - Alert management
  • ELK Stack - Log aggregation (optional)
  • Sentry - Error tracking

Prometheus Setup

Install Prometheus

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus

Configure Prometheus

Edit /opt/prometheus/prometheus.yml:

global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: 'sbmcrm-api'
static_configs:
- targets: ['localhost:9090']

- job_name: 'postgresql'
static_configs:
- targets: ['localhost:9187']

- job_name: 'redis'
static_configs:
- targets: ['localhost:9121']

Start Prometheus

# Create systemd service
sudo nano /etc/systemd/system/prometheus.service

Service file:

[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml
Restart=always

[Install]
WantedBy=multi-user.target
sudo systemctl start prometheus
sudo systemctl enable prometheus

Application Metrics

Expose Metrics Endpoint

// Express.js example
const promClient = require('prom-client');

// Create metrics registry
const register = new promClient.Registry();

// Default metrics
promClient.collectDefaultMetrics({ register });

// Custom metrics
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status'],
registers: [register]
});

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});

Key Metrics to Track

  • Request Rate - Requests per second
  • Response Time - P50, P95, P99 latencies
  • Error Rate - 4xx and 5xx errors
  • Database Connections - Active connections
  • Cache Hit Rate - Redis cache performance
  • Queue Length - Message queue depth

Grafana Setup

Install Grafana

# Ubuntu/Debian
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install grafana

# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Configure Data Source

  1. Access Grafana at http://localhost:3000
  2. Login with admin/admin
  3. Add Prometheus data source
  4. URL: http://localhost:9090

Create Dashboards

Application Dashboard

Key panels:

  • Request rate (requests/second)
  • Response time (P50, P95, P99)
  • Error rate (%)
  • Active users
  • Database query time
  • Cache hit rate

Business Metrics Dashboard

  • New registrations
  • Points earned/redeemed
  • Campaign participation
  • Revenue metrics
  • Customer tier distribution

Alerting

Alertmanager Configuration

Edit /opt/alertmanager/alertmanager.yml:

route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:5001/'

Alert Rules

Create /opt/prometheus/alerts.yml:

groups:
- name: sbmcrm_alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"

- alert: HighResponseTime
expr: histogram_quantile(0.95, http_request_duration_seconds) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High response time detected"

- alert: DatabaseConnectionHigh
expr: pg_stat_database_numbackends > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High database connections"

Logging

Application Logging

const winston = require('winston');

const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' })
]
});

// Use logger
logger.info('User registered', { userId: '12345' });
logger.error('Database connection failed', { error: err.message });

Log Aggregation (ELK Stack)

Optional: Set up ELK stack for centralized logging:

  1. Install Elasticsearch
  2. Install Logstash
  3. Install Kibana
  4. Configure log shipping

Error Tracking (Sentry)

Install Sentry SDK

npm install @sentry/node

Configure Sentry

const Sentry = require('@sentry/node');

Sentry.init({
dsn: 'https://your-sentry-dsn@sentry.io/project-id',
environment: 'production',
tracesSampleRate: 0.1
});

// Capture exceptions
try {
// Your code
} catch (error) {
Sentry.captureException(error);
}

Health Checks

Application Health Endpoint

app.get('/health', async (req, res) => {
const health = {
status: 'healthy',
timestamp: new Date().toISOString(),
checks: {
database: await checkDatabase(),
redis: await checkRedis(),
externalApis: await checkExternalApis()
}
};

const isHealthy = Object.values(health.checks).every(check => check.status === 'ok');
res.status(isHealthy ? 200 : 503).json(health);
});

Monitoring Health Checks

Set up uptime monitoring:

  • Pingdom
  • UptimeRobot
  • Custom health check endpoint monitoring

Performance Monitoring

APM Tools

Consider using Application Performance Monitoring tools:

  • New Relic
  • Datadog
  • AppDynamics

Database Monitoring

Monitor PostgreSQL:

  • Query performance
  • Connection pool usage
  • Slow queries
  • Table sizes

Cache Monitoring

Monitor Redis:

  • Memory usage
  • Hit rate
  • Eviction rate
  • Connection count

Best Practices

  1. Set Up Alerts Early - Don't wait for issues
  2. Monitor Business Metrics - Not just technical metrics
  3. Regular Review - Review dashboards weekly
  4. Document Runbooks - Clear procedures for alerts
  5. Test Alerts - Ensure alerts work correctly
  6. Retention Policy - Set appropriate retention
  7. Cost Monitoring - Monitor cloud costs