Monitoring Setup
Complete guide for setting up monitoring and observability for the SBM CRM Platform.
Monitoring Stack
Components
- Prometheus - Metrics collection
- Grafana - Visualization and dashboards
- Alertmanager - Alert management
- ELK Stack - Log aggregation (optional)
- Sentry - Error tracking
Prometheus Setup
Install Prometheus
# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
sudo mv prometheus-2.45.0.linux-amd64 /opt/prometheus
Configure Prometheus
Edit /opt/prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'sbmcrm-api'
static_configs:
- targets: ['localhost:9090']
- job_name: 'postgresql'
static_configs:
- targets: ['localhost:9187']
- job_name: 'redis'
static_configs:
- targets: ['localhost:9121']
Start Prometheus
# Create systemd service
sudo nano /etc/systemd/system/prometheus.service
Service file:
[Unit]
Description=Prometheus
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml
Restart=always
[Install]
WantedBy=multi-user.target
sudo systemctl start prometheus
sudo systemctl enable prometheus
Application Metrics
Expose Metrics Endpoint
// Express.js example
const promClient = require('prom-client');
// Create metrics registry
const register = new promClient.Registry();
// Default metrics
promClient.collectDefaultMetrics({ register });
// Custom metrics
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status'],
registers: [register]
});
// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
Key Metrics to Track
- Request Rate - Requests per second
- Response Time - P50, P95, P99 latencies
- Error Rate - 4xx and 5xx errors
- Database Connections - Active connections
- Cache Hit Rate - Redis cache performance
- Queue Length - Message queue depth
Grafana Setup
Install Grafana
# Ubuntu/Debian
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo apt-get update
sudo apt-get install grafana
# Start Grafana
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
Configure Data Source
- Access Grafana at
http://localhost:3000 - Login with admin/admin
- Add Prometheus data source
- URL:
http://localhost:9090
Create Dashboards
Application Dashboard
Key panels:
- Request rate (requests/second)
- Response time (P50, P95, P99)
- Error rate (%)
- Active users
- Database query time
- Cache hit rate
Business Metrics Dashboard
- New registrations
- Points earned/redeemed
- Campaign participation
- Revenue metrics
- Customer tier distribution
Alerting
Alertmanager Configuration
Edit /opt/alertmanager/alertmanager.yml:
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://localhost:5001/'
Alert Rules
Create /opt/prometheus/alerts.yml:
groups:
- name: sbmcrm_alerts
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
- alert: HighResponseTime
expr: histogram_quantile(0.95, http_request_duration_seconds) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High response time detected"
- alert: DatabaseConnectionHigh
expr: pg_stat_database_numbackends > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High database connections"
Logging
Application Logging
const winston = require('winston');
const logger = winston.createLogger({
level: 'info',
format: winston.format.json(),
transports: [
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' })
]
});
// Use logger
logger.info('User registered', { userId: '12345' });
logger.error('Database connection failed', { error: err.message });
Log Aggregation (ELK Stack)
Optional: Set up ELK stack for centralized logging:
- Install Elasticsearch
- Install Logstash
- Install Kibana
- Configure log shipping
Error Tracking (Sentry)
Install Sentry SDK
npm install @sentry/node
Configure Sentry
const Sentry = require('@sentry/node');
Sentry.init({
dsn: 'https://your-sentry-dsn@sentry.io/project-id',
environment: 'production',
tracesSampleRate: 0.1
});
// Capture exceptions
try {
// Your code
} catch (error) {
Sentry.captureException(error);
}
Health Checks
Application Health Endpoint
app.get('/health', async (req, res) => {
const health = {
status: 'healthy',
timestamp: new Date().toISOString(),
checks: {
database: await checkDatabase(),
redis: await checkRedis(),
externalApis: await checkExternalApis()
}
};
const isHealthy = Object.values(health.checks).every(check => check.status === 'ok');
res.status(isHealthy ? 200 : 503).json(health);
});
Monitoring Health Checks
Set up uptime monitoring:
- Pingdom
- UptimeRobot
- Custom health check endpoint monitoring
Performance Monitoring
APM Tools
Consider using Application Performance Monitoring tools:
- New Relic
- Datadog
- AppDynamics
Database Monitoring
Monitor PostgreSQL:
- Query performance
- Connection pool usage
- Slow queries
- Table sizes
Cache Monitoring
Monitor Redis:
- Memory usage
- Hit rate
- Eviction rate
- Connection count
Best Practices
- Set Up Alerts Early - Don't wait for issues
- Monitor Business Metrics - Not just technical metrics
- Regular Review - Review dashboards weekly
- Document Runbooks - Clear procedures for alerts
- Test Alerts - Ensure alerts work correctly
- Retention Policy - Set appropriate retention
- Cost Monitoring - Monitor cloud costs