Monitoring Azure SQL with System Center 2012: Management Pack Best Practices
1) Deployment & discovery
- Use the official Azure SQL Database Management Pack MSI from Microsoft and the accompanying Operations Guide.
- Run discovery with the wizard using Azure Resource Manager (REST API) where possible; fall back to T‑SQL discovery only for legacy cases.
- Support multiple subscriptions and servers; create separate discoveries per subscription to limit blast radius.
2) Authentication & Run As
- Prefer Azure AD authentication (service principal) for REST API access. Use a least‑privilege service principal with Reader + monitoring roles.
- Configure Run As / Run As profiles securely and map them only to the management pack objects that need them.
- Store credentials in SCOM Run As accounts and test connectivity after import.
3) Metrics & polling strategy
- Use REST API collection for lightweight, reliable metric pulls; T‑SQL queries add deeper telemetry but increase load.
- Default poll intervals: 60–300s for critical health/availability; 300–900s for lower‑priority performance metrics to reduce API quota and SCOM load.
- Stagger discovery and collection schedules across agents to avoid bursts.
4) Thresholds & alert tuning
- Replace default thresholds with environment‑specific values. Configure separate warning/critical thresholds per database or pool when needed.
- Use overrides to:
- Exclude known noisy databases or maintenance windows.
- Disable per‑database file growth alerts if many DBs share the same drive (or monitor disk at OS level instead).
- Leverage alert suppression / dependency model for failover groups and elastic pools to avoid alert storms during planned maintenance.
5) Key monitors to enable
- Availability (server & database)
- DTU/CPU/worker/IO usage and percent thresholds
- Long‑running queries and maximum transaction time
- Failed connections, deadlocks, throttling counts
- Elastic pool and geo‑replication health
- Transaction log usage and growth events
6) Custom queries & app‑specific checks
- Use custom query support for application‑specific availability checks and business‑critical transactions.
- Add exclude lists (application/database/query text) to long‑running query rules to reduce noise.
7) Dashboards & runbooks
- Create SCOM dashboards focused on: availability, performance hotspots, elastic pool utilization, and replication status.
- Integrate alerts with runbooks/automation (Azure Automation / Logic Apps) for automated remediation of common issues (scale up, restart, failover).
8) Capacity planning & cost control
- Monitor CPU, memory, IO, and egress/ingress bandwidth trends for right‑sizing.
- Track elastic pool utilization to optimize DTU/vCore allocation and avoid unnecessary scale costs.
9) Security & governance
- Limit who can change management pack overrides and Run As accounts.
- Audit Run As profile usage and rotate service principal credentials regularly.
- Use least privilege access for monitoring service principals.
10) Maintenance & lifecycle
- Keep the management pack and Operations Guide updated (import updates from Microsoft).
- Test MP changes in a staging SCOM environment before production.
- Review overrides, suppression rules, and alert noise quarterly.
Quick table — Recommended defaults
| Area | Recommended setting |
|---|---|
| Discovery method | Azure Resource Manager (REST API) |
| Auth method | Azure AD service principal (least privilege) |
| Polling (critical) | 60–300 seconds |
| Polling (non‑critical) | 300–900 seconds |
| Alert tuning | Per‑DB thresholds + overrides/exclusions |
| Long‑running queries | Enable with app/db/query exclude lists |
| Update cadence | Quarterly review + apply MP updates from Microsoft |
Leave a Reply