Monitoring
Overview
AyushBridge provides comprehensive monitoring capabilities to ensure system reliability, performance, and compliance. The monitoring system tracks health status, usage metrics, security events, and performance indicators in real-time.
Health Checks
Health checks provide immediate status information about system components and external dependencies.
Endpoint
GET /health
Response
{
"status": "healthy",
"timestamp": "2025-09-30T10:30:00Z",
"version": "1.0.0",
"services": {
"database": {
"status": "healthy",
"responseTime": "45ms"
},
"terminology-service": {
"status": "healthy",
"responseTime": "120ms"
},
"authentication": {
"status": "healthy",
"responseTime": "30ms"
}
}
}
Health Check Types
- Readiness: System ready to accept traffic
- Liveness: System functioning correctly
- Dependency: External service availability
- Custom: Application-specific health indicators
Metrics & Analytics
Comprehensive metrics collection for operational insights and performance optimization.
API Metrics
- Request Count: Total API calls by endpoint
- Response Times: Average, 95th, 99th percentile latency
- Error Rates: HTTP status code distribution
- Throughput: Requests per second
Terminology Metrics
- Code Validation Success Rate: Percentage of successful validations
- Translation Accuracy: Mapping confidence scores
- Cache Hit Rate: Terminology lookup efficiency
- Update Frequency: Code system synchronization status
Usage Analytics
{
"period": "2025-09-30",
"api_calls": {
"total": 15420,
"by_endpoint": {
"/CodeSystem/$validate-code": 4520,
"/ConceptMap/$translate": 3890,
"/Patient": 2340
}
},
"terminology_usage": {
"namaste_codes_validated": 4520,
"icd11_translations": 3890,
"cache_hit_rate": 0.87
},
"performance": {
"avg_response_time": "145ms",
"p95_response_time": "320ms",
"error_rate": "0.02"
}
}
Audit Trails
Complete audit logging for compliance, security, and troubleshooting.
AuditEvent Resource
{
"resourceType": "AuditEvent",
"type": {
"system": "http://terminology.hl7.org/CodeSystem/audit-event-type",
"code": "rest",
"display": "Restful Operation"
},
"subtype": [
{
"system": "http://hl7.org/fhir/restful-interaction",
"code": "create",
"display": "create"
}
],
"action": "C",
"recorded": "2025-09-30T10:30:00Z",
"outcome": "0",
"agent": [
{
"type": {
"coding": [
{
"system": "http://terminology.hl7.org/CodeSystem/extra-security-role-type",
"code": "humanuser",
"display": "human user"
}
]
},
"who": {
"identifier": {
"system": "https://abha.in",
"value": "12-3456-7890-1234"
}
},
"requestor": true
}
],
"source": {
"site": "AyushBridge Server",
"observer": {
"identifier": {
"system": "https://ayushbridge.in",
"value": "server-01"
}
}
},
"entity": [
{
"what": {
"reference": "Patient/example"
},
"type": {
"system": "http://terminology.hl7.org/CodeSystem/audit-entity-type",
"code": "1",
"display": "Person"
},
"role": {
"system": "http://terminology.hl7.org/CodeSystem/object-role",
"code": "1",
"display": "Patient"
}
}
]
}
Audit Categories
- Authentication Events: Login, logout, token refresh
- Authorization Events: Access control decisions
- Data Access: Read, create, update, delete operations
- Terminology Operations: Code validation, translation requests
- Administrative Actions: Configuration changes, user management
Performance Monitoring
Real-time performance tracking and optimization insights.
Key Performance Indicators
- API Response Time: End-to-end request processing
- Database Query Performance: SQL execution times
- Memory Usage: Heap and non-heap utilization
- CPU Utilization: Core and thread-level metrics
- Network I/O: Bandwidth and connection metrics
Performance Dashboards
- Real-time Metrics: Current system status
- Historical Trends: Performance over time
- Comparative Analysis: Performance across deployments
- Anomaly Detection: Automated performance issue identification
Optimization Recommendations
- Query Optimization: Index recommendations
- Caching Strategies: Cache hit rate improvements
- Resource Scaling: Auto-scaling triggers
- Code Profiling: Performance bottleneck identification
Alerting
Automated alerting for critical system events and performance issues.
Alert Types
- System Health: Service availability issues
- Performance: Response time degradation
- Security: Authentication failures, suspicious activity
- Capacity: Resource utilization thresholds
- Data Quality: Terminology synchronization failures
Alert Channels
- Email Notifications: Critical alerts to administrators
- SMS Alerts: High-priority system issues
- Webhook Integration: Custom alerting systems
- Dashboard Alerts: Real-time UI notifications
Alert Configuration
{
"alerts": [
{
"name": "High Error Rate",
"condition": "error_rate > 0.05",
"severity": "critical",
"channels": ["email", "sms"],
"cooldown": "300s"
},
{
"name": "Slow Response Time",
"condition": "p95_response_time > 5000ms",
"severity": "warning",
"channels": ["webhook"],
"cooldown": "600s"
}
]
}
Escalation Policies
- Immediate Response: Critical system failures
- Scheduled Review: Performance degradation
- Weekly Reports: General health summaries
- Monthly Audits: Compliance and security reviews