MI Architecture

The below diagram shows the deployment of the current monitoring infrastructure. The monitoring server is responsible to collect metrics and logs from different target hosts. This data is later used for visualization and alerting.

Monitoring Infrastructure Figure 1: Flow of metrics and logs

Runtime Metrics

The EFPF Monitoring infrastructure uses Prometheus for storing the metrics. The metrics related to the docker containers and services are exported to the Prometheus using cAdvisor . These metrics are stored in Prometheus as time series data. In addition to the cAdvisor, there are plenty of Prometheus exporters readily available to use depending on the components we deploy. These exporters can be found in the official Prometheus documention.

Different metrics are visualized using Grafana dashboards. Grafana has built in data source for Prometheus.

Distributed Logging

Grafana Loki is used for aggregation and storage of logs. The logs produced by different docker containers and services are exported to Loki using Vector. Vector is responsible for extraction, transformation and ingestion of the logs to the Loki server. The logs from multiple sources located at different locations can be exported to the Loki instance using vector. Vector needs to be installed in the source host machines and set up to export the data to Loki. As an alternative to vector, Promtail can be used as well.

Grafana interacts with the Loki to query the logs and statistics related to the collected logs. Loki can be explored using the explore functionality provided by the Loki data source for Grafana. Loki

Alerting

Prometheus and Loki produce different alerts based on the alerting rules. The resultant alerts can be turn annoying if not controlled and managed. Prometheus Alertmanager ensures deduplication, aggregation and re-routing of these alerts before the actual notification.

Previous
Next