The below diagram shows the deployment of the current monitoring infrastructure. The monitoring server is responsible to collect metrics and logs from different target hosts. This data is later used for visualization and alerting.
Figure 1: Flow of metrics and logs
The EFPF Monitoring infrastructure uses Prometheus for storing the metrics. The metrics related to the docker containers and services are exported to the Prometheus using cAdvisor . These metrics are stored in Prometheus as time series data. In addition to the cAdvisor, there are plenty of Prometheus exporters readily available to use depending on the components we deploy. These exporters can be found in the official Prometheus documention.
Grafana Loki is used for aggregation and storage of logs. The logs produced by different docker containers and services are exported to Loki using Vector. Vector is responsible for extraction, transformation and ingestion of the logs to the Loki server. The logs from multiple sources located at different locations can be exported to the Loki instance using vector. Vector needs to be installed in the source host machines and set up to export the data to Loki. As an alternative to vector, Promtail can be used as well.
Prometheus and Loki produce different alerts based on the alerting rules. The resultant alerts can be turn annoying if not controlled and managed. Prometheus Alertmanager ensures deduplication, aggregation and re-routing of these alerts before the actual notification.