metrics



An ansible role which configures performance analysis services for the managed host. This (optionally) includes a list of remote systems to be monitored by the managed host.

Requirements

Performance Co-Pilot (PCP) v5+. All of the packages are available from the standard repositories on Fedora, CentOS 8, and RHEL 8. On RHEL 7 and RHEL 6, you will need to enable the Optional repository/channel on the managed host.

The role can optionally use Grafana v6+ (metrics_graph_service) and Valkey (metrics_query_service) on Fedora, CentOS 10, RHEL 10 and later, or Redis v5+ (metrics_query_service) on CentOS 8 or 9, RHEL 8 or 9.

Collection requirements

The role requires the firewall role and the selinux role from the fedora.linux_system_roles collection, if metrics_manage_firewall and metrics_manage_selinux is set to true, respectively. (Please see also the variables in the Role Variables section.)

If the metrics is a role from the fedora.linux_system_roles collection or from the Fedora RPM package, the requirement is already satisfied.

The role requires additional collections to manage rpm-ostree systems. If you need to manage rpm-ostree systems, run the below command to install the collections.

ansible-galaxy collection install -r meta/collection-requirements.yml

Role Variables

metrics_monitored_hosts: []

List of remote hosts to be analysed by the managed host. These hosts will have metrics recorded on the managed host, so care should be taken to ensure sufficient disk space exists below /var/log for each host.

Example:

metrics_monitored_hosts: ["webserver.example.com", "database.example.com"]

metrics_webhook_endpoint: ''

Webhook endpoint (URL) where notification about any automatically detected performance issues are to be sent. By default, these events are logged to the local system log only.

metrics_retention_days: 14

Retain historical performance data for the specified number of days; after this time it will be removed (day by day).

metrics_graph_service: false

Boolean flag allowing host to be setup with graphing services. Enabling this starts PCP and Grafana servers for visualizing PCP metrics. This option requires Grafana v6+ which is available on Fedora, CentOS 8, RHEL 8, or later versions of these platforms.

metrics_query_service: false

Boolean flag allowing host to be setup with time series query services. Enabling this starts PCP and Valkey or Redis servers for querying any recorded PCP metrics. This option requires either Valkey or Redis v5+ which is available on Fedora, CentOS 8, RHEL 8, or later versions of these platforms (Valkey is the preferred solution on Fedora, Centos 10, RHEL 10 and later).

metrics_into_elasticsearch: false

Boolean flag allowing metric values to be exported into Elasticsearch.

metrics_from_elasticsearch: false

Boolean flag allowing metrics from Elasticsearch to be made available.

metrics_into_spark: false

Boolean flag allowing metric values to be exported into Spark.

metrics_from_spark: false

Boolean flag allowing metrics from Spark to be made available.

metrics_from_postfix: false

Boolean flag allowing metrics from Postfix to be made available.

metrics_from_mssql: false

Boolean flag allowing metrics from SQL Server to be made available. Enabling this flag requires a 'trusted' connection to SQL Server.

metrics_from_bpftrace: false

Boolean flag allowing metrics from bpftrace to be made available.

metrics_username: metrics

An account to establish authenticated access to remote metrics via the PCP pmcd daemon. For more information, see https://pcp.readthedocs.io/en/latest/QG/AuthenticatedConnections.html.

Additionally, if the bpftrace metrics are configured, this user account will be able to register bpftrace scripts.

metrics_password: metrics

Do not use a clear text metrics_password. Use Ansible Vault to encrypt the password.

Mandatory authentication for executing dynamic bpftrace scripts.

metrics_provider: pcp

The metrics collector to use to provide metrics.

Currently Performance Co-Pilot is the only supported metrics provider. When using the PCP provider these TCP ports will be used - 44321 (pmcd, live metric value sampling), 44322 (pmproxy, with metrics_query_service or metrics_graph_service), 6379 (either valkey-server or redis-server for metrics_query_service) and 3000 (grafana-server for metrics_graph_service).

metrics_manage_firewall: false

Boolean flag allowing to configure firewall using the firewall role. Manage the pmcd port, the pmproxy port, the Grafana port and either the Valkey or Redis port depending upon the configuration parameters. If the variable is set to false, the metrics role does not manage the firewall.

NOTE: metrics_manage_firewall is limited to adding ports. It cannot be used for removing ports. If you want to remove ports, you will need to use the firewall system role directly.

NOTE: the firewall management is not supported on RHEL 6.

metrics_manage_selinux: false

Boolean flag allowing to configure selinux using the selinux role. Assign the pmcd port, the pmproxy port, the Grafana port and either the Valkey or Redis port depending upon the configuration parameters. If the variable is set to false, the metrics role does not manage the selinux.

Please note that the pmcd and pmproxy services are in the "ephemeral" range requiring no special setup and the Grafana port is "unregistered". The Valkey or Redis ports are gated by the valkey_port_t or redis_port_t SELinux types respectively, and may need to be further configured if you require direct access (not required if you are accessing it from metrics role tools like Grafana and PCP). Use the selinux system role to manage port access, for SELinux contexts.

NOTE: metrics_manage_selinux is limited to adding policy. It cannot be used for removing policy. If you want to remove policy, you will need to use the selinux system role directly.

metrics_optional_domains: []

List of optional metrics domains to enable. The role will enable the domains for components you are managing. For example, if you use metrics_from_elasticsearch: true, the role will enable the elasticsearch domain automatically. This variable is for additional domains.

metrics_optional_domains: [apache]

metrics_optional_packages: []

Additional metrics packages that should be installed, beyond the default set, to enable additional metrics, export to alternate data sinks, and so on.

metrics_optional_packages: [pcp-pmda-apache]

metrics_grafana_certificates: []

For generating a new certificate for grafana it is recommended to set the metrics_grafana_certificates variable. If you have your own certs/keys, or will create them from your own CA provider, see below metrics_grafana_cert et al. for information about how to pass in and/or use your own certs.

The value of metrics_grafana_certificates is passed on to the certificate_requests variable of the certificate role called internally in the metrics role and it generates the private key and certificate. For the supported parameters of metrics_grafana_certificates, see the certificate_requests role documentation section.

When you set metrics_grafana_certificates, you must not set metrics_grafana_private_key and metrics_grafana_cert variables because they are ignored.

This example installs grafana with an IdM-issued web server certificate assuming your machines are joined to a FreeIPA domain.

    - name: Install grafana with server certificate and key
      include_role:
        name: linux-system-roles.metrics
      vars:
        metrics_graph_service: true
        metrics_grafana_certificates:
          - name: grafana-server
            dns: ['localhost', 'www.example.com']
            ca: ipa

NOTE: The certificate role, unless using IPA and joining the systems to an IPA domain, creates self-signed certificates, so you will need to explicitly configure trust, which is not currently supported by the system roles. To use ca: self-sign or ca: local, depending on your certmonger usage, see the linux-system-roles.certificate documentation for details.

NOTE: Creating a self-signed certificate is not supported on RHEL/CentOS-7.

metrics_grafana_cert: ''

TLS certificate file for the grafana server. This should be the full absolute path on the grafana server machine e.g. /etc/pki/tls/certs/grafana-server.crt. This file should already exist. If you want to copy a local file, see metrics_grafana_cert_src.

metrics_grafana_cert: /etc/pki/tls/certs/grafana-server.crt

metrics_grafana_cert_src: ''

TLS certificate file from the local machine to copy to metrics_grafana_cert on the grafana server machine. If metrics_grafana_cert is not specified, then metrics_grafana_cert_src will be copied to /etc/pki/tls/certs/basename.crt on the grafana server machine, where basename is the basename of metrics_grafana_cert_src.

metrics_grafana_cert_src: /my/local/grafana-server.crt

metrics_grafana_private_key: ''

TLS private key file for the grafana server. This should be the full absolute path on the grafana server machine e.g. /etc/pki/tls/private/grafana-server.key. This file should already exist. If you want to copy a local file, see metrics_grafana_private_key_src.

metrics_grafana_private_key: /etc/pki/tls/private/grafana-server.key

metrics_grafana_private_key_src: ''

TLS private key file from the local machine to copy to metrics_grafana_private_key on the grafana server machine. If metrics_grafana_private_key is not specified, then metrics_grafana_private_key_src will be copied to /etc/pki/tls/private/basename.key on the grafana server machine, where basename is the basename of metrics_grafana_private_key_src.

metrics_grafana_private_key_src: /my/local/grafana-server.key

Example Playbook

Basic metric recording setup for each managed host only, with one weeks worth of data retained before culling.

---
- name: Manage metrics service
  hosts: all
  vars:
    metrics_retention_days: 7
  roles:
    - linux-system-roles.metrics

Scalable metric recording, analysis and visualization setup for the managed hosts, providing a REST API server with an OpenMetrics endpoint, graphs and scalable querying.

---
- name: Manage metrics with graph and query services
  hosts: all
  vars:
    metrics_graph_service: true
    metrics_query_service: true
  roles:
    - linux-system-roles.metrics

Centralized metric recording setup for several remote hosts and scalable metric recording, analysis and visualization setup for the local host, providing a REST API server with an OpenMetrics endpoint, graphs and scalable querying.

---
- name: Manage centralized metrics gathering
  hosts: monitors
  vars:
    metrics_monitored_hosts: [app.example.com, db.example.com, nas.example.com]
    metrics_graph_service: true
    metrics_query_service: true
  roles:
    - linux-system-roles.metrics

rpm-ostree

See README-ostree.md

License

MIT