DevConf2021.cz - Presentation and Demo

There was a presentation entitled “Managing Standard Operating Envs with Ansible” given at DevConf2021.cz. Demo files and links to videos can be found at DevConf2021.cz


CI changes - Github Actions and tox-lsr

We have recently moved our github CI to use Github Actions instead of Travis. Our organization template is here: https://github.com/linux-system-roles/.github

We currently aren’t using any of the more advanced features of Github Actions, as we wanted to achieve parity with Travis as soon as possible.

We have also replaced all of the local scripts used for CI testing with tox-lsr. If you are a system roles developer, you will need to modify your workflow in order to use the new plugin. See README.md for more information.


Introduction to Network Role

Introduction

The network role supports two providers: NetworkManager(nm) and initscripts. For CentOS/RHEL 6, we only use initscripts as providers. For CentOS/RHEL 7+, we use initscripts and nm as providers. Various networking profiles can be configured via customized Ansible module. Several tasks will run for host networking setup, including but not limited to, package installation, starting/enabling services. Network role CI system consists of Tox running unit tests and Test-harness running integration tests. When we use Tox to run unit tests, we can check code formatting using Python Black, check YAML files formatting etc. Integration tests run in internal OpenShift, watch configured GitHub repositories for PRs, check out new PR, run all testing playbooks against all configured images, fresh machine for every test playbook, sets statuses of PR and uploads results. For better testing efficiency, in some playbooks, we can call internal Ansible modules instead of role to skip redundant tasks, we can also group Ansible modules into blocks for more targeted unit testing. Furthermore, there are helper scripts to get coverage from integration tests via Ansible, basic unit for argument parsing, additional helper files for assertion/test setup/logging.

Code structure

The repository is structured as follows:

  • ./defaults/ – Contains the default role configuration.
  • ./examples/ – Contains YAML examples for different configurations.
  • ./library/network_connections.py – Contains the internal Ansible module, which is the main script. It controls the communication between the role and Ansible, imports the YAML configuration and applies the changes to the provider (i.e. NetworkManager, initscripts).
  • ./meta/ – Metadata of the project.
  • ./module_utils/network_lsr/ – Contains other files that are useful for the network role (e.g. the YAML argument validator)
  • ./tasks/ – Declaration of the different tasks that the role is going to execute.
  • ./tests/playbooks/ – Contains the complete tests for the role.
  • ./tests/tests_*.yml are shims to run tests once for every provider.
  • ./tests/tasks/ contains task snippets that are used in multiple tests to avoid having the same code repeated multiple times.
  • Each file matching tests_*.yml is a test playbook which is run by the CI system.

How to run test

Tox Unit Tests

  • tox -l, list all the unit testing, available unit testing options are:

    • black
    • pylint
    • flake8
    • yamllint
    • py26
    • py27
    • py36
    • py37
    • py38
    • collection
    • custom
  • tox, run all the tests
  • tox -e py36, py36 is pyunit testing with Python 3.6
  • tox -e yamllint, Check the YAML files are correctly formatted
  • tox -e black, Check the formatting of the code with Python Black

Integration Test

  • Download CentOS 6, CentOS 7, CentOS 8, Fedora images from
    • https://cloud.centos.org/centos/6/images/CentOS-6-x86_64-GenericCloud-1907.qcow2c
    • https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-2003.qcow2c
    • https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2
    • https://kojipkgs.fedoraproject.org/compose/cloud/
  • Install “standard-test-roles-inventory-qemu” package
    dnf install standard-test-roles-inventory-qemu
    
  • [TEST_DEBUG=1] TEST_SUBJECTS= ansible-playbook -v[v] -i <inventory file/script> <tests_….yml>
    TEST_SUBJECTS=CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 ansible-playbook -v -i /usr/share/ansible/inventory/standard-inventory-qcow2 tests/tests_default.yml
    

Overview

Network role enables users to configure the network on the target machine. This role can be used to configure:

  • Ethernet interfaces
  • Bridge interfaces
  • Bonded interfaces
  • VLAN interfaces
  • MacVLAN interfaces
  • Infiniband interfaces
  • Wireless (WiFi) interfaces
  • IP configuration
  • 802.1x authentication

Examples of Connections

The network role updates or creates all connection profiles on the target system as specified in the network_connections variable, which is a list of dictionaries that include specific options.

Configuring Ethernet:

network_connections:
  - name: eth0
    #persistent_state: present  # default
    type: ethernet
    autoconnect: yes
    mac: 00:00:5e:00:53:5d
    ip:
      dhcp4: yes

Configuring Bridge:

network_connections:
  - name: internal-br0
    interface_name: br0
    type: bridge
    ip:
      dhcp4: no
      auto6: no

Configuring Bonded Interface:

network_connections:
  - name: br0-bond0
    type: bond
    interface_name: bond0
    controller: internal-br0
    port_type: bridge

  - name: br0-bond0-eth1
    type: ethernet
    interface_name: eth1
    controller: br0-bond0
    port_type: bond

Configuring VLANs:

network_connections:
  - name: eth1-profile
    autoconnet: no
    type: ethernet
    interface_name: eth1
    ip:
      dhcp4: no
      auto6: no

  - name: eth1.6
    autoconnect: no
    type: vlan
    parent: eth1-profile
    vlan:
      id: 6
    ip:
      address:
        - 192.0.2.5/24
      auto6: no

Configuring Infiniband:

network_connections:
  - name: ib0
    type: infiniband
    interface_name: ib0

  # Create a simple infiniband profile
  - name: ib0-10
    interface_name: ib0.000a
    type: infiniband
    autoconnect: yes
    infiniband_p_key: 10
    parent: ib0
    state: up
    ip:
      dhcp4: no
      auto6: no
      address:
        - 198.51.100.133/30

Configuring MACVLAN:

network_connections:
  - name: eth0-profile
    type: ethernet
    interface_name: eth0
    ip:
      address:
        - 192.168.0.1/24

  - name: veth0
    type: macvlan
    parent: eth0-profile
    macvlan:
      mode: bridge
      promiscuous: yes
      tap: no
    ip:
      address:
        - 192.168.1.1/24

Configuring a wireless connection:

network_connections:
  - name: wlan0
    type: wireless
    interface_name: wlan0
    wireless:
      ssid: "My WPA2-PSK Network"
      key_mgmt: "wpa-psk"
      # recommend vault encrypting the wireless password
      # see https://docs.ansible.com/ansible/latest/user_guide/vault.html
      password: "p@55w0rD"

Setting the IP configuration:

network_connections:
  - name: eth0
    type: ethernet
    ip:
      route_metric4: 100
      dhcp4: no
      #dhcp4_send_hostname: no
      gateway4: 192.0.2.1

      dns:
        - 192.0.2.2
        - 198.51.100.5
      dns_search:
        - example.com
        - subdomain.example.com

      route_metric6: -1
      auto6: no
      gateway6: 2001:db8::1

      address:
        - 192.0.2.3/24
        - 198.51.100.3/26
        - 2001:db8::80/7

      route:
        - network: 198.51.100.128
          prefix: 26
          gateway: 198.51.100.1
          metric: 2
        - network: 198.51.100.64
          prefix: 26
          gateway: 198.51.100.6
          metric: 4
      route_append_only: no
      rule_append_only: yes

Configuring 802.1x:

network_connections:
  - name: eth0
    type: ethernet
    ieee802_1x:
      identity: myhost
      eap: tls
      private_key: /etc/pki/tls/client.key
      # recommend vault encrypting the private key password
      # see https://docs.ansible.com/ansible/latest/user_guide/vault.html
      private_key_password: "p@55w0rD"
      client_cert: /etc/pki/tls/client.pem
      ca_cert: /etc/pki/tls/cacert.pem
      domain_suffix_match: example.com

Reference

  1. The external landing page for the system roles project, https://linux-system-roles.github.io/
  2. The external network role docs, https://github.com/linux-system-roles/network/

Separate INFO and DEBUG logs

Introduction

Before refactoring logging of network module, the module collects all logging statements, and at the end returns them as “warnings”, so that they are shown by ansible. Obviously, these are not really warnings, but rather debug information..

How to reproduce

We can reproduce this network module bug by doing qemu test.

TEST_SUBJECTS=CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 ansible-playbook -vv -i /usr/share/ansible/inventory/standard-inventory-qcow2 ./tests/playbooks/tests_ethernet.yml

How to resolve it

The logging messages should be returned in a different json field that is ignored by ansible. Then, the tasks/main.yml should have a follow-up debug task that prints the returned variable. In the failure case, the network_connections task must run ignoring failures to reach the debug statement. Then, a follow up task should check whether the network_connections task failed and abort.

What is the result

After bug fixed, we can also use the same qemu test to compare the result:

Additional test cases

Beyond that, we also have some assertion to confirm that we indeed separate Info and Debug logs. In ./tests/tests_default.yml, we have the following testing code to assert no warning in _network_connections_result.

---
- name: Test executing the role with default parameters
  hosts: all
  roles:
    - linux-system-roles.network
  tasks:
    - name: Test warning and info logs
      assert:
        that:
          - "'warnings' not in __network_connections_result"
        msg: "There are warnings"

In ./tests/tasks/assert_output_in_stderr_without_warnings.yml, we assert no warning in _network_connections_result, and assert stderr in _network_connections_result.

---
- name: "Assert that warnings is empty"
  assert:
    that:
      - "'warnings' not in __network_connections_result"
    msg: "There are unexpected warnings"
- name: "Assert that there is output in stderr"
  assert:
    that:
      - "'stderr' in __network_connections_result"
    msg: "There are no messages in stderr"

The following Ansible logs is extracted from same qemu testing result after the bug fixed:

Demo video

I made a demo video to show the bugs and refactoring logging of network module after bug fixed, as well as additional test cases running result.

Separate INFO and DEBUG logs

Reference

  1. Refactor logging of network module, https://github.com/linux-system-roles/network/issues/29
  2. Separate debug and info logs from warnings, https://github.com/linux-system-roles/network/pull/207

Conversion to Collection - YAML roundtrip with ruamel

The System Roles team is working on making the roles available as a collection. One of the challenges is that we have to continue to support the old style roles for the foreseeable future due to customers using older versions of Ansible. So rather than just create a github repository for the collection and do a one-time conversion of all of the roles to collection format, we have decided to keep the existing github role structure, and instead use a script to build the collection for publishing in Galaxy.

Using the collections: keyword

One strategy is to use the collections: keyword in the play. For example:

- name: Apply the kernel_settings role
  hosts: all
  roles:
    - kernel_settings
  tasks:
    - name: use the kernel_settings module
      kernel_settings:
        ...

To use this role from a collection fedora.system_roles, you could use the collections: keyword:

- name: Apply the kernel_settings role
  hosts: all
  collections:
    - fedora.system_roles
  roles:
    - kernel_settings
  tasks:
    - name: use the kernel_settings module
      kernel_settings:
        ...

However, the guidance we have received from the Ansible team is that we should use FQRN (Fully Qualified Role Name) and FQCN (Fully Qualified Collection Name) to avoid any naming collisions or ambiguity, and not to rely on the collections: keyword. This means we have a lot of conversion to do. For Ansible YAML files, the two main items are:

  • convert references to role ROLENAME and linux-system-roles.ROLENAME to fedora.system_roles.ROLENAME
  • convert references to modules to use the FQCN e.g. some_module: to fedora.system_roles.some_module:

Using regular expressions to search/replace strings

One solution is to use a regular expression match - just look for references to linux-system-roles.ROLENAME and convert them to fedora.system_roles.ROLENAME. This works pretty well, but there is no guarantee that there is some odd use of linux-system-roles.ROLENAME not related to a role keyword. It would be much better and safer if we could only change those places where the role name is used in the semantic context of an Ansible role reference. For modules, it is quite tricky to do this search/replace using a regexp. To complicate matters, in the network role, the module name network_connections is also used as a role variable name. I’m not sure how one would write a regexp that could detect the semantic context and only replace the string network_connections with fedora.system_roles.network_connections in the context of usage as an Ansible module.

Using the Ansible parser

The next solution was to use the Ansible parser (ansible.parsing.dataloader.DataLoader) to read in the files with the full semantic information. We took inspiration from the ansible-lint code for this, and used similar heuristics to determine the file and node types:

  • file location - files in the vars/ and defaults/ directories are not tasks/ files
  • Ansible type - a tasks file has type AnsibleSequence not AnsibleMapping
  • node type - a play has one of the play keywords like gather_facts, tasks, etc.

For task nodes, we then use ansible.parsing.mod_args.ModuleArgsParser to parse out the module name (as is done in ansible-lint).

For role references, we look for

  • a task with a module include_role or import_role with a name parameter
  • a play with a roles keyword
  • a meta with a dependencies keyword

A role in a roles or dependencies may be referenced as

roles/dependencies:
  - ROLENAME
# OR
  - name: ROLENAME
    vars: ...
# OR
  - role: ROLENAME
    vars: ...

This allowed us to easily identify where the ROLENAME was referenced as a role rather than something else, and to identify where the role modules were used.

The next problem - how to write out these converted files? Just using a plain YAML dump, even if nicely formatted, does not preserve all of our pre/post YAML doc, comments, formatting, etc. We thought it was important to keep this as much as possible:

  • keep license headers in files
  • helps visually determine if the collection conversion was successful
  • when bugs come from customers using the collection, we can much better debug and fix the source role if the line numbers and formatting match
  • we’ll use this code when we eventually convert our repos in github to use the collection format

Using Ansible and ruamel

The ruamel.yaml package has the ability to “round-trip” YAML files, preserving comments, quoting, formatting, etc. We borrowed another technique from ansible-lint which parses and iterates Ansible files using both the Ansible parser and the ruamel parser “in parallel” (ansible-lint is also comment aware). This is an excerpt from the role file parser class:

    def __init__(self, filepath, rolename):
        self.filepath = filepath
        dl = DataLoader()
        self.ans_data = dl.load_from_file(filepath)
        if self.ans_data is None:
            raise LSRException(f"file is empty {filepath}")
        self.file_type = get_file_type(self.ans_data)
        self.rolename = rolename
        self.ruamel_yaml = YAML(typ="rt")
        self.ruamel_yaml.default_flow_style = False
        self.ruamel_yaml.preserve_quotes = True
        self.ruamel_yaml.width = None
        buf = open(filepath).read()
        self.ruamel_data = self.ruamel_yaml.load(buf)
        self.ruamel_yaml.indent(mapping=2, sequence=4, offset=2)
        self.outputfile = None
        self.outputstream = sys.stdout

The class uses ans_data for looking at the data using Ansible semantics, and uses ruamel_data for doing the modification and writing.

    def run(self):
        if self.file_type == "vars":
            self.handle_vars(self.ans_data, self.ruamel_data)
        elif self.file_type == "meta":
            self.handle_meta(self.ans_data, self.ruamel_data)
        else:
            for a_item, ru_item in zip(self.ans_data, self.ruamel_data):
                self.handle_item(a_item, ru_item)

    def write(self):
        def xform(thing):
            if self.file_type == "tasks":
                thing = re.sub(LSRFileTransformerBase.INDENT_RE, "", thing)
            return thing
        if self.outputfile:
            outstrm = open(self.outputfile, "w")
        else:
            outstrm = self.outputstream
        self.ruamel_yaml.dump(self.ruamel_data, outstrm, transform=xform)

    def handle_item(self, a_item, ru_item):
        """handle any type of item - call the appropriate handlers"""
        ans_type = get_item_type(a_item)
        self.handle_vars(a_item, ru_item)
        self.handle_other(a_item, ru_item)
        if ans_type == "task":
            self.handle_task(a_item, ru_item)
        self.handle_task_list(a_item, ru_item)

    def handle_task_list(self, a_item, ru_item):
        """item has one or more fields which hold a list of Task objects"""
        for kw in TASK_LIST_KWS:
            if kw in a_item:
                for a_task, ru_task in zip(a_item[kw], ru_item[kw]):
                    self.handle_item(a_task, ru_task)

The concrete class that uses this code provides callbacks for tasks, vars, meta, and other, and the callback can change the data. a_task is the task node from the Ansible parser, and ru_task is the task node from the ruamel parser. role_modules is a set of names of the modules provided by the role. prefix is e.g. fedora.system_roles.

    def task_cb(self, a_task, ru_task, module_name, module_args, delegate_to):
        if module_name == "include_role" or module_name == "import_role":
            rolename = ru_task[module_name]["name"]
            lsr_rolename = "linux-system-roles." + self.rolename
            if rolename == self.rolename or rolename == lsr_rolename:
                ru_task[module_name]["name"] = prefix + self.rolename
        elif module_name in role_modules:
            # assumes ru_task is an orderreddict
            idx = tuple(ru_task).index(module_name)
            val = ru_task.pop(module_name)
            ru_task.insert(idx, prefix + module_name, val)

This produces an output file that is very close to the input - but not quite.

Problems with this approach

  • We can’t make ruamel do proper indentation of lists without having it do the indentation at the first level. For example:
- name: first level
  block:
    - name: second level
      something: something

comes out as

  - name: first level
    block:
      - name: second level
        something: something

This is why we have the xform hack in the write method.

  • Even with the hack, comments are not indented correctly
- name: first level
  # comment here
  block:
    # comment here
    - name: second level
      something: something

comes out as

  - name: first level
  # comment here
    block:
    # comment here
      - name: second level
        something: something

One approach would be to have xform skip the removal of the two extra spaces at the beginning of the line if the first non-space character in the line is #. However, if you have shell scripts or embedded config files with comments in them, these will then not be indented correctly, leading to problems. So for now, we just live with improperly indented Ansible comments.

  • Line wrapping is not preserved

We use yamllint and have had to use some creative wrapping/folding to abide by the line length restriction e.g.

    - "{{ ansible_facts['distribution'] }}_\
        {{ ansible_facts['distribution_version'] }}.yml"
    - "{{ ansible_facts['distribution'] }}_\
        {{ ansible_facts['distribution_major_version'] }}.yml"

is converted to

    - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version']\
        \ }}.yml"
    - "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version']\
        \ }}.yml"

that is, ruamel imposes its own line length and wrapping convention.

We also didn’t have to worry about how to handle usage of plugins inside of lookup functions, which would seem to be a much more difficult problem.