DevConf2021.cz - Presentation and Demo
There was a presentation entitled “Managing Standard Operating Envs with Ansible” given at DevConf2021.cz. Demo files and links to videos can be found at DevConf2021.cz
CI changes - Github Actions and tox-lsr
We have recently moved our github CI to use Github Actions instead of Travis. Our organization template is here: https://github.com/linux-system-roles/.github
We currently aren’t using any of the more advanced features of Github Actions, as we wanted to achieve parity with Travis as soon as possible.
We have also replaced all of the local scripts used for CI testing with tox-lsr. If you are a system roles developer, you will need to modify your workflow in order to use the new plugin. See README.md for more information.
Introduction to Network Role
Introduction
The network role supports two providers: NetworkManager(nm) and initscripts. For CentOS/RHEL 6, we only use initscripts as providers. For CentOS/RHEL 7+, we use initscripts and nm as providers. Various networking profiles can be configured via customized Ansible module. Several tasks will run for host networking setup, including but not limited to, package installation, starting/enabling services. Network role CI system consists of Tox running unit tests and Test-harness running integration tests. When we use Tox to run unit tests, we can check code formatting using Python Black, check YAML files formatting etc. Integration tests run in internal OpenShift, watch configured GitHub repositories for PRs, check out new PR, run all testing playbooks against all configured images, fresh machine for every test playbook, sets statuses of PR and uploads results. For better testing efficiency, in some playbooks, we can call internal Ansible modules instead of role to skip redundant tasks, we can also group Ansible modules into blocks for more targeted unit testing. Furthermore, there are helper scripts to get coverage from integration tests via Ansible, basic unit for argument parsing, additional helper files for assertion/test setup/logging.
Code structure
The repository is structured as follows:
./defaults/
– Contains the default role configuration../examples/
– Contains YAML examples for different configurations../library/network_connections.py
– Contains the internal Ansible module, which is the main script. It controls the communication between the role and Ansible, imports the YAML configuration and applies the changes to the provider (i.e. NetworkManager, initscripts)../meta/
– Metadata of the project../module_utils/network_lsr/
– Contains other files that are useful for the network role (e.g. the YAML argument validator)./tasks/
– Declaration of the different tasks that the role is going to execute../tests/playbooks/
– Contains the complete tests for the role../tests/tests_*.yml
are shims to run tests once for every provider../tests/tasks/
contains task snippets that are used in multiple tests to avoid having the same code repeated multiple times.- Each file matching
tests_*.yml
is a test playbook which is run by the CI system.
How to run test
Tox Unit Tests
-
tox -l
, list all the unit testing, available unit testing options are:- black
- pylint
- flake8
- yamllint
- py26
- py27
- py36
- py37
- py38
- collection
- custom
- tox, run all the tests
- tox -e py36,
py36
is pyunit testing with Python 3.6 - tox -e yamllint, Check the YAML files are correctly formatted
- tox -e black, Check the formatting of the code with Python Black
- …
Integration Test
- Download CentOS 6, CentOS 7, CentOS 8, Fedora images from
- https://cloud.centos.org/centos/6/images/CentOS-6-x86_64-GenericCloud-1907.qcow2c
- https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud-2003.qcow2c
- https://cloud.centos.org/centos/8/x86_64/images/CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2
- https://kojipkgs.fedoraproject.org/compose/cloud/
- Install “standard-test-roles-inventory-qemu” package
dnf install standard-test-roles-inventory-qemu
- [TEST_DEBUG=1] TEST_SUBJECTS=
ansible-playbook -v[v] -i <inventory file/script> <tests_….yml> TEST_SUBJECTS=CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 ansible-playbook -v -i /usr/share/ansible/inventory/standard-inventory-qcow2 tests/tests_default.yml
Overview
Network role enables users to configure the network on the target machine. This role can be used to configure:
- Ethernet interfaces
- Bridge interfaces
- Bonded interfaces
- VLAN interfaces
- MacVLAN interfaces
- Infiniband interfaces
- Wireless (WiFi) interfaces
- IP configuration
- 802.1x authentication
Examples of Connections
The network role updates or creates all connection profiles on the target system as specified in the network_connections variable, which is a list of dictionaries that include specific options.
Configuring Ethernet:
network_connections:
- name: eth0
#persistent_state: present # default
type: ethernet
autoconnect: yes
mac: 00:00:5e:00:53:5d
ip:
dhcp4: yes
Configuring Bridge:
network_connections:
- name: internal-br0
interface_name: br0
type: bridge
ip:
dhcp4: no
auto6: no
Configuring Bonded Interface:
network_connections:
- name: br0-bond0
type: bond
interface_name: bond0
controller: internal-br0
port_type: bridge
- name: br0-bond0-eth1
type: ethernet
interface_name: eth1
controller: br0-bond0
port_type: bond
Configuring VLANs:
network_connections:
- name: eth1-profile
autoconnet: no
type: ethernet
interface_name: eth1
ip:
dhcp4: no
auto6: no
- name: eth1.6
autoconnect: no
type: vlan
parent: eth1-profile
vlan:
id: 6
ip:
address:
- 192.0.2.5/24
auto6: no
Configuring Infiniband:
network_connections:
- name: ib0
type: infiniband
interface_name: ib0
# Create a simple infiniband profile
- name: ib0-10
interface_name: ib0.000a
type: infiniband
autoconnect: yes
infiniband_p_key: 10
parent: ib0
state: up
ip:
dhcp4: no
auto6: no
address:
- 198.51.100.133/30
Configuring MACVLAN:
network_connections:
- name: eth0-profile
type: ethernet
interface_name: eth0
ip:
address:
- 192.168.0.1/24
- name: veth0
type: macvlan
parent: eth0-profile
macvlan:
mode: bridge
promiscuous: yes
tap: no
ip:
address:
- 192.168.1.1/24
Configuring a wireless connection:
network_connections:
- name: wlan0
type: wireless
interface_name: wlan0
wireless:
ssid: "My WPA2-PSK Network"
key_mgmt: "wpa-psk"
# recommend vault encrypting the wireless password
# see https://docs.ansible.com/ansible/latest/user_guide/vault.html
password: "p@55w0rD"
Setting the IP configuration:
network_connections:
- name: eth0
type: ethernet
ip:
route_metric4: 100
dhcp4: no
#dhcp4_send_hostname: no
gateway4: 192.0.2.1
dns:
- 192.0.2.2
- 198.51.100.5
dns_search:
- example.com
- subdomain.example.com
route_metric6: -1
auto6: no
gateway6: 2001:db8::1
address:
- 192.0.2.3/24
- 198.51.100.3/26
- 2001:db8::80/7
route:
- network: 198.51.100.128
prefix: 26
gateway: 198.51.100.1
metric: 2
- network: 198.51.100.64
prefix: 26
gateway: 198.51.100.6
metric: 4
route_append_only: no
rule_append_only: yes
Configuring 802.1x:
network_connections:
- name: eth0
type: ethernet
ieee802_1x:
identity: myhost
eap: tls
private_key: /etc/pki/tls/client.key
# recommend vault encrypting the private key password
# see https://docs.ansible.com/ansible/latest/user_guide/vault.html
private_key_password: "p@55w0rD"
client_cert: /etc/pki/tls/client.pem
ca_cert: /etc/pki/tls/cacert.pem
domain_suffix_match: example.com
Reference
- The external landing page for the system roles project, https://linux-system-roles.github.io/
- The external network role docs, https://github.com/linux-system-roles/network/
Separate INFO and DEBUG logs
Introduction
Before refactoring logging of network module, the module collects all logging statements, and at the end returns them as “warnings”, so that they are shown by ansible. Obviously, these are not really warnings, but rather debug information..
How to reproduce
We can reproduce this network module bug by doing qemu test.
TEST_SUBJECTS=CentOS-8-GenericCloud-8.1.1911-20200113.3.x86_64.qcow2 ansible-playbook -vv -i /usr/share/ansible/inventory/standard-inventory-qcow2 ./tests/playbooks/tests_ethernet.yml
How to resolve it
The logging messages should be returned in a different json field that is ignored by ansible. Then, the tasks/main.yml should have a follow-up debug task that prints the returned variable. In the failure case, the network_connections task must run ignoring failures to reach the debug statement. Then, a follow up task should check whether the network_connections task failed and abort.
What is the result
After bug fixed, we can also use the same qemu test to compare the result:
Additional test cases
Beyond that, we also have some assertion to confirm that we indeed separate Info and Debug logs.
In ./tests/tests_default.yml
, we have the following testing code to assert no warning
in _network_connections_result.
---
- name: Test executing the role with default parameters
hosts: all
roles:
- linux-system-roles.network
tasks:
- name: Test warning and info logs
assert:
that:
- "'warnings' not in __network_connections_result"
msg: "There are warnings"
In ./tests/tasks/assert_output_in_stderr_without_warnings.yml
, we assert no warning in
_network_connections_result, and assert stderr in _network_connections_result.
---
- name: "Assert that warnings is empty"
assert:
that:
- "'warnings' not in __network_connections_result"
msg: "There are unexpected warnings"
- name: "Assert that there is output in stderr"
assert:
that:
- "'stderr' in __network_connections_result"
msg: "There are no messages in stderr"
The following Ansible logs is extracted from same qemu testing result after the bug fixed:
Demo video
I made a demo video to show the bugs and refactoring logging of network module after bug fixed, as well as additional test cases running result.
Reference
- Refactor logging of network module, https://github.com/linux-system-roles/network/issues/29
- Separate debug and info logs from warnings, https://github.com/linux-system-roles/network/pull/207
Conversion to Collection - YAML roundtrip with ruamel
The System Roles team is working on making the roles available as a collection. One of the challenges is that we have to continue to support the old style roles for the foreseeable future due to customers using older versions of Ansible. So rather than just create a github repository for the collection and do a one-time conversion of all of the roles to collection format, we have decided to keep the existing github role structure, and instead use a script to build the collection for publishing in Galaxy.
Using the collections:
keyword
One strategy is to use the
collections:
keyword in the play. For example:
- name: Apply the kernel_settings role
hosts: all
roles:
- kernel_settings
tasks:
- name: use the kernel_settings module
kernel_settings:
...
To use this role from a collection fedora.system_roles
, you could use the
collections:
keyword:
- name: Apply the kernel_settings role
hosts: all
collections:
- fedora.system_roles
roles:
- kernel_settings
tasks:
- name: use the kernel_settings module
kernel_settings:
...
However, the guidance we have received from the Ansible team is that we should
use FQRN (Fully Qualified Role Name) and FQCN (Fully Qualified Collection Name)
to avoid any naming collisions or ambiguity, and not to rely on the
collections:
keyword. This means we have a lot of conversion to do. For Ansible YAML files, the two
main items are:
- convert references to role
ROLENAME
andlinux-system-roles.ROLENAME
tofedora.system_roles.ROLENAME
- convert references to modules to use the FQCN e.g.
some_module:
tofedora.system_roles.some_module:
Using regular expressions to search/replace strings
One solution is to use a regular expression match - just look for references to
linux-system-roles.ROLENAME
and convert them to
fedora.system_roles.ROLENAME
. This works pretty well, but there is no
guarantee that there is some odd use of linux-system-roles.ROLENAME
not
related to a role keyword. It would be much better and safer if we could only
change those places where the role name is used in the semantic context of an
Ansible role reference. For modules, it is quite tricky to do this
search/replace using a regexp. To complicate matters, in the network
role,
the module name network_connections
is also used as a role variable name. I’m
not sure how one would write a regexp that could detect the semantic context and
only replace the string network_connections
with
fedora.system_roles.network_connections
in the context of usage as an Ansible
module.
Using the Ansible parser
The next solution was to use the Ansible parser
(ansible.parsing.dataloader.DataLoader
) to read in the files with the full
semantic information. We took inspiration from the ansible-lint
code for
this, and used similar heuristics to determine the file and node types:
- file location - files in the
vars/
anddefaults/
directories are nottasks/
files - Ansible type - a tasks file has type
AnsibleSequence
notAnsibleMapping
- node type - a
play
has one of theplay
keywords likegather_facts
,tasks
, etc.
For task
nodes, we then use ansible.parsing.mod_args.ModuleArgsParser
to
parse out the module name (as is done in ansible-lint
).
For role references, we look for
- a
task
with a moduleinclude_role
orimport_role
with aname
parameter - a
play
with aroles
keyword - a
meta
with adependencies
keyword
A role in a roles
or dependencies
may be referenced as
roles/dependencies:
- ROLENAME
# OR
- name: ROLENAME
vars: ...
# OR
- role: ROLENAME
vars: ...
This allowed us to easily identify where the ROLENAME
was referenced as a role
rather than something else, and to identify where the role modules were used.
The next problem - how to write out these converted files? Just using a plain YAML dump, even if nicely formatted, does not preserve all of our pre/post YAML doc, comments, formatting, etc. We thought it was important to keep this as much as possible:
- keep license headers in files
- helps visually determine if the collection conversion was successful
- when bugs come from customers using the collection, we can much better debug and fix the source role if the line numbers and formatting match
- we’ll use this code when we eventually convert our repos in github to use the collection format
Using Ansible and ruamel
The ruamel.yaml
package has the ability to “round-trip” YAML files, preserving
comments, quoting, formatting, etc. We borrowed another technique from
ansible-lint
which parses and iterates Ansible files using both the Ansible
parser and the ruamel parser “in parallel” (ansible-lint
is also comment
aware). This is an excerpt from the role file parser class:
def __init__(self, filepath, rolename):
self.filepath = filepath
dl = DataLoader()
self.ans_data = dl.load_from_file(filepath)
if self.ans_data is None:
raise LSRException(f"file is empty {filepath}")
self.file_type = get_file_type(self.ans_data)
self.rolename = rolename
self.ruamel_yaml = YAML(typ="rt")
self.ruamel_yaml.default_flow_style = False
self.ruamel_yaml.preserve_quotes = True
self.ruamel_yaml.width = None
buf = open(filepath).read()
self.ruamel_data = self.ruamel_yaml.load(buf)
self.ruamel_yaml.indent(mapping=2, sequence=4, offset=2)
self.outputfile = None
self.outputstream = sys.stdout
The class uses ans_data
for looking at the data using Ansible semantics, and
uses ruamel_data
for doing the modification and writing.
def run(self):
if self.file_type == "vars":
self.handle_vars(self.ans_data, self.ruamel_data)
elif self.file_type == "meta":
self.handle_meta(self.ans_data, self.ruamel_data)
else:
for a_item, ru_item in zip(self.ans_data, self.ruamel_data):
self.handle_item(a_item, ru_item)
def write(self):
def xform(thing):
if self.file_type == "tasks":
thing = re.sub(LSRFileTransformerBase.INDENT_RE, "", thing)
return thing
if self.outputfile:
outstrm = open(self.outputfile, "w")
else:
outstrm = self.outputstream
self.ruamel_yaml.dump(self.ruamel_data, outstrm, transform=xform)
def handle_item(self, a_item, ru_item):
"""handle any type of item - call the appropriate handlers"""
ans_type = get_item_type(a_item)
self.handle_vars(a_item, ru_item)
self.handle_other(a_item, ru_item)
if ans_type == "task":
self.handle_task(a_item, ru_item)
self.handle_task_list(a_item, ru_item)
def handle_task_list(self, a_item, ru_item):
"""item has one or more fields which hold a list of Task objects"""
for kw in TASK_LIST_KWS:
if kw in a_item:
for a_task, ru_task in zip(a_item[kw], ru_item[kw]):
self.handle_item(a_task, ru_task)
The concrete class that uses this code provides callbacks for tasks, vars, meta,
and other, and the callback can change the data. a_task
is the task
node
from the Ansible parser, and ru_task
is the task
node from the ruamel
parser. role_modules
is a set
of names of the modules provided by the role.
prefix
is e.g. fedora.system_roles.
def task_cb(self, a_task, ru_task, module_name, module_args, delegate_to):
if module_name == "include_role" or module_name == "import_role":
rolename = ru_task[module_name]["name"]
lsr_rolename = "linux-system-roles." + self.rolename
if rolename == self.rolename or rolename == lsr_rolename:
ru_task[module_name]["name"] = prefix + self.rolename
elif module_name in role_modules:
# assumes ru_task is an orderreddict
idx = tuple(ru_task).index(module_name)
val = ru_task.pop(module_name)
ru_task.insert(idx, prefix + module_name, val)
This produces an output file that is very close to the input - but not quite.
Problems with this approach
- We can’t make ruamel do proper indentation of lists without having it do the indentation at the first level. For example:
- name: first level
block:
- name: second level
something: something
comes out as
- name: first level
block:
- name: second level
something: something
This is why we have the xform
hack in the write
method.
- Even with the hack, comments are not indented correctly
- name: first level
# comment here
block:
# comment here
- name: second level
something: something
comes out as
- name: first level
# comment here
block:
# comment here
- name: second level
something: something
One approach would be to have xform
skip the removal of the two extra spaces
at the beginning of the line if the first non-space character in the line is
#
. However, if you have shell
scripts or embedded config files with
comments in them, these will then not be indented correctly, leading to
problems. So for now, we just live with improperly indented Ansible comments.
- Line wrapping is not preserved
We use yamllint
and have had to use some creative wrapping/folding to abide by
the line length restriction e.g.
- "{{ ansible_facts['distribution'] }}_\
{{ ansible_facts['distribution_version'] }}.yml"
- "{{ ansible_facts['distribution'] }}_\
{{ ansible_facts['distribution_major_version'] }}.yml"
is converted to
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version']\
\ }}.yml"
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version']\
\ }}.yml"
that is, ruamel imposes its own line length and wrapping convention.
We also didn’t have to worry about how to handle usage of plugins inside of
lookup
functions, which would seem to be a much more difficult problem.