From 536c48b5c36adb432f2fde7fac19828feb85cf9e Mon Sep 17 00:00:00 2001 From: Vladimir Hasko Date: Sun, 21 May 2023 20:14:17 +0000 Subject: [PATCH] added test scenarios content --- .../internal/apimon_training/databases.rst | 2 +- doc/source/internal/apimon_training/logs.rst | 4 +- .../apimon_training/test_scenarios.rst | 194 +++++++++++++++++- 3 files changed, 195 insertions(+), 5 deletions(-) diff --git a/doc/source/internal/apimon_training/databases.rst b/doc/source/internal/apimon_training/databases.rst index 77f6e39..5a7ce20 100644 --- a/doc/source/internal/apimon_training/databases.rst +++ b/doc/source/internal/apimon_training/databases.rst @@ -14,7 +14,7 @@ Graphite ======== - [Graphite](https://graphiteapp.org/) is an open-source enterprise-ready + `Graphite `_ is an open-source enterprise-ready time-series database. ApiMon, EpMon, and CloudMon data are stored in the clustered Graphite TSDB. Metrics emitted by the processes are gathered in the row of statsd processes which aggregate metrics to 10s precision. diff --git a/doc/source/internal/apimon_training/logs.rst b/doc/source/internal/apimon_training/logs.rst index 0cd240a..ef712a6 100644 --- a/doc/source/internal/apimon_training/logs.rst +++ b/doc/source/internal/apimon_training/logs.rst @@ -1,10 +1,12 @@ +.. _logs: + ==== Logs ==== - - Every single job run log is stored on object storage + - Every single job run log is stored on OpenStack Swift object storage. - Each single job log file provides unique URL which can be accessed to see log details - These URLs are available on all APIMON levels: diff --git a/doc/source/internal/apimon_training/test_scenarios.rst b/doc/source/internal/apimon_training/test_scenarios.rst index 2e58095..cb31ce8 100644 --- a/doc/source/internal/apimon_training/test_scenarios.rst +++ b/doc/source/internal/apimon_training/test_scenarios.rst @@ -5,7 +5,195 @@ Test Scenarios ============== -Test Scenarios playbooks are located at -https://github.com/opentelekomcloud-infra/apimon-test. (the location will change -with CloudMon replacement in future). +The Executor role of each API-Monitoring environment is responsible for +executing individual jobs (scenarios). Those can be defined as Ansible playbooks +(what allow them to be pretty much anything) or any other executable form (as +python script). With Ansible on it's own having nearly limitless capability and +availability to execute anything else ApiMon can do pretty much anything. The +only expectation is that whatever is being done produces some form of metric for +further analysis and evaluation. Otherwise there is no sense in monitoring. The +scenarios are collected in a Git repository and updated in real-time. In general +mentioned test jobs do not need take care of generating data implicitly. Since +the API related tasks in the playbooks rely on the Python OpenStack SDK (and its +OTC extensions), metric data generated automatically by a logging interface of +the SDK ('openstack_api' metrics). Those metrics are collected by statsd and +stored to `graphite TSDB `. + +Additionall metric data are generated also by executor service which collects +the playbook names, results and duration time ('ansible_stats' metrics) and +stores them to `postgresql relational database `. + +The playbooks with monitoring scenarios are stored in separete repository on +`github `_ (the location +will change with CloudMon replacement in future). Playbooks address the most +common use cases with cloud services conducted by end customers. + +The metrics generated by Executor are described on :ref:`Metric +Definitions ` page. + +In addition to metrics generated and captured by a playbook ApiMon also captures +`stdout of the execution `. and saves this log for additional analysis to OpenStack +Swift storage where logs are being uploaded there with a configurable retention +policy. + + +New test scenario introduction +============================== + + +As already mentioned playbook scenarios are stored in separete repository on +`github `_. Due to the +fact that we have farious environments which differ between each other by +location, supported services, different flavors, etc it's required to have +monitoring configuration matrix which defines the monitoring standard and scope +for each enviroment. Therefore to enable +playbook in some of the monitored environments (PROD EU-DE, EU-NL, PREPROD, +Swisscloud) further update is required in the `monitoring matrix +`_. +This will be also matter of change in future once StackMon will take place. + + +Rules for Test Scenarios +======================== + +Ansible playbooks need to follow some basic regression testing principles to +ensure sustainability of the endless exceution of such scenarios: + +- **use OpenTelekomCloud collection and OpenStack collection** + + - When developing test scenarios use available `Opentelekomcloud.Cloud + `_ or + `Openstack.Cloud + `_ + collections for native interaction with cloud in ansible. + + - In case there are features not supported by collection you can still use + script module and call directly python SDK script to invoke required request + towards cloud + +- **unique names of resources** + + - Make sure that resources don't conflict with each other and are easily + trackable by its unique name + +- **teardown of the resources** + + - Make sure that deletion / cleanup of the resources is triggered even if some + of the tasks in playbooks will fail + + - Make sure that deletion / cleanup is triggered in right order + +- **simplicity** + + - Do not overcomplicate test scenario. Use default auto-autofilled parameters + whereever possible + +- **only basic / core functions are in scope of testing** + + - ApiMon is not supposed to validate full service functionality. For such + cases we have different team / framework within QA responsibility + + - Focus only on core functions which are critical for basic operation / + lifecycle of the service. + + - The less functions you use the less potential failure rate you will have on + runnign scenario for whatever reasons + +- **minimize hardcoding** + + - Every single hardcoded parameter in scenario will later lead to potential + outage of the scenario's run in future when such parameter might change + + - Try to obtain all such parameters dynamically from the cloud directly. + +- **use special tags for combined metrics** + + - In case that you want to combine multiple tasks in playbook in single custom + metric you can do with using tags parameter in the tasks + + + +Custom metrics in Test Scenarios +================================ + +OpenStack SDK and otcextensions support metric generation natively for every +single API call and ApiMon executor supports collection of ansible playbook +statistics so every single scenario and task can store its result, duration and +name in metric database. + +But in some cases there's a need to provide measurement on multiple tasks which +represent some important aspect of the customer use case. For example measure +the time and overall result from the VM deployment until succesfull login via +SSH. Single task results are stored as metrics in metric database but it would +be complicated to transfer processing logic of metrics on grafana. Therefore +tags feature on task level introduces possibility to address custom metrics. + + +In following example the custom metric stores the result of multiple tasks in special metri name create_server:: + + - name: Create Server in default AZ + openstack.cloud.server: + auto_ip: false + name: "{{ test_server_fqdn }}" + image: "{{ test_image }}" + flavor: "{{ test_flavor }}" + key_name: "{{ test_keypair_name }}" + network: "{{ test_network_name }}" + security_groups: "{{ test_security_group_name }}" + tags: + - "metric=create_server" + - "az=default" + register: server + + - name: get server id + set_fact: + server_id: "{{ server.id }}" + + - name: Attach FIP + openstack.cloud.floating_ip: + server: "{{ server_id }}" + tags: + - "metric=create_server" + - "az=default" + + - name: get server info + openstack.cloud.server_info: + server: "{{ server_id }}" + register: server + tags: + - "metric=create_server" + - "az=default" + + - set_fact: + server_ip: "{{ server['openstack_servers'][0]['public_v4'] }}" + tags: + - "metric=create_server" + - "az=default" + + - name: find servers by name + openstack.cloud.server_info: + server: "{{ test_server_fqdn }}" + register: servers + tags: + - "metric=create_server" + - "az=default" + + - name: Debug server info + debug: + var: servers + + # Wait for the server to really start and become accessible + - name: Wait for SSH port to become active + wait_for: + port: 22 + host: "{{ server_ip }}" + timeout: 600 + tags: ["az=default", "service=compute", "metric=create_server"] + + - name: Try connecting + retries: 10 + delay: 1 + command: "ssh -o 'UserKnownHostsFile=/dev/null' -o 'StrictHostKeyChecking=no' linux@{{ server_ip }} -i ~/.ssh/{{ test_keypair_name }}.pem" + tags: ["az=default", "service=compute", "metric=create_server"] +