updating EpMon and Test Cases

This commit is contained in:
Hasko, Vladimir 2023-05-18 12:31:37 +00:00
parent fc2f6fd929
commit 8537442f8f
12 changed files with 111 additions and 33 deletions

View File

@ -9,11 +9,16 @@ The authentication is centrally managed by LDAP.
- Alerta is a monitoring tool to integrate alerts from multiple sources.
- The alerts from different sources can be consolidated and de-duplicated.
- On ApiMon it is hosted on same instance as Grafana just listening on different port.
- The Zulip API was integrated with Alerta, to send notification of errors/alerts on zulip stream.
- Alerts displayed on OTC Alerta are generated either by Executor or by Grafana.
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
- On ApiMon it is hosted on same instance as Grafana just listening on
different port.
- The Zulip API was integrated with Alerta, to send notification of
errors/alerts on zulip stream.
- Alerts displayed on OTC Alerta are generated either by Executor or by
Grafana.
- “Executor alerts” focus on playbook results, whether playbook has
completed or failed.
- “Grafana alerts” focus on breaching the defined thresholds. For example
API response time is higher than defined threshold.
.. image:: training_images/alerta_dashboard.png

View File

@ -8,8 +8,10 @@ The authentication is centrally managed by LDAP.
- The ApiMon Dashboards are segregated based on the type of service.
- The “OTC KPI” dashboard provides high level overview about OTC stability and reliability for management.
- “Endpoint monitoring” dashboard monitors health of every endpoint url listed by endpoint services catalogue.
- The “OTC KPI” dashboard provides high level overview about OTC stability and
reliability for management.
- “Endpoint monitoring” dashboard monitors health of every endpoint url listed
by endpoint services catalogue.
- “Respective service statistics” dashboards provide more detailed overview.
- Dashboards can be replicated/customized for individual Squad needs.
@ -20,3 +22,17 @@ OTC KPI Dashboard
=================
.. image:: training_images/kpi_dashboard.png
24/7 dasbhoards
===============
Endpoint Monitoring Dashboard
=============================
Common Test Results Dashboard
=============================
Service Based dashboard
=======================

View File

@ -10,21 +10,22 @@ understand what is supported in which mode.
The most important differences are described in the table below:
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
| **Differences** | **ApiMon (CMO)** | **ApiMon(FMO)** |
+=======================+============================================================================================================+===============================================================+
+=======================+============================================================================================================+==========================================================================+
| Playbook scenarios | https://github.com/opentelekomcloud-infra/apimon-test | https://github.com/stackmon/apimon-tests/tree/main/playbooks |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
| Dashboards | https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon | https://github.com/stackmon/apimon-tests/tree/main/dashboards |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
| Dashboards setup | https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon | https://github.com/stackmon/apimon-tests/tree/main/dashboards |
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
| Environment setup | https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml | https://github.com/opentelekomcloud-infra/stackmon-config |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
| Implementation mode | standalone app | plugin based |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
| Source of information | opentelekomcloud-infra | stackmon |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
| Portal | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
| Dashboards | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ |
| | https://dashboard.tsi-dev.otc-service.com/dashboards/f/UaB8meoZk/apimon | https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon |
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
| Documentation | https://confluence.tsi-dev.otc-service.com/display/ES/API-Monitoring | https://stackmon.github.io/ |
| | | https://stackmon-cloudmon.readthedocs.io/en/latest/index.html |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+

View File

@ -1,3 +1,38 @@
============================
Endpoint Monitoring overview
============================
EpMon is a standalone python based process targetting every OTC service. Tt
finds service in the service catalogs and sends GET requests to the configured
endpoints.
Performing extensive tests like provisioning a server is giving a great
coverage, but is usually not something what can be performed very often and
leaves certain gaps on the timescale of monitoring. In order to cover this gap
EpMon component is capable to send GET requests to the given URLs relying on the
API discovery of the OpenStack cloud (perform GET request to /servers or the
compute endpoint). Such requests are cheap and can be performed in the loop i.e.
every 5 seconds. Latency of those calls, as well as the return codes are being
captured and sent to the metrics storage.
Currently EpMon configuration is located in system-config:
https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml
(this will change in future once CloudMon will take place)
And defines the query HTTP targets for every single OTC service.
EpMon dashboard provides general availability status of every service definition
from service catalog:
.. image:: training_images/epmon_status_dashboard.jpg
Additionally it provides further details for the endpoints like response times,
detected error codes or no responses at all.
.. image:: training_images/epmon_dashboard_details.jpg
EpMon findings are also reported to Alerta and notifications are sent to Zulip
dedicated topic "apimon_endpoint_monitoring".

View File

@ -49,6 +49,9 @@ ApiMon Architecture Summary
- Alerta further sends error notification on Zulip #Alerts Stream.
- Log Files are maintained on OTC object storage via swift.
ApiMon features
---------------
ApiMon comes with the following features:
- Support of ansible playbooks for testing scenarios
@ -72,7 +75,9 @@ ApiMon comes with the following features:
- Every exectution of ansible playbooks stores the log file for further
investigation/analysis on swift
What ApiMon is NOT:
What ApiMon is NOT
------------------
The following items are out of scope (while some of them are technically
possible):

View File

@ -5,13 +5,13 @@ Logs
- Every single job run log is stored on object storage
- Each single job log file provides unique URL which can be accessed to see log details
- Each single job log file provides unique URL which can be accessed to see log
details
- These URLs are available on all APIMON levels:
- In Zulip alarm messages
- In Alerta events
- In Grafana Dashboards
- Logs are simple plain text files of the whole playbook output.
- Logs are simple plain text files of the whole playbook output::
2020-07-12 05:54:04.661170 | TASK [List Servers]

View File

@ -2,7 +2,8 @@
Monitoring coverage
===================
Multiple factors define the monitoring coverage to simulate common customer use cases.
Multiple factors define the monitoring coverage to simulate common customer use
cases.
Monitored locations

View File

@ -2,8 +2,17 @@
Notifications
=============
You will see notifications of errors on OTC Zulip #Alerts Stream.
You will see notifications of errors on OTC Zulip:
If the error has been acknowledged on Alerta, the new notification message for repeating error wont get posted again on Zulip.
- #Alerts Stream
- #Alerts-Hybrid Stream
- #Alerts-Preprod Stream
Every stream contains topics based on the service type (if represented by
standalone ansible playbook) and general apimon_endpoint_monitor topic whihc
contains alerts of GET queries towards all services.
If the error has been acknowledged on Alerta, the new notification message for
repeating error wont get posted again on Zulip.
.. image:: training_images/zulip_notifications.png

View File

@ -1,3 +1,9 @@
==============
Test Scenarios
==============
Test Scenarios playbooks are located at
https://github.com/opentelekomcloud-infra/apimon-test. (the location will change
with CloudMon replacement in future).

File diff suppressed because one or more lines are too long

Before

Width:  |  Height:  |  Size: 59 KiB

After

Width:  |  Height:  |  Size: 247 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 96 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 165 KiB