forked from docs/docsportal
updating EpMon and Test Cases
This commit is contained in:
parent
fc2f6fd929
commit
8537442f8f
@ -9,11 +9,16 @@ The authentication is centrally managed by LDAP.
|
||||
|
||||
- Alerta is a monitoring tool to integrate alerts from multiple sources.
|
||||
- The alerts from different sources can be consolidated and de-duplicated.
|
||||
- On ApiMon it is hosted on same instance as Grafana just listening on different port.
|
||||
- The Zulip API was integrated with Alerta, to send notification of errors/alerts on zulip stream.
|
||||
- Alerts displayed on OTC Alerta are generated either by Executor or by Grafana.
|
||||
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
|
||||
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
|
||||
- On ApiMon it is hosted on same instance as Grafana just listening on
|
||||
different port.
|
||||
- The Zulip API was integrated with Alerta, to send notification of
|
||||
errors/alerts on zulip stream.
|
||||
- Alerts displayed on OTC Alerta are generated either by Executor or by
|
||||
Grafana.
|
||||
- “Executor alerts” focus on playbook results, whether playbook has
|
||||
completed or failed.
|
||||
- “Grafana alerts” focus on breaching the defined thresholds. For example
|
||||
API response time is higher than defined threshold.
|
||||
|
||||
.. image:: training_images/alerta_dashboard.png
|
||||
|
||||
|
@ -8,8 +8,10 @@ The authentication is centrally managed by LDAP.
|
||||
|
||||
|
||||
- The ApiMon Dashboards are segregated based on the type of service.
|
||||
- The “OTC KPI” dashboard provides high level overview about OTC stability and reliability for management.
|
||||
- “Endpoint monitoring” dashboard monitors health of every endpoint url listed by endpoint services catalogue.
|
||||
- The “OTC KPI” dashboard provides high level overview about OTC stability and
|
||||
reliability for management.
|
||||
- “Endpoint monitoring” dashboard monitors health of every endpoint url listed
|
||||
by endpoint services catalogue.
|
||||
- “Respective service statistics” dashboards provide more detailed overview.
|
||||
- Dashboards can be replicated/customized for individual Squad needs.
|
||||
|
||||
@ -20,3 +22,17 @@ OTC KPI Dashboard
|
||||
=================
|
||||
|
||||
.. image:: training_images/kpi_dashboard.png
|
||||
|
||||
24/7 dasbhoards
|
||||
===============
|
||||
|
||||
Endpoint Monitoring Dashboard
|
||||
=============================
|
||||
|
||||
Common Test Results Dashboard
|
||||
=============================
|
||||
|
||||
Service Based dashboard
|
||||
=======================
|
||||
|
||||
|
||||
|
@ -10,21 +10,22 @@ understand what is supported in which mode.
|
||||
|
||||
The most important differences are described in the table below:
|
||||
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
| **Differences** | **ApiMon (CMO)** | **ApiMon(FMO)** |
|
||||
+=======================+============================================================================================================+===============================================================+
|
||||
+=======================+============================================================================================================+==========================================================================+
|
||||
| Playbook scenarios | https://github.com/opentelekomcloud-infra/apimon-test | https://github.com/stackmon/apimon-tests/tree/main/playbooks |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
| Dashboards | https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon | https://github.com/stackmon/apimon-tests/tree/main/dashboards |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
| Dashboards setup | https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon | https://github.com/stackmon/apimon-tests/tree/main/dashboards |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
| Environment setup | https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml | https://github.com/opentelekomcloud-infra/stackmon-config |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
| Implementation mode | standalone app | plugin based |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
| Source of information | opentelekomcloud-infra | stackmon |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
| Portal | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
| Dashboards | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ |
|
||||
| | https://dashboard.tsi-dev.otc-service.com/dashboards/f/UaB8meoZk/apimon | https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
| Documentation | https://confluence.tsi-dev.otc-service.com/display/ES/API-Monitoring | https://stackmon.github.io/ |
|
||||
| | | https://stackmon-cloudmon.readthedocs.io/en/latest/index.html |
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||
+-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+
|
||||
|
@ -1,3 +1,38 @@
|
||||
============================
|
||||
Endpoint Monitoring overview
|
||||
============================
|
||||
|
||||
|
||||
EpMon is a standalone python based process targetting every OTC service. Tt
|
||||
finds service in the service catalogs and sends GET requests to the configured
|
||||
endpoints.
|
||||
|
||||
Performing extensive tests like provisioning a server is giving a great
|
||||
coverage, but is usually not something what can be performed very often and
|
||||
leaves certain gaps on the timescale of monitoring. In order to cover this gap
|
||||
EpMon component is capable to send GET requests to the given URLs relying on the
|
||||
API discovery of the OpenStack cloud (perform GET request to /servers or the
|
||||
compute endpoint). Such requests are cheap and can be performed in the loop i.e.
|
||||
every 5 seconds. Latency of those calls, as well as the return codes are being
|
||||
captured and sent to the metrics storage.
|
||||
|
||||
|
||||
|
||||
Currently EpMon configuration is located in system-config:
|
||||
https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml
|
||||
(this will change in future once CloudMon will take place)
|
||||
|
||||
And defines the query HTTP targets for every single OTC service.
|
||||
|
||||
EpMon dashboard provides general availability status of every service definition
|
||||
from service catalog:
|
||||
|
||||
.. image:: training_images/epmon_status_dashboard.jpg
|
||||
|
||||
Additionally it provides further details for the endpoints like response times,
|
||||
detected error codes or no responses at all.
|
||||
|
||||
.. image:: training_images/epmon_dashboard_details.jpg
|
||||
|
||||
EpMon findings are also reported to Alerta and notifications are sent to Zulip
|
||||
dedicated topic "apimon_endpoint_monitoring".
|
@ -49,6 +49,9 @@ ApiMon Architecture Summary
|
||||
- Alerta further sends error notification on Zulip #Alerts Stream.
|
||||
- Log Files are maintained on OTC object storage via swift.
|
||||
|
||||
ApiMon features
|
||||
---------------
|
||||
|
||||
ApiMon comes with the following features:
|
||||
|
||||
- Support of ansible playbooks for testing scenarios
|
||||
@ -72,7 +75,9 @@ ApiMon comes with the following features:
|
||||
- Every exectution of ansible playbooks stores the log file for further
|
||||
investigation/analysis on swift
|
||||
|
||||
What ApiMon is NOT:
|
||||
|
||||
What ApiMon is NOT
|
||||
------------------
|
||||
|
||||
The following items are out of scope (while some of them are technically
|
||||
possible):
|
||||
|
@ -5,13 +5,13 @@ Logs
|
||||
|
||||
|
||||
- Every single job run log is stored on object storage
|
||||
- Each single job log file provides unique URL which can be accessed to see log details
|
||||
- Each single job log file provides unique URL which can be accessed to see log
|
||||
details
|
||||
- These URLs are available on all APIMON levels:
|
||||
- In Zulip alarm messages
|
||||
- In Alerta events
|
||||
- In Grafana Dashboards
|
||||
- Logs are simple plain text files of the whole playbook output.
|
||||
|
||||
- Logs are simple plain text files of the whole playbook output::
|
||||
|
||||
2020-07-12 05:54:04.661170 | TASK [List Servers]
|
||||
|
||||
|
@ -2,7 +2,8 @@
|
||||
Monitoring coverage
|
||||
===================
|
||||
|
||||
Multiple factors define the monitoring coverage to simulate common customer use cases.
|
||||
Multiple factors define the monitoring coverage to simulate common customer use
|
||||
cases.
|
||||
|
||||
|
||||
Monitored locations
|
||||
|
@ -2,8 +2,17 @@
|
||||
Notifications
|
||||
=============
|
||||
|
||||
You will see notifications of errors on OTC Zulip #Alerts Stream.
|
||||
You will see notifications of errors on OTC Zulip:
|
||||
|
||||
If the error has been acknowledged on Alerta, the new notification message for repeating error wont get posted again on Zulip.
|
||||
- #Alerts Stream
|
||||
- #Alerts-Hybrid Stream
|
||||
- #Alerts-Preprod Stream
|
||||
|
||||
Every stream contains topics based on the service type (if represented by
|
||||
standalone ansible playbook) and general apimon_endpoint_monitor topic whihc
|
||||
contains alerts of GET queries towards all services.
|
||||
|
||||
If the error has been acknowledged on Alerta, the new notification message for
|
||||
repeating error wont get posted again on Zulip.
|
||||
|
||||
.. image:: training_images/zulip_notifications.png
|
||||
|
@ -1,3 +1,9 @@
|
||||
==============
|
||||
Test Scenarios
|
||||
==============
|
||||
|
||||
|
||||
Test Scenarios playbooks are located at
|
||||
https://github.com/opentelekomcloud-infra/apimon-test. (the location will change
|
||||
with CloudMon replacement in future).
|
||||
|
||||
|
File diff suppressed because one or more lines are too long
Before Width: | Height: | Size: 59 KiB After Width: | Height: | Size: 247 KiB |
Binary file not shown.
After Width: | Height: | Size: 96 KiB |
Binary file not shown.
After Width: | Height: | Size: 165 KiB |
Loading…
x
Reference in New Issue
Block a user