diff --git a/doc/source/internal/apimon_training/alerts.rst b/doc/source/internal/apimon_training/alerts.rst index 9a23f36..13dfc58 100644 --- a/doc/source/internal/apimon_training/alerts.rst +++ b/doc/source/internal/apimon_training/alerts.rst @@ -9,11 +9,16 @@ The authentication is centrally managed by LDAP. - Alerta is a monitoring tool to integrate alerts from multiple sources. - The alerts from different sources can be consolidated and de-duplicated. - - On ApiMon it is hosted on same instance as Grafana just listening on different port. - - The Zulip API was integrated with Alerta, to send notification of errors/alerts on zulip stream. - - Alerts displayed on OTC Alerta are generated either by Executor or by Grafana. - - “Executor alerts” focus on playbook results, whether playbook has completed or failed. - - “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold. + - On ApiMon it is hosted on same instance as Grafana just listening on + different port. + - The Zulip API was integrated with Alerta, to send notification of + errors/alerts on zulip stream. + - Alerts displayed on OTC Alerta are generated either by Executor or by + Grafana. + - “Executor alerts” focus on playbook results, whether playbook has + completed or failed. + - “Grafana alerts” focus on breaching the defined thresholds. For example + API response time is higher than defined threshold. .. image:: training_images/alerta_dashboard.png diff --git a/doc/source/internal/apimon_training/dashboards.rst b/doc/source/internal/apimon_training/dashboards.rst index 22a4b22..bee7801 100644 --- a/doc/source/internal/apimon_training/dashboards.rst +++ b/doc/source/internal/apimon_training/dashboards.rst @@ -8,8 +8,10 @@ The authentication is centrally managed by LDAP. - The ApiMon Dashboards are segregated based on the type of service. - - The “OTC KPI” dashboard provides high level overview about OTC stability and reliability for management. - - “Endpoint monitoring” dashboard monitors health of every endpoint url listed by endpoint services catalogue. + - The “OTC KPI” dashboard provides high level overview about OTC stability and + reliability for management. + - “Endpoint monitoring” dashboard monitors health of every endpoint url listed + by endpoint services catalogue. - “Respective service statistics” dashboards provide more detailed overview. - Dashboards can be replicated/customized for individual Squad needs. @@ -20,3 +22,17 @@ OTC KPI Dashboard ================= .. image:: training_images/kpi_dashboard.png + +24/7 dasbhoards +=============== + +Endpoint Monitoring Dashboard +============================= + +Common Test Results Dashboard +============================= + +Service Based dashboard +======================= + + diff --git a/doc/source/internal/apimon_training/difference_cmo_fmo.rst b/doc/source/internal/apimon_training/difference_cmo_fmo.rst index d812bb1..bfa691e 100644 --- a/doc/source/internal/apimon_training/difference_cmo_fmo.rst +++ b/doc/source/internal/apimon_training/difference_cmo_fmo.rst @@ -10,21 +10,22 @@ understand what is supported in which mode. The most important differences are described in the table below: -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| **Differences** | **ApiMon (CMO)** | **ApiMon(FMO)** | -+=======================+============================================================================================================+===============================================================+ -| Playbook scenarios | https://github.com/opentelekomcloud-infra/apimon-test | https://github.com/stackmon/apimon-tests/tree/main/playbooks | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Dashboards | https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon | https://github.com/stackmon/apimon-tests/tree/main/dashboards | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Environment setup | https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml | https://github.com/opentelekomcloud-infra/stackmon-config | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Implementation mode | standalone app | plugin based | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Source of information | opentelekomcloud-infra | stackmon | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Portal | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Documentation | https://confluence.tsi-dev.otc-service.com/display/ES/API-Monitoring | https://stackmon.github.io/ | -| | | https://stackmon-cloudmon.readthedocs.io/en/latest/index.html | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +| **Differences** | **ApiMon (CMO)** | **ApiMon(FMO)** | ++=======================+============================================================================================================+==========================================================================+ +| Playbook scenarios | https://github.com/opentelekomcloud-infra/apimon-test | https://github.com/stackmon/apimon-tests/tree/main/playbooks | ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +| Dashboards setup | https://github.com/opentelekomcloud-infra/system-config/tree/main/playbooks/templates/grafana/apimon | https://github.com/stackmon/apimon-tests/tree/main/dashboards | ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +| Environment setup | https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml | https://github.com/opentelekomcloud-infra/stackmon-config | ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +| Implementation mode | standalone app | plugin based | ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +| Source of information | opentelekomcloud-infra | stackmon | ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +| Dashboards | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ | +| | https://dashboard.tsi-dev.otc-service.com/dashboards/f/UaB8meoZk/apimon | https://dashboard.tsi-dev.otc-service.com/dashboards/f/CloudMon/cloudmon | ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ +| Documentation | https://confluence.tsi-dev.otc-service.com/display/ES/API-Monitoring | https://stackmon.github.io/ | +| | | https://stackmon-cloudmon.readthedocs.io/en/latest/index.html | ++-----------------------+------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------+ diff --git a/doc/source/internal/apimon_training/epmon_checks.rst b/doc/source/internal/apimon_training/epmon_checks.rst index 0f80923..96468c3 100644 --- a/doc/source/internal/apimon_training/epmon_checks.rst +++ b/doc/source/internal/apimon_training/epmon_checks.rst @@ -1,3 +1,38 @@ ============================ Endpoint Monitoring overview ============================ + + +EpMon is a standalone python based process targetting every OTC service. Tt +finds service in the service catalogs and sends GET requests to the configured +endpoints. + +Performing extensive tests like provisioning a server is giving a great +coverage, but is usually not something what can be performed very often and +leaves certain gaps on the timescale of monitoring. In order to cover this gap +EpMon component is capable to send GET requests to the given URLs relying on the +API discovery of the OpenStack cloud (perform GET request to /servers or the +compute endpoint). Such requests are cheap and can be performed in the loop i.e. +every 5 seconds. Latency of those calls, as well as the return codes are being +captured and sent to the metrics storage. + + + +Currently EpMon configuration is located in system-config: +https://github.com/opentelekomcloud-infra/system-config/blob/main/inventory/service/group_vars/apimon.yaml +(this will change in future once CloudMon will take place) + +And defines the query HTTP targets for every single OTC service. + +EpMon dashboard provides general availability status of every service definition +from service catalog: + +.. image:: training_images/epmon_status_dashboard.jpg + +Additionally it provides further details for the endpoints like response times, +detected error codes or no responses at all. + +.. image:: training_images/epmon_dashboard_details.jpg + +EpMon findings are also reported to Alerta and notifications are sent to Zulip +dedicated topic "apimon_endpoint_monitoring". \ No newline at end of file diff --git a/doc/source/internal/apimon_training/introduction.rst b/doc/source/internal/apimon_training/introduction.rst index 722e608..d570424 100644 --- a/doc/source/internal/apimon_training/introduction.rst +++ b/doc/source/internal/apimon_training/introduction.rst @@ -49,6 +49,9 @@ ApiMon Architecture Summary - Alerta further sends error notification on Zulip #Alerts Stream. - Log Files are maintained on OTC object storage via swift. +ApiMon features +--------------- + ApiMon comes with the following features: - Support of ansible playbooks for testing scenarios @@ -72,7 +75,9 @@ ApiMon comes with the following features: - Every exectution of ansible playbooks stores the log file for further investigation/analysis on swift -What ApiMon is NOT: + +What ApiMon is NOT +------------------ The following items are out of scope (while some of them are technically possible): diff --git a/doc/source/internal/apimon_training/logs.rst b/doc/source/internal/apimon_training/logs.rst index 1f80919..ccb98bc 100644 --- a/doc/source/internal/apimon_training/logs.rst +++ b/doc/source/internal/apimon_training/logs.rst @@ -5,13 +5,13 @@ Logs - Every single job run log is stored on object storage - - Each single job log file provides unique URL which can be accessed to see log details + - Each single job log file provides unique URL which can be accessed to see log + details - These URLs are available on all APIMON levels: - In Zulip alarm messages - In Alerta events - In Grafana Dashboards - - Logs are simple plain text files of the whole playbook output. - + - Logs are simple plain text files of the whole playbook output:: 2020-07-12 05:54:04.661170 | TASK [List Servers] diff --git a/doc/source/internal/apimon_training/monitoring_coverage.rst b/doc/source/internal/apimon_training/monitoring_coverage.rst index b6194fd..89d0848 100644 --- a/doc/source/internal/apimon_training/monitoring_coverage.rst +++ b/doc/source/internal/apimon_training/monitoring_coverage.rst @@ -2,7 +2,8 @@ Monitoring coverage =================== -Multiple factors define the monitoring coverage to simulate common customer use cases. +Multiple factors define the monitoring coverage to simulate common customer use +cases. Monitored locations diff --git a/doc/source/internal/apimon_training/notifications.rst b/doc/source/internal/apimon_training/notifications.rst index e60dd08..052537a 100644 --- a/doc/source/internal/apimon_training/notifications.rst +++ b/doc/source/internal/apimon_training/notifications.rst @@ -2,8 +2,17 @@ Notifications ============= -You will see notifications of errors on OTC Zulip #Alerts Stream. +You will see notifications of errors on OTC Zulip: -If the error has been acknowledged on Alerta, the new notification message for repeating error wont get posted again on Zulip. + - #Alerts Stream + - #Alerts-Hybrid Stream + - #Alerts-Preprod Stream + +Every stream contains topics based on the service type (if represented by +standalone ansible playbook) and general apimon_endpoint_monitor topic whihc +contains alerts of GET queries towards all services. + +If the error has been acknowledged on Alerta, the new notification message for +repeating error wont get posted again on Zulip. .. image:: training_images/zulip_notifications.png diff --git a/doc/source/internal/apimon_training/test_scenarios.rst b/doc/source/internal/apimon_training/test_scenarios.rst index deb9929..aafde84 100644 --- a/doc/source/internal/apimon_training/test_scenarios.rst +++ b/doc/source/internal/apimon_training/test_scenarios.rst @@ -1,3 +1,9 @@ ============== Test Scenarios ============== + + +Test Scenarios playbooks are located at +https://github.com/opentelekomcloud-infra/apimon-test. (the location will change +with CloudMon replacement in future). + diff --git a/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg b/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg index e0afe35..b1b7b4e 100644 --- a/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg +++ b/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg @@ -1,4 +1,4 @@ -

Scheduler


running
8 parallel
threads
Scheduler...
Add next playbook to the queue when thread is free
Add next playbook...
Graphite TSDB



Graphite TSDB...
Fill in playbooks to the queue of threads
Fill in playboo...
Execute ansible playbooks
Execute ansible...
Remove completed playbook from the thread
Remove complete...

Statsd


Collects the
metrics
Statsd...

Executor


Ansible
Executor...
Send metrics to graphite
Send metrics to...
Service Squad
Servic...
If playbook/thread failed raise alert
If playbook/thread...
Store the job logs
to object storage
Store the job logs...
Visualize data
Visualize...
Create Alerts based on Thresholds
Create Alerts...
O/M
O/M

Github


apimon tests
repository
Github...
Pull
repository

Pull...
Management
Manage...
Endless loop
Endless loop

Grafana


Dashboard
Grafana...

Alerta


Dashboard
Alerta...
Send notifications to Zulip
Send notifica...

Zulip


running
6 parallel
streams
Zulip...
Swift

Swift
Postgresql RDB



Postgresql RDB...
Test results
Test resul...
Metrics
Metrics
Text is not SVG - cannot display
\ No newline at end of file +

Scheduler


running
8 parallel
threads
Scheduler...
Add next playbook to the queue when thread is free
Add next playbook...
Graphite TSDB



Graphite TSDB...
Fill in playbooks to the queue of threads
Fill in playboo...
Execute ansible playbooks
Execute ansible...
Remove completed playbook from the thread
Remove complete...

Statsd


Collects the
metrics
Statsd...

Executor


Ansible
Executor...
Send metrics to graphite
Send metrics to...
Service Squad
Servic...
If playbook/thread failed raise alert
If playbook/thread...
Store the job logs
to object storage
Store the job logs...
Visualize data
Visualize...
Create Alerts based on Thresholds
Create Alerts...
O/M
O/M

Github


apimon tests
repository
Github...
Pull
repository

Pull...
Management
Manage...
Endless loop
Endless loop

Grafana


Dashboard
Grafana...

Alerta


Dashboard
Alerta...
Send notifications to Zulip
Send notifica...

Zulip


running
6 parallel
streams
Zulip...
Swift

Swift
Postgresql RDB



Postgresql RDB...
Test results
Test resul...
Metrics
Metrics
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/source/internal/apimon_training/training_images/epmon_dashboard_details.jpg b/doc/source/internal/apimon_training/training_images/epmon_dashboard_details.jpg new file mode 100644 index 0000000..9b61729 Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/epmon_dashboard_details.jpg differ diff --git a/doc/source/internal/apimon_training/training_images/epmon_status_dashboard.jpg b/doc/source/internal/apimon_training/training_images/epmon_status_dashboard.jpg new file mode 100644 index 0000000..414b40a Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/epmon_status_dashboard.jpg differ