diff --git a/doc/source/internal/apimon_training/alerts.rst b/doc/source/internal/apimon_training/alerts.rst index e2d6c1c..9a23f36 100644 --- a/doc/source/internal/apimon_training/alerts.rst +++ b/doc/source/internal/apimon_training/alerts.rst @@ -1,3 +1,19 @@ ====== Alerts ====== + +https://alerts.eco.tsi-dev.otc-service.com/ + +The authentication is centrally managed by LDAP. + + + - Alerta is a monitoring tool to integrate alerts from multiple sources. + - The alerts from different sources can be consolidated and de-duplicated. + - On ApiMon it is hosted on same instance as Grafana just listening on different port. + - The Zulip API was integrated with Alerta, to send notification of errors/alerts on zulip stream. + - Alerts displayed on OTC Alerta are generated either by Executor or by Grafana. + - “Executor alerts” focus on playbook results, whether playbook has completed or failed. + - “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold. + +.. image:: training_images/alerta_dashboard.png + diff --git a/doc/source/internal/apimon_training/dashboards.rst b/doc/source/internal/apimon_training/dashboards.rst index a82487c..22a4b22 100644 --- a/doc/source/internal/apimon_training/dashboards.rst +++ b/doc/source/internal/apimon_training/dashboards.rst @@ -1,3 +1,22 @@ ===================== Dashboards management ===================== + +https://dashboard.tsi-dev.otc-service.com + +The authentication is centrally managed by LDAP. + + + - The ApiMon Dashboards are segregated based on the type of service. + - The “OTC KPI” dashboard provides high level overview about OTC stability and reliability for management. + - “Endpoint monitoring” dashboard monitors health of every endpoint url listed by endpoint services catalogue. + - “Respective service statistics” dashboards provide more detailed overview. + - Dashboards can be replicated/customized for individual Squad needs. + +.. image:: training_images/dashboards.png + + +OTC KPI Dashboard +================= + +.. image:: training_images/kpi_dashboard.png diff --git a/doc/source/internal/apimon_training/difference_cmo_fmo.rst b/doc/source/internal/apimon_training/difference_cmo_fmo.rst index 906250b..d812bb1 100644 --- a/doc/source/internal/apimon_training/difference_cmo_fmo.rst +++ b/doc/source/internal/apimon_training/difference_cmo_fmo.rst @@ -21,9 +21,7 @@ The most important differences are described in the table below: +-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ | Implementation mode | standalone app | plugin based | +-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Source of information | opentelekomcloud=infra | stackmon | -+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ -| Form of change | Overwrite | Diff | +| Source of information | opentelekomcloud-infra | stackmon | +-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ | Portal | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ | +-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+ diff --git a/doc/source/internal/apimon_training/logs.rst b/doc/source/internal/apimon_training/logs.rst index fe8e86b..1f80919 100644 --- a/doc/source/internal/apimon_training/logs.rst +++ b/doc/source/internal/apimon_training/logs.rst @@ -1,3 +1,57 @@ ==== Logs ==== + + + + - Every single job run log is stored on object storage + - Each single job log file provides unique URL which can be accessed to see log details + - These URLs are available on all APIMON levels: + - In Zulip alarm messages + - In Alerta events + - In Grafana Dashboards + - Logs are simple plain text files of the whole playbook output. + + + 2020-07-12 05:54:04.661170 | TASK [List Servers] + + 2020-07-12 05:54:09.050491 | localhost | ok + + 2020-07-12 05:54:09.067582 | TASK [Create Server in default AZ] + + 2020-07-12 05:54:46.055650 | localhost | MODULE FAILURE: + + 2020-07-12 05:54:46.055873 | localhost | Traceback (most recent call last): + + 2020-07-12 05:54:46.057441 | localhost | + + 2020-07-12 05:54:46.057499 | localhost | During handling of the above exception, another exception occurred: + + 2020-07-12 05:54:46.057535 | localhost | + + … + + 2020-07-12 05:54:46.063992 | localhost | File "/tmp/ansible_os_server_payload_uz1c7_iw/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py", line 500, in _create_server + + 2020-07-12 05:54:46.065152 | localhost | return self._send_request( + + 2020-07-12 05:54:46.065186 | localhost | File "/root/.local/lib/python3.8/site-packages/keystoneauth1/session.py", line 1020, in _send_request + + 2020-07-12 05:54:46.065334 | localhost | raise exceptions.ConnectFailure(msg) + + 2020-07-12 05:54:46.065378 | localhost | keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://ims.eu-de.otctest.t-systems.com/v2/images: ('Connection aborted.', OSError(107, 'Transport endpoint is not connected')) + + 2020-07-12 05:54:46.295035 | + + 2020-07-12 05:54:46.295241 | TASK [Delete server] + + 2020-07-12 05:54:48.481374 | localhost | ok + + 2020-07-12 05:54:48.505761 | + + 2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup] + + 2020-07-12 05:54:50.727174 | localhost | changed + + 2020-07-12 05:54:50.745541 | + diff --git a/doc/source/internal/apimon_training/notifications.rst b/doc/source/internal/apimon_training/notifications.rst index 822d672..e60dd08 100644 --- a/doc/source/internal/apimon_training/notifications.rst +++ b/doc/source/internal/apimon_training/notifications.rst @@ -1,3 +1,9 @@ ============= Notifications ============= + +You will see notifications of errors on OTC Zulip #Alerts Stream. + +If the error has been acknowledged on Alerta, the new notification message for repeating error wont get posted again on Zulip. + +.. image:: training_images/zulip_notifications.png diff --git a/doc/source/internal/apimon_training/training_images/alerta_alerts.png b/doc/source/internal/apimon_training/training_images/alerta_alerts.png new file mode 100644 index 0000000..31516de Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/alerta_alerts.png differ diff --git a/doc/source/internal/apimon_training/training_images/alerta_alerts_detail.png b/doc/source/internal/apimon_training/training_images/alerta_alerts_detail.png new file mode 100644 index 0000000..3f40ac2 Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/alerta_alerts_detail.png differ diff --git a/doc/source/internal/apimon_training/training_images/alerta_dashboard.png b/doc/source/internal/apimon_training/training_images/alerta_dashboard.png new file mode 100644 index 0000000..c255812 Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/alerta_dashboard.png differ diff --git a/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg b/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg new file mode 100644 index 0000000..e0afe35 --- /dev/null +++ b/doc/source/internal/apimon_training/training_images/apimon_data_flow.svg @@ -0,0 +1,4 @@ + + + +

Scheduler


running
8 parallel
threads
Scheduler...
Add next playbook to the queue when thread is free
Add next playbook...
Graphite TSDB



Graphite TSDB...
Fill in playbooks to the queue of threads
Fill in playboo...
Execute ansible playbooks
Execute ansible...
Remove completed playbook from the thread
Remove complete...

Statsd


Collects the
metrics
Statsd...

Executor


Ansible
Executor...
Send metrics to graphite
Send metrics to...
Service Squad
Servic...
If playbook/thread failed raise alert
If playbook/thread...
Store the job logs
to object storage
Store the job logs...
Visualize data
Visualize...
Create Alerts based on Thresholds
Create Alerts...
O/M
O/M

Github


apimon tests
repository
Github...
Pull
repository

Pull...
Management
Manage...
Endless loop
Endless loop

Grafana


Dashboard
Grafana...

Alerta


Dashboard
Alerta...
Send notifications to Zulip
Send notifica...

Zulip


running
6 parallel
streams
Zulip...
Swift

Swift
Postgresql RDB



Postgresql RDB...
Test results
Test resul...
Metrics
Metrics
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/source/internal/apimon_training/training_images/dashboards.png b/doc/source/internal/apimon_training/training_images/dashboards.png new file mode 100644 index 0000000..3237d0a Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/dashboards.png differ diff --git a/doc/source/internal/apimon_training/training_images/kpi_dashboard.png b/doc/source/internal/apimon_training/training_images/kpi_dashboard.png new file mode 100644 index 0000000..e179a98 Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/kpi_dashboard.png differ diff --git a/doc/source/internal/apimon_training/training_images/zulip_notifications.png b/doc/source/internal/apimon_training/training_images/zulip_notifications.png new file mode 100644 index 0000000..024f644 Binary files /dev/null and b/doc/source/internal/apimon_training/training_images/zulip_notifications.png differ diff --git a/doc/source/internal/apimon_training/workflow.rst b/doc/source/internal/apimon_training/workflow.rst index 3700922..1078a6c 100644 --- a/doc/source/internal/apimon_training/workflow.rst +++ b/doc/source/internal/apimon_training/workflow.rst @@ -3,3 +3,7 @@ ApiMon Flow Process =================== + +.. image:: training_images/apimon_data_flow.svg + :target: training_images/apimon_data_flow.svg + :alt: apimon_data_flow