adding additional content to apimon training

This commit is contained in:
Hasko, Vladimir 2023-05-12 23:08:43 +00:00
parent 3e2bc53f7e
commit 99edfb302a
13 changed files with 104 additions and 3 deletions

View File

@ -1,3 +1,19 @@
======
Alerts
======
https://alerts.eco.tsi-dev.otc-service.com/
The authentication is centrally managed by LDAP.
- Alerta is a monitoring tool to integrate alerts from multiple sources.
- The alerts from different sources can be consolidated and de-duplicated.
- On ApiMon it is hosted on same instance as Grafana just listening on different port.
- The Zulip API was integrated with Alerta, to send notification of errors/alerts on zulip stream.
- Alerts displayed on OTC Alerta are generated either by Executor or by Grafana.
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
.. image:: training_images/alerta_dashboard.png

View File

@ -1,3 +1,22 @@
=====================
Dashboards management
=====================
https://dashboard.tsi-dev.otc-service.com
The authentication is centrally managed by LDAP.
- The ApiMon Dashboards are segregated based on the type of service.
- The “OTC KPI” dashboard provides high level overview about OTC stability and reliability for management.
- “Endpoint monitoring” dashboard monitors health of every endpoint url listed by endpoint services catalogue.
- “Respective service statistics” dashboards provide more detailed overview.
- Dashboards can be replicated/customized for individual Squad needs.
.. image:: training_images/dashboards.png
OTC KPI Dashboard
=================
.. image:: training_images/kpi_dashboard.png

View File

@ -21,9 +21,7 @@ The most important differences are described in the table below:
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
| Implementation mode | standalone app | plugin based |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
| Source of information | opentelekomcloud=infra | stackmon |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
| Form of change | Overwrite | Diff |
| Source of information | opentelekomcloud-infra | stackmon |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
| Portal | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ |
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+

View File

@ -1,3 +1,57 @@
====
Logs
====
- Every single job run log is stored on object storage
- Each single job log file provides unique URL which can be accessed to see log details
- These URLs are available on all APIMON levels:
- In Zulip alarm messages
- In Alerta events
- In Grafana Dashboards
- Logs are simple plain text files of the whole playbook output.
2020-07-12 05:54:04.661170 | TASK [List Servers]
2020-07-12 05:54:09.050491 | localhost | ok
2020-07-12 05:54:09.067582 | TASK [Create Server in default AZ]
2020-07-12 05:54:46.055650 | localhost | MODULE FAILURE:
2020-07-12 05:54:46.055873 | localhost | Traceback (most recent call last):
2020-07-12 05:54:46.057441 | localhost |
2020-07-12 05:54:46.057499 | localhost | During handling of the above exception, another exception occurred:
2020-07-12 05:54:46.057535 | localhost |
2020-07-12 05:54:46.063992 | localhost | File "/tmp/ansible_os_server_payload_uz1c7_iw/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py", line 500, in _create_server
2020-07-12 05:54:46.065152 | localhost | return self._send_request(
2020-07-12 05:54:46.065186 | localhost | File "/root/.local/lib/python3.8/site-packages/keystoneauth1/session.py", line 1020, in _send_request
2020-07-12 05:54:46.065334 | localhost | raise exceptions.ConnectFailure(msg)
2020-07-12 05:54:46.065378 | localhost | keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://ims.eu-de.otctest.t-systems.com/v2/images: ('Connection aborted.', OSError(107, 'Transport endpoint is not connected'))
2020-07-12 05:54:46.295035 |
2020-07-12 05:54:46.295241 | TASK [Delete server]
2020-07-12 05:54:48.481374 | localhost | ok
2020-07-12 05:54:48.505761 |
2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
2020-07-12 05:54:50.727174 | localhost | changed
2020-07-12 05:54:50.745541 |

View File

@ -1,3 +1,9 @@
=============
Notifications
=============
You will see notifications of errors on OTC Zulip #Alerts Stream.
If the error has been acknowledged on Alerta, the new notification message for repeating error wont get posted again on Zulip.
.. image:: training_images/zulip_notifications.png

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 109 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 101 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

View File

@ -3,3 +3,7 @@
ApiMon Flow Process
===================
.. image:: training_images/apimon_data_flow.svg
:target: training_images/apimon_data_flow.svg
:alt: apimon_data_flow