adding additional content to apimon training
@ -1,3 +1,19 @@
|
|||||||
======
|
======
|
||||||
Alerts
|
Alerts
|
||||||
======
|
======
|
||||||
|
|
||||||
|
https://alerts.eco.tsi-dev.otc-service.com/
|
||||||
|
|
||||||
|
The authentication is centrally managed by LDAP.
|
||||||
|
|
||||||
|
|
||||||
|
- Alerta is a monitoring tool to integrate alerts from multiple sources.
|
||||||
|
- The alerts from different sources can be consolidated and de-duplicated.
|
||||||
|
- On ApiMon it is hosted on same instance as Grafana just listening on different port.
|
||||||
|
- The Zulip API was integrated with Alerta, to send notification of errors/alerts on zulip stream.
|
||||||
|
- Alerts displayed on OTC Alerta are generated either by Executor or by Grafana.
|
||||||
|
- “Executor alerts” focus on playbook results, whether playbook has completed or failed.
|
||||||
|
- “Grafana alerts” focus on breaching the defined thresholds. For example API response time is higher than defined threshold.
|
||||||
|
|
||||||
|
.. image:: training_images/alerta_dashboard.png
|
||||||
|
|
||||||
|
@ -1,3 +1,22 @@
|
|||||||
=====================
|
=====================
|
||||||
Dashboards management
|
Dashboards management
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
|
https://dashboard.tsi-dev.otc-service.com
|
||||||
|
|
||||||
|
The authentication is centrally managed by LDAP.
|
||||||
|
|
||||||
|
|
||||||
|
- The ApiMon Dashboards are segregated based on the type of service.
|
||||||
|
- The “OTC KPI” dashboard provides high level overview about OTC stability and reliability for management.
|
||||||
|
- “Endpoint monitoring” dashboard monitors health of every endpoint url listed by endpoint services catalogue.
|
||||||
|
- “Respective service statistics” dashboards provide more detailed overview.
|
||||||
|
- Dashboards can be replicated/customized for individual Squad needs.
|
||||||
|
|
||||||
|
.. image:: training_images/dashboards.png
|
||||||
|
|
||||||
|
|
||||||
|
OTC KPI Dashboard
|
||||||
|
=================
|
||||||
|
|
||||||
|
.. image:: training_images/kpi_dashboard.png
|
||||||
|
@ -21,9 +21,7 @@ The most important differences are described in the table below:
|
|||||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||||
| Implementation mode | standalone app | plugin based |
|
| Implementation mode | standalone app | plugin based |
|
||||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||||
| Source of information | opentelekomcloud=infra | stackmon |
|
| Source of information | opentelekomcloud-infra | stackmon |
|
||||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
|
||||||
| Form of change | Overwrite | Diff |
|
|
||||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||||
| Portal | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ |
|
| Portal | https://dashboard.tsi-dev.otc-service.com/ | https://dashboard.tsi-dev.otc-service.com/ |
|
||||||
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
+-----------------------+------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------+
|
||||||
|
@ -1,3 +1,57 @@
|
|||||||
====
|
====
|
||||||
Logs
|
Logs
|
||||||
====
|
====
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
- Every single job run log is stored on object storage
|
||||||
|
- Each single job log file provides unique URL which can be accessed to see log details
|
||||||
|
- These URLs are available on all APIMON levels:
|
||||||
|
- In Zulip alarm messages
|
||||||
|
- In Alerta events
|
||||||
|
- In Grafana Dashboards
|
||||||
|
- Logs are simple plain text files of the whole playbook output.
|
||||||
|
|
||||||
|
|
||||||
|
2020-07-12 05:54:04.661170 | TASK [List Servers]
|
||||||
|
|
||||||
|
2020-07-12 05:54:09.050491 | localhost | ok
|
||||||
|
|
||||||
|
2020-07-12 05:54:09.067582 | TASK [Create Server in default AZ]
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.055650 | localhost | MODULE FAILURE:
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.055873 | localhost | Traceback (most recent call last):
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.057441 | localhost |
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.057499 | localhost | During handling of the above exception, another exception occurred:
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.057535 | localhost |
|
||||||
|
|
||||||
|
…
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.063992 | localhost | File "/tmp/ansible_os_server_payload_uz1c7_iw/ansible_os_server_payload.zip/ansible/modules/cloud/openstack/os_server.py", line 500, in _create_server
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.065152 | localhost | return self._send_request(
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.065186 | localhost | File "/root/.local/lib/python3.8/site-packages/keystoneauth1/session.py", line 1020, in _send_request
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.065334 | localhost | raise exceptions.ConnectFailure(msg)
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.065378 | localhost | keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://ims.eu-de.otctest.t-systems.com/v2/images: ('Connection aborted.', OSError(107, 'Transport endpoint is not connected'))
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.295035 |
|
||||||
|
|
||||||
|
2020-07-12 05:54:46.295241 | TASK [Delete server]
|
||||||
|
|
||||||
|
2020-07-12 05:54:48.481374 | localhost | ok
|
||||||
|
|
||||||
|
2020-07-12 05:54:48.505761 |
|
||||||
|
|
||||||
|
2020-07-12 05:54:48.505906 | TASK [Delete SecurityGroup]
|
||||||
|
|
||||||
|
2020-07-12 05:54:50.727174 | localhost | changed
|
||||||
|
|
||||||
|
2020-07-12 05:54:50.745541 |
|
||||||
|
|
||||||
|
@ -1,3 +1,9 @@
|
|||||||
=============
|
=============
|
||||||
Notifications
|
Notifications
|
||||||
=============
|
=============
|
||||||
|
|
||||||
|
You will see notifications of errors on OTC Zulip #Alerts Stream.
|
||||||
|
|
||||||
|
If the error has been acknowledged on Alerta, the new notification message for repeating error wont get posted again on Zulip.
|
||||||
|
|
||||||
|
.. image:: training_images/zulip_notifications.png
|
||||||
|
After Width: | Height: | Size: 42 KiB |
After Width: | Height: | Size: 109 KiB |
After Width: | Height: | Size: 56 KiB |
After Width: | Height: | Size: 59 KiB |
After Width: | Height: | Size: 33 KiB |
After Width: | Height: | Size: 101 KiB |
After Width: | Height: | Size: 66 KiB |
@ -3,3 +3,7 @@
|
|||||||
ApiMon Flow Process
|
ApiMon Flow Process
|
||||||
===================
|
===================
|
||||||
|
|
||||||
|
|
||||||
|
.. image:: training_images/apimon_data_flow.svg
|
||||||
|
:target: training_images/apimon_data_flow.svg
|
||||||
|
:alt: apimon_data_flow
|
||||||
|