Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Reviewed-by: Rechenburg, Matthias <matthias.rechenburg@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
24 KiB
ALM-45000 HetuEngine Service Unavailable
Description
The system checks the HetuEngine service status every 300 seconds. This alarm is generated when the HetuEngine service is unavailable.
This alarm is cleared when the HetuEngine service recovers.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
45000 |
Critical |
Yes |
Parameters
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Impact on the System
HetuEngine tasks fail to execute.
Possible Causes
- The KrbServer service is abnormal.
- The ZooKeeper service is abnormal.
- The HDFS service is abnormal.
- The Yarn service is abnormal.
- The DBService service is abnormal.
- The Hive service is abnormal.
- Thre are no HSBroker instances in HetuEngine.
Procedure
Check the KrbServer service status.
- On MRS Manager, choose O&M > Alarm > Alarm.
- In the alarm list, check whether the "ALM-25500 KrbServer Service Unavailable" alarm is generated.
- Clear "ALM-25500 KrbServer Service Unavailable" according to the alarm help.
- In the alarm list, check whether the alarm "ALM-45000 HetuEngine Service Unavailable" is cleared.
- If yes, no further action is required.
- If no, go to 5.
Check the ZooKeeper service status.
- In the alarm list, check whether the alarm "ALM-12007 Process Fault" is generated.
- In the alarm list, click
in the row that contains the "Process Fault" alarm. Check whether the name of the service for which the alarm is generated is ZooKeeper in Location Information.
- Clear "ALM-12007 Process Fault" according to the alarm help.
- In the alarm list, check whether the alarm "ALM-45000 HetuEngine Service Unavailable" is cleared.
- If yes, no further action is required.
- If no, go to 9.
Check the HDFS service status.
- In the alarm list, check whether the "ALM-14000 HDFS Service Unavailable" alarm is generated.
- Clear "ALM-14000 HDFS Service Unavailable" according to the alarm help.
- In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.
- If yes, no further action is required.
- If no, go to 12.
Check the YARN service status.
- In the alarm list, check whether the "ALM-18000 YARN Service Unavailable" alarm is generated.
- Clear "ALM-18000 YARN Service Unavailable" according to the alarm help.
- In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.
- If yes, no further action is required.
- If no, go to 15.
Check the DBService service status.
- In the alarm list, check whether the "ALM-27001 DBService Service Unavailable" alarm is generated.
- Clear "ALM-27001 DBService Service Unavailable" according to the alarm help.
- In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.
- If yes, no further action is required.
- If no, go to 20.
Check the Hive service status.
- In the alarm list, check whether the "ALM-16004 Hive Service Unavailable" alarm is generated.
- Clear "ALM-16004 Hive Service Unavailable" according to the alarm help.
- In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.
- If yes, no further action is required.
- If no, go to 20.
Check whether there are no HSBroker instances in HetuEngine.
- On MRS Manager, choose Cluster > Name of the desired cluster > Services > HetuEngine. On the page that is displayed, click the Instance tab.
- Check whether there are no HSBroker instances.
- If yes, click Add Instance to add one.
- If no, go to 23.
- In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.
- If yes, no further action is required.
- If no, go to 23.
Check the network connection between HetuEngine and ZooKeeper, HDFS, YARN, DBService, and Hive.
- On MRS Manager, choose Cluster > Name of the desired cluster > Services > HetuEngine. On the page that is displayed, click the Instance tab.
- Click the host name in the HSBroker row and record the management IP address in the Basic Information area.
- Log in to the host where HSBroker resides as user omm using the IP address obtained in 25.
- Run the ping command to check whether the network connection between the host where HSBroker resides and the hosts where ZooKeeper, HDFS, Yarn, DBService, and Hive reside is in the normal state.
- Contact the network administrator to restore the network.
- In the alarm list, check whether the "ALM-45000 HetuEngine Service Unavailable" alarm is cleared.
- If yes, no further action is required.
- If no, go to 30.
Collect fault information.
- On MRS Manager, choose O&M > Log > Download.
- Expand the Service drop-down list. In the Services dialog box that is displayed, select HetuEngine under the target cluster name, and click OK.
- Expand the Hosts drop-down list. In the Select Host dialog box that is displayed, select the hosts to which the role belongs, and click OK.
- Click
in the upper right corner, and set Start Date and End Date for log collection to 30 minutes ahead of and after the alarm generation time respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearing
After the fault is rectified, the system automatically clears this alarm.
Reference
None