Reviewed-by: Wagner, Fabian <fabian.wagner@t-systems.com> Co-authored-by: Ru, Li Yi <liyiru7@huawei.com> Co-committed-by: Ru, Li Yi <liyiru7@huawei.com>
31 KiB
Supported Events
Source |
Namespace |
Name |
ID |
Severity |
Description |
Handling Suggestion |
Impact |
|---|---|---|---|---|---|---|---|
GaussDB |
SYS.GAUSSDBV5 |
Process status alarm |
ProcessStatusAlarm |
Major |
Key GaussDB processes exit, including CMS/CMA, ETCD, GTM, CN, and DN processes. |
Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. |
If processes on primary nodes are faulty, services are interrupted and then rolled back. If processes on standby nodes are faulty, services are not affected. |
Component status alarm |
ComponentStatusAlarm |
Major |
Key GaussDB components do not respond, including CMA, ETCD, GTM, CN, and DN components. |
Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. |
If processes on primary nodes do not respond, neither do the services. If processes on standby nodes are faulty, services are not affected. |
||
Cluster status alarm |
ClusterStatusAlarm |
Major |
The cluster is abnormal, including the following faults: The cluster is read-only. The majority of ETCD members are faulty. The cluster resources are unevenly distributed. |
Contact SRE engineers. |
If the cluster status is read-only, only read requests are processed. If the majority of ETCD members are faulty, the cluster is unavailable. If resources are unevenly distributed, the instance performance and reliability deteriorate. |
||
Hardware resource alarm |
HardwareResourceAlarm |
Major |
A major hardware fault occurs in the instance, such as disk damage or GTM network fault. |
Contact SRE engineers. |
Some or all services are affected. |
||
Status transition alarm |
StateTransitionAlarm |
Major |
The following events occur in the instance: DN build attempt, DN build failure, forcible DN promotion, primary/standby DN switchover/failover, or primary/standby GTM switchover/failover. |
Wait until the fault is automatically rectified and check whether services are recovered. If no, contact SRE engineers. |
Some services are interrupted. |
||
Other abnormal alarm |
OtherAbnormalAlarm |
Major |
Disk usage threshold alarm |
Monitor workload changes and scale up storage as needed. |
If the used space exceeds the threshold, storage cannot be scaled up. |
||
Instance running status abnormal |
TaurusInstanceRunningStatusAbnormal |
Major |
This event is a key alarm event and is reported when an instance is faulty due to a disaster or a server failure. |
Submit a service ticket. |
The database service may be unavailable. |
||
Instance running status recovered |
TaurusInstanceRunningStatusRecovered |
Major |
GaussDB provides an HA tool to automatically or manually rectify the catastrophic fault. After the fault is rectified, this event is reported. |
No further action is required. |
None |
||
Node status abnormal |
TaurusNodeRunningStatusAbnormal |
Major |
This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure. |
Check whether the database service is available and submit a service ticket. |
The database service may be unavailable. |
||
Node recovered |
TaurusNodeRunningStatusRecovered |
Major |
GaussDB provides an HA tool to automatically or manually rectify the catastrophic fault. After the fault is rectified, this event is reported. |
No further action is required. |
None |
||
Instance creation failure |
GaussDBV5CreateInstanceFailed |
Major |
Instances fail to be created because the quota is insufficient or underlying resources are exhausted. |
Release the instances that are no longer used and try to provision new instances again, or submit a service ticket to adjust the quota. |
Instances fail to be created. |
||
Node adding failure |
GaussDBV5ExpandClusterFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket to ask O&M personnel to coordinate resources, delete the node that failed to be added and add a new one. |
None |
||
Storage scale-up failure |
GaussDBV5EnlargeVolumeFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background and then you scale up the storage space again. |
Services may be interrupted. |
||
Reboot failure |
GaussDBV5RestartInstanceFailed |
Major |
The network is abnormal. |
Retry the reboot operation or submit a service ticket to the O&M personnel. |
The database service may be unavailable. |
||
Full backup failure |
GaussDBV5FullBackupFailed |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to O&M personnel. |
Data cannot be backed up. |
||
Differential backup failure |
GaussDBV5DifferentialBackupFailed |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to O&M personnel. |
Data cannot be backed up. |
||
Backup deletion failure |
GaussDBV5DeleteBackupFailed |
Major |
Backup files fail to be cleared. |
Submit a service ticket to O&M personnel. |
There may be residual OBS files. |
||
EIP binding failure |
GaussDBV5BindEIPFailed |
Major |
The EIP has been used or EIP resources are insufficient. |
Submit a service ticket to O&M personnel. |
The instance cannot be accessed from the Internet. |
||
EIP unbinding failure |
GaussDBV5UnbindEIPFailed |
Major |
The network or the EIP service is faulty. |
Unbind the IP address again or submit a service ticket to the O&M personnel. |
Residual IP resources may be generated. |
||
Parameter template application failure |
GaussDBV5ApplyParamFailed |
Major |
Changing a parameter group times out. |
Change the parameter group again. |
None |
||
Parameter modification failure |
GaussDBV5UpdateInstanceParamGroupFailed |
Major |
Changing a parameter group times out. |
Change the parameter group again. |
None |
||
Backup and restoration failure |
GaussDBV5RestoreFromBcakupFailed |
Major |
The underlying resources are insufficient or backup files fail to be downloaded. |
Submit a service ticket. |
The database service may be unavailable during the restoration failure. |
||
Hot patch installation failure |
GaussDBV5UpgradeHotfixFailed |
Major |
Generally, this fault is caused by an error reported during kernel upgrade. |
View the error information about the workflow and redo or skip the job. |
None |