Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
11 KiB
ALM-45637 FlinkServer Task Is Continuously Under Back Pressure
This section applies to MRS 3.1.2-LTS.6 or later.
Description
The system checks the back pressure duration of FlinkServer tasks based on the configured alarm checking interval. This alarm is generated when the back pressure duration of a FlinkServer task reaches the configured threshold. This alarm is cleared when the task back pressure is recovered or the job is successfully restarted.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
45637 |
Minor |
Yes |
Parameters
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
JobName |
Specifies the job for which the alarm is generated. |
Impact on the System
This alarm has no impact on the system.
Possible Causes
You can view the causes in the specific logs.
Procedure
- Log in to Manager as a user who has the FlinkServer management permission.
- Choose Cluster > Services > Yarn and click the link next to ResourceManager WebUI to go to the Yarn page.
- Locate the failed job based on its name displayed in Location, search for and record the application ID of the failed job, and check whether the job logs are available on the Yarn page.
If yes, go to 4.
If no, go to 6.
- Click the application ID of the failed job to go to the job page.
- Click Logs in the Logs column to view JobManager logs.
- Click the ID in the Attempt ID column and click Logs in the Logs column to view TaskManager logs.
- View the logs of the failed job to rectify the fault, or contact the O&M personnel personnel and send the collected fault logs. No further action is required.
If logs are unavailable on the Yarn page, download logs from HDFS.
- On Manager, choose Cluster > Services > HDFS, click the link next to NameNode WebUI to go to the HDFS page, select Utilities > Browse the file system, and download logs in the /tmp/logs/User name/logs/Application ID of the failed job directory.
- View the logs of the failed job to rectify the fault, or contact the O&M personnel personnel and send the collected fault logs.
Alarm Clearing
This alarm is cleared when FlinkServer task back pressure is recovered or the job is successfully restarted.
Related Information
None