Files
doc-exports/docs/dli/umn/dli_03_0236.html
Su, Xiaomeng 12dd64efc7 dli_umn_20240430
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2024-05-15 11:56:22 +00:00

2.7 KiB

Why Is the Flink Job Abnormal Due to Heartbeat Timeout Between JobManager and TaskManager?

Symptom

JobManager and TaskManager heartbeats timed out. As a result, the Flink job is abnormal.

Figure 1 Error information

Possible Causes

  1. Check whether the network is intermittently disconnected and whether the cluster load is high.
  2. If Full GC occurs frequently, check the code to determine whether memory leakage occurs.
    Figure 2 Full GC

Handling Procedure

  • If Full GC occurs frequently, check the code to determine whether memory leakage occurs.
  • Allocate more resources for a single TaskManager.
  • Contact technical support to modify the cluster heartbeat configuration.