forked from docs/doc-exports
Reviewed-by: Hasko, Vladimir <vladimir.hasko@t-systems.com> Co-authored-by: Yang, Tong <yangtong2@huawei.com> Co-committed-by: Yang, Tong <yangtong2@huawei.com>
101 lines
15 KiB
HTML
101 lines
15 KiB
HTML
<a name="ALM-24005"></a><a name="ALM-24005"></a>
|
|
|
|
<h1 class="topictitle1">ALM-24005 Exception Occurs When Flume Transmits Data</h1>
|
|
<div id="body45121442"><div class="section" id="ALM-24005__section6563861"><h4 class="sectiontitle">Description</h4><p id="ALM-24005__p30958160">The alarm module monitors the capacity status of Flume Channel. The alarm is generated immediately when the duration that Channel is fully occupied exceeds the threshold or the number of times that Source fails to send data to Channel exceeds the threshold.</p>
|
|
<p id="ALM-24005__p10187984">The default threshold is <strong id="ALM-24005__b1552511595325">10</strong>. You can change the threshold by modifying the <strong id="ALM-24005__b81675537178">channelfullcount</strong> parameter of the related channel in the <strong id="ALM-24005__b8824909187">properties.properties</strong> configuration file in the <strong id="ALM-24005__b887734162510">conf</strong> directory.</p>
|
|
<p id="ALM-24005__p24582998">The alarm is cleared when the space of Flume Channel is released and the alarm handling is complete.</p>
|
|
</div>
|
|
<div class="section" id="ALM-24005__section59074751"><h4 class="sectiontitle">Attribute</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-24005__table45065821" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-24005__row13802373"><th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.1"><p id="ALM-24005__p44250398">Alarm ID</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.2"><p id="ALM-24005__p27512447">Alarm Severity</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="33.33333333333333%" id="mcps1.3.2.2.1.4.1.3"><p id="ALM-24005__p13915729">Auto Clear</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-24005__row53432251"><td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.1 "><p id="ALM-24005__p33045094">24005</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.2 "><p id="ALM-24005__p59406970">Major</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="33.33333333333333%" headers="mcps1.3.2.2.1.4.1.3 "><p id="ALM-24005__p47235229">Yes</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-24005__section61910715"><h4 class="sectiontitle">Parameters</h4>
|
|
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="ALM-24005__table848378" frame="border" border="1" rules="all"><thead align="left"><tr id="ALM-24005__row32681314"><th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.1"><p id="ALM-24005__p29940743">Name</p>
|
|
</th>
|
|
<th align="left" class="cellrowborder" valign="top" width="50%" id="mcps1.3.3.2.1.3.1.2"><p id="ALM-24005__p9281096">Meaning</p>
|
|
</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody><tr id="ALM-24005__row25016533157"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-24005__p17935380415">Source</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-24005__p187931338134115">Specifies the cluster for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-24005__row13571343"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-24005__p25536961">ServiceName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-24005__p55227976">Specifies the service for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-24005__row27289743"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-24005__p62985584">HostName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-24005__p1558681">Specifies the host for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-24005__row390817108323"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-24005__p122238832019">AgentId</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-24005__p5908181018320">Specifies the ID of the agent for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-24005__row14028137"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-24005__p1259352420200">ComponentType</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-24005__p32356376">Specifies the type of the component for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
<tr id="ALM-24005__row22771929"><td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.1 "><p id="ALM-24005__p29641231152015">ComponentName</p>
|
|
</td>
|
|
<td class="cellrowborder" valign="top" width="50%" headers="mcps1.3.3.2.1.3.1.2 "><p id="ALM-24005__p22295840">Specifies the component for which the alarm is generated.</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
</div>
|
|
</div>
|
|
<div class="section" id="ALM-24005__section20325531"><h4 class="sectiontitle">Impact on the System</h4><p id="ALM-24005__p61132612">If the disk usage of Flume Channel increases continuously, the time required for importing data to a specified destination prolongs. When the disk usage of Flume Channel reaches 100%, the Flume agent process pauses.</p>
|
|
</div>
|
|
<div class="section" id="ALM-24005__section48712055"><h4 class="sectiontitle">Possible Causes</h4><ul id="ALM-24005__ul52794569"><li id="ALM-24005__li5389079">Flume Sink is faulty, so the data cannot be sent.</li><li id="ALM-24005__li48501719">The network is faulty, so the data cannot be sent.</li></ul>
|
|
</div>
|
|
<div class="section" id="ALM-24005__section35755311"><h4 class="sectiontitle">Procedure</h4><p class="tableheading" id="ALM-24005__p36325141"><strong id="ALM-24005__b337865809433">Check whether Flume Sink is faulty.</strong></p>
|
|
<ol id="ALM-24005__ol534172111431"><li id="ALM-24005__li1415818911148"><span>Open the <strong id="ALM-24005__b43793138111">properties.properties</strong> configuration file on the local PC, search for <strong id="ALM-24005__b3380413121118">type = hdfs</strong> in the file, and check whether the Flume sink type is HDFS.</span><p><ul class="subitemlist" id="ALM-24005__ul3885583411148"><li id="ALM-24005__li5786450211148">If yes, go to <a href="#ALM-24005__li893062611148">2</a>.</li><li id="ALM-24005__li5651309811148">If no, go to <a href="#ALM-24005__li2804053511148">3</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li893062611148"><a name="ALM-24005__li893062611148"></a><a name="li893062611148"></a><span>On FusionInsight Manager, check whether <strong id="ALM-24005__b886123213315">HDFS Service Unavailable</strong> alarm is generated in the alarm list and whether the HDFS service is stopped in the service list.</span><p><ul class="subitemlist" id="ALM-24005__ul3827499411148"><li id="ALM-24005__li1321651911148">If the alarm is reported, clear it according to the handling suggestions of ALM-14000 HDFS Service Unavailable; if the HDFS service is stopped, start it. Then, go to <a href="#ALM-24005__li5165783111148">7</a>.</li><li id="ALM-24005__li6390510011148">If no, go to <a href="#ALM-24005__li5165783111148">7</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li2804053511148"><a name="ALM-24005__li2804053511148"></a><a name="li2804053511148"></a><span>Open the <strong id="ALM-24005__b331858123">properties.properties</strong> configuration file on the local PC, search for <strong id="ALM-24005__b23105171212">type = hbase</strong> in the file, and check whether the Flume sink type is HBase.</span><p><ul class="subitemlist" id="ALM-24005__ul311561511148"><li id="ALM-24005__li5229211211148">If yes, go to <a href="#ALM-24005__li5423421711148">4</a>.</li><li id="ALM-24005__li780271911148">If no, go to <a href="#ALM-24005__li3655261711148">5</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li5423421711148"><a name="ALM-24005__li5423421711148"></a><a name="li5423421711148"></a><span>On FusionInsight Manager, check whether <strong id="ALM-24005__b1312964511312">HBase Service Unavailable</strong> alarm is generated in the alarm list and whether the HBase service is stopped in the service list.</span><p><ul class="subitemlist" id="ALM-24005__ul1348256411148"><li id="ALM-24005__li2855587111148">If the alarm is reported, clear it according to the handling suggestions of ALM-19000 HBase Service Unavailable; if the HBase service is stopped, start it. Then, go to <a href="#ALM-24005__li5165783111148">7</a>.</li><li id="ALM-24005__li3132422411148">If no, go to <a href="#ALM-24005__li5165783111148">7</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li3655261711148"><a name="ALM-24005__li3655261711148"></a><a name="li3655261711148"></a><span>Open the <strong id="ALM-24005__b664134713120">properties.properties</strong> configuration file on the local PC, search for <strong id="ALM-24005__b96512476124">org.apache.flume.sink.kafka.KafkaSink</strong> in the file, and check whether the Flume sink type is Kafka.</span><p><ul class="subitemlist" id="ALM-24005__ul4134410411148"><li id="ALM-24005__li3089549111148">If yes, go to <a href="#ALM-24005__li5047900111148">6</a>.</li><li id="ALM-24005__li1950687011148">If no, go to <a href="#ALM-24005__li3789323111148">9</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li5047900111148"><a name="ALM-24005__li5047900111148"></a><a name="li5047900111148"></a><span>On FusionInsight Manager, check whether <strong id="ALM-24005__b1191315713111">Kafka Service Unavailable</strong> alarm is generated in the alarm list and whether the Kafka service is stopped in the service list.</span><p><ul class="subitemlist" id="ALM-24005__ul3543493911148"><li id="ALM-24005__li4175000011148">If the alarm is reported, clear it according to the handling suggestions of ALM-38000 Kafka Service Unavailable; if the Kafka service is stopped, start it. Then, go to <a href="#ALM-24005__li5165783111148">7</a>.</li><li id="ALM-24005__li2630683611148">If no, go to <a href="#ALM-24005__li5165783111148">7</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li5165783111148"><a name="ALM-24005__li5165783111148"></a><a name="li5165783111148"></a><span>On FusionInsight Manager, choose <strong id="ALM-24005__b66384016194">Cluster</strong> > <em id="ALM-24005__i1765013071915">Name of the desired cluster</em> > <strong id="ALM-24005__b96521011912">Services</strong> > <strong id="ALM-24005__b1465517021911">Flume</strong> > <strong id="ALM-24005__b1965730131911">Instance</strong>.</span></li><li id="ALM-24005__li6154427911148"><span>Go to the Flume instance page of the faulty node to check whether the indicator <strong id="ALM-24005__b31339649428">Sink Speed Metrics</strong> is 0.</span><p><ul class="subitemlist" id="ALM-24005__ul2727194911148"><li id="ALM-24005__li2353481611148">If yes, go to <a href="#ALM-24005__li2555818811148">13</a>.</li><li id="ALM-24005__li1048675711148">If no, go to <a href="#ALM-24005__li3789323111148">9</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-24005__p4412095511148"><strong id="ALM-24005__b4993647951">Check the network connection between the faulty node and the node that corresponds to the Flume Sink IP address.</strong></p>
|
|
<ol start="9" id="ALM-24005__ol16005403111458"><li id="ALM-24005__li3789323111148"><a name="ALM-24005__li3789323111148"></a><a name="li3789323111148"></a><span>Open the <strong id="ALM-24005__b208651324135">properties.properties</strong> configuration file on the local PC, search for <strong id="ALM-24005__b20866163217137">type = avro</strong> in the file, and check whether the Flume sink type is Avro.</span><p><ul class="subitemlist" id="ALM-24005__ul4894960111148"><li id="ALM-24005__li1903069311148">If yes, go to <a href="#ALM-24005__li3657487511148">10</a>.</li><li id="ALM-24005__li6509116811148">If no, go to <a href="#ALM-24005__li2555818811148">13</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li3657487511148"><a name="ALM-24005__li3657487511148"></a><a name="li3657487511148"></a><span>Log in to the faulty node as user <strong id="ALM-24005__b411991389428">root</strong>, and run the <strong id="ALM-24005__b352479279428">ping </strong><em id="ALM-24005__i487958919428">IP address of the Flume sink</em> command to check whether the peer host can be pinged successfully. <span id="ALM-24005__text15127544165217"></span></span><p><ul class="subitemlist" id="ALM-24005__ul6371619811148"><li id="ALM-24005__li3387197411148">If yes, go to <a href="#ALM-24005__li2555818811148">13</a>.</li><li id="ALM-24005__li5927536011148">If no, go to <a href="#ALM-24005__li6073842411148">11</a>.</li></ul>
|
|
</p></li><li id="ALM-24005__li6073842411148"><a name="ALM-24005__li6073842411148"></a><a name="li6073842411148"></a><span>Contact the network administrator to restore the network.</span></li><li id="ALM-24005__li6249212211148"><span>In the alarm list, check whether the alarm is cleared after a period.</span><p><ul class="subitemlist" id="ALM-24005__ul1237057011148"><li id="ALM-24005__li977491111148">If yes, no further action is required.</li><li id="ALM-24005__li5357029011148">If no, go to <a href="#ALM-24005__li2555818811148">13</a>.</li></ul>
|
|
</p></li></ol>
|
|
<p class="tableheading" id="ALM-24005__p4422627111148"><strong id="ALM-24005__b1897213754">Collect the fault information.</strong></p>
|
|
<ol start="13" id="ALM-24005__ol19131698111516"><li id="ALM-24005__li2555818811148"><a name="ALM-24005__li2555818811148"></a><a name="li2555818811148"></a><span>On FusionInsight Manager, choose <strong id="ALM-24005__b191151879199">O&M</strong>. In the navigation pane on the left, choose <strong id="ALM-24005__b19121976193">Log</strong> > <strong id="ALM-24005__b121253731913">Download</strong>.</span></li><li id="ALM-24005__li2869710211148"><span>Expand the <strong id="ALM-24005__b14574258112513">Service</strong> drop-down list, and select <strong id="ALM-24005__b3574175818251">Flume</strong> for the target cluster.</span></li><li id="ALM-24005__li5694732911148"><span>Click <span><img id="ALM-24005__image1945644173117" src="en-us_image_0263895532.png"></span> in the upper right corner, and set <strong id="ALM-24005__b6456941173117">Start Date</strong> and <strong id="ALM-24005__b11456154113318">End Date</strong> for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click <strong id="ALM-24005__b13456164113319">Download</strong>.</span></li><li id="ALM-24005__li4933095111148"><span>Contact <span id="ALM-24005__text1042513457437">O&M personnel</span> and provide the collected logs.</span></li></ol>
|
|
</div>
|
|
<div class="section" id="ALM-24005__section169311343318"><h4 class="sectiontitle">Alarm Clearing</h4><p id="ALM-24005__p754913417333">This alarm is automatically cleared after the fault is rectified.</p>
|
|
</div>
|
|
<div class="section" id="ALM-24005__section53362350"><h4 class="sectiontitle">Related Information</h4><p id="ALM-24005__p7522741">None</p>
|
|
</div>
|
|
</div>
|
|
<div>
|
|
<div class="familylinks">
|
|
<div class="parentlink"><strong>Parent topic:</strong> <a href="mrs_01_1298.html">Alarm Reference (Applicable to MRS 3.x)</a></div>
|
|
</div>
|
|
</div>
|
|
|