Files
doc-exports/docs/mrs/umn/admin_guide_000409.html
yangtong c285e88a17 MRS UMN 20250806 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: yangtong <yangtong2@huawei.com>
Co-committed-by: yangtong <yangtong2@huawei.com>
2025-09-02 10:43:57 +00:00

45 KiB

Adding an SQL Inspection

Scenario

You can add rules for specified tenants and SQL engines on MRS Manager. The system will display hints on, intercept, or block SQL requests matched by the rules.

Exercise caution when you add or modify a SQL inspection rule for a cluster, enable a rule, and set the threshold. An improper rule may cause upper-layer service interruption.

Adding a Rule

  1. Log in to MRS Manager as a user with the Manager administrator rights.
  2. Click Cluster and choose SQL Inspector. The SQL Inspector page is displayed.

    You can click View Supported Rules to view all SQL inspection rules supported by the current cluster.

  3. Click Add Rule. After the password of the current user is verified, the Add Rule page is displayed.
  4. Set the required parameters and click OK.

    Parameter

    Description

    Name

    Name of a SQL inspection rule

    ID

    Rule ID

    For details about meaning of the rules corresponding to the IDs, see Table 1.

    Tenant

    Click Add to select the name of the tenant to which the current rule will be associated.

    If you need to add a new tenant, plan and create a cluster tenant by referring to Tenant Resources.

    Services and Actions

    Click Add to specify the SQL engine to which this rule will be associated with and set the threshold parameters of the rule.

    Each rule can be associated with one SQL engine. If you want to configure a rule for other SQL engines, add new rules.

    • Service: Select the SQL engine associated with the current rule.
    • If an SQL request meets the rule, the system performs the following operations:
      • Hint: Record logs and display a hint for handling the SQL request. If the rule has parameters, you need to configure the threshold.
      • Intercept: Intercept the SQL request that meets the rule. If the rule has parameters, you need to configure the threshold.
      • Block: Block the SQL request that meets the rule. If the rule has parameters, you need to configure the threshold.
        NOTE:

        For static and dynamic interception rules, Hint and Block operations are supported. For blocking rules, only the Block operation is supported.

  5. View the added prevention rule on the SQL Defense page. The rule takes effect dynamically.

    To adjust the current rule, click Modify in the Operation column of the row that contains the target rule. After the user password is verified, you can modify rule parameters.

    Figure 1 Viewing SQL inspection rules

MRS SQL Inspection Rules

Table 1 MRS SQL inspection rules

ID

Description

Engine

Threshold

Example SQL Statement

Impact

static_0001

Check whether the number of the count(distinct) functions used in a SQL statement exceeds the preconfigured threshold.

  • Hive
  • Spark
  • HetuEngine
  • Doris

Number of the count(distinct) functions

Recommended value: 10

SELECT COUNT(DISTINCT deviceId), COUNT(DISTINCT collDeviceId)

FROM table

GROUP BY deviceName, collDeviceName, collCurrentVersion;

The select count(distinct) syntax generates only one Reduce. When a large table is processed, the data volume to shuffle is large and the execution is slow. If there are multiple count distinct, multiple records are generated for the same record for shuffling, increasing the shuffling amount and slowing down the job execution.

static_0002

Check whether not in <subquery> is used in a SQL statement.

  • Hive
  • Spark
  • HetuEngine
  • Doris

N/A

SELECT *

FROM Orders o

WHERE Orders.Order_ID not in (Select Order_ID

FROM HeldOrders h

where h.order_id = o.order_id);

The not in subquery performance is poor. If the listed values of the not in clause contains null, no data is returned in the result.

static_0003

Check whether the number of joins in a SQL statement exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • Doris

Number of joins

Recommended value: 20

N/A

The more tables are joined, the more files, partitions, and data are scanned. As a result, the SQL statement occupies too much memory, affecting cluster stability.

static_0004

Check whether the number of the union all operators in a SQL statement exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • Doris

Number of the union all operators in a statement

Recommended value: 20

select * from tables t1

union all select * from tables t2

union all select * from tables t3

union all select * from tables t4

union all select * from tables t5

union all select * from tables t6

union all select * from tables t7

union all select * from tables t8

union all select * from tables t9;

A large number of union all operations may generate ultra-large result sets. As a result, a large number of HDFS and Yarn resources are occupied during shuffling.

static_0005

Check whether the number of nested subqueries exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • Doris

Number of nested subqueries

Recommended value: 20

select * from (

with temp1 as (select * from tables)

select * from temp1);

If there are too many SQL nesting layers, temporary data is generated for multiple times, and SQL statements are difficult to maintain and modify. You are advised to avoid multiple nested queries to improve execution efficiency and SQL maintainability.

static_0006

Check whether the length of a SQL statement exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • Doris

Length of the SQL statement, in KB

Recommended value: 10

N/A

If a SQL string is too long, the SQL statement may be too complex, which may cause memory and performance problems. In addition, the SQL statement is difficult to maintain.

static_0007

Check whether the Cartesian product exists when multiple tables are associated.

  • Hive
  • Spark
  • HetuEngine
  • Doris

N/A

select * from A,B;

The Cartesian product causes data expansion. When a task is running, a large amount of HDFS space and YARN resources may be occupied, affecting the execution of other tasks.

static_0008

Check whether alter table update is performed at the cluster level (on cluster).

ClickHouse

N/A

alter table testtb1 on cluster default_cluster update price=10.0 where id='100'

Updating and deleting data consume a large number of CPU and memory resources. Cluster-level operations pose high pressure on the database. As a result, tasks on some nodes may fail to be executed, time out, or do not respond for a long time, affecting cluster stability.

static_0009

Check whether alter table delete is performed at the cluster level (on cluster).

ClickHouse

N/A

alter table testtb1 on cluster default_cluster delete where id ='10'

static_0010

Check whether alter table add column is performed at the cluster level (on cluster).

ClickHouse

N/A

alter table testtb1 on cluster default_cluster add column testc String

Adding and deleting columns consume a large number of CPU and memory resources. Cluster-level operations pose high pressure on the database. As a result, tasks on some nodes may fail to be executed, time out, or do not respond for a long time, causing metadata inconsistency and affecting cluster stability.

static_0011

Check whether alter table drop column is performed at the cluster level (on cluster).

ClickHouse

N/A

alter table testtb1 on cluster default_cluster drop column testc

static_0012

Check whether optimize final is performed at the cluster level (on cluster).

ClickHouse

N/A

optimize table testtb1 on cluster default_cluster final

Manual combination consumes a large number of CPU and memory resources and disk I/O resources when the table data volume is large. Cluster-level operations pose high pressure on the database. As a result, tasks on some nodes may fail to be executed, time out, or do not respond for a long time, affecting cluster stability.

static_0013

Check whether drop is performed at the cluster level (on cluster).

ClickHouse

N/A

drop table/database test on cluster default_cluster;

Dropping tables consumes a large number of CPU and memory resources and disk I/O resources when the metadata volume and data volume are large. Cluster-level operations pose high pressure on the database. As a result, tasks on some nodes may fail to be executed, time out, or do not respond for a long time, affecting cluster stability.

static_0014

Check whether truncate table is performed at the cluster level (on cluster).

ClickHouse

N/A

truncate table testtb1 on cluster default_cluster;

Deleting table data consumes a large number of CPU and memory resources and disk I/O resources when the table data volume is large. Cluster-level operations pose high pressure on the database. As a result, tasks on some nodes may fail to be executed, time out, or do not respond for a long time, affecting cluster stability.

dynamic_0001

Check whether the number of scanned files exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • Doris

Number of files that will be scanned or have been scanned

Recommended value: 100,000

SELECT ss_ticket_number FROM store_sales WHERE ss_ticket_number=72291252 LIMIT 10;

Scanning a large number of files with a SQL statement can generate a large number of slices, overloading HiveServer memory and potentially causing the instance to crash. This can also consume a significant amount of cluster resources, delaying other tasks.

dynamic_0002

Check whether the number of partitions involved in a table operation (select, delete, update, or alter) exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • ClickHouse
  • Doris

Number of partitions involved in the delete or alter operation

Recommended value: 10,000

DELETE FROM table_name WHERE column_name = value

Scanning too many partitions can overload the database, causing it to run slowly and consume excessive memory in HiveServer and MetaStore. This can lead to frequent GC, disrupting other tasks and potentially causing the instance to restart unexpectedly.

dynamic_0003

When the right table of a join is a distributed table, check whether the data volume of the right table exceeds the threshold.

ClickHouse

Number of rows in the right table of a join

Recommended value: 100,000,000

SELECT name, text FROM table_1 JOIN table_2 ON table_1.Id = table_2.Id

Large data volumes in the right table can cause the join operation to consume excessive memory, potentially leading to memory insufficiency, service failure, and cluster instability.

dynamic_0004

Check whether a SQL statement overwrites the same table where it reads data.

  • Hive
  • Spark

N/A

N/A

Such SQL statements may cause data loss or inconsistency.

running_0001

Check whether the number of rows returned by a Select statement to the client exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • ClickHouse
  • Doris

Number of rows in the query result

Recommended value: 100,000

select * from table

Large query results can cause server memory overload, leading to OOM exceptions and instability. Excessive results also slow down query efficiency.

running_0002

Check whether the peak memory usage of a SQL statement exceeds the threshold (absolute value).

  • Hive
  • Spark
  • HetuEngine
  • ClickHouse
  • Doris

Memory occupied by a SQL statement during runtime, in MB

N/A

Long-running tasks consume cluster resources, slowing down other tasks and overall performance.

running_0003

Check whether the running duration of a SQL statement exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • ClickHouse
  • Doris

Running duration of a SQL statement, in seconds

N/A

Long-running tasks can monopolize cluster resources, reducing overall utilization. They also generate a significant amount of intermediate data. To avoid delays, tasks that fail to meet expectations should be adjusted promptly.

running_0004

Check whether the size of data scanned by a SQL statement exceeds the threshold.

  • Hive
  • Spark
  • HetuEngine
  • ClickHouse

Data scanned by a SQL statement, in GB

Recommended value: 10,240

N/A

Large datasets can consume significant memory resources, impacting other tasks' performance. Intermediate data can also fill disk space, compromising cluster stability.

running_0005

Check whether the amount of shuffle data that has been written by a SQL statement exceeds the threshold.

Spark

Amount of shuffle data written by a SQL statement, in GB

N/A

When executing SQL statements with operators like join and aggregation, a significant amount of data is shuffled, leading to high disk usage and potentially causing disk space exhaustion, which can compromise cluster stability.