diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739896.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739896.png deleted file mode 100644 index a738660..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739896.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739900.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739900.png deleted file mode 100644 index bcc2564..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295739900.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899860.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899860.png deleted file mode 100644 index e1ba5ae..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899860.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899864.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899864.png deleted file mode 100644 index 1b7e755..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001295899864.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059704.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059704.png deleted file mode 100644 index f1197f6..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296059704.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219332.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219332.png deleted file mode 100644 index 3b28ad5..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219332.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219336.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219336.png deleted file mode 100644 index 49f804c..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001296219336.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739725.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739725.png deleted file mode 100644 index 284cb79..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348739725.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740045.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740045.png index ae7f920..c7456c8 100644 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740045.png and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001348740045.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059549.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059549.png deleted file mode 100644 index 0d135b4..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059549.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059937.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059937.png deleted file mode 100644 index 06737e1..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349059937.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139413.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139413.png deleted file mode 100644 index c84e94e..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139413.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139417.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139417.png deleted file mode 100644 index 9632c84..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349139417.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259001.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259001.png deleted file mode 100644 index 6a06cad..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259001.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259005.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259005.png deleted file mode 100644 index 934505b..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259005.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259429.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259429.png deleted file mode 100644 index d40da22..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001349259429.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387905484.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387905484.png deleted file mode 100644 index 72f5360..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001387905484.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438431645.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438431645.png deleted file mode 100644 index 1a7780b..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001438431645.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441091233.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441091233.png deleted file mode 100644 index 7af9c59..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441091233.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441208981.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441208981.png deleted file mode 100644 index 62de180..0000000 Binary files a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001441208981.png and /dev/null differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001446755301.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001446755301.png new file mode 100644 index 0000000..527a0f2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001446755301.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001446835121.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001446835121.png new file mode 100644 index 0000000..30e52c7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001446835121.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472704.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472704.png new file mode 100644 index 0000000..cf4cf0f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472704.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472712.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472712.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472712.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472724.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472724.png new file mode 100644 index 0000000..702ac83 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472724.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472728.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472728.png new file mode 100644 index 0000000..0364571 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472728.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472732.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472732.png new file mode 100644 index 0000000..019e269 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472732.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472784.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472784.png new file mode 100644 index 0000000..99bd42a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532472784.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532503042.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532503042.png new file mode 100644 index 0000000..5b7b401 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532503042.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532516862.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532516862.png new file mode 100644 index 0000000..29c9d74 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532516862.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532549720.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532549720.png new file mode 100644 index 0000000..c3f4283 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532549720.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632184.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632184.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632184.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632196.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632196.png new file mode 100644 index 0000000..b2f4508 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632196.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632200.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632200.png new file mode 100644 index 0000000..0341312 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632200.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632204.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632204.png new file mode 100644 index 0000000..274d903 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632204.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632208.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632208.png new file mode 100644 index 0000000..61c3cf3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632208.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632212.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632212.png new file mode 100644 index 0000000..00c0937 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532632212.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532676350.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532676350.png new file mode 100644 index 0000000..e3c61e7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532676350.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532676354.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532676354.png new file mode 100644 index 0000000..1d240ed Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532676354.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532677010.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532677010.png new file mode 100644 index 0000000..451fad4 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532677010.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532709204.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532709204.png new file mode 100644 index 0000000..2a9177d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532709204.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791924.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791924.png new file mode 100644 index 0000000..99ca361 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791924.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791932.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791932.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791932.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791944.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791944.png new file mode 100644 index 0000000..a354bf3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791944.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791948.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791948.png new file mode 100644 index 0000000..340b198 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791948.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791952.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791952.png new file mode 100644 index 0000000..43499c8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791952.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791956.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791956.png new file mode 100644 index 0000000..fc1933d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791956.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791960.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791960.png new file mode 100644 index 0000000..038267d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532791960.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532792008.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532792008.png new file mode 100644 index 0000000..6b5650c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532792008.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532836094.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532836094.png new file mode 100644 index 0000000..6a9c4cf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532836094.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532836098.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532836098.png new file mode 100644 index 0000000..60cc25a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532836098.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951860.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951860.png new file mode 100644 index 0000000..a354bf3 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951860.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951868.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951868.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951868.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951876.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951876.png new file mode 100644 index 0000000..cb825bb Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951876.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951880.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951880.png new file mode 100644 index 0000000..2ad4615 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951880.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951884.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951884.png new file mode 100644 index 0000000..138accf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951884.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951888.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951888.png new file mode 100644 index 0000000..d016313 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951888.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951892.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951892.png new file mode 100644 index 0000000..46e9010 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951892.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951928.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951928.png new file mode 100644 index 0000000..038267d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951928.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951944.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951944.png new file mode 100644 index 0000000..23fd610 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532951944.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532996022.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532996022.png new file mode 100644 index 0000000..cb54d2d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001532996022.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533162146.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533162146.png new file mode 100644 index 0000000..88c4617 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533162146.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533198872.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533198872.png new file mode 100644 index 0000000..44d0fa6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533198872.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533358396.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533358396.png new file mode 100644 index 0000000..7f20a76 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533358396.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533359808.jpg b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533359808.jpg new file mode 100644 index 0000000..ded4813 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533359808.jpg differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533481354.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533481354.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533481354.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533544798.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533544798.png new file mode 100644 index 0000000..5361108 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533544798.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533639950.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533639950.png new file mode 100644 index 0000000..b267f51 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533639950.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533641294.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533641294.png new file mode 100644 index 0000000..e433406 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533641294.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533678044.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533678044.png new file mode 100644 index 0000000..23905ee Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001533678044.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001536916934.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001536916934.png new file mode 100644 index 0000000..30e52c7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001536916934.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537076386.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537076386.png new file mode 100644 index 0000000..527a0f2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537076386.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537090654.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537090654.png new file mode 100644 index 0000000..8c2c79e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537090654.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537269552.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537269552.png new file mode 100644 index 0000000..a060727 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537269552.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537413022.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537413022.png new file mode 100644 index 0000000..f8eebba Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001537413022.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952073.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952073.png new file mode 100644 index 0000000..2931dd6 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952073.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952089.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952089.png new file mode 100644 index 0000000..17e5857 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952089.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952097.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952097.png new file mode 100644 index 0000000..8bb092e Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952097.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952105.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952105.png new file mode 100644 index 0000000..bfe756b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952105.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952133.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952133.png new file mode 100644 index 0000000..00c0937 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952133.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952137.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952137.png new file mode 100644 index 0000000..0b1a69f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001582952137.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151841.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151841.png new file mode 100644 index 0000000..2ad4615 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151841.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151845.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151845.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151845.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151849.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151849.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151849.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151865.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151865.png new file mode 100644 index 0000000..67e41ca Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151865.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151869.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151869.png new file mode 100644 index 0000000..3606e4a Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151869.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151877.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151877.png new file mode 100644 index 0000000..bfe756b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151877.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151917.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151917.png new file mode 100644 index 0000000..fc5870c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583151917.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583182157.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583182157.png new file mode 100644 index 0000000..620b8ec Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583182157.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583195981.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583195981.png new file mode 100644 index 0000000..287d3d8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583195981.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583195985.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583195985.png new file mode 100644 index 0000000..2f2c70c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583195985.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272137.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272137.png new file mode 100644 index 0000000..27e3f4c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272137.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272145.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272145.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272145.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272157.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272157.png new file mode 100644 index 0000000..801fe09 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272157.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272169.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272169.png new file mode 100644 index 0000000..1240554 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272169.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272185.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272185.png new file mode 100644 index 0000000..f37aa66 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272185.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272201.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272201.png new file mode 100644 index 0000000..bfe756b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583272201.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316317.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316317.png new file mode 100644 index 0000000..11e5cde Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316317.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316321.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316321.png new file mode 100644 index 0000000..3924b9c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316321.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316949.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316949.png new file mode 100644 index 0000000..77fddbf Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583316949.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583349121.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583349121.png new file mode 100644 index 0000000..eb375e8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583349121.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391837.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391837.png new file mode 100644 index 0000000..dcb40c1 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391837.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391841.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391841.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391841.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391845.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391845.png new file mode 100644 index 0000000..8c9e423 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391845.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391853.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391853.png new file mode 100644 index 0000000..ac917e8 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391853.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391861.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391861.png new file mode 100644 index 0000000..3805721 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391861.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391865.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391865.png new file mode 100644 index 0000000..52db820 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391865.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391869.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391869.png new file mode 100644 index 0000000..038267d Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391869.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391873.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391873.png new file mode 100644 index 0000000..00c0937 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391873.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391913.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391913.png new file mode 100644 index 0000000..60abc9f Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583391913.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583435997.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583435997.png new file mode 100644 index 0000000..f63e57c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583435997.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583436657.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583436657.png new file mode 100644 index 0000000..1ff4bff Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583436657.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583468825.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583468825.png new file mode 100644 index 0000000..defaffc Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583468825.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583504773.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583504773.png new file mode 100644 index 0000000..315a5f9 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583504773.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583757997.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583757997.png new file mode 100644 index 0000000..25985e0 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583757997.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583881265.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583881265.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583881265.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583957937.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583957937.png new file mode 100644 index 0000000..f837461 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583957937.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583961513.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583961513.png new file mode 100644 index 0000000..8506e84 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001583961513.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584077717.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584077717.png new file mode 100644 index 0000000..03bef95 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584077717.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584081289.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584081289.png new file mode 100644 index 0000000..51f1f16 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584081289.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584317997.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584317997.png new file mode 100644 index 0000000..f05133b Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001584317997.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587755985.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587755985.png new file mode 100644 index 0000000..527a0f2 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587755985.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587840761.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587840761.png new file mode 100644 index 0000000..da4b81c Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587840761.png differ diff --git a/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587875989.png b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587875989.png new file mode 100644 index 0000000..30e52c7 Binary files /dev/null and b/doc/component-operation-guide-lts/source/_static/images/en-us_image_0000001587875989.png differ diff --git a/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_outside_a_mrs_cluster.rst b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_outside_a_mrs_cluster.rst index 62f19d1..f993ace 100644 --- a/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_outside_a_mrs_cluster.rst +++ b/doc/component-operation-guide-lts/source/appendix/using_an_mrs_client/using_an_mrs_client_on_nodes_outside_a_mrs_cluster.rst @@ -26,7 +26,7 @@ Prerequisites +-------------------------+-----------------------+-------------------------------------------------+ | | SLES | SUSE Linux Enterprise Server 12 SP4 (SUSE 12.4) | +-------------------------+-----------------------+-------------------------------------------------+ - | | RedHat | Red Hat-7.5-x86_64 (Red Hat 7.5) | + | | Red Hat | Red Hat-7.5-x86_64 (Red Hat 7.5) | +-------------------------+-----------------------+-------------------------------------------------+ | | CentOS | CentOS-7.6 | +-------------------------+-----------------------+-------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/change_history.rst b/doc/component-operation-guide-lts/source/change_history.rst index e21eef5..b52bb7c 100644 --- a/doc/component-operation-guide-lts/source/change_history.rst +++ b/doc/component-operation-guide-lts/source/change_history.rst @@ -1,12 +1,63 @@ -:original_name: en-us_topic_0000001298722056.html +:original_name: mrs_01_17512.html -.. _en-us_topic_0000001298722056: +.. _mrs_01_17512: Change History ============== -=========== ========================================= -Released On What's New -=========== ========================================= -2022-11-01 This issue is the first official release. -=========== ========================================= ++-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Released On | What's New | ++===================================+=====================================================================================================================================================================================================================================================================================================================================================+ +| 2023-11-01 | - Deleted the description of some parameters that are not applicable to Flink. For details, see :ref:`JobManager & TaskManager `. | +| | - Modified the description of the digital display identifier in the Ranger anonymization policy. For details, see :ref:`Adding a Ranger Access Permission Policy for Hive `, :ref:`Adding a Ranger Access Permission Policy for Spark2x `, :ref:`Adding a Ranger Access Permission Policy for HetuEngine `. | +| | - Deleted related content because Hive materialized views are no longer recommended. | +| | - The titles of some sections are optimized to avoid repeated content in the sections where components are not connected. | ++-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 2023-05-04 | Modified the feature documentation of MRS 3.2.0-LTS.1. | +| | | +| | - Added the following content: | +| | | +| | - :ref:`Using CDL ` | +| | - :ref:`Enabling the Read-Only Mode of the ClickHouse Table ` | +| | - :ref:`Configuring the Password of the Default Account of a ClickHouse Cluster ` | +| | - :ref:`ClickHouse Multi-Tenancy ` | +| | - :ref:`Synchronizing Kafka Data to ClickHouse ` | +| | - :ref:`Using the Migration Tool to Quickly Migrate ClickHouse Cluster Data ` | +| | - :ref:`ClickHouse Performance Tuning ` | +| | - :ref:`ClickHouse FAQ ` | +| | - :ref:`Importing and Exporting Jobs ` | +| | - :ref:`Flink Restart Policy ` | +| | - :ref:`Closing HDFS Files ` | +| | - :ref:`Configuring an IoTDB Data Source ` | +| | - :ref:`Using HetuEngine Materialized Views ` | +| | - :ref:`Using HetuEngine SQL Diagnosis ` | +| | - :ref:`Locating Abnormal Hive Files ` | +| | - :ref:`Data Import and Export in Hive ` | +| | - :ref:`Clustering Configuration ` | +| | - :ref:`ARCHIVELOG ` | +| | - :ref:`CLEAN ` | +| | - :ref:`CALL COMMAND ` | +| | - :ref:`Hudi Schema Evolution ` | +| | - :ref:`Using IoTDB ` | +| | - :ref:`Migrating Data Between Kafka Nodes ` | +| | - :ref:`Configuring Intranet and Extranet Access for Kafka ` | +| | - :ref:`Purging Historical Loader Data ` | +| | - :ref:`Adding a Ranger Access Permission Policy for CDL ` | +| | - :ref:`Configuring Ranger Specifications ` | +| | - :ref:`Configuring the Drop Partition Command to Support Batch Deletion ` | +| | - :ref:`Enabling an Executor to Execute Custom Code When Exiting ` | +| | | +| | - Modified the following content: | +| | | +| | - :ref:`Configuring the Password of the Default Account of a ClickHouse Cluster(for MRS 3.1.2) ` | +| | - :ref:`Interconnecting FlinkServer with ClickHouse ` | +| | - :ref:`Interconnecting FlinkServer with Hudi ` | +| | - :ref:`Introduction to Flume Logs ` | +| | - :ref:`Configuring Encrypted Channels ` | +| | - :ref:`Using DBeaver to Access HetuEngine ` | +| | - :ref:`Hive Supporting Transactions ` | +| | - :ref:`Stream Write ` | +| | - :ref:`Introduction to Kafka Logs ` | ++-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 2022-11-01 | This issue is the first official release. | ++-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/index.rst b/doc/component-operation-guide-lts/source/index.rst index 2957bb8..e7c4738 100644 --- a/doc/component-operation-guide-lts/source/index.rst +++ b/doc/component-operation-guide-lts/source/index.rst @@ -6,6 +6,7 @@ MapReduce Service - Component Operation Guide (LTS) :maxdepth: 1 using_carbondata/index + using_cdl/index using_clickhouse/index using_dbservice/index using_flink/index @@ -16,6 +17,7 @@ MapReduce Service - Component Operation Guide (LTS) using_hive/index using_hudi/index using_hue/index + using_iotdb/index using_kafka/index using_loader/index using_mapreduce/index diff --git a/doc/component-operation-guide-lts/source/using_carbondata/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/index.rst index 0950137..e49f39f 100644 --- a/doc/component-operation-guide-lts/source/using_carbondata/index.rst +++ b/doc/component-operation-guide-lts/source/using_carbondata/index.rst @@ -5,7 +5,7 @@ Using CarbonData ================ -- :ref:`Overview ` +- :ref:`Spark CarbonData Overview ` - :ref:`Configuration Reference ` - :ref:`CarbonData Operation Guide ` - :ref:`CarbonData Performance Tuning ` @@ -18,7 +18,7 @@ Using CarbonData :maxdepth: 1 :hidden: - overview/index + spark_carbondata_overview/index configuration_reference carbondata_operation_guide/index carbondata_performance_tuning/index diff --git a/doc/component-operation-guide-lts/source/using_carbondata/overview/carbondata_overview.rst b/doc/component-operation-guide-lts/source/using_carbondata/spark_carbondata_overview/carbondata_overview.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_carbondata/overview/carbondata_overview.rst rename to doc/component-operation-guide-lts/source/using_carbondata/spark_carbondata_overview/carbondata_overview.rst diff --git a/doc/component-operation-guide-lts/source/using_carbondata/overview/index.rst b/doc/component-operation-guide-lts/source/using_carbondata/spark_carbondata_overview/index.rst similarity index 83% rename from doc/component-operation-guide-lts/source/using_carbondata/overview/index.rst rename to doc/component-operation-guide-lts/source/using_carbondata/spark_carbondata_overview/index.rst index e0cc8c5..ac6c97b 100644 --- a/doc/component-operation-guide-lts/source/using_carbondata/overview/index.rst +++ b/doc/component-operation-guide-lts/source/using_carbondata/spark_carbondata_overview/index.rst @@ -2,8 +2,8 @@ .. _mrs_01_1401: -Overview -======== +Spark CarbonData Overview +========================= - :ref:`CarbonData Overview ` - :ref:`Main Specifications of CarbonData ` diff --git a/doc/component-operation-guide-lts/source/using_carbondata/overview/main_specifications_of_carbondata.rst b/doc/component-operation-guide-lts/source/using_carbondata/spark_carbondata_overview/main_specifications_of_carbondata.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_carbondata/overview/main_specifications_of_carbondata.rst rename to doc/component-operation-guide-lts/source/using_carbondata/spark_carbondata_overview/main_specifications_of_carbondata.rst diff --git a/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_104_or_143_is_reported_after_a_cdl_job_runs_for_a_period_of_time.rst b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_104_or_143_is_reported_after_a_cdl_job_runs_for_a_period_of_time.rst new file mode 100644 index 0000000..81b797b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_104_or_143_is_reported_after_a_cdl_job_runs_for_a_period_of_time.rst @@ -0,0 +1,23 @@ +:original_name: mrs_01_24794.html + +.. _mrs_01_24794: + +Error 104 or 143 Is Reported After a CDL Job Runs for a Period of Time +====================================================================== + +Symptom +------- + +After an CDL job runs for a period of time, the YARN job fails and the status code **104** or **143** is returned. + +Possible Causes +--------------- + +A large amount of data is captured to Hudi. As a result, the memory of the job is insufficient. + +Procedure +--------- + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **CDL**. Click the hyperlink next to **CDLService UI** to access the CDLService web UI. On the data synchronization job list page, locate the row that contains the target job and choose **More** > **Stop**. After the job is stopped, choose **More** > **Edit**. +#. Change the value of **max.rate.per.partition** of Hudi to **6000** and click **Save**. +#. On the data synchronization job list page, locate the row containing the target job and choose **More** > **Restart** to restart the job. diff --git a/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_403_is_reported_when_a_cdl_job_is_stopped.rst b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_403_is_reported_when_a_cdl_job_is_stopped.rst new file mode 100644 index 0000000..67c0317 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_403_is_reported_when_a_cdl_job_is_stopped.rst @@ -0,0 +1,25 @@ +:original_name: mrs_01_24796.html + +.. _mrs_01_24796: + +Error 403 Is Reported When a CDL Job Is Stopped +=============================================== + +Symptom +------- + +The error message "parameter exception with code: 403" is displayed when a CDL job is stopped on the CDLService web UI. + +|image1| + +Possible Causes +--------------- + +The current user does not have the permission to stop the job. + +Procedure +--------- + +Stop the job as the user who created the job. To view the job creator, log in to the CDLService web UI, locate the job in the job list, and view the creator in the **Creator** column. + +.. |image1| image:: /_static/images/en-us_image_0000001532791956.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_is_reported_when_the_job_of_capturing_data_from_pgsql_to_hudi_is_started.rst b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_is_reported_when_the_job_of_capturing_data_from_pgsql_to_hudi_is_started.rst new file mode 100644 index 0000000..da97049 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/error_is_reported_when_the_job_of_capturing_data_from_pgsql_to_hudi_is_started.rst @@ -0,0 +1,27 @@ +:original_name: mrs_01_24795.html + +.. _mrs_01_24795: + +Error Is Reported When the Job of Capturing Data From PgSQL to Hudi Is Started +============================================================================== + +Symptom +------- + +The error message "Record key is empty" is displayed when the job of capturing data from PgSQL to Hudi is started. + +|image1| + +Possible Causes +--------------- + +The primary key parameter **table.primarykey.mapping** of the Hudi table is not configured. + +Procedure +--------- + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **CDL**. Click the hyperlink next to **CDLService UI** to access the CDLService web UI. On the data synchronization job list page, locate the row that contains the target job and choose **More** > **Stop**. After the job is stopped, choose **More** > **Edit**. +#. Configure the **table.primarykey.mapping** parameter on the Hudi table attribute configuration page and click **Save**. For details about the parameter, see :ref:`Table 6 `. +#. On the data synchronization job list page, locate the row containing the target job and click **Start** to restart the job. + +.. |image1| image:: /_static/images/en-us_image_0000001532632208.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/hudi_does_not_receive_data_after_a_cdl_job_is_executed.rst b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/hudi_does_not_receive_data_after_a_cdl_job_is_executed.rst new file mode 100644 index 0000000..4aca974 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/hudi_does_not_receive_data_after_a_cdl_job_is_executed.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_24793.html + +.. _mrs_01_24793: + +Hudi Does Not Receive Data After a CDL Job Is Executed +====================================================== + +Symptom +------- + +After the CDL job for capturing data to Hudi is executed, related data exists in Kafka, but no record exists in Spark RDD, no related data exists in Hudi, and the error message "TopicAuthorizationException: No authorized to access topics" is displayed. + +Possible Causes +--------------- + +The current user does not have the permission to consume Kafka data. + +Procedure +--------- + +#. Log in to FusionInsight Manager and choose **System** > **Permission** > **User**. Locate the row containing the user who submitted the CDL job, click **Modify**, add the **kafkaadmin** user group, and click **OK**. +#. On FusionInsight Manager, choose **Cluster** > **Services** > **CDL**. Click the hyperlink next to **CDLService UI** to access the CDLService web UI and restart the job. diff --git a/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/index.rst b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/index.rst new file mode 100644 index 0000000..b7883aa --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/cdl_faqs/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_24288.html + +.. _mrs_01_24288: + +CDL FAQs +======== + +- :ref:`Hudi Does Not Receive Data After a CDL Job Is Executed ` +- :ref:`Error 104 or 143 Is Reported After a CDL Job Runs for a Period of Time ` +- :ref:`Error Is Reported When the Job of Capturing Data From PgSQL to Hudi Is Started ` +- :ref:`Error 403 Is Reported When a CDL Job Is Stopped ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + hudi_does_not_receive_data_after_a_cdl_job_is_executed + error_104_or_143_is_reported_after_a_cdl_job_runs_for_a_period_of_time + error_is_reported_when_the_job_of_capturing_data_from_pgsql_to_hudi_is_started + error_403_is_reported_when_a_cdl_job_is_stopped diff --git a/doc/component-operation-guide-lts/source/using_cdl/cdl_log_overview.rst b/doc/component-operation-guide-lts/source/using_cdl/cdl_log_overview.rst new file mode 100644 index 0000000..b1439a9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/cdl_log_overview.rst @@ -0,0 +1,109 @@ +:original_name: mrs_01_24129.html + +.. _mrs_01_24129: + +CDL Log Overview +================ + +Log Description +--------------- + +**Log path**: The default log storage path of CDL is **/var/log/Bigdata/cdl/**\ *Role name abbreviation*. + +- CDLService: **/var/log/Bigdata/cdl/service** (run logs) and **/var/log/Bigdata/audit/cdl/service** (audit logs). +- CDLConnector: **/var/log/Bigdata/cdl/connector** (run logs). + +.. table:: **Table 1** Log list + + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | Type | File | Description | + +===========+=================================+=================================================================================+ + | Run logs | connect.log | CDLConnector run log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | prestartDetail.log | Log that records cluster initialization before service startup. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | startDetail.log | Service startup log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | stopDetail.log | Service stop log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | cleanupDetail.log | Log that records the cleanup execution of services. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | check-serviceDetail.log | Log that records the verification of service status after service installation. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | cdl-db-operation.log | Log that records database initialization during service startup. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | cdl-app-launcher.log | Spark application startup log of CDL data synchronization tasks. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | cdl-dc-app-launcher.log | Spark application startup log of CDL data comparison tasks. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | serviceInstanceCheck.log | Instance check log of CDLService. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | connectorInstanceCheck.log | Instance check log of CDLConnector. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | ModifyDBPasswd.log | Log that records the resetting of the service database password. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | ranger-cdl-plugin-enable.log | Log that records the enabling or disabling of Ranger authentication. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | postinstallDetail.log | Service installation log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | cdl_connector_pidxxx_gc.log.x | CDLConnector garbage collection (GC) log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | cdl_service_pidxxx_gc.log.x | CDLService GC log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | threadDump-CDLConnector-xxx.log | CDLConnector stack log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | | threadDump-CDLService-xxx.log | CDLService stack log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + | Audit log | cdl-audit.log | Service audit log. | + +-----------+---------------------------------+---------------------------------------------------------------------------------+ + +Log Level +--------- + +:ref:`Table 2 ` describes the log levels supported by CDL. + +Levels of run logs are FATAL, ERROR, WARN, INFO, and DEBUG from the highest to the lowest priority. Run logs of equal or higher levels are recorded. The higher the specified log level, the fewer the logs recorded. + +.. _mrs_01_24129__tc09b739e3eb34797a6da936a37654e97: + +.. table:: **Table 2** Log levels + + +-----------------------+-------+------------------------------------------------------------------------------------------+ + | Type | Level | Description | + +=======================+=======+==========================================================================================+ + | Run log and audit log | FATAL | Logs of this level record fatal information about system. | + +-----------------------+-------+------------------------------------------------------------------------------------------+ + | | ERROR | Logs of this level record error information about system running. | + +-----------------------+-------+------------------------------------------------------------------------------------------+ + | | WARN | Logs of this level record exception information about the current event processing. | + +-----------------------+-------+------------------------------------------------------------------------------------------+ + | | INFO | Logs of this level record normal running status information about the system and events. | + +-----------------------+-------+------------------------------------------------------------------------------------------+ + | | DEBUG | Logs of this level record system running and debugging information. | + +-----------------------+-------+------------------------------------------------------------------------------------------+ + +To modify log levels, perform the following operations: + +#. Go to the **All Configurations** page of CDL. For details, see :ref:`Modifying Cluster Service Configuration Parameters `. +#. On the menu bar on the left, select the log menu of the target role. +#. Select a desired log level. +#. Save the configuration. In the displayed dialog box, click **OK** to make the configurations take effect. + + .. note:: + + The configurations take effect immediately without the need to restart the service. + +Log Format +---------- + +The following table lists the CDL log formats: + +.. table:: **Table 3** Log formats + + +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Format | Example | + +===========+========================================================================================================================================================+===============================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | Run log | <*yyyy-MM-dd HH:mm:ss,SSS*>|<*Log level*>|<*Name of the thread that generates the log*>|<*Message in the log*>|<*Location where the log event occurs*> | 2021-06-15 17:25:19,658 \| DEBUG \| qtp2009591182-1754 \| >fill SslConnection@5d04c5a0::SocketChannelEndPoint@7c011c24{l=/10.244.224.65:21495,r=/10.244.224.83:53724,OPEN,fill=-,flush=-,to=1/30000}{io=0/0,kio=0,kro=1}->SslConnection@5d04c5a0{NOT_HANDSHAKING,eio=-1/-1,di=-1,fill=IDLE,flush=IDLE}~>DecryptedEndPoint@771f2f77{l=/10.244.224.65:21495,r=/10.244.224.83:53724,OPEN,fill=-,flush=-,to=19398/30000}=>HttpConnection@68c5859b[p=HttpParser{s=CONTENT,0 of -1},g=HttpGenerator@536e2de0{s=END}]=>HttpChannelOverHttp@7bf252bd{s=HttpChannelState@38be31e{s=IDLE rs=COMPLETED os=COMPLETED is=IDLE awp=false se=false i=false al=0},r=1,c=true/true,a=IDLE,uri=https://10.244.224.65:21495/api/v1/cdl/monitor/jobs/metrics,age=19382} \| SslConnection.java:614 | + +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | <*yyyy-MM-dd HH:mm:ss,SSS*>|<*Log level*>|<*Name of the thread that generates the log*>|<*Message in the log*>|<*Location where the log event occurs*> | 2021-06-15 11:07:00,262 \| INFO \| qtp1846345504-30 \| STARTTIME=2021-06-15 11:06:47.912 ENDTIME=2021-06-15 11:07:00.261 USERIP=10.144.116.198 USER=CDL User INSTANCE=10-244-224-65 OPERATION=Start CDL Job TARGET=CDCJobExecutionResource RESULT=SUCCESS \| CDCAuditLogger.java:93 | + +-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_cdl/cdl_usage_instructions.rst b/doc/component-operation-guide-lts/source/using_cdl/cdl_usage_instructions.rst new file mode 100644 index 0000000..9dd1f99 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/cdl_usage_instructions.rst @@ -0,0 +1,275 @@ +:original_name: mrs_01_24124.html + +.. _mrs_01_24124: + +CDL Usage Instructions +====================== + +CDL is a simple and efficient real-time data integration service. It captures data change events from various OLTP databases and pushes them to Kafka. The Sink Connector consumes data in topics and imports the data to the software applications of big data ecosystems. In this way, data is imported to the data lake in real time. + +The CDL service contains two roles: CDLConnector and CDLService. CDLConnector is the instance for executing a data capture job, and CDLService is the instance for managing and creating a job. + +You can create data synchronization and comparison tasks on the CDLService WebUI. + +Data synchronization task +------------------------- + +- The CDL supports the following types of data synchronization tasks: + + .. table:: **Table 1** Data synchronization task types supported by the CDL + + +-------------+-----------------+-------------------------------------------------------------------+ + | Data source | Destination end | Description | + +=============+=================+===================================================================+ + | MySQL | Hudi | This task synchronizes data from the MySQL database to Hudi. | + +-------------+-----------------+-------------------------------------------------------------------+ + | | Kafka | This task synchronizes data from the MySQL database to Kafka. | + +-------------+-----------------+-------------------------------------------------------------------+ + | PgSQL | Hudi | This task synchronizes data from the PgSQL database to Hudi. | + +-------------+-----------------+-------------------------------------------------------------------+ + | | Kafka | This task synchronizes data from the PgSQL database to Kafka. | + +-------------+-----------------+-------------------------------------------------------------------+ + | Hudi | DWS | This task synchronizes data from the Hudi database to DWS. | + +-------------+-----------------+-------------------------------------------------------------------+ + | | ClickHouse | This task synchronizes data from the Hudi database to ClickHouse. | + +-------------+-----------------+-------------------------------------------------------------------+ + | ThirdKafka | Hudi | This task synchronizes data from the ThirdKafka database to Hudi. | + +-------------+-----------------+-------------------------------------------------------------------+ + +- Usage Constraints: + + - If CDL is required, the value of **log.cleanup.policy** of Kafka must be **delete**. + + - The CDL service has been installed in the MRS cluster. + + - CDL can capture incremental data only from non-system tables, but not from built-in databases of databases such as MySQL, and PostgreSQL. + + - .. _mrs_01_24124__li268123915168: + + Binary logging (enabled by default) and GTID have been enabled for the MySQL database. CDL cannot fetch tables whose names contain special characters such as the dollar sign ($) character. + + **To check whether binary logging is enabled for the MySQL database:** + + Use a tool (Navicat is used in this example) or CLI to connect to the MySQL database and run the **show variables like 'log_%'** command to view the configuration. + + For example, in Navicat, choose **File** > **New Query** to create a query, enter the following SQL statement, and click **Run**. If **log_bin** is displayed as **ON** in the result, the function is enabled successfully. + + **show variables like 'log_%'** + + |image1| + + **If the bin log function of the MySQL database is not enabled, perform the following operations:** + + Modify the MySQL configuration file **my.cnf** (**my.ini** for Windows) as follows: + + .. code-block:: + + server-id = 223344 + log_bin = mysql-bin + binlog_format = ROW + binlog_row_image = FULL + expire_logs_days = 10 + + After the modification, restart MySQL for the configurations to take effect. + + **To check whether GTID is enabled for the MySQL database:** + + Run the **show global variables like '%gtid%'** command to check whether GTID is enabled. For details, see the official documentation of the corresponding MySQL version. (For details about how to enable the function in MySQL 8.x, see https://dev.mysql.com/doc/refman/8.0/en/replication-mode-change-online-enable-gtids.html.) + + |image2| + + **Set user permissions:** + + To execute MySQL tasks, users must have the **SELECT**, **RELOAD**, **SHOW DATABASES**, **REPLICATION SLAVE** and **REPLICATION CLIENT** permissions. + + Run the following command to grant the permissions: + + **GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON \*.\* TO** '*Username*' **IDENTIFIED BY** '*Password*'; + + Run the following command to update the permissions: + + **FLUSH PRIVILEGES;** + + - .. _mrs_01_24124__li1868193914169: + + The write-ahead log policy is modified for the PostgreSQL database. + + .. note:: + + - The user for connecting to the PostgreSQL database must have the replication permission, the CREATE permission on the database, and is the owner of tables. + + - CDL cannot fetch tables whose names contain special characters such as the dollar sign ($) character. + + - For PostgreSQL databases, you must have the permission to set the **statement_timeout** and **lock_timeout** parameters and the permission to query and delete slots and publications. + + - You are advised to set **max_wal_senders** to 1.5 or 2 times the value of **Slot**. + + - If the replication identifier of a PostgreSQL table is **default**, enable the full field completion function in the following scenarios: + + - Scenario 1: + + When the **delete** operation is performed on the source database, a **delete** event contains only the primary key information. In this case, for the **delete** data written to Hudi, only the primary key has values, and the values of other service fields are **null**. + + - Scenario 2: + + When the size of a single piece of data in the database exceeds 8 KB (including 8 KB), an **update** event contains only changed fields. In this case, the values of some fields in the Hudi data are **\__debezium_unavailable_value**. + + The related commands are as follows: + + - Command for querying the replication identifier of a PostgreSQL table: + + **SELECT CASE relreplident WHEN 'd' THEN 'default' WHEN 'n' THEN 'nothing' WHEN 'f' THEN 'full' WHEN 'i' THEN 'index' END AS replica_identity FROM pg_class WHERE oid = '**\ *tablename*\ **'::regclass;** + + - Command for enabling the full field completion function for a table: + + **ALTER TABLE** *tablename* **REPLICA IDENTITY FULL;** + + #. Modify **wal_level = logical** in the database configuration file **postgresql.conf** (which is stored in the **data** folder in the PostgreSQL installation directory by default). + + .. code-block:: + + #------------------------------------------------ + #WRITE-AHEAD LOG + #------------------------------------------------ + + # - Settings - + wal_level = logical # minimal, replica, or logical + # (change requires restart) + #fsync = on #flush data to disk for crash safety + ... + + #. Restart the database service. + + .. code-block:: + + # Stop + pg_ctl stop + # Start + pg_ctl start + + - .. _mrs_01_24124__li6127144552014: + + Prerequisites for the DWS database + + Before a synchronization task is started, both the source and target tables exist and have the same table structure. The value of **ads_last_update_date** in the DWS table is the current system time. + + - .. _mrs_01_24124__li347115587209: + + Prerequisites for ThirdPartyKafka + + The upper-layer source supports openGauss and OGG. Kafka topics at the source end can be consumed by Kafka in the MRS cluster. + + - Prerequisites for ClickHouse + + You have the permissions to operate ClickHouse. For details, see :ref:`ClickHouse User and Permission Management `. + +Data Types and Mapping Supported by CDL Synchronization Tasks +------------------------------------------------------------- + +This section describes the data types supported by CDL synchronization tasks and the mapping between data types of the source database and Spark data types. + +.. table:: **Table 2** Mapping between PostgreSQL and Spark data types + + ==================== ====================== + PostgreSQL Data Type Spark (Hudi) Data Type + ==================== ====================== + int2 int + int4 int + int8 bigint + numeric(p, s) decimal[p,s] + bool boolean + char string + varchar string + text string + timestamptz timestamp + timestamp timestamp + date date + json, jsonb string + float4 float + float8 double + ==================== ====================== + +.. table:: **Table 3** Mapping between MySQL and Spark data types + + =============== ====================== + MySQL Data Type Spark (Hudi) Data Type + =============== ====================== + int int + integer int + bigint bigint + double double + decimal[p,s] decimal[p,s] + varchar string + char string + text string + timestamp timestamp + datetime timestamp + date date + json string + float double + =============== ====================== + +.. table:: **Table 4** Mapping between Ogg and Spark data types + + ======================== ====================== + Oracle Data Type Spark (Hudi) Data Type + ======================== ====================== + NUMBER(3), NUMBER(5) bigint + INTEGER decimal + NUMBER(20) decimal + NUMBER decimal + BINARY_DOUBLE double + CHAR string + VARCHAR string + TIMESTAMP, DATETIME timestamp + timestamp with time zone timestamp + DATE timestamp + ======================== ====================== + +.. table:: **Table 5** Mapping between Spark (Hudi) and DWS data types + + ====================== ============= + Spark (Hudi) Data Type DWS Data Type + ====================== ============= + int int + long bigint + float float + double double + decimal[p,s] decimal[p,s] + boolean boolean + string varchar + date date + timestamp timestamp + ====================== ============= + +.. table:: **Table 6** Mapping between Spark (Hudi) and ClickHouse data types + + +------------------------+----------------------------------------------------------------------------------------------------+ + | Spark (Hudi) Data Type | ClickHouse Data Type | + +========================+====================================================================================================+ + | int | Int32 | + +------------------------+----------------------------------------------------------------------------------------------------+ + | long | Int64 (bigint) | + +------------------------+----------------------------------------------------------------------------------------------------+ + | float | Float32 (float) | + +------------------------+----------------------------------------------------------------------------------------------------+ + | double | Float64 (double) | + +------------------------+----------------------------------------------------------------------------------------------------+ + | decimal[p,s] | Decimal(P,S) | + +------------------------+----------------------------------------------------------------------------------------------------+ + | boolean | bool | + +------------------------+----------------------------------------------------------------------------------------------------+ + | string | String (LONGTEXT, MEDIUMTEXT, TINYTEXT, TEXT, LONGBLOB, MEDIUMBLOB, TINYBLOB, BLOB, VARCHAR, CHAR) | + +------------------------+----------------------------------------------------------------------------------------------------+ + | date | Date | + +------------------------+----------------------------------------------------------------------------------------------------+ + | timestamp | DateTime | + +------------------------+----------------------------------------------------------------------------------------------------+ + +Data comparison task +-------------------- + +Data comparison checks the consistency between data in the source database and that in the target Hive. If the data is inconsistent, CDL can attempt to repair the inconsistent data. For detail, see :ref:`Creating a CDL Data Comparison Job `. + +.. |image1| image:: /_static/images/en-us_image_0000001532472704.png +.. |image2| image:: /_static/images/en-us_image_0000001532791924.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/index.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/index.rst new file mode 100644 index 0000000..dfb94ba --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/index.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_24240.html + +.. _mrs_01_24240: + +Common CDL Jobs +=============== + +- :ref:`Synchronizing Data from PgSQL to Kafka ` +- :ref:`Synchronizing Data from MySQL to Hudi ` +- :ref:`Synchronizing Data from PgSQL to Hudi ` +- :ref:`Synchronizing Data from ThirdKafka to Hudi ` +- :ref:`Synchronizing Data from Hudi to DWS ` +- :ref:`Synchronizing Data from Hudi to ClickHouse ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + synchronizing_data_from_pgsql_to_kafka + synchronizing_data_from_mysql_to_hudi + synchronizing_data_from_pgsql_to_hudi + synchronizing_data_from_thirdkafka_to_hudi + synchronizing_data_from_hudi_to_dws + synchronizing_data_from_hudi_to_clickhouse diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_hudi_to_clickhouse.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_hudi_to_clickhouse.rst new file mode 100644 index 0000000..92efd19 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_hudi_to_clickhouse.rst @@ -0,0 +1,136 @@ +:original_name: mrs_01_24754.html + +.. _mrs_01_24754: + +Synchronizing Data from Hudi to ClickHouse +========================================== + +Scenario +-------- + +This section describes how to import data from Hudi to ClickHouse by using the CDLService web UI of a cluster with Kerberos authentication enabled. + +Prerequisites +------------- + +- The CDL, Hudi, and ClickHouse services have been installed in a cluster and are running properly. +- You have operation permissions on ClickHouse. For details, see :ref:`ClickHouse User and Permission Management `. +- You have created a human-machine user with the ClickHouse administrator permissions (for details, see :ref:`ClickHouse User and Permission Management `), for example, **cdluser**, added the user to user groups **cdladmin** (primary group), **hadoop**, **kafka**, and **supergroup**, and associated the user with the **System_administrator** role on FusionInsight Manager. +- You have manually created a local table and distributed table on ClickHouse. The local table uses the ReplicatedReplacingMergeTree engine. For details, see :ref:`Creating a ClickHouse Table `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as user **cdluser** (change the password upon your first login), choose **Cluster** > **Services** > **CDL**, and click the link next to **CDLService UI** to go to the CDLService web UI. + +#. Choose **Link Management** and click **Add Link**. In the displayed dialog box, set parameters for adding the **clickhouse** and **hudi** links by referring to the following tables. + + .. table:: **Table 1** ClickHouse data link parameters + + =========== ==================================== + Parameter Example + =========== ==================================== + Link Type clickhouse + Name cklink + Host 10.10.10.10:21428 + User *cdluser* + Password *Password of the* **cdluser** *user* + Description ``-`` + =========== ==================================== + + .. table:: **Table 2** Hudi data link parameters + + =============== =================================================== + Parameter Example + =============== =================================================== + Link Type hudi + Name hudilink + Storage Type hdfs + Auth KeytabFile /opt/Bigdata/third_lib/CDL/user_libs/cdluser.keytab + Principal cdluser + Description ``-`` + =============== =================================================== + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. + +#. (Optional) Choose **ENV Management** and click **Add Env**. In the displayed dialog box, configure the parameters based on the following table. + + .. table:: **Table 3** Parameters for adding an ENV + + ================ ============= + Parameter Example Value + ================ ============= + Name test-env + Driver Memory 1 GB + Type spark + Executor Memory 1 GB + Executor Cores 1 + Number Executors 1 + Queue ``-`` + Description ``-`` + ================ ============= + + Click **OK**. + +#. Choose **Job Management** > **Data synchronization task** and click **Add Job**. In the displayed dialog box, set parameters and click **Next**. + + Parameters are as follows. + + ========= ============ + Parameter Example + ========= ============ + Name job_huditock + Desc ``-`` + ========= ============ + +#. Configure Hudi job parameters. + + a. On the **Job Management** page, drag the **hudi** icon in the Source area on the left to the editing area on the right and double-click the icon to go to the Hudi job configuration page. Configure parameters based on the following table. + + .. table:: **Table 4** Source Hudi job parameters + + +------------+-------------------------------------------------------------------------------------------------------------+ + | Parameter | Example | + +============+=============================================================================================================+ + | Link | hudilink | + +------------+-------------------------------------------------------------------------------------------------------------+ + | Interval | 10 | + +------------+-------------------------------------------------------------------------------------------------------------+ + | Table Info | {"table1":[{"source.database":"db","source.tablename":"tabletest","target.tablename":"default.tabletest"}]} | + +------------+-------------------------------------------------------------------------------------------------------------+ + + |image1| + + b. Click **OK**. The Hudi job parameters are configured. + +#. Configure ClickHouse job parameters. + + a. On the **Job Management** page, drag the **clickhouse** icon on the left to the editing area on the right and double-click the icon to go to the ClickHouse job configuration page. Configure parameters based on the following table. + + .. table:: **Table 5** ClickHouse job parameters + + ============= ======= + Parameter Example + ============= ======= + Link cklink + Query Timeout 60000 + Batch Size 100 + ============= ======= + + |image2| + + b. Click **OK**. + +#. Drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image3| + +#. In the job list on the **Job Management** page, locate the created job, click **Start** in the **Operation** column, and wait until the job is started. + + Check whether the data transmission takes effect, for example, insert data into the Hudi table and view the content of the file imported to ClickHouse. + +.. |image1| image:: /_static/images/en-us_image_0000001532951888.png +.. |image2| image:: /_static/images/en-us_image_0000001583151869.png +.. |image3| image:: /_static/images/en-us_image_0000001583391865.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_hudi_to_dws.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_hudi_to_dws.rst new file mode 100644 index 0000000..bcbf123 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_hudi_to_dws.rst @@ -0,0 +1,145 @@ +:original_name: mrs_01_24753.html + +.. _mrs_01_24753: + +Synchronizing Data from Hudi to DWS +=================================== + +Scenario +-------- + +This section describes how to import data from Hudi to DWS by using the CDLService web UI of a cluster with Kerberos authentication enabled. + +Prerequisites +------------- + +- The CDL and Hudi services have been installed in a cluster and are running properly. +- The prerequisites for the DWS database have been met. For details, see :ref:`Prerequisites for the DWS database `. +- You have created a human-machine user, for example, **cdluser**, added the user to user groups **cdladmin** (primary group), **hadoop**, **kafka**, and **supergroup**, and associated the user with the **System_administrator** role on FusionInsight Manager. + +Procedure +--------- + +#. Log in to FusionInsight Manager as user **cdluser** (change the password upon your first login), choose **Cluster** > **Services** > **CDL**, and click the link next to **CDLService UI** to go to the CDLService web UI. + +#. Choose **Link Management** and click **Add Link**. In the displayed dialog box, set parameters for adding the **dws** and **hudi** links by referring to the following tables. + + .. table:: **Table 1** DWS data link parameters + + =========== =================================== + Parameter Example + =========== =================================== + Link Type dws + Name dwstest + Host 10.10.10.10 + Port 8000 + DB Name dwsdb + User dbuser + Password *Password of the* **dbuser** *user* + Description ``-`` + =========== =================================== + + .. table:: **Table 2** Hudi data link parameters + + =============== =================================================== + Parameter Example + =============== =================================================== + Link Type hudi + Name hudilink + Storage Type hdfs + Auth KeytabFile /opt/Bigdata/third_lib/CDL/user_libs/cdluser.keytab + Principal cdluser + Description xxx + =============== =================================================== + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. + +#. (Optional) Choose **ENV Management** and click **Add Env**. In the displayed dialog box, configure the parameters based on the following table. + + .. table:: **Table 3** Parameters for adding an ENV + + ================ ============= + Parameter Example Value + ================ ============= + Name test-env + Driver Memory 1 GB + Type spark + Executor Memory 1 GB + Executor Cores 1 + Number Executors 1 + Queue ``-`` + Description ``-`` + ================ ============= + + Click **OK**. + +#. Choose **Job Management** > **Data synchronization task** and click **Add Job**. In the displayed dialog box, set parameters and click **Next**. + + Parameters are as follows. + + ========= ============= + Parameter Example + ========= ============= + Name job_huditodws + Desc ``-`` + ========= ============= + +#. Configure Hudi job parameters. + + a. On the **Job Management** page, drag the **hudi** icon in the Source area on the left to the editing area on the right and double-click the icon to go to the Hudi job configuration page. Configure parameters based on the following table. + + .. table:: **Table 4** Source Hudi job parameters + + +------------+--------------------------------------------------------------------------------------------------------+ + | Parameter | Example | + +============+========================================================================================================+ + | Link | hudilink | + +------------+--------------------------------------------------------------------------------------------------------+ + | Interval | 10 | + +------------+--------------------------------------------------------------------------------------------------------+ + | Table Info | {"table1":[{"source.database":"dwsdb","source.tablename":"tabletest","target.tablename":"tabletest"}]} | + +------------+--------------------------------------------------------------------------------------------------------+ + + |image1| + + b. Click **OK**. The Hudi job parameters are configured. + +#. Configure DWS job parameters. + + a. On the **Job Management** page, drag the **dws** icon on the left to the editing area on the right and double-click the icon to go to the DWS job configuration page. Configure parameters based on the following table. + + .. table:: **Table 5** DWS job parameters + + ============= ======= + Parameter Example + ============= ======= + Link dwstest + Query Timeout 180000 + Batch Size 10 + ============= ======= + + .. note:: + + - Data can be synchronized from Hudi to GaussDB(DWS) only when both Hudi and GaussDB(DWS) contain the **precombine** field. + + - GaussDB(DWS) tables must contain the **precombine** field and the primary key. + + - By default, the Hudi built-in field **\_hoodie_event_time** is used. If this field is not used, **enable.sink.precombine** must be specified. An example is as follows: + + |image2| + + b. Click **OK**. + +#. Drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image3| + +#. In the job list on the **Job Management** page, locate the created job, click **Start** in the **Operation** column, and wait until the job is started. + + Check whether the data transmission takes effect, for example, insert data into the Hudi table and view the content of the file imported to DWS. + +.. |image1| image:: /_static/images/en-us_image_0000001532632204.png +.. |image2| image:: /_static/images/en-us_image_0000001532472732.png +.. |image3| image:: /_static/images/en-us_image_0000001532791952.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_mysql_to_hudi.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_mysql_to_hudi.rst new file mode 100644 index 0000000..78bc998 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_mysql_to_hudi.rst @@ -0,0 +1,185 @@ +:original_name: mrs_01_24751.html + +.. _mrs_01_24751: + +Synchronizing Data from MySQL to Hudi +===================================== + +Scenario +-------- + +This section describes how to import data from MySQL to Hudi by using the CDLService web UI of a cluster with Kerberos authentication enabled. + +Prerequisites +------------- + +- The CDL and Hudi services have been installed in a cluster and are running properly. +- The prerequisites for the MySQL database have been met. For details, see :ref:`Prerequisites for Enabling the MySQL Database `. +- You have created a human-machine user, for example, **cdluser**, added the user to user groups **cdladmin** (primary group), **hadoop**, **kafka**, and **supergroup**, and associated the user with the **System_administrator** role on FusionInsight Manager. + +Procedure +--------- + +#. Log in to FusionInsight Manager as user **cdluser** (change the password upon your first login), choose **Cluster** > **Services** > **CDL**, and click the link next to **CDLService UI** to go to the CDLService web UI. + +#. Choose **Driver Management** and click **Upload Driver** to upload the driver file **mysql-connector-java-8.0.24.jar** of MySQL. For details, see :ref:`Uploading a Driver File `. + +#. Choose **Link Management** and click **Add Link**. In the displayed dialog box, set parameters for adding the **mysql** and **hudi** links by referring to the following tables. + + .. table:: **Table 1** MySQL data link parameters + + =========== =============================== + Parameter Example + =========== =============================== + Link Type mysql + Name mysqllink + DB driver mysql-connector-java-8.0.24.jar + Host 10.10.10.10 + Port 3306 + User dbuser + Password *Password of the dbuser user* + Description ``-`` + =========== =============================== + + .. table:: **Table 2** Hudi data link parameters + + =============== =================================================== + Parameter Example + =============== =================================================== + Link Type hudi + Name hudilink + Storage Type hdfs + Auth KeytabFile /opt/Bigdata/third_lib/CDL/user_libs/cdluser.keytab + Principal cdluser + Description ``-`` + =============== =================================================== + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. + +#. (Optional) Choose **ENV Management** and click **Add Env**. In the displayed dialog box, configure the parameters based on the following table. + + .. table:: **Table 3** Parameters for adding an ENV + + ================ ============= + Parameter Example Value + ================ ============= + Name test-env + Driver Memory 1 GB + Type spark + Executor Memory 1 GB + Executor Cores 1 + Number Executors 1 + Queue ``-`` + Description ``-`` + ================ ============= + + Click **OK**. + +#. Choose **Job Management** > **Data synchronization task** and click **Add Job**. In the displayed dialog box, set parameters and click **Next**. + + Parameters are as follows. + + ========= =============== + Parameter Example + ========= =============== + Name job_mysqltohudi + Desc ``-`` + ========= =============== + +#. Configure MySQL job parameters. + + a. On the **Job Management** page, drag the **mysql** icon on the left to the editing area on the right and double-click the icon to go to the MySQL job configuration page. Configure parameters based on the following table. + + .. table:: **Table 4** MySQL job parameters + + ================== ========================== + Parameter Example + ================== ========================== + Link mysqllink + Tasks Max 1 + Mode insert, update, and delete + DB Name MYSQLDBA + Schema Auto Create Yes + Connect With Hudi Yes + ================== ========================== + + b. Click the plus sign (+) to display more parameters. + + |image1| + + .. note:: + + - **PosFromBeginning**: whether to capture CDC events from the start position + - **DBZ Snapshot Locking Mode**: lock mode when the connector is performing a snapshot. **none** indicates that no lock is held during the snapshot. + - **WhiteList**: Enter the database table to receive data, for example, **myclass**. + - **Blacklist**: Enter the database table that does not need to capture data. + - **Multi Partition**: whether to enable Kafka partitioning. + - **Topic Table Mapping** + + - This parameter is mandatory if **Connect With Hudi** is set to **Yes**. + - Enter the table name in the first text box, for example, **test**. Enter a topic name in the second text box, for example, **test_topic**. The topic name must match the table name in the first text box. + + c. Click **OK**. The MySQL job parameters are configured. + +#. Configure Hudi job parameters. + + a. On the **Job Management** page, drag the **hudi** icon in the Sink area on the left to the editing area on the right and double-click the icon to go to the Hudi job configuration page. Configure parameters based on the following table: + + .. table:: **Table 5** Sink Hudi job parameters + + +-------------------------------------------------------------------------+---------------+ + | Parameter | Example Value | + +=========================================================================+===============+ + | Link | hudilink | + +-------------------------------------------------------------------------+---------------+ + | Path | /cdl/test | + +-------------------------------------------------------------------------+---------------+ + | Interval | 10 | + +-------------------------------------------------------------------------+---------------+ + | Max Rate Per Partition | 0 | + +-------------------------------------------------------------------------+---------------+ + | Parallelism | 10 | + +-------------------------------------------------------------------------+---------------+ + | Target Hive Database | default | + +-------------------------------------------------------------------------+---------------+ + | Configuring Hudi Table Attributes | Visual View | + +-------------------------------------------------------------------------+---------------+ + | Global Configuration of Hudi Table Attributes | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Name | test | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Type Opt Key | COPY_ON_WRITE | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Hudi TableName Mapping | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Hive TableName Mapping | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Primarykey Mapping | id | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Hudi Partition Type | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Custom Config | ``-`` | + +-------------------------------------------------------------------------+---------------+ + + |image2| + + b. (Optional) Click the plus sign (+) to display the **Execution Env** parameter. Select a created environment for it. The default value is **defaultEnv**. + + |image3| + + c. Click **OK**. + +#. Drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image4| + +#. In the job list on the **Job Management** page, locate the created job, click **Start** in the **Operation** column, and wait until the job is started. + + Check whether the data transmission takes effect, for example, insert data into the table in the MySQL database and view the content of the file imported to Hudi. + +.. |image1| image:: /_static/images/en-us_image_0000001532632200.png +.. |image2| image:: /_static/images/en-us_image_0000001537076386.png +.. |image3| image:: /_static/images/en-us_image_0000001587875989.png +.. |image4| image:: /_static/images/en-us_image_0000001532472728.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_pgsql_to_hudi.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_pgsql_to_hudi.rst new file mode 100644 index 0000000..0e4d2fe --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_pgsql_to_hudi.rst @@ -0,0 +1,186 @@ +:original_name: mrs_01_24752.html + +.. _mrs_01_24752: + +Synchronizing Data from PgSQL to Hudi +===================================== + +Scenario +-------- + +This section describes how to import data from PgSQL to Hudi by using the CDLService web UI of a cluster with Kerberos authentication enabled. + +Prerequisites +------------- + +- The CDL and Hudi services have been installed in a cluster and are running properly. +- The prerequisites for the PgSQL database have been met. For details, see :ref:`Policy for Modifying Write-Ahead Logs in PostgreSQL Databases `. +- You have created a human-machine user, for example, **cdluser**, added the user to user groups **cdladmin** (primary group), **hadoop**, **kafka**, and **supergroup**, and associated the user with the **System_administrator** role on FusionInsight Manager. + +Procedure +--------- + +#. Log in to FusionInsight Manager as user **cdluser** (change the password upon your first login), choose **Cluster** > **Services** > **CDL**, and click the link next to **CDLService UI** to go to the CDLService web UI. + +#. Choose **Link Management** and click **Add Link**. In the displayed dialog box, set parameters for adding the **pgsql** and **hudi** links by referring to the following tables. + + .. table:: **Table 1** PgSQL data link parameters + + =========== ================================= + Parameter Example + =========== ================================= + Link Type pgsql + Name pgsqllink + Host 10.10.10.10 + Port 5432 + DB Name testDB + User user + Password *Password of the* **user** *user* + Description ``-`` + =========== ================================= + + .. table:: **Table 2** Hudi data link parameters + + =============== =================================================== + Parameter Example + =============== =================================================== + Link Type hudi + Name hudilink + Storage Type hdfs + Auth KeytabFile /opt/Bigdata/third_lib/CDL/user_libs/cdluser.keytab + Principal cdluser + Description xxx + =============== =================================================== + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. + +#. (Optional) Choose **ENV Management** and click **Add Env**. In the displayed dialog box, configure the parameters based on the following table. + + .. table:: **Table 3** Parameters for adding an ENV + + ================ ============= + Parameter Example Value + ================ ============= + Name test-env + Driver Memory 1 GB + Type spark + Executor Memory 1 GB + Executor Cores 1 + Number Executors 1 + Queue ``-`` + Description ``-`` + ================ ============= + + Click **OK**. + +#. Choose **Job Management** > **Data synchronization task** and click **Add Job**. In the displayed dialog box, set parameters and click **Next**. + + Parameters are as follows. + + ========= ============ + Parameter Example + ========= ============ + Name job_pgtohudi + Desc ``-`` + ========= ============ + +#. Configure PgSQL job parameters. + + a. On the **Job Management** page, drag the **pgsql** icon on the left to the editing area on the right and double-click the icon to go to the PgSQL job configuration page. Configure parameters based on the following table. + + .. table:: **Table 4** PgSQL job parameters + + ===================== ========================== + Parameter Example + ===================== ========================== + Link pgsqllink + Tasks Max 1 + Mode insert, update, and delete + dbName Alias pgsqldb + Schema pgschema + Slot Name pg_slot + Enable FailOver Slot No + Slot Drop No + Connect With Hudi Yes + Use Exist Publication No + Publication Name publicationtest + ===================== ========================== + + b. Click the plus sign (+) to display more parameters. + + |image1| + + .. note:: + + - **Start Time**: indicates the start time of table synchronization. + - **WhiteList**: Enter a table in the database. + - **Blacklist**: Enter the database table that does not need to capture data. + - **Topic Table Mapping** + + - This parameter is mandatory if **Connect With Hudi** is set to **Yes**. + - Enter the table name in the first text box, for example, **test**. Enter a topic name in the second text box, for example, **test_topic**. The topic name must match the table name in the first text box. + + c. Click **OK**. The PgSQL job parameters are configured. + +#. Configure Hudi job parameters. + + a. On the **Job Management** page, drag the **hudi** icon in the Sink area on the left to the editing area on the right and double-click the icon to go to the Hudi job configuration page. Configure parameters based on the following table: + + .. table:: **Table 5** Sink Hudi job parameters + + +-------------------------------------------------------------------------+---------------+ + | Parameter | Example Value | + +=========================================================================+===============+ + | Link | hudilink | + +-------------------------------------------------------------------------+---------------+ + | Path | /cdl/test | + +-------------------------------------------------------------------------+---------------+ + | Interval | 10 | + +-------------------------------------------------------------------------+---------------+ + | Max Rate Per Partition | 0 | + +-------------------------------------------------------------------------+---------------+ + | Parallelism | 10 | + +-------------------------------------------------------------------------+---------------+ + | Target Hive Database | default | + +-------------------------------------------------------------------------+---------------+ + | Configuring Hudi Table Attributes | Visual View | + +-------------------------------------------------------------------------+---------------+ + | Global Configuration of Hudi Table Attributes | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Name | test | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Type Opt Key | COPY_ON_WRITE | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Hudi TableName Mapping | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Hive TableName Mapping | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Primarykey Mapping | id | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Hudi Partition Type | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Custom Config | ``-`` | + +-------------------------------------------------------------------------+---------------+ + + |image2| + + b. (Optional) Click the plus sign (+) to display the **Execution Env** parameter. Select a created environment for it. The default value is **defaultEnv**. + + |image3| + + c. Click **OK**. + +#. Drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image4| + +#. In the job list on the **Job Management** page, locate the created job, click **Start** in the **Operation** column, and wait until the job is started. + + Check whether the data transmission takes effect, for example, insert data into the table in the PgSQL database and view the content of the file imported to Hudi. + +.. |image1| image:: /_static/images/en-us_image_0000001532791948.png +.. |image2| image:: /_static/images/en-us_image_0000001446755301.png +.. |image3| image:: /_static/images/en-us_image_0000001446835121.png +.. |image4| image:: /_static/images/en-us_image_0000001532951884.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_pgsql_to_kafka.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_pgsql_to_kafka.rst new file mode 100644 index 0000000..dc84d81 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_pgsql_to_kafka.rst @@ -0,0 +1,126 @@ +:original_name: mrs_01_24254.html + +.. _mrs_01_24254: + +Synchronizing Data from PgSQL to Kafka +====================================== + +Scenario +-------- + +This section describes how to import data from PgSQL to Kafka by using the CDLService web UI of a cluster with Kerberos authentication enabled. + +Prerequisites +------------- + +- The CDL and Kafka services have been installed in a cluster and are running properly. +- Write-ahead logging is enabled for the PostgreSQL database. For details, see :ref:`Policy for Modifying Write-Ahead Logs in PostgreSQL Databases `. +- You have created a human-machine user, for example, **cdluser**, added the user to user groups **cdladmin** (primary group), **hadoop**, and **kafka**, and associated the user with the **System_administrator** role on FusionInsight Manager. + +Procedure +--------- + +#. Log in to FusionInsight Manager as user **cdluser** (change the password upon the first login) and choose **Cluster** > **Services** > **CDL**. On the **Dashboard** page, click the hyperlink next to **CDLService UI** to go to the native CDL page. + +#. Choose **Link Management** and click **Add Link**. On the displayed dialog box, set parameters for adding the **pgsql** and **kafka** links by referring to the following tables. + + .. table:: **Table 1** PgSQL data link parameters + + =========== =========================== + Parameter Example Value + =========== =========================== + Link Type pgsql + Name pgsqllink + Host 10.10.10.10 + Port 5432 + DB Name testDB + User user + Password *Password of the user user* + Description ``-`` + =========== =========================== + + .. table:: **Table 2** Kafka data link parameters + + =========== ============= + Parameter Example Value + =========== ============= + Link Type kafka + Name kafkalink + Description ``-`` + =========== ============= + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. + +#. .. _mrs_01_24254__li8419191320242: + + On the **Job Management** page, click **Add Job**. In the displayed dialog box, configure the parameters and click **Next**. + + Specifically: + + ========= ================ + Parameter Example Value + ========= ================ + Name job_pgsqltokafka + Desc xxx + ========= ================ + +#. Configure PgSQL job parameters. + + a. On the **Job Management** page, drag the **pgsql** icon on the left to the editing area on the right and double-click the icon to go to the PgSQL job configuration page. + + .. table:: **Table 3** PgSQL job parameters + + ===================== ========================== + Parameter Example Value + ===================== ========================== + Link pgsqllink + Tasks Max 1 + Mode insert, update, and delete + Schema public + dbName Alias cdc + Slot Name a4545sad + Slot Drop No + Connect With Hudi No + Use Exist Publication Yes + Publication Name test + ===================== ========================== + + b. Click the plus sign (+) to display more parameters. + + |image1| + + .. note:: + + - **WhiteList**: Enter the name of the table in the database, for example, **myclass**. + - **Topic Table Mapping**: In the first text box, enter a topic name (the value must be different from that of **Name** in :ref:`4 `), for example, **myclass_topic**. In the second text box, enter a table name, for example, **myclass**. The value must be in one-to-one relationship with the topic name entered in the first text box.) + + c. Click **OK**. The PgSQL job parameters are configured. + +#. Configure Kafka job parameters. + + a. On the **Job Management** page, drag the **kafka** icon on the left to the editing area on the right and double-click the icon to go to the Kafka job configuration page. Configure parameters based on :ref:`Table 4 `. + + .. _mrs_01_24254__table8128935153416: + + .. table:: **Table 4** Kafka job parameter + + ========= ============= + Parameter Example Value + ========= ============= + Link kafkalink + ========= ============= + + b. Click **OK**. + +#. After the job parameters are configured, drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image2| + +#. In the job list on the **Job Management** page, locate the created jobs, click **Start** in the **Operation** column, and wait until the jobs are started. + + Check whether the data transmission takes effect. For example, insert data into the table in the PgSQL database, go to the Kafka UI to check whether data is generated in the Kafka topic by referring to :ref:`Managing Topics on Kafka UI `. + +.. |image1| image:: /_static/images/en-us_image_0000001532791944.png +.. |image2| image:: /_static/images/en-us_image_0000001532951880.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_thirdkafka_to_hudi.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_thirdkafka_to_hudi.rst new file mode 100644 index 0000000..3519b66 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/common_cdl_jobs/synchronizing_data_from_thirdkafka_to_hudi.rst @@ -0,0 +1,206 @@ +:original_name: mrs_01_24763.html + +.. _mrs_01_24763: + +Synchronizing Data from ThirdKafka to Hudi +========================================== + +Scenario +-------- + +This section describes how to import data from ThirdKafka to Hudi by using the CDLService web UI of a cluster with Kerberos authentication enabled. + +Prerequisites +------------- + +- The CDL and Hudi services have been installed in a cluster and are running properly. +- Topics of the ThirdKafka database can be consumed by the MRS cluster. For details, see :ref:`PrerequisitesforThirdPartyKafka `. +- You have created a human-machine user, for example, **cdluser**, added the user to user groups **cdladmin** (primary group), **hadoop**, **kafka**, and **supergroup**, and associated the user with the **System_administrator** role on FusionInsight Manager. + +Procedure +--------- + +#. Log in to FusionInsight Manager as user **cdluser** (change the password upon your first login), choose **Cluster** > **Services** > **CDL**, and click the link next to **CDLService UI** to go to the CDLService web UI. + +#. Choose **Link Management** and click **Add Link**. In the displayed dialog box, set parameters for adding the **thirdparty-kafka** and **hudi** links by referring to the following tables. + + .. table:: **Table 1** thirdparty-kafka data link parameters + + +-------------------------+-----------------------------------------------------+ + | Parameter | Example | + +=========================+=====================================================+ + | Name | opengausslink | + +-------------------------+-----------------------------------------------------+ + | Link Type | thirdparty-kafka | + +-------------------------+-----------------------------------------------------+ + | Bootstrap Servers | 10.10.10.10:9093 | + +-------------------------+-----------------------------------------------------+ + | Security Protocol | SASL_SSL | + +-------------------------+-----------------------------------------------------+ + | Username | testuser | + +-------------------------+-----------------------------------------------------+ + | Password | *Password of the* **testuser** *user* | + +-------------------------+-----------------------------------------------------+ + | SSL Truststore Location | Click **Upload** to upload the authentication file. | + +-------------------------+-----------------------------------------------------+ + | SSL Truststore Password | ``-`` | + +-------------------------+-----------------------------------------------------+ + | Datastore Type | opengauss | + +-------------------------+-----------------------------------------------------+ + | Host | 11.11.xxx.xxx,12.12.xxx.xxx | + +-------------------------+-----------------------------------------------------+ + | Port | 8000 | + +-------------------------+-----------------------------------------------------+ + | DB Name | opengaussdb | + +-------------------------+-----------------------------------------------------+ + | User | opengaussuser | + +-------------------------+-----------------------------------------------------+ + | DB Password | *Password of the* **opengaussuser** *user* | + +-------------------------+-----------------------------------------------------+ + | Description | ``-`` | + +-------------------------+-----------------------------------------------------+ + + .. note:: + + MRS Kafka can also be used as the source of thirdparty-kafka. If the username and password are used for login authentication, log in to FusionInsight Manager, choose **Cluster** > **Services** > **Kafka**, click **Configuration**, search for the **sasl.enabled.mechanisms** parameter in the search box, add **PLAIN** as the parameter value, click **Save** to save the configuration, and restart the Kafka service for the configuration to take effect. + + |image1| + + On the CDL web UI, configure the thirdparty-kafka link that uses MRS Kafka as the source. For example, the data link configuration is as follows: + + |image2| + + .. table:: **Table 2** Hudi data link parameters + + =============== =================================================== + Parameter Example + =============== =================================================== + Link Type hudi + Name hudilink + Storage Type hdfs + Auth KeytabFile /opt/Bigdata/third_lib/CDL/user_libs/cdluser.keytab + Principal cdluser + Description ``-`` + =============== =================================================== + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. + +#. (Optional) Choose **ENV Management** and click **Add Env**. In the displayed dialog box, configure the parameters based on the following table. + + .. table:: **Table 3** Parameters for adding an ENV + + ================ ============= + Parameter Example Value + ================ ============= + Name test-env + Driver Memory 1 GB + Type spark + Executor Memory 1 GB + Executor Cores 1 + Number Executors 1 + Queue ``-`` + Description ``-`` + ================ ============= + + Click **OK**. + +#. Choose **Job Management** > **Data synchronization task** and click **Add Job**. In the displayed dialog box, set parameters and click **Next**. + + Parameters are as follows. + + ========= =================== + Parameter Example + ========= =================== + Name job_opengausstohudi + Desc New CDL Job + ========= =================== + +#. Configure ThirdKafka job parameters. + + a. On the **Job Management** page, drag the **thirdparty-kafka** icon on the left to the editing area on the right and double-click the icon to go to the ThirdpartyKafka job configuration page. Configure parameters based on the following table. + + .. table:: **Table 4** thirdparty-kafka job parameters + + =================== =============== + Parameter Example + =================== =============== + Link opengausslink + DB Name opengaussdb + Schema opengaussschema + Datastore Type opengauss + Source Topics source_topic + Tasks Max 1 + Tolerance none + Start Time ``-`` + Multi Partition No + Topic Table Mapping test/hudi_topic + =================== =============== + + |image3| + + b. Click **OK**. The ThirdpartyKafka job parameters are configured. + +#. Configure Hudi job parameters. + + a. On the **Job Management** page, drag the **hudi** icon in the Sink area on the left to the editing area on the right and double-click the icon to go to the Hudi job configuration page. Configure parameters based on the following table: + + .. table:: **Table 5** Sink Hudi job parameters + + +-------------------------------------------------------------------------+---------------+ + | Parameter | Example Value | + +=========================================================================+===============+ + | Link | hudilink | + +-------------------------------------------------------------------------+---------------+ + | Path | /cdl/test | + +-------------------------------------------------------------------------+---------------+ + | Interval | 10 | + +-------------------------------------------------------------------------+---------------+ + | Max Rate Per Partition | 0 | + +-------------------------------------------------------------------------+---------------+ + | Parallelism | 10 | + +-------------------------------------------------------------------------+---------------+ + | Target Hive Database | default | + +-------------------------------------------------------------------------+---------------+ + | Configuring Hudi Table Attributes | Visual View | + +-------------------------------------------------------------------------+---------------+ + | Global Configuration of Hudi Table Attributes | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Name | test | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Type Opt Key | COPY_ON_WRITE | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Hudi TableName Mapping | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Hive TableName Mapping | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Primarykey Mapping | id | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Table Hudi Partition Type | ``-`` | + +-------------------------------------------------------------------------+---------------+ + | Configuring the Attributes of the Hudi Table: Custom Config | ``-`` | + +-------------------------------------------------------------------------+---------------+ + + |image4| + + b. (Optional) Click the plus sign (+) to display the **Execution Env** parameter. Select a created environment for it. The default value is **defaultEnv**. + + |image5| + + c. Click **OK**. + +#. Drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image6| + +#. In the job list on the **Job Management** page, locate the created job, click **Start** in the **Operation** column, and wait until the job is started. + + Check whether the data transmission takes effect, for example, insert data into the table in the openGauss database and view the content of the file imported to Hudi. + +.. |image1| image:: /_static/images/en-us_image_0000001583151865.png +.. |image2| image:: /_static/images/en-us_image_0000001583391861.png +.. |image3| image:: /_static/images/en-us_image_0000001583272169.png +.. |image4| image:: /_static/images/en-us_image_0000001587755985.png +.. |image5| image:: /_static/images/en-us_image_0000001536916934.png +.. |image6| image:: /_static/images/en-us_image_0000001582952097.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/creating_a_cdl_data_comparison_job.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/creating_a_cdl_data_comparison_job.rst new file mode 100644 index 0000000..dadba96 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/creating_a_cdl_data_comparison_job.rst @@ -0,0 +1,105 @@ +:original_name: mrs_01_24775.html + +.. _mrs_01_24775: + +Creating a CDL Data Comparison Job +================================== + +Scenario +-------- + +Data comparison checks the consistency between data in the source database and that in the target Hive. If the data is inconsistent, CDL can attempt to repair the inconsistent data. + +The current data comparison task supports manual full comparison. The data comparison task runs in On Yarn mode, and the comparison result is uploaded to HDFS directories. + +.. note:: + + - Currently, only basic data types can be compared. Special data types such as date, timestamp, decimal, numeric, and JSON cannot be compared. + - Data cannot be compared for tables whose field names contain database keywords. + - A data comparison task for a single table supports comparison of a maximum of 100 fields. If a table contains more than 100 fields, you can specify two whitelists of different comparison fields for data comparison. + - Currently, only the data captured from PgSQL to Hudi can be compared. If the comparison result is inconsistent, a report address is generated only when there are no more than 2000 inconsistent data records. If there are more than 2000 inconsistent data records, no report address is generated and data cannot be repaired. + - If the Kafka lag of the CDL task involved in the comparison is not 0, the comparison result is inconsistent. + +Prerequisites +------------- + +#. You have prepared the Hive UDF JAR package, copied **${BIGDATA_HOME}/FusionInsight_CDL_*/install/FusionInsight-CDL-*/cdl/hive-checksum/cdl-dc-hive-checksum-*.jar** from the CDL installation directory to the **${BIGDATA_HOME}/third_lib/Hive** directory of Hive, and set the permission on the JAR package to **750** or higher. + +#. A user with the CDL management permission has been created for the cluster with Kerberos authentication enabled.. If Ranger authentication is enabled for the current cluster, grant the Hive administrator permission and UDF operation permission to the user by referring to :ref:`Adding a Ranger Access Permission Policy for Hive `. + +#. You have created a global UDF algorithm on the Hive client as a user with the Hive administrator permission. + + Run the following command to create the **CheckSum** function in the default database: + + **create function checksum_aggregate as 'com.xxx.hive.checksum.ChecksumUdaf'** + +#. A CDL synchronization task exists. The comparison task determines the data to be compared based on the synchronization task status and data synchronization status. + +#. The database user in the data synchronization task associated with data comparison task must have the **create function** permission on the current schema. + +Procedure +--------- + +#. Log in to the CDLService web UI as the created user user (for the cluster where Kerberos authentication is not enabled). For details, see :ref:`Logging In to the CDLService WebUI `. + +#. Choose **Job Management** > **Data comparison task** and click **Add Job**. In the displayed dialog box, set related job parameters and click **Next**. + + +---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+ + | Parameter | Description | Example | + +===============+=================================================================================================================================================================+==============+ + | Name | Name of the data comparison task. | job_dc_test | + +---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+ + | CDL Job Name | Name of the associated synchronization task. (Note: The user who runs the comparison task is the user of the Hudi Link in the associated synchronization task.) | pg2hudi_test | + +---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+ + | Execution Env | Environment variable required for running Spark tasks. If no ENV is available, create one by referring to :ref:`Managing ENV `. | dc_env | + +---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+ + | Desc | Description of the task. | ``-`` | + +---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+ + +#. On the **Create Compare-Pair** page, set related parameters and click **Create**. + + +-------------------+------------------------------------------------+-----------+ + | Parameter | Description | Example | + +-------------------+------------------------------------------------+-----------+ + | Name | Name of the current comparison task. | test | + +-------------------+------------------------------------------------+-----------+ + | Source Table | Source table name. | tabletest | + +-------------------+------------------------------------------------+-----------+ + | Target Table | Target table name. | tabletest | + +-------------------+------------------------------------------------+-----------+ + | WhiteList Columns | Column family involved in data comparison. | ``-`` | + +-------------------+------------------------------------------------+-----------+ + | BlackList Columns | Column family not involved in data comparison. | ``-`` | + +-------------------+------------------------------------------------+-----------+ + | Where Condition | User-defined comparison conditions. | ``-`` | + +-------------------+------------------------------------------------+-----------+ + + To compare multiple tables, click **Add**. + +#. In the data comparison task list, click **Start** in the row of the task to start data comparison. + +#. After the execution is complete, view the result in the **Comparing Result** column. + + |image1| + +#. If the result is **Inconsistent**, click **More** and select **view records**. + + |image2| + +#. In the **Task Run Log** window, locate the target task and click **View Results** in the **Operation** column. + + |image3| + +#. Click **Repair** to repair the data. + + |image4| + +#. After the repair is complete, check whether the value of **Comparing Result** is **Consistent**. If yes, data repair is successful. If not, the repair fails. In this case, obtain the report from the corresponding HDFS directory based on the value of **Report Path** and manually repair the data. + + |image5| + +.. |image1| image:: /_static/images/en-us_image_0000001583391853.png +.. |image2| image:: /_static/images/en-us_image_0000001583272157.png +.. |image3| image:: /_static/images/en-us_image_0000001582952089.png +.. |image4| image:: /_static/images/en-us_image_0000001532632196.png +.. |image5| image:: /_static/images/en-us_image_0000001532472724.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/creating_a_cdl_data_synchronization_job.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/creating_a_cdl_data_synchronization_job.rst new file mode 100644 index 0000000..98e0cfc --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/creating_a_cdl_data_synchronization_job.rst @@ -0,0 +1,344 @@ +:original_name: mrs_01_24239.html + +.. _mrs_01_24239: + +Creating a CDL Data Synchronization Job +======================================= + +Scenario +-------- + +The CDLService web UI provides a visualized page for users to quickly create CDL jobs and import real-time data into the data lake. + +Prerequisites +------------- + +A user with the CDL management permission has been created for the cluster with Kerberos authentication enabled. + +Procedure +--------- + +#. Access the CDLService web UI as a user with the CDL management permissions or the **admin** user (for the cluster where Kerberos authentication is not enabled). For details, see :ref:`Logging In to the CDLService WebUI `. + +#. Choose **Job Management** > **Data synchronization task** and click **Add Job**. In the displayed dialog box, set related job parameters and click **Next**. + + ========= =============== ================ + Parameter Description Example Value + ========= =============== ================ + Name Job name job_pgsqltokafka + Desc Job description xxx + ========= =============== ================ + +#. On the **Job Management** page, select and drag the target element from **Source** and **Sink** to the GUI on the right. + + |image1| + + Double-click the two elements to connect them and set related parameters as required. + + To delete an element, select the element to be deleted and click **Delete** in the lower right corner of the page. + + .. table:: **Table 1** MySQL job parameters + + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Parameter | Description | Example Value | + +===========================+==========================================================================================================================================================================+============================+ + | Link | Created MySQL link | mysqllink | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Tasks Max | Maximum number of tasks that can be created by a connector. For a connector of the database type, this parameter must be set to **1**. | 1 | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Mode | Type of the CDC event to be captured by the job. Value options are as follows: | insert, update, and delete | + | | | | + | | - **insert** | | + | | - **update** | | + | | - **delete** | | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | DB Name | MySQL database name | cdl-test | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Schema Auto Create | Whether to create table schemas after the job is started | No | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Connect With Hudi | Whether to connect to Hudi | Yes | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | DBZ Snapshot Locking Mode | Lock mode used when a task starts to execute a snapshot. Value options are as follows: | none | + | | | | + | | - **minimal**: A global read lock is held only when the database schema and other metadata are obtained. | | + | | | | + | | - **extend**: A global read lock is held during the entire snapshot execution process, blocking all write operations. | | + | | | | + | | - **none**: No lock mode. The schema cannot be changed when a CDL task is started. | | + | | | | + | | Optional. Click |image2| to display this parameter. | | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | WhiteList | Whitelisted tables to be captured. | testtable | + | | | | + | | Separate multiple tables using commas (,). Wildcards are supported. | | + | | | | + | | (Optional) This parameter is displayed when you click |image3|. | | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | BlackList | Whitelisted tables not to be captured. | ``-`` | + | | | | + | | Separate multiple tables using commas (,). Wildcards are supported. | | + | | | | + | | (Optional) This parameter is displayed when you click |image4|. | | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Multi Partition | Whether to enable multi-partition mode for topics. | No | + | | | | + | | If enabled, you need to set **Topic Table Mapping** and specify the number of topic partitions, and the data of a single table will be scattered in multiple partitions. | | + | | | | + | | (Optional) This parameter is displayed when you click |image5|. | | + | | | | + | | .. note:: | | + | | | | + | | The data receiving sequence cannot be ensured. Exercise caution when setting this parameter. | | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Topic Table Mapping | Mapping between topics and tables. | testtable | + | | | | + | | If configured, table data can be sent to the specified topic. If multi-partitioning is enabled, you need to set the number of partitions, which must be greater than 1. | testtable_topic | + | | | | + | | This parameter is displayed when you click |image6|. This parameter is mandatory if **Connect With Hudi** is set to **Yes**. | | + +---------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + + .. table:: **Table 2** PgSQL job parameters + + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Parameter | Description | Example Value | + +=======================+=============================================================================================================================================================================================================================================================================================================================================================================+============================+ + | Link | Created PgSQL link. | pgsqllink | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Tasks Max | Maximum number of tasks that can be created by a connector. For a connector of the database type, this parameter must be set to **1**. | 1 | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Mode | Type of the CDC event to be captured by the job. The options are as follows: | insert, update, and delete | + | | | | + | | - **insert** | | + | | - **update** | | + | | - **delete** | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | dbName Alias | Database name. | test | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Schema | Schema of the database to be connected to. | public | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Slot Name | Name of the PostgreSQL logical replication slot. | test_solt_1 | + | | | | + | | The value can contain lowercase letters, digits, and underscores (_), and cannot be the same in any other job. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Enable FailOver Slot | Whether to enable the failover slot function. After it is enabled, the information about the logical replication slot specified as the failover slot is synchronized from the active instance to the standby instance. In this manner, logical subscription can continue even upon an active/standby switchover, implementing the failover of the logical replication slot. | No | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Slot Drop | Whether to delete the slot when a task is stopped | No | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Connect With Hudi | Whether to connect to Hudi. | Yes | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Use Exist Publication | Use a created publication | Yes | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Publication Name | Name of a created publication | test | + | | | | + | | This parameter is available when **Use Exist Publication** is set to **Yes**. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Start Time | Start time for synchronizing tables | 2022/03/16 11:33:37 | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | WhiteList | Whitelisted tables to be captured. | testtable | + | | | | + | | Separate multiple tables using commas (,). Wildcards are supported. | | + | | | | + | | (Optional) This parameter is displayed when you click |image7|. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | BlackList | Whitelisted tables not to be captured. | ``-`` | + | | | | + | | Separate multiple tables using commas (,). Wildcards are supported. | | + | | | | + | | (Optional) This parameter is displayed when you click |image8|. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Start Position | Start LSN of the data captured by a task | ``-`` | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Start Txid | Start TXID of the data captured by a task | ``-`` | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Multi Partition | Whether to enable multi-partition mode for topics. | No | + | | | | + | | If enabled, you need to set **Topic Table Mapping** and specify the number of topic partitions, and the data of a single table will be scattered in multiple partitions. | | + | | | | + | | (Optional) This parameter is displayed when you click |image9|. | | + | | | | + | | .. note:: | | + | | | | + | | The data receiving sequence cannot be ensured. Exercise caution when setting this parameter. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + | Topic Table Mapping | Mapping between topics and tables. | testtable | + | | | | + | | If configured, table data can be sent to the specified topic. If multi-partitioning is enabled, you need to set the number of partitions, which must be greater than 1. | testtable_topic | + | | | | + | | This parameter is displayed when you click |image10|. This parameter is mandatory if **Connect With Hudi** is set to **Yes**. | | + +-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+ + + .. table:: **Table 3** Source Hudi job parameters + + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +====================+==================================================================================================================================+============================================================================================================================================================================================================+ + | Link | Link used by the Hudi app | hudilink | + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Interval | Interval for synchronizing the Hudi table, in seconds | 10 | + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Start Time | Start time for synchronizing tables | 2022/03/16 11:40:52 | + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Max Commit Number | Maximum number of commits that can be pulled from an incremental view at a time. | 10 | + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Hudi Custom Config | Customized configuration related to Hudi. | ``-`` | + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Table Info | Detailed configuration information about the synchronization table. Hudi and DWS must have the same table names and field types. | {"table1":[{"source.database":"base1","source.tablename":"table1"}],"table2":[{"source.database":"base2","source.tablename":"table2"}],"table3":[{"source.database":"base3","source.tablename":"table3"}]} | + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Execution Env | Environment variable required for running the Hudi App. If no ENV is available, manually create one. | defaultEnv | + +--------------------+----------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. table:: **Table 4** Source Kafka job parameters + + ========= ================== ============= + Parameter Description Example Value + ========= ================== ============= + Link Created Kafka link kafkalink + ========= ================== ============= + + .. table:: **Table 5** thirdparty-kafka job parameters + + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+=================================================================================================================================================================================================================================+=======================+ + | Link | Created thirdparty-kafka link | thirdparty-kafkalink | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | DB Name | Name of the database to be connected to. | opengaussdb | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Schema | Schema of the database to be checked | oprngaussschema | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Datastore Type | Type of the upper-layer source. Value options are as follows: | opengauss | + | | | | + | | - opengauss | | + | | - ogg | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Avro Schema Topic | Schema topic used by OGG Kafka to store table schemas in JSON format. | ogg_topic | + | | | | + | | .. note:: | | + | | | | + | | This parameter is available when **Datastore Type** is set to **ogg**. | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Source Topics | Source topics can contain letters, digits, and special characters (-,_). Topics must be separated by commas (,). | topic1 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Tasks Max | Maximum number of tasks that can be created by a connector. For a connector of the database type, this parameter must be set to **1**. | 10 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Tolerance | Fault tolerance policy. | all | + | | | | + | | - **none**: indicates low tolerance and the Connector task will fail if an error occurs. | | + | | - **all**: indicates high tolerance and all failed records will be ignored if an error occurs. | | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Start Time | Start time for synchronizing tables | 2022/03/16 14:14:50 | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Multi Partition | Whether to enable multi-partitioning for topics. If it is enabled, you need to set **Topic Table Mapping** and specify the number of topic partitions, and the data of a single table will be scattered in multiple partitions. | No | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Topic Table Mapping | Mapping between topics and tables. | testtable | + | | | | + | | If configured, table data can be sent to the specified topic. If multi-partitioning is enabled, you need to set the number of partitions, which must be greater than 1. | testtable_topic | + +-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + + .. _mrs_01_24239__table12483144010172: + + .. table:: **Table 6** Sink Hudi job parameters + + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=========================================================================+=============================================================================================================================================================================================================================+=======================+ + | Link | Created Hudi link. | hudilink | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Path | Path for storing data. | /cdldata | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Interval | Spark RDD execution interval, in seconds. | 1 | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Max Rate Per Partition | Maximum rate for reading data from each Kafka partition using the Kafka direct stream API. It is the number of records per second. **0** indicates that the rate is not limited. | 0 | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parallelism | Parallelism for writing data to Hudi. | 100 | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Target Hive Database | Database of the target Hive | default | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring Hudi Table Attributes | View for configuring attributes of the Hudi table. The value can be: | Visual View | + | | | | + | | - Visual View | | + | | - JSON View | | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Global Configuration of Hudi Table Attributes | Global parameters on Hudi. | ``-`` | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table | Configuration of the Hudi table attributes. | ``-`` | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table: Table Name | Hudi table name, which must be the same as the source table name. | ``-`` | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table: Table Type Opt Key | Hudi table type. The options are as follows: | MERGE_ON_READ | + | | | | + | | - COPY_ON_WRITE | | + | | - MERGE_ON_READ | | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table: Hudi TableName Mapping | Hudi table name. If this parameter is not set, the name of the Hudi table is the same as that of the source table by default. | ``-`` | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table: Hive TableName Mapping | Mapping between Hudi tables and Hive tables. | ``-`` | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table: Table Primarykey Mapping | Primary key mapping of the Hudi table | id | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table: Table Hudi Partition Type | Mapping between the Hudi table and partition fields. If the Hudi table uses partitioned tables, you need to configure the mapping between the table name and partition fields. The value can be **time** or **customized**. | time | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Configuring the Attributes of the Hudi Table: Custom Config | Custom configuration | ``-`` | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Execution Env | Environment variable required for running the Hudi App. If no ENV is available, create one by referring to :ref:`Managing ENV `. | defaultEnv | + +-------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + + .. table:: **Table 7** Sink Kafka job parameters + + ========= ================== ============= + Parameter Description Example Value + ========= ================== ============= + Link Created Kafka link kafkalink + ========= ================== ============= + + .. table:: **Table 8** DWS job parameters + + +-------------------+-------------------------------------------------------------------+---------------+ + | Parameter | Description | Example Value | + +===================+===================================================================+===============+ + | Link | Link used by Connector | dwslink | + +-------------------+-------------------------------------------------------------------+---------------+ + | Query Timeout | Timeout interval for connecting to DWS, in milliseconds | 180000 | + +-------------------+-------------------------------------------------------------------+---------------+ + | Batch Size | Amount of data batch written to DWS | 50 | + +-------------------+-------------------------------------------------------------------+---------------+ + | Sink Task Number | Maximum number of concurrent jobs when a table is written to DWS. | ``-`` | + +-------------------+-------------------------------------------------------------------+---------------+ + | DWS Custom Config | Custom configuration | ``-`` | + +-------------------+-------------------------------------------------------------------+---------------+ + + Table 11 ClickHouse job parameters + + +-----------------------+-------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+=============================================================================================================+=======================+ + | Link | Link used by Connector | dwslink | + +-----------------------+-------------------------------------------------------------------------------------------------------------+-----------------------+ + | Query Timeout | Timeout interval for connecting to ClickHouse, in milliseconds | 60000 | + +-----------------------+-------------------------------------------------------------------------------------------------------------+-----------------------+ + | Batch Size | Amount of data batch written to ClickHouse | 100000 | + | | | | + | | .. note:: | | + | | | | + | | It is best practice to set this parameter to a large value. The recommended value range is 10000-100000. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------+-----------------------+ + +#. After the job parameters are configured, drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image11| + +#. In the job list on the **Job Management** page, locate the created jobs, click **Start** in the **Operation** column, and wait until the jobs are started. + + Check whether the data transmission takes effect, for example, insert data into the table in the MySQL database and view the content of the file imported to Hudi. + +.. |image1| image:: /_static/images/en-us_image_0000001582952073.png +.. |image2| image:: /_static/images/en-us_image_0000001583151845.png +.. |image3| image:: /_static/images/en-us_image_0000001583391841.png +.. |image4| image:: /_static/images/en-us_image_0000001583272145.png +.. |image5| image:: /_static/images/en-us_image_0000001532632184.png +.. |image6| image:: /_static/images/en-us_image_0000001532472712.png +.. |image7| image:: /_static/images/en-us_image_0000001532791932.png +.. |image8| image:: /_static/images/en-us_image_0000001532951868.png +.. |image9| image:: /_static/images/en-us_image_0000001583151849.png +.. |image10| image:: /_static/images/en-us_image_0000001583391845.png +.. |image11| image:: /_static/images/en-us_image_0000001532951876.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/index.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/index.rst new file mode 100644 index 0000000..3e2733a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_job/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_24774.html + +.. _mrs_01_24774: + +Creating a CDL Job +================== + +- :ref:`Creating a CDL Data Synchronization Job ` +- :ref:`Creating a CDL Data Comparison Job ` +- :ref:`Common CDL Jobs ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + creating_a_cdl_data_synchronization_job + creating_a_cdl_data_comparison_job + common_cdl_jobs/index diff --git a/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_user.rst b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_user.rst new file mode 100644 index 0000000..0edb810 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/creating_a_cdl_user.rst @@ -0,0 +1,39 @@ +:original_name: mrs_01_24234.html + +.. _mrs_01_24234: + +Creating a CDL User +=================== + +Scenario +-------- + +Before using the CDL service, a cluster administrator needs to create a user and grant operation permissions to the user to meet service requirements. + +CDL users are classified into administrators and common users. The default CDL user group for administrators and common users is **cdladmin** and **cdl**, respectively. + +- Users associated with the **cdladmin** user group can perform any CDL operations. +- Users associated with the **cdl** user group can perform creation and query operations on CDL. + +If Ranger authentication is enabled and you need to configure the creation, execution, query, or deletion permission for CDL users, see :ref:`Adding a Ranger Access Permission Policy for CDL `. + +If Ranger authentication is manually disabled for a cluster, enable Ranger authentication by referring to :ref:`Enabling Ranger Authentication `. + +.. note:: + + This section applies only to clusters with Kerberos authentication enabled. + +Procedure +--------- + +#. Log in to FusionInsight Manager. +#. Choose **System**. On the navigation pane on the left, choose **Permission** > **User** and click **Create**. +#. Enter a username, for example, **cdl_admin**. +#. Set **User Type** to **Human-Machine**. +#. Set **Password** and confirm your password. +#. Set **User Group** and **Primary Group**. + + - CDL administrator permissions: Add the **cdladmin** user group and set it to the primary group. + - Common CDL user permissions: Add the **cdl** user group and set it to the primary group. + +#. Click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_cdl/index.rst b/doc/component-operation-guide-lts/source/using_cdl/index.rst new file mode 100644 index 0000000..cb81aac --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/index.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_24123.html + +.. _mrs_01_24123: + +Using CDL +========= + +- :ref:`CDL Usage Instructions ` +- :ref:`Using CDL from Scratch ` +- :ref:`Creating a CDL User ` +- :ref:`Preparing for Creating a CDL Job ` +- :ref:`Creating a CDL Job ` +- :ref:`CDL Log Overview ` +- :ref:`CDL FAQs ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + cdl_usage_instructions + using_cdl_from_scratch + creating_a_cdl_user + preparing_for_creating_a_cdl_job/index + creating_a_cdl_job/index + cdl_log_overview + cdl_faqs/index diff --git a/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/configuring_heartbeat_and_data_consistency_check_for_a_synchronization_task.rst b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/configuring_heartbeat_and_data_consistency_check_for_a_synchronization_task.rst new file mode 100644 index 0000000..6f945ac --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/configuring_heartbeat_and_data_consistency_check_for_a_synchronization_task.rst @@ -0,0 +1,130 @@ +:original_name: mrs_01_24811.html + +.. _mrs_01_24811: + +Configuring Heartbeat and Data Consistency Check for a Synchronization Task +=========================================================================== + +Scenario +-------- + +The heartbeat and data consistency check function is used to collect full-link information about CDL synchronization tasks, including the time required for sending data from the database management system RDBMS to Kafka, the time required for writing data from Kafka to Hudi, and the number of data records, and writes the data to a specific topic (cdl_snapshot_topic). You can consume the data in the topic and write the data to a specific Hudi table for data consistency check. The heartbeat data can be used not only to determine whether data before the heartbeat time has been synchronized to the data lake, but also to determine the data latency based on the transaction time, Kafka write time, data import start time, and data import end time. + +In addition, for PgSQL tasks, configuring a heartbeat table can periodically push forward the LSN information recorded by the slot in the PgSQL database. This prevents database log stacking caused by the configuration of some tables with little changes in a task. + +Configuring the Heartbeat Table for Capturing Data from Oracle GoldenGate (OGG) to a Hudi Job +--------------------------------------------------------------------------------------------- + +#. Run the following commands in the Oracle database where data needs to be synchronized to create a heartbeat table. The heartbeat table belongs to the **CDC_CDL** schema, the table name is **CDC_HEARTBEAT**, and the primary key is **CDL_JOB_ID**. + + **CREATE TABLE "CDC_CDL"."CDC_HEARTBEAT" (** + + **"CDL_JOB_ID" VARCHAR(22) PRIMARY KEY,** + + **"CDL_LAST_HEARTBEAT" TIMESTAMP,** + + **SUPPLEMENTAL LOG DATA (ALL) COLUMNS** + + **);** + +#. Add the **CDC_HEARTBEAT** table to the OGG job to ensure that heartbeat data can be properly sent to Kafka. + +#. Configure the thirdparty-kafka (ogg) link on the CDL web UI and add the Oracle link information. + + |image1| + +#. After the configuration is complete, create a job for capturing data from OGG to Hudi on the CDL web UI and start the job to receive heartbeat data. + +Configuring the Heartbeat Table for Capturing Data from PostgreSQL to a Hudi Job +-------------------------------------------------------------------------------- + +#. Run the following commands in the PostgreSQL database to be synchronized to create a heartbeat table. The heartbeat table belongs to the **cdc_cdl** schema, the table name is **cdc_heartbeat**, and the primary key is **cdl_job_id**. + + **DROP TABLE IF EXISTS cdc_cdl.cdc_heartbeat;** + + **CREATE TABLE cdc_cdl.cdc_heartbeat (** + + **cdl_job_id int8 NOT NULL,** + + **cdl_last_heartbeat timestamp(6)** + + **);** + + **ALTER TABLE cdc_cdl.cdc_heartbeat ADD CONSTRAINT cdc_heartbeat_pkey PRIMARY KEY (cdl_job_id);** + +#. After the heartbeat table is created, create a job for capturing data from PostgreSQL to Hudi on the CDL web UI and start the job to receive heartbeat data. + +Configuring the Heartbeat Table from openGauss to a Hudi Job +------------------------------------------------------------ + +#. Run the following commands in the openGauss database to be synchronized to create a heartbeat table. The heartbeat table belongs to the **cdc_cdl** schema, the table name is **cdc_heartbeat**, and the primary key is **cdl_job_id**. + + **DROP TABLE IF EXISTS cdc_cdl.cdc_heartbeat;** + + **CREATE TABLE cdc_cdl.cdc_heartbeat (** + + **cdl_job_id int8 NOT NULL,** + + **cdl_last_heartbeat timestamp(6)** + + **);** + + **ALTER TABLE cdc_cdl.cdc_heartbeat ADD CONSTRAINT cdc_heartbeat_pkey PRIMARY KEY (cdl_job_id);** + +#. Add the heartbeat table to the DRS job to ensure that the heartbeat table data is properly sent to the DRS Kafka. + +#. On the CDL web UI, add the openGauss link information when configuring the thirdparty-kafka link of openGauss. If one primary openGauss node and multiple standby openGauss nodes are deployed, enter all IP addresses in **Host**. + + |image2| + +#. After the configuration is complete, create a job for capturing data from thirdparty-kafka to Hudi on the CDL web UI and start the job to receive heartbeat data. + +Fields in a Data Consistency Check Message +------------------------------------------ + +.. table:: **Table 1** Fields in a data consistency check message + + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | Field | Description | + +========================+===============================================================================================================================================+ + | cdl_job_name | The name of the synchronization task to which the data in this batch belongs. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | target_table_schema | The name of the schema to which the data in this batch is written. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | target_table_name | The name of the Hudi table to which the data in this batch is written. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | target_table_path | The path of the Hudi table to which the data in this batch is saved. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | total_num | The total number of data records in this batch. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | cdl_original_heartbeat | The maximum duration of heartbeat data in this batch. If this batch does not contain heartbeat data, the value is empty. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | cdl_last_heartbeat | The minimum duration of heartbeat data in this batch. If this batch does not contain heartbeat data, the value of **event_time_min** is used. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | insert_num | The total number of data **insert** events in this batch. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | update_num | The total number of data **update** events in this batch. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | delete_num | The total number of data **delete** events in this batch. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | event_time_min | The minimum transaction submission time of the data source in this batch. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | event_time_max | The maximum transaction submission time of the data source in this batch. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | event_time_avg | The average transaction submission time of the data source in this batch. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_timestamp_min | The minimum time for sending data in this batch to Kafka. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_timestamp_max | The maximum time for sending data in this batch to Kafka. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | begin_time | The time when the data in this batch starts to be written to Hudi. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | end_time | The time when the data in this batch stops to be written to Hudi. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | cdc_partitioned_time | The time partition field in the heartbeat table. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | cdc_last_update_date | The time when the check record is written. | + +------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + +.. |image1| image:: /_static/images/en-us_image_0000001583391837.png +.. |image2| image:: /_static/images/en-us_image_0000001583272137.png diff --git a/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/creating_a_database_link.rst b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/creating_a_database_link.rst new file mode 100644 index 0000000..17365ae --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/creating_a_database_link.rst @@ -0,0 +1,255 @@ +:original_name: mrs_01_24238.html + +.. _mrs_01_24238: + +Creating a Database Link +======================== + +Scenario +-------- + +Create a database link on the CDLService web UI. + +Prerequisites +------------- + +- You have obtained the driver JAR package of the data to be connected. +- A user with the CDL management permission has been created for the cluster with Kerberos authentication enabled. + +Procedure +--------- + +#. Access the CDLService web UI as a user with the CDL management permissions or the **admin** user (for the cluster where Kerberos authentication is not enabled). For details, see :ref:`Logging In to the CDLService WebUI `. + +#. Choose **Link Management** and click **Add Link**. In the displayed dialog box, enter the link name (cannot be the same as an existing one) and select the link type. + +#. Set other link parameters based on the link type. + + .. table:: **Table 1** MySQL data link parameters + + +-------------+----------------------------------------------------------------+---------------------------------+ + | Parameter | Description | Example Value | + +=============+================================================================+=================================+ + | Link Type | Link type | mysql | + +-------------+----------------------------------------------------------------+---------------------------------+ + | Name | Link name | mysqllink | + +-------------+----------------------------------------------------------------+---------------------------------+ + | DB driver | Uploaded MySQL driver file **mysql-connector-java-8.0.24.jar** | mysql-connector-java-8.0.24.jar | + +-------------+----------------------------------------------------------------+---------------------------------+ + | Host | IP address of the MySQL database | 10.10.10.10 | + +-------------+----------------------------------------------------------------+---------------------------------+ + | Port | MySQL database port | 3306 | + +-------------+----------------------------------------------------------------+---------------------------------+ + | User | User for accessing the MySQL database | user | + +-------------+----------------------------------------------------------------+---------------------------------+ + | Password | Password for accessing the MySQL database | *Password of the user user* | + +-------------+----------------------------------------------------------------+---------------------------------+ + | Description | Data link description. | xxx | + +-------------+----------------------------------------------------------------+---------------------------------+ + + .. table:: **Table 2** PgSQL data link parameters + + +-------------+-------------------------------------------+-----------------------------+ + | Parameter | Description | Example Value | + +=============+===========================================+=============================+ + | Link Type | Link type | pgsql | + +-------------+-------------------------------------------+-----------------------------+ + | Name | Link name | pgsqllink | + +-------------+-------------------------------------------+-----------------------------+ + | Host | IP address of the PgSQL database | 10.10.10.10 | + +-------------+-------------------------------------------+-----------------------------+ + | Port | PgSQL database port | 5432 | + +-------------+-------------------------------------------+-----------------------------+ + | DB Name | PgSQL database name | testDB | + +-------------+-------------------------------------------+-----------------------------+ + | User | User for accessing the PgSQL database | user | + +-------------+-------------------------------------------+-----------------------------+ + | Password | Password for accessing the PgSQL database | *Password of the user user* | + +-------------+-------------------------------------------+-----------------------------+ + | Description | Data link description | xxx | + +-------------+-------------------------------------------+-----------------------------+ + + .. table:: **Table 3** Kafka data link parameters + + =========== ====================== ============= + Parameter Description Example Value + =========== ====================== ============= + Link Type Link type. kafka + Name Link name. kafkalink + Description Data link description. ``-`` + =========== ====================== ============= + + .. table:: **Table 4** Hudi data link parameters + + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+=====================================================================================================================================================================================================================================+=======================================================================================================+ + | Link Type | Link type. | hudi | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ + | Name | Link name. | hudilink | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ + | Storage Type | Storage type, which can be either of the following: | hdfs | + | | | | + | | **hdfs**: Data is stored in HDFS. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ + | Auth KeytabFile | Keytab file of a user. You can click **Upload File** to upload the keytab file. | ${BIGDATA_HOME}/FusionInsight_CDL\_\ *X.X.X*/install/FusionInsight-CDL-*X.X.X*/cdl/keytabs/cdl.keytab | + | | | | + | | Set this parameter only for a cluster in security mode. | | + | | | | + | | .. note:: | | + | | | | + | | To obtain this file, log in to FusionInsight Manager and choose **System**. On the navigation pane on the left, choose **Permission** > **User** and choose **More** > **Download User Credential** in the **Operation** column. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ + | Principal | Domain name of the user who accesses HDFS. | cdl/test.com@HADOOP.COM | + | | | | + | | Set this parameter only for a cluster in security mode. | | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ + | Description | Data link description. | xxx | + +-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ + + .. table:: **Table 5** thirdparty-kafka data link parameters + + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Parameter | Description | Example Value | + +=========================+=========================================================================================================================================================================================================================================================================+======================================+ + | Link Type | Link type | thirdparty-kafka | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Name | Link name | thirdparty-kafkalink | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Bootstrap Servers | Kafka proxy instance, which can be set to a value in the format of *Service IP address of the Kafka Broker instance*\ **:**\ *Kafka port number*. | 10.10.10.10:21005 | + | | | | + | | .. note:: | | + | | | | + | | If MRS Kafka is used as the source of thirdparty-kafka, log in to FusionInsight Manager, choose **Cluster** > **Services** > **Kafka**, click **Configuration**, search for the port in the search box, and obtain the port number based on the encryption protocol. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Security Protocol | Encryption protocol. Value options are as follows: | SASL_SSL | + | | | | + | | - **SASL_PLAINTEXT** | | + | | | | + | | - **PLAINTEXT** | | + | | - **SASL_SSL** | | + | | - **SSL** | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Username | Username specified when **SASL_SSL** is enabled during instance creation | test | + | | | | + | | .. note:: | | + | | | | + | | IThis parameter is available only when **Security Protocol** is set to **SASL_PLAINTEXT** or **SASL_SSL**. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Password | Password configured when **SASL_SSL** is enabled during instance creation | xxx | + | | | | + | | .. note:: | | + | | | | + | | This parameter is available only when **Security Protocol** is set to **SASL_PLAINTEXT** or **SASL_SSL**. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | SSL Truststore Location | Path where the **client.truststore.jks** authentication file is stored | ``-`` | + | | | | + | | .. note:: | | + | | | | + | | This parameter is available only when **Security Protocol** is set to **SASL_SSL** or **SSL**. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | SSL Truststore Password | Password of the **client.truststore.jks** certificate file | xxx | + | | | | + | | .. note:: | | + | | | | + | | This parameter is available only when **Security Protocol** is set to **SASL_SSL** or **SSL**. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Datastore Type | Type of the upper-layer source. Value options are as follows: | opengauss | + | | | | + | | - opengauss | | + | | | | + | | - ogg | | + | | - oracle | | + | | - drs-avro-oracle | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | DB driver | Uploaded thirdparty-kafka driver file | ``-`` | + | | | | + | | .. note:: | | + | | | | + | | This parameter is displayed when **Datastore Type** is set to **ogg**. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Host | IP address of the thirdparty-kafka database | 11.11.xxx.xxx,12.12.xxx.xxx | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Port | thirdparty-kafka database port | 8000 | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | DB Name | thirdparty-kafka database name | opengaussdb | + | | | | + | | .. note:: | | + | | | | + | | This parameter is displayed when **Datastore Type** is set to **opengauss**. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | User | thirdparty-kafka database access user | opengaussuser | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | DB Password | Password for accessing the thirdparty-kafka database | *Password of the opengaussuser user* | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Sid | Service ID of Oracle | ``-`` | + | | | | + | | .. note:: | | + | | | | + | | This parameter is displayed when **Datastore Type** is set to **ogg**. | | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + | Description | Data link description. | ``-`` | + +-------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+ + + .. table:: **Table 6** DWS data link parameters + + =========== ============================================== ============= + Parameter Description Example Value + =========== ============================================== ============= + Link Type Link type dws + Name Link name dwslink + Host IP address of the DWS database to be connected 10.10.10.10 + Port Database port 8000 + DB Name Name of the database to be connected to default + User Database access user test + Password Password for accessing the database xxx + Description Data link description. ``-`` + =========== ============================================== ============= + + .. table:: **Table 7** opengauss data link parameters + + +-------------+------------------------------------------------------+---------------+ + | Parameter | Description | Example Value | + +=============+======================================================+===============+ + | Link Type | Link type | opengauss | + +-------------+------------------------------------------------------+---------------+ + | Name | Link name | opengausslink | + +-------------+------------------------------------------------------+---------------+ + | Host | IP address of the opengauss database to be connected | 10.10.10.10 | + +-------------+------------------------------------------------------+---------------+ + | Port | Database port | 8000 | + +-------------+------------------------------------------------------+---------------+ + | DB Name | Name of the database to be connected to | default | + +-------------+------------------------------------------------------+---------------+ + | User | Database access user | test | + +-------------+------------------------------------------------------+---------------+ + | Password | Password for accessing the database | xxx | + +-------------+------------------------------------------------------+---------------+ + | Description | Data link description. | ``-`` | + +-------------+------------------------------------------------------+---------------+ + + .. table:: **Table 8** ClickHouse data link parameters + + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Parameter | Description | Example Value | + +=======================+====================================================================================================================================================================================================================================+=======================+ + | Link Type | Link type | dws | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Name | Link name | clickhouselink | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Host | *Service IP address of the ClickHouseBalancer instance of ClickHouse:HTTP balancer port number*. Multiple ClickHouse instances can be connected using commas (,). | 10.10.10.10:21428 | + | | | | + | | .. note:: | | + | | | | + | | To obtain the HTTP balancer port number, log in to Manager, choose **Cluster** > **Services** > **ClickHouse**, click **Logic Cluster**, and obtain the port number from the **HTTP Balancer Port** column in the cluster list. | | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | User | Database access user | test | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Password | Password for accessing the database | xxx | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | Description | Data link description. | ``-`` | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/index.rst b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/index.rst new file mode 100644 index 0000000..2e9e0ea --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/index.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_24235.html + +.. _mrs_01_24235: + +Preparing for Creating a CDL Job +================================ + +- :ref:`Logging In to the CDLService WebUI ` +- :ref:`Uploading a Driver File ` +- :ref:`Creating a Database Link ` +- :ref:`Managing ENV ` +- :ref:`Configuring Heartbeat and Data Consistency Check for a Synchronization Task ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + logging_in_to_the_cdlservice_webui + uploading_a_driver_file + creating_a_database_link + managing_env + configuring_heartbeat_and_data_consistency_check_for_a_synchronization_task diff --git a/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/logging_in_to_the_cdlservice_webui.rst b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/logging_in_to_the_cdlservice_webui.rst new file mode 100644 index 0000000..8390506 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/logging_in_to_the_cdlservice_webui.rst @@ -0,0 +1,39 @@ +:original_name: mrs_01_24236.html + +.. _mrs_01_24236: + +Logging In to the CDLService WebUI +================================== + +Scenario +-------- + +After CDL is installed in an MRS cluster, you can manage data connections and visualized jobs using the CDL web UI. + +This section describes how to access the CDL web UI in an MRS cluster. + +.. note:: + + You are advised to use Google Chrome to access the CDLService web UI because it is incompatible with Internet Explorer. + + CDL cannot fetch tables whose names contain the dollar sign ($) and special characters. + +Prerequisites +------------- + +- The CDL component has been installed in an MRS cluster and is running properly. +- A user with the CDL management permission has been created for the cluster with Kerberos authentication enabled. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user with the CDL management permissions or the **admin** user (for the cluster where Kerberos authentication is not enabled), and choose **Cluster** > **Services** > **CDL**. + +#. On the right of **CDLService UI**, click the link to access the CDLService web UI. + + You can perform the following operations on the CDL web UI: + + - **Driver Management**: You can upload, view, and delete a driver file corresponding to the connected database. + - **Link Management**: You can create, view, edit, and delete a data connection. + - **Job Management**: You can create, view, start, pause, restore, stop, restart, delete, or edit a job. + - **ENV Management**: You can create, view, edit, and delete Hudi environment variables. diff --git a/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/managing_env.rst b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/managing_env.rst new file mode 100644 index 0000000..6c80ab8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/managing_env.rst @@ -0,0 +1,49 @@ +:original_name: mrs_01_24255.html + +.. _mrs_01_24255: + +Managing ENV +============ + +Scenario +-------- + +To capture data to or from Hudi, create and manage Hudi environment variables by performing the operations in this section. + +Prerequisites +------------- + +A user with the CDL management permission has been created for the cluster with Kerberos authentication enabled. + +Procedure +--------- + +#. Access the CDLService web UI as a user with the CDL management permissions or the **admin** user (for the cluster where Kerberos authentication is not enabled). For details, see :ref:`Logging In to the CDLService WebUI `. + +#. Choose **ENV Management** and click **Add Env**. In the displayed dialog box, set related parameters. + + .. table:: **Table 1** Parameters for adding an ENV + + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Parameter | Description | Example Value | + +==================+============================================================================================================+===============+ + | Name | ENV name | spark-env | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Type | ENV type | spark | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Driver Memory | Memory for the driver process, in GB ( by default). | 1 GB | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Executor Memory | Memory size for each Executor process, in GB by default. Its string format is the same as that of JVM. | 1 GB | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Executor Cores | Number of CPU cores occupied by each Executor | 1 | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Number Executors | Number of Executors | 1 | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Queue | Name of the Yarn tenant queue. Jobs are submitted to the default queue if this parameter is not specified. | ``-`` | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + | Description | ENV description | ``-`` | + +------------------+------------------------------------------------------------------------------------------------------------+---------------+ + +#. Click **OK**. + + After the ENV is created, you can click **Edit** or **Delete** in the **Operation** column to edit or delete the ENV, respectively. diff --git a/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/uploading_a_driver_file.rst b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/uploading_a_driver_file.rst new file mode 100644 index 0000000..66ef637 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/preparing_for_creating_a_cdl_job/uploading_a_driver_file.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_24237.html + +.. _mrs_01_24237: + +Uploading a Driver File +======================= + +Scenario +-------- + +CDL is a simple and efficient real-time data integration service. It captures events from various OLTP databases and pushes them to Kafka. When creating a database connection on the CDLService Web UI, you can upload the database's driver file to the Web UI for unified management. + +Prerequisites +------------- + +- You have obtained the driver JAR package of the database to be connected. The driver JAR package of the MySQL database is **mysql-connector-java-8.0.24.jar**. +- Drivers need to be uploaded only for the MySQL data sources. +- A user with the CDL management permission has been created for the cluster with Kerberos authentication enabled. + +Procedure +--------- + +#. Access the CDLService web UI as a user with the CDL management permissions or the **admin** user (for the cluster where Kerberos authentication is not enabled). For details, see :ref:`Logging In to the CDLService WebUI `. +#. Choose **Driver Management** and click **Upload Driver**. In the displayed dialog box, select the prepared database driver file and click **Open**. +#. On the **Driver Management** page, check whether the list of driver file names is displayed properly. + + .. note:: + + - If a driver is no longer used or is mistakenly uploaded, click **Delete** to delete its driver file. + - If there are a large number of driver files, you can enter a driver file name in the search box to quickly search for the desired driver file. diff --git a/doc/component-operation-guide-lts/source/using_cdl/using_cdl_from_scratch.rst b/doc/component-operation-guide-lts/source/using_cdl/using_cdl_from_scratch.rst new file mode 100644 index 0000000..199ef61 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_cdl/using_cdl_from_scratch.rst @@ -0,0 +1,123 @@ +:original_name: mrs_01_24232.html + +.. _mrs_01_24232: + +Using CDL from Scratch +====================== + +CDL supports data synchronization or comparison tasks in multiple scenarios. This section describes how to import data from PgSQL to Kafka on the CDLService WebUI of a cluster with Kerberos authentication enabled. + +Prerequisites +------------- + +- The CDL and Kafka services have been installed in a cluster and are running properly. +- Write-ahead logging is enabled for the PostgreSQL database. For details, see :ref:`Policy for Modifying Write-Ahead Logs in PostgreSQL Databases `. +- You have created a human-machine user, for example, **cdluser**, added the user to user groups **cdladmin** (primary group), **hadoop**, and **kafka**, and associated the user with the **System_administrator** role on FusionInsight Manager. + +Procedure +--------- + +#. Log in to FusionInsight Manager as user **cdluser** (change the password upon the first login) and choose **Cluster** > **Services** > **CDL**. On the **Dashboard** page, click the hyperlink next to **CDLService UI** to go to the native CDL page. + +#. Choose **Link Management** and click **Add Link**. On the displayed dialog box, set parameters for adding the **pgsql** and **kafka** links by referring to the following tables. + + .. table:: **Table 1** PgSQL data link parameters + + =========== =========================== + Parameter Example Value + =========== =========================== + Link Type pgsql + Name pgsqllink + Host 10.10.10.10 + Port 5432 + DB Name testDB + User user + Password *Password of the user user* + Description ``-`` + =========== =========================== + + .. table:: **Table 2** Kafka data link parameters + + =========== ============= + Parameter Example Value + =========== ============= + Link Type kafka + Name kafkalink + Description ``-`` + =========== ============= + +#. After the parameters are configured, click **Test** to check whether the data link is normal. + + After the test is successful, click **OK**. + +#. .. _mrs_01_24232__li8419191320242: + + On the **Job Management** page, click **Add Job**. In the displayed dialog box, configure the parameters and click **Next**. + + Specifically: + + ========= ================ + Parameter Example Value + ========= ================ + Name job_pgsqltokafka + Desc xxx + ========= ================ + +#. Configure PgSQL job parameters. + + a. On the **Job Management** page, drag the **pgsql** icon on the left to the editing area on the right and double-click the icon to go to the PgSQL job configuration page. + + .. table:: **Table 3** PgSQL job parameters + + ===================== ========================== + Parameter Example Value + ===================== ========================== + Link pgsqllink + Tasks Max 1 + Mode insert, update, and delete + Schema public + dbName Alias cdc + Slot Name a4545sad + Slot Drop No + Connect With Hudi No + Use Exist Publication Yes + Publication Name test + ===================== ========================== + + b. Click the plus sign (+) to display more parameters. + + |image1| + + .. note:: + + - **WhiteList**: Enter the name of the table in the database, for example, **myclass**. + - **Topic Table Mapping**: In the first text box, enter a topic name (the value must be different from that of **Name** in :ref:`4 `), for example, **myclass_topic**. In the second text box, enter a table name, for example, **myclass**. The value must be in one-to-one relationship with the topic name entered in the first text box.) + + c. Click **OK**. The PgSQL job parameters are configured. + +#. Configure Kafka job parameters. + + a. On the **Job Management** page, drag the **kafka** icon on the left to the editing area on the right and double-click the icon to go to the Kafka job configuration page. Configure parameters based on :ref:`Table 4 `. + + .. _mrs_01_24232__table8128935153416: + + .. table:: **Table 4** Kafka job parameter + + ========= ============= + Parameter Example Value + ========= ============= + Link kafkalink + ========= ============= + + b. Click **OK**. + +#. After the job parameters are configured, drag the two icons to associate the job parameters and click **Save**. The job configuration is complete. + + |image2| + +#. In the job list on the **Job Management** page, locate the created jobs, click **Start** in the **Operation** column, and wait until the jobs are started. + + Check whether the data transmission takes effect. For example, insert data into the table in the PgSQL database, go to the Kafka UI to check whether data is generated in the Kafka topic by referring to :ref:`Managing Topics on Kafka UI `. + +.. |image1| image:: /_static/images/en-us_image_0000001532951860.png +.. |image2| image:: /_static/images/en-us_image_0000001583151841.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/an_error_is_reported_in_logs_when_the_auxiliary_zookeeper_or_replica_data_is_used_to_synchronize_table_data.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/an_error_is_reported_in_logs_when_the_auxiliary_zookeeper_or_replica_data_is_used_to_synchronize_table_data.rst new file mode 100644 index 0000000..84bb32b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/an_error_is_reported_in_logs_when_the_auxiliary_zookeeper_or_replica_data_is_used_to_synchronize_table_data.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_24837.html + +.. _mrs_01_24837: + +An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data +=========================================================================================================== + +Question +-------- + +An error is reported in logs when the auxiliary ZooKeeper or replica data is used to synchronize table data. + +.. code-block:: + + DB::Exception: Cannot parse input: expected 'quorum:' before: 'merge_type: 2'…" and "Too many parts (315). Merges are processing significantly slower than inserts… + +Answer +------ + +The versions of replication table replicas are inconsistent, causing compatibility issues. The table schema contains TTL statements. TTL_DELETE is added in versions later than ClickHouse 20.9, which cannot be identified in earlier versions. This issue occurs when the replication table replica of a later version is elected as the leader. + +You can modify the **config.xml** file of ClickHouse of a later version to avoid such an issue. Ensure that the replication table replicas are the same as those of ClickHouse. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_do_if_the_disk_status_displayed_in_the_system.disks_table_is_fault_or_abnormal.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_do_if_the_disk_status_displayed_in_the_system.disks_table_is_fault_or_abnormal.rst new file mode 100644 index 0000000..5525a80 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_do_if_the_disk_status_displayed_in_the_system.disks_table_is_fault_or_abnormal.rst @@ -0,0 +1,23 @@ +:original_name: mrs_01_24778.html + +.. _mrs_01_24778: + +How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal? +======================================================================================== + +Question +-------- + +How do I do if the disk status displayed in the System.disks table is fault or abnormal? + +Answer +------ + +This problem is caused by I/O errors on the disk. To rectify the fault, perform the following steps: + +- Method 1: Log in to FusionInsight Manager and check whether an alarm is generated indicating that the disk I/O is abnormal. If yes, replace the faulty disk by referring to the alarm help. +- Method 2: Log in to FusionInsight Manager and restart the ClickHouse instance to restore the disk status. + + .. note:: + + If an I/O error occurs but the disk is not replaced, the disk status will still turn to fault or abnormal. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_grant_the_select_permission_at_the_database_level_to_clickhouse_users.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_grant_the_select_permission_at_the_database_level_to_clickhouse_users.rst new file mode 100644 index 0000000..b721050 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_grant_the_select_permission_at_the_database_level_to_clickhouse_users.rst @@ -0,0 +1,65 @@ +:original_name: mrs_01_24846.html + +.. _mrs_01_24846: + +How Do I Grant the Select Permission at the Database Level to ClickHouse Users? +=============================================================================== + +Procedure +--------- + +#. Log in to the node where the ClickHouse client is installed in the MRS cluster and run the following commands: + + **su - omm** + + **source** *{Client installation directory}*\ **/bigdata_env** + + **kinit** *Component user* (You do not need to run the **kinit** command for normal clusters.) + + **clickhouse client --host** *IP address of the ClickHouse node* **--port 9000 -m --user clickhouse -password '**\ *Password of the ClickHouse user*\ **'** + + .. note:: + + View the password of the ClickHouse user. + + Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. Click **Instance** and click any ClickHouseServer role name. Go to the **Dashboard** tab page of ClickHouseServer, click the **users.xml** file in **Configuration File** area, and view the password of the ClickHouse user. + +#. You can use either of the following methods to create a role with the read-only permission for a specified database: + + Method 1 + + a. Creating a role with the read-only permission for a specified database (the **default** database is used as an example) + + **create role ck_role on cluster default_cluster;** + + **GRANT SELECT ON default.\* TO ck_role on cluster default_cluster;** + + b. Creating a common user + + **CREATE USER user_01 on cluster default_cluster IDENTIFIED WITH PLAINTEXT_PASSWORD BY 'password';** + + c. Granting the read-only permission role to a common user + + **GRANT ck_role to user_01 on cluster default_cluster;** + + d. Viewing user permissions + + **show grants for user_01;** + + **select \* from system.grants where role_name = 'ck_role';** + + Method 2 + + Creating a user with the read-only permission for a specified database + + a. Creating a user: + + **CREATE USER user_01 on cluster default_cluster IDENTIFIED WITH PLAINTEXT_PASSWORD BY 'password';** + + b. Granting the query permission on a specified database to the created user: + + **grant select on default.\* to user_01 on cluster default_cluster;** + + c. Querying user permissions: + + **select \* from system.grants where user_name = 'user_01';** diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_migrate_data_from_hive_hdfs_to_clickhouse.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_migrate_data_from_hive_hdfs_to_clickhouse.rst new file mode 100644 index 0000000..2c68acd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/how_do_i_migrate_data_from_hive_hdfs_to_clickhouse.rst @@ -0,0 +1,24 @@ +:original_name: mrs_01_24831.html + +.. _mrs_01_24831: + +How Do I Migrate Data from Hive/HDFS to ClickHouse? +=================================================== + +Question +-------- + +How do I migrate Hive/HDFS data to ClickHouse? + +Answer +------ + +You can export data from Hive as CSV files and import the CSV files to ClickHouse. + +#. Export data from Hive as CSV files. + + **hive -e "select \* from db_hive.student limit 1000"\| tr "\\t" "," > /data/bigdata/hive/student.csv;** + +#. Import the CSV files to the **student_hive** table in the default database of ClickHouse. + + **clickhouse --client --port 9002 --password password -m --query='INSERT INTO default.student_hive FORMAT CSV' < /data/bigdata/hive/student.csv** diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/index.rst new file mode 100644 index 0000000..b725fdd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_faq/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_24777.html + +.. _mrs_01_24777: + +ClickHouse FAQ +============== + +- :ref:`How Do I Do If the Disk Status Displayed in the System.disks Table Is fault or abnormal? ` +- :ref:`How Do I Migrate Data from Hive/HDFS to ClickHouse? ` +- :ref:`An Error Is Reported in Logs When the Auxiliary ZooKeeper or Replica Data Is Used to Synchronize Table Data ` +- :ref:`How Do I Grant the Select Permission at the Database Level to ClickHouse Users? ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + how_do_i_do_if_the_disk_status_displayed_in_the_system.disks_table_is_fault_or_abnormal + how_do_i_migrate_data_from_hive_hdfs_to_clickhouse + an_error_is_reported_in_logs_when_the_auxiliary_zookeeper_or_replica_data_is_used_to_synchronize_table_data + how_do_i_grant_the_select_permission_at_the_database_level_to_clickhouse_users diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/clickhouse_multi-tenancy_overview.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/clickhouse_multi-tenancy_overview.rst new file mode 100644 index 0000000..87b770c --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/clickhouse_multi-tenancy_overview.rst @@ -0,0 +1,36 @@ +:original_name: mrs_01_24790.html + +.. _mrs_01_24790: + +ClickHouse Multi-Tenancy Overview +================================= + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +ClickHouse Multi-Tenancy +------------------------ + +The ClickHouse multi-tenancy feature enables you to manage cluster resources through the user > tenant role > resource profile management model. Currently, memory and CPU priority management is supported. The following figure shows a multi-tenancy model. + +|image1| + +On the service configuration and tenant management pages of FusionInsight Manager, you can configure memory quotas for services, create tenants, associate ClickHouse services, bind logical clusters, set available memory and CPU priorities for tenants, and associate tenants with users. The following figure illustrates the role association between Manager and ClickHouse. + +|image2| + +The following table lists the resource configurations supported by the current version. + ++-------------------------------------+-------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Resource | Value Range | Description | Remarks | ++=====================================+=============+==================================================================================================================+==================================================================================================================================================================================================================================+ +| Service-level memory resource limit | 0-1 | Percentage of available ClickHouse memory to total server memory | For example, if the physical memory of the server is 10 GB and the limit is set to **0.9**, the available memory of the ClickHouse service on the current server is 9 GB (10 GB x 0.9). | ++-------------------------------------+-------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Tenant-level memory resource limit | 0%-100% | Percentage of the available memory of the current tenant in ClickHouseServer | If this limit is set to **80**, the total memory that can be used by the current tenant is calculated as follows: Total memory that can be used by the service x 80% | ++-------------------------------------+-------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Tenant-level CPU priority | -20 to 19 | NICE value of the OS associated with this value. A smaller value indicates a higher CPU priority of the process. | This feature depends on **CAP_SYS_NICE** of the OS. By default, this feature is disabled after the cluster is installed. To use this feature, enable it by referring to :ref:`Enabling the CPU Priority Feature `. | ++-------------------------------------+-------------+------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. |image1| image:: /_static/images/en-us_image_0000001532836094.png +.. |image2| image:: /_static/images/en-us_image_0000001532996022.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/enabling_the_cpu_priority_feature.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/enabling_the_cpu_priority_feature.rst new file mode 100644 index 0000000..f4d00fd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/enabling_the_cpu_priority_feature.rst @@ -0,0 +1,42 @@ +:original_name: mrs_01_24789.html + +.. _mrs_01_24789: + +Enabling the CPU Priority Feature +================================= + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Scenario +-------- + +ClickHouse tenants support CPU priorities. This feature depends on CAP_SYS_NICE of the OS and takes effect only after being enabled. + +Procedure +--------- + +#. Log in to the ClickHouseServer node as user **root** and run the following command: + + **setcap cap_sys_nice=+ep /opt/Bigdata/FusionInsight_ClickHouse_*/install/FusionInsight-ClickHouse-*/clickhouse/bin/clickhouse** + + .. note:: + + You need to run this command on all ClickHouseServer nodes. + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. Cick the **Instance** tab. Select all ClickHouseServer instances, and choose **More** > **Restart Instance**. + +#. Run the following command to check whether the CPU priority feature is enabled: + + **getcap /opt/Bigdata/FusionInsight_ClickHouse_*/install/FusionInsight-ClickHouse-*/clickhouse/bin/clickhouse** + + The following command output indicates that the feature has been enabled: + + .. code-block:: + + /opt/Bigdata/FusionInsight_ClickHouse_*/install/FusionInsight-ClickHouse*/clickhouse/bin/clickhouse = cap_sys_nice+ep + +#. (Optional) If the current cluster runs SUSE, run the following command on each ClickHouseServer node: + + **sudo zypper install libcap-progs** diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/index.rst new file mode 100644 index 0000000..e35f9ed --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_24784.html + +.. _mrs_01_24784: + +ClickHouse Multi-Tenancy +======================== + +- :ref:`ClickHouse Multi-Tenancy Overview ` +- :ref:`Enabling the CPU Priority Feature ` +- :ref:`Managing ClickHouse Tenants ` +- :ref:`Modifying the Memory Limit of ClickHouse on a ClickHouseServer Node ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + clickhouse_multi-tenancy_overview + enabling_the_cpu_priority_feature + managing_clickhouse_tenants + modifying_the_memory_limit_of_clickhouse_on_a_clickhouseserver_node diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/managing_clickhouse_tenants.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/managing_clickhouse_tenants.rst new file mode 100644 index 0000000..3b87934 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/managing_clickhouse_tenants.rst @@ -0,0 +1,147 @@ +:original_name: mrs_01_24791.html + +.. _mrs_01_24791: + +Managing ClickHouse Tenants +=========================== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Scenario +-------- + +On FusionInsight Manager, cluster administrators can create a ClickHouse tenant and associate it with a logical cluster. After a system user is bound to the tenant, the system user has the permissions on the logical cluster of the tenant. + +.. _mrs_01_24791__section552518236108: + +Creating a ClickHouse Tenant +---------------------------- + +#. Log in to FusionInsight Manager and choose **Tenant Resources**. + +#. Click |image1|. On the page that is displayed, configure tenant attributes according to :ref:`Table 1 `. + + .. _mrs_01_24791__table6423224172911: + + .. table:: **Table 1** Tenant parameters + + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +====================================+================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | Cluster | Cluster for which you want to create a tenant | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Name | - Name of the current tenant. The value consists of 3 to 50 characters, including numbers, letters, and underscores (_). | + | | - Plan a tenant name based on service requirements. The name cannot be the same as that of a role, HDFS directory, or Yarn queue that exists in the current cluster. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Tenant Resource Type | Select **Leaf Tenant** **Resource**. | + | | | + | | .. note:: | + | | | + | | Create a ClickHouse tenant. **Tenant** **Resource** **Type** can only be **Leaf Tenant**. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Computing Resource | Dynamic compute resources for the current tenant | + | | | + | | - When **Yarn** is selected, the system automatically creates a queue in Yarn and the queue is named the same as the tenant name. | + | | - If **Yarn** is not selected, the system does not automatically create a queue. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Mode | Configuration mode of compute resource parameters | + | | | + | | - If you select **Basic**, you only need to configure **Default Resource Pool Capacity (%)**. | + | | - If you select **Advanced**, you can manually configure the resource allocation weight and the minimum, maximum, and reserved resources of the tenant. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Default Resource Pool Capacity (%) | Percentage of compute resources used by the current tenant in the default resource pool. The value ranges from 0 to 100%. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Weight | Resource allocation weight. The value ranges from 0 to 100. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Minimum Resource | Resources guaranteed for the tenant (preemption supported). The value can be a percentage or an absolute value of the parent tenant's resources. When a tenant has a light workload, the resources of the tenant are automatically allocated to other tenants. When the available tenant resources are less than the value of **Minimum Resource**, the tenant can preempt the resources that have been lent to other tenants. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Maximum Resource | Maximum resources that can be used by the tenant. The tenant cannot obtain more resources than the value configured. The value can be a percentage or an absolute value of the parent tenant's resources. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Reserved Resource | Resources reserved for the tenant resource. The reserved resources cannot be used by other tenants even if no job is running in the current tenant resources. The value can be a percentage or an absolute value of the parent tenant's resources. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Storage Resource | Storage resources for the current tenant. | + | | | + | | - When **HDFS** is selected, the system automatically allocates storage resources. | + | | - If **HDFS** is not selected, the system does not automatically allocate storage resources. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Quota | Quota for files and directories | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Space Quota | Quota for the HDFS storage space used by the current tenant | + | | | + | | - If the unit is set to **MB**, the value ranges from 1 to 8796093022208. If the unit is set to **GB**, the value ranges from 1 to 8589934592. | + | | - This parameter indicates the maximum HDFS storage space that can be used by the tenant, but not the actual space used. | + | | - If its value is greater than the size of the HDFS physical disk, the maximum space available is the full space of the HDFS physical disk. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Storage Path | HDFS storage directory for the tenant | + | | | + | | - The system automatically creates a folder named after the tenant name in the **/tenant** directory by default. For example, the default HDFS storage directory for tenant **ta1** is **/tenant/ta1**. | + | | - When a tenant is created for the first time, the system automatically creates the **/tenant** directory in the HDFS root directory. The storage path is customizable. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Service | - Select **ClickHouse** for **Service**. | + | | | + | | - **Association Type**: When **Service** is set to **ClickHouse**, **Association Type** can only be set to **Shared**. | + | | - **Associate Logical Cluster**: If the logical cluster function is not enabled for ClickHouse, **default_cluster** is selected by default. If the function is enabled, select the logical cluster to which you want to associate. | + | | - **CPU Priority**: The CPU priority ranges from -20 to 19. This value is associated with the NICE value of the OS. A smaller value indicates a higher CPU priority. For details about how to enable the CPU priority, see :ref:`Enabling the CPU Priority Feature `. | + | | - **Memory**: The maximum value of this parameter is **100**, in percentage. For example, if this parameter is set to **80**, the total memory that can be used by the current tenant is calculated as follows: Available memory x 80%. | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Description | Description of the current tenant | + +------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Click **OK**. Wait until the tenant is created. + +#. After the ClickHouse tenant is created, you can view and modify tenant resources on the **Tenant Resources** page. + + a. On FusionInsight Manager, choose **Tenant Resources**. In the tenant list, select the ClickHouse tenant whose information you want to view and view the tenant overview and resource quota. + b. Choose **Resources** and click |image2| next to **Resource Details** to modify tenant resources. + c. After the modification is complete, click **OK**. The modified resource details are displayed on the **Resources** page. + + .. note:: + + After modifying the resource quota of the ClickHouse tenant, you need to log in to the ClickHouse client again for the modification to take effect. + +Adding a User and Binding the User to a Tenant +---------------------------------------------- + +- To create a user and bind the user to a tenant, log in to FusionInsight Manager, choose **System** > **Permission** > **User**, click **Create User** to add a human-machine user, and add the tenant created by referring to :ref:`Creating a ClickHouse Tenant `. Then, the user has the permissions on the ClickHouse logical cluster. + +- To bind an existing user to a tenant, log in to FusionInsight Manager, choose **System** > **Permission** > **User**, click **Modify** in the **Operation** column of the user, and add the tenant created by referring to :ref:`Creating a ClickHouse Tenant `. Delete the ClickHouse tenant from the role if the tenant is no longer needed. + + .. note:: + + - After a user is bound to a ClickHouse tenant, the user has the permission to modify the logical cluster of the tenant. + - When multiple users are bound to the same tenant, the tenant-level memory limit of the current version does not support real-time total memory limit. For example, user1 and user2 are bound to tenant1, the memory limit for tenant1 is 10 GB, and the query performed by user1 uses 5 GB memory. When user2 initiates a query, the memory that can be used by user2 is limited to 5 GB. During the query, the service does not dynamically update this restriction. + - In the current version, a user cannot be bound to multiple ClickHouse tenants. If user1 has been associated with tenant1, no error message is displayed when user1 is associated with tenant2, but the information is recorded in background logs, indicating that the user has been associated with a tenant and this association operation is invalid. + +Associating an Existing Tenant with the ClickHouse Service +---------------------------------------------------------- + +#. On FusionInsight Manager, choose **Tenant Resources**, select the tenant to which you want to associate a service, click the **Service Association** tab, and click **Associate Service**. The following table describes the parameters. + + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===========================+=================================================================================================================================================================================================================================================================+ + | Service | Select **ClickHouse**. | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Association Type | Select **Shared**. | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Associate Logical Cluster | If the logical cluster function is not enabled for ClickHouse, **default_cluster** is selected by default. If the function is enabled, select the logical cluster to which you want to associate. | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | CPU Priority | The CPU priority ranges from -20 to 19. This value is associated with the NICE value of the OS. A smaller value indicates a higher CPU priority. For details about how to enable the CPU priority, see :ref:`Enabling the CPU Priority Feature `. | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Memory | The maximum value of this parameter is **100**, in percentage. For example, if this parameter is set to **80**, the total memory that can be used by the current tenant is calculated as follows: Available memory x 80%. | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. On the displayed tab page, configure the tenant based on service requirements and click **OK**. The tenant is associated with the service. + +#. To disassociate the ClikHouse service, perform the following operations: + + On FusionInsight Manager, choose **Tenant Resources**, select the tenant whose ClickHouse service is to be disassociated, and click **Delete** in the **Operation** column. In the dialog box that is displayed, click **OK**. + + .. note:: + + After the ClickHouse service is disassociated from a tenant, the tenant and its users no longer have the permissions on the ClickHouse logical cluster. + +.. |image1| image:: /_static/images/en-us_image_0000001583435997.png +.. |image2| image:: /_static/images/en-us_image_0000001583195981.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/modifying_the_memory_limit_of_clickhouse_on_a_clickhouseserver_node.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/modifying_the_memory_limit_of_clickhouse_on_a_clickhouseserver_node.rst new file mode 100644 index 0000000..2d7ae41 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_multi-tenancy/modifying_the_memory_limit_of_clickhouse_on_a_clickhouseserver_node.rst @@ -0,0 +1,31 @@ +:original_name: mrs_01_24786.html + +.. _mrs_01_24786: + +Modifying the Memory Limit of ClickHouse on a ClickHouseServer Node +=================================================================== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Scenario +-------- + +Modify the maximum memory allowed for ClickHouse on a ClickHouseServer node to ensure the normal use of other service instances on the node. + +Procedure +--------- + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. Click **Configuration** then **All Configurations**, click **ClickHouseServer(Role)**, and select **Performance**. + + |image1| + +#. Change the value of **max_server_memory_usage_to_ram_ratio** as required and save the configuration. + + .. note:: + + - Restart is not required for the modification to take effect. + - The value ranges from 0 to 1, indicating the ratio of the total physical RAM of the server that can be used for ClickHouse. For example, if the physical memory of the server is 10 GB and the value of this parameter is **0.9**, the available memory of the ClickHouse service on the current server is 9 GB (10 GB x 0.9). If the value of this parameter is **0**, it indicates that the memory is not limited and the ClickHouse service can use all the physical memory of the server. The value of this parameter can contain a maximum of two decimal places. + +.. |image1| image:: /_static/images/en-us_image_0000001583316317.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/accelerating_merge_operations.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/accelerating_merge_operations.rst new file mode 100644 index 0000000..e3c8be3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/accelerating_merge_operations.rst @@ -0,0 +1,43 @@ +:original_name: mrs_01_24853.html + +.. _mrs_01_24853: + +Accelerating Merge Operations +============================= + +To accelerate background tasks, adjust the ZooKeeper service configuration first. Otherwise, the ClickHouse service and background tasks will be abnormal due to insufficient ZooKeeper resources such as znodes. + +#. Adjust the ZooKeeper configuration. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Zookeeper**. Click **Configurations** then **All Configurations**. Click **quorumpeer** > **System**, change the value of **GC_OPTS** according to the following table, save the configuration, and roll restart the ZooKeeper service. + + +-----------------------+------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Reference Value | Description | + +=======================+==============================================================================+===============================================================================================================================================================================================================================+ + | GC_OPTS | Xmx = (Memory size of master nodes - 16 GB) x 0.65 (conservative estimation) | JVM parameter used for garbage collection (GC). This parameter is valid only when **GC_PROFILE** is set to **custom**. Ensure that the **GC_OPT** parameter is set correctly. Otherwise, the process will fail to be started. | + | | | | + | | | .. caution:: | + | | | | + | | | CAUTION: | + | | | Exercise caution when modifying this item. If this parameter is set incorrectly, the service will be unavailable. | + +-----------------------+------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Adjust the ClickHouse configuration. On FusionInsight Manager, choose **Cluster** > **Services** > **ClickHouse**. Click **Configurations** then **All Configurations**. Click **ClickHouse** > **Zookeeper**, modify the following parameters, and save the configuration. You do not need to restart the service. + + +---------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Reference Value | Description | + +=======================================+=======================+=========================================================================================================================================================================================+ + | clickhouse.zookeeper.quota.node.conut | Xmx/4 GB x 1,500,000 | Node quantity quota of ClickHouse in the top directory on ZooKeeper. | + | | | | + | | | This parameter cannot be set to **0**, but can be set to a minimum value of **-1**. **-1** indicates that there is no limit on the number of ClickHouse nodes in the top directory. | + | | | | + | | | .. caution:: | + | | | | + | | | CAUTION: | + | | | If the quantity quota is less than the actual value of the current ZooKeeper directory, the configuration can be saved but does not take effect and an alarm is reported on the GUI. | + +---------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | clickhouse.zookeeper.quota.size | Xmx/4 GB x 1 GB | Capacity quota of ClickHouse in the top directory on ZooKeeper. | + | | | | + | | | .. caution:: | + | | | | + | | | CAUTION: | + | | | If the quantity quota is less than the actual value of the current ZooKeeper directory, the configuration can be saved but does not take effect and an alarm is reported on the GUI. | + +---------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/accelerating_ttl_operations.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/accelerating_ttl_operations.rst new file mode 100644 index 0000000..0d0be69 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/accelerating_ttl_operations.rst @@ -0,0 +1,22 @@ +:original_name: mrs_01_24855.html + +.. _mrs_01_24855: + +Accelerating TTL Operations +=========================== + +When TTL is triggered in ClickHouse, a large amount of CPU and memory are consumed. + +Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. Click **Configurations** then **All Configurations**. Click **ClickHouseServer** > **Customization**, find the **clickhouse-config-customize** parameter, add the following parameters, save the configuration, and restart the service. + ++----------------------------------------------------+-----------------------------+----------------------------------------------------------------------------------------------+ +| Configuration Item | Reference Value | Description | ++====================================================+=============================+==============================================================================================+ +| merge_tree.max_replicated_merges_with_ttl_in_queue | Half of number of CPU cores | Number of tasks that allow TTL to merge parts concurrently in the ReplicatedMergeTree queue. | ++----------------------------------------------------+-----------------------------+----------------------------------------------------------------------------------------------+ +| merge_tree.max_number_of_merges_with_ttl_in_pool | Number of CPU cores | The thread pool that allows TTL to merge parts in the ReplicatedMergeTree queue. | ++----------------------------------------------------+-----------------------------+----------------------------------------------------------------------------------------------+ + +.. note:: + + Do not modify these configurations when the cluster writes heavily. Idle threads need to be reserved for regular Merge operations to avoid the "Too many parts" issue. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/index.rst new file mode 100644 index 0000000..18d9656 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/index.rst @@ -0,0 +1,18 @@ +:original_name: mrs_01_24848.html + +.. _mrs_01_24848: + +ClickHouse Performance Tuning +============================= + +- :ref:`Solution to the "Too many parts" Error in Data Tables ` +- :ref:`Accelerating Merge Operations ` +- :ref:`Accelerating TTL Operations ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + solution_to_the_too_many_parts_error_in_data_tables + accelerating_merge_operations + accelerating_ttl_operations diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/solution_to_the_too_many_parts_error_in_data_tables.rst b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/solution_to_the_too_many_parts_error_in_data_tables.rst new file mode 100644 index 0000000..dac214b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/clickhouse_performance_tuning/solution_to_the_too_many_parts_error_in_data_tables.rst @@ -0,0 +1,52 @@ +:original_name: mrs_01_24849.html + +.. _mrs_01_24849: + +Solution to the "Too many parts" Error in Data Tables +===================================================== + +Troubleshooting +--------------- + +#. Log in to the ClickHouse client and check whether abnormal merge exists. + + **select database, table, elapsed, progress, merge_type from system.merges;** + +#. Do not perform the INSERT operation too frequently, do not insert a small amount of data, and increase the interval for inserting data. + +#. The data table partitions are not properly allocated. As a result, too many partitions are generated and data tables need to be re-partitioned. + +#. If the MERGE operations are not triggered or slow, adjust the following parameters to accelerate them. + + For details, see :ref:`Accelerating Merge Operations `. + + +----------------------+-------------------------------------------------------------------------------------------------+ + | Configuration Item | Reference Value | + +======================+=================================================================================================+ + | max_threads | Number of CPU cores x 2 | + +----------------------+-------------------------------------------------------------------------------------------------+ + | background_pool_size | Number of CPU cores | + +----------------------+-------------------------------------------------------------------------------------------------+ + | merge_max_block_size | The value is an integer multiple of 8192 and is adjusted based on the CPU and memory resources. | + +----------------------+-------------------------------------------------------------------------------------------------+ + | cleanup_delay_period | Set this parameter to a value that is appropriately less than the default value 30. | + +----------------------+-------------------------------------------------------------------------------------------------+ + +Changing the Value of parts_to_throw_insert +------------------------------------------- + +.. caution:: + + Increase the value of this parameter only in special scenarios. This configuration acts as a warning for potential issues to some extent. If the cluster hardware resources are insufficient and this configuration is not adjusted properly, potential service issues cannot be detected in a timely manner, which may cause other faults and increase the difficulty of fault recovery. + +Log in to FusionInsight Manager and choose **Cluster** > **Services** > **ClickHouse**. Click **Configurations** then **All Configurations**. Click **ClickHouseServer** > **Customization**, find the **clickhouse-config-customize** parameter, add the following configuration, save it, and restart the service. + ++----------------------------------+----------------------------------------------------------------------+ +| Name | Value | ++==================================+======================================================================+ +| merge_tree.parts_to_throw_insert | Memory of ClickHouse instances/32 GB x 300 (conservative estimation) | ++----------------------------------+----------------------------------------------------------------------+ + +Verify the modification. + +Log in to the ClickHouse client and run the **select \* from system.merge_tree_settings where name = 'parts_to_throw_insert';** command. diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/index.rst index 8cb7011..96c0e58 100644 --- a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/index.rst +++ b/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/index.rst @@ -13,7 +13,6 @@ Common ClickHouse SQL Syntax - :ref:`DESC: Querying a Table Structure ` - :ref:`DROP: Deleting a Table ` - :ref:`SHOW: Displaying Information About Databases and Tables ` -- :ref:`Importing and Exporting File Data ` .. toctree:: :maxdepth: 1 @@ -27,4 +26,3 @@ Common ClickHouse SQL Syntax desc_querying_a_table_structure drop_deleting_a_table show_displaying_information_about_databases_and_tables - importing_and_exporting_file_data diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/enabling_the_read-only_mode_of_the_clickhouse_table.rst b/doc/component-operation-guide-lts/source/using_clickhouse/enabling_the_read-only_mode_of_the_clickhouse_table.rst new file mode 100644 index 0000000..8551f56 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/enabling_the_read-only_mode_of_the_clickhouse_table.rst @@ -0,0 +1,66 @@ +:original_name: mrs_01_24451.html + +.. _mrs_01_24451: + +Enabling the Read-Only Mode of the ClickHouse Table +=================================================== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Scenario +-------- + +During data migration, one-click balancing, decommissioning and capacity reduction, ClickHouse allows you to set the **only_allow_select_statement** parameter for MergeTree series tables to enable SELECT operations instead of ALTER, RENAME, DROP, and INSERT operations. + +Procedure for Enabling the Read-Only Mode of the ClickHouse Table +----------------------------------------------------------------- + +#. .. _mrs_01_24451__li10269200102512: + + Run the following commands to log in to the node where the client is installed as user **root**: + + **cd** *Client installation directory* + + **source bigdata_env** + +#. Run the following command to authenticate the user if the cluster is in security mode (with Kerberos authentication enabled). Otherwise, skip this step. + + **kinit** *Component service user* + + .. note:: + + The user must have the ClickHouse administrator permissions. + +#. .. _mrs_01_24451__li4140746493: + + Run the proper client command to connect to the ClickHouse server. + + - Normal mode + + **clickhouse client --host** *IP address of the ClickHouse instance*\ **--user** *Username* **--password** **--port** 9440 **--secure** + + *Enter the user password.* + + - Security mode + + **clickhouse client --host** *IP address of the ClickHouse instance*\ **--port** 9440 **--secure** + + .. note:: + + - The user in normal mode is the default user, or you can create an administrator using the open source capability provided by the ClickHouse community. You cannot use the users created on FusionInsight Manager. + - To obtain the IP address of the ClickHouseServer instance, log in to FusionInsight Manager, choose **Cluster** > **Services** > **ClickHouse**, and click the **Instance** tab. + +#. Run the following statement to set the table to read-only: + + **ALTER TABLE** *{table_name}* **MODIFY SETTING only_allow_select_statement = true;** + +Disabling the Read-Only Mode of the Table +----------------------------------------- + +#. Log in to the ClickHouse client by referring to :ref:`1 ` to :ref:`3 `. + +#. Run the following statement to disable the read-only mode of the table: + + **ALTER TABLE** *{table_name}* **MODIFY SETTING only_allow_select_statement = false settings hw_internal_operation = true;** diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/index.rst index fe6f5e6..8952ea9 100644 --- a/doc/component-operation-guide-lts/source/using_clickhouse/index.rst +++ b/doc/component-operation-guide-lts/source/using_clickhouse/index.rst @@ -6,25 +6,33 @@ Using ClickHouse ================ - :ref:`Using ClickHouse from Scratch ` +- :ref:`Enabling the Read-Only Mode of the ClickHouse Table ` - :ref:`Common ClickHouse SQL Syntax ` - :ref:`User Management and Authentication ` +- :ref:`ClickHouse Multi-Tenancy ` - :ref:`ClickHouse Table Engine Overview ` - :ref:`Creating a ClickHouse Table ` -- :ref:`Using the ClickHouse Data Migration Tool ` +- :ref:`Migrating ClickHouse Data ` - :ref:`Monitoring of Slow ClickHouse Query Statements and Replication Table Data Synchronization ` - :ref:`Adaptive MV Usage in ClickHouse ` - :ref:`ClickHouse Log Overview ` +- :ref:`ClickHouse Performance Tuning ` +- :ref:`ClickHouse FAQ ` .. toctree:: :maxdepth: 1 :hidden: using_clickhouse_from_scratch + enabling_the_read-only_mode_of_the_clickhouse_table common_clickhouse_sql_syntax/index user_management_and_authentication/index + clickhouse_multi-tenancy/index clickhouse_table_engine_overview creating_a_clickhouse_table - using_the_clickhouse_data_migration_tool + migrating_clickhouse_data/index monitoring_of_slow_clickhouse_query_statements_and_replication_table_data_synchronization/index adaptive_mv_usage_in_clickhouse clickhouse_log_overview + clickhouse_performance_tuning/index + clickhouse_faq/index diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/index.rst new file mode 100644 index 0000000..60d5e15 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_24250.html + +.. _mrs_01_24250: + +Migrating ClickHouse Data +========================= + +- :ref:`Using ClickHouse to Import and Export Data ` +- :ref:`Synchronizing Kafka Data to ClickHouse ` +- :ref:`Using the ClickHouse Data Migration Tool ` +- :ref:`Using the Migration Tool to Quickly Migrate ClickHouse Cluster Data ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_clickhouse_to_import_and_export_data + synchronizing_kafka_data_to_clickhouse + using_the_clickhouse_data_migration_tool + using_the_migration_tool_to_quickly_migrate_clickhouse_cluster_data diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/synchronizing_kafka_data_to_clickhouse.rst b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/synchronizing_kafka_data_to_clickhouse.rst new file mode 100644 index 0000000..4d2829d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/synchronizing_kafka_data_to_clickhouse.rst @@ -0,0 +1,188 @@ +:original_name: mrs_01_24377.html + +.. _mrs_01_24377: + +Synchronizing Kafka Data to ClickHouse +====================================== + +This section describes how to create a Kafka table to automatically synchronize Kafka data to the ClickHouse cluster. + +Prerequisites +------------- + +- You have created a Kafka cluster. The Kafka client has been installed. +- You have created a ClickHouse cluster and installed the ClickHouse client. The ClickHouse and Kafka clusters are in the same VPC and can communicate with each other. + +Constraints +----------- + +Currently, ClickHouse cannot interconnect with Kafka clusters with security mode enabled. + +.. _mrs_01_24377__section10908164973416: + +Syntax of the Kafka Table +------------------------- + +- **Syntax** + + .. code-block:: + + CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] + ( + name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], + name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], + ... + ) ENGINE = Kafka() + SETTINGS + kafka_broker_list = 'host1:port1,host2:port2', + kafka_topic_list = 'topic1,topic2,...', + kafka_group_name = 'group_name', + kafka_format = 'data_format'; + [kafka_row_delimiter = 'delimiter_symbol',] + [kafka_schema = '',] + [kafka_num_consumers = N] + +- **Parameter description** + + .. table:: **Table 1** Kafka table parameters + + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Mandatory | Description | + +=======================+=======================+===========================================================================================================================================================================================================================================================================================+ + | kafka_broker_list | Yes | A list of Kafka broker instances, separated by comma (,). For example, *IP address 1 of the Kafka broker instance*:**9092**,\ *IP address 2 of the Kafka broker instance*:**9092**,\ *IP address 3 of the Kafka broker instance*:**9092**. | + | | | | + | | | .. note:: | + | | | | + | | | If the Kerberos authentication is enabled, parameter **allow.everyone.if.no.acl.found** must be set to **true** if port **21005** is used. Otherwise, an error will be reported. | + | | | | + | | | To obtain the IP address of the Kafka broker instance, perform the following steps: | + | | | | + | | | Log in to FusionInsight Manager and choose **Cluster** > *Name of the desired cluster* > **Services** > **Kafka**. Click **Instances** to query the IP addresses of the Kafka instances. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_topic_list | Yes | A list of Kafka topics. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_group_name | Yes | A group of Kafka consumers, which can be customized. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_format | Yes | Kafka message format, for example, JSONEachRow, CSV, and XML. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_row_delimiter | No | Delimiter character, which ends a message. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_schema | No | Parameter that must be used if the format requires a schema definition. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | kafka_num_consumers | No | Number of consumers in per table. The default value is **1**. If the throughput of a consumer is insufficient, more consumers are required. The total number of consumers cannot exceed the number of partitions in a topic because only one consumer can be allocated to each partition. | + +-----------------------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +How to Synchronize Kafka Data to ClickHouse +------------------------------------------- + +#. .. _mrs_01_24377__li58847364569: + + Switch to the Kafka client installation directory. For details, see :ref:`Using the Kafka Client `. + + a. Log in to the node where the Kafka client is installed as the Kafka client installation user. + + b. Run the following command to go to the client installation directory: + + **cd /opt/client** + + c. Run the following command to configure environment variables: + + **source bigdata_env** + + d. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit** *Component service user* + +#. .. _mrs_01_24377__li133267241488: + + Run the following command to create a Kafka topic. For details, see :ref:`Managing Kafka Topics `. + + **kafka-topics.sh --topic** *kafkacktest2* **--create --zookeeper** *IP address of the Zookeeper role instance:Port used by ZooKeeper to listen to client*\ **/kafka --partitions** *2* **--replication-factor** *1* + + .. note:: + + - **--topic** is the name of the topic to be created, for example, **kafkacktest2**. + + - **--zookeeper** is the IP address of the node where the ZooKeeper role instances are located, which can be the IP address of any of the three role instances. You can obtain the IP address of the node by performing the following steps: + + Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > *Name of the desired cluster* > **Services** > **ZooKeeper** > **Instance**. View the IP addresses of the ZooKeeper role instances. + + - **--partitions** and **--replication-factor** are the topic partitions and topic backup replicas, respectively. The number of the two parameters cannot exceed the number of Kafka role instances. + + - To obtain the *Port used by ZooKeeper to listen to client*, log in to FusionInsight Manager, click **Cluster**, choose **Services** > **ZooKeeper**, and view the value of **clientPort** on the **Configuration** tab page. The default value is **24002**. + +#. .. _mrs_01_24377__li64680261586: + + Log in to the ClickHouse client by referring to :ref:`Using ClickHouse from Scratch `. + + a. Run the following command to go to the client installation directory: + + **cd /opt/client** + + b. Run the following command to configure environment variables: + + **source bigdata_env** + + c. If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The user must have the permission to create ClickHouse tables. Therefore, you need to bind the corresponding role to the user. For details, see :ref:`ClickHouse User and Permission Management `. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit** *Component service user* + + Example: **kinit clickhouseuser** + + d. Run the following command to connect to the ClickHouse instance node to which data is to be imported: + + **clickhouse client --host** *IP address of the ClickHouse instance* **--user** *Login username* **--password** **--port** *ClickHouse port number* **--database** *Database name* **--multiline** + + *Enter the user password.* + +#. Create a Kafka table in ClickHouse by referring to :ref:`Syntax of the Kafka Table `. For example, the following table creation statement is used to create a Kafka table whose name is **kafka_src_tbl3**, topic name is **kafkacktest2**, and message format is **JSONEachRow** in the default database. + + .. code-block:: + + create table kafka_src_tbl3 on cluster default_cluster + (id UInt32, age UInt32, msg String) + ENGINE=Kafka() + SETTINGS + kafka_broker_list='IP address 1 of the Kafka broker instance:9092,IP address 2 of the Kafka broker instance:9092,IP address 3 of the Kafka broker instance:9092', + kafka_topic_list='kafkacktest2', + kafka_group_name='cg12', + kafka_format='JSONEachRow'; + +#. Create a ClickHouse replicated table, for example, the ReplicatedMergeTree table named **kafka_dest_tbl3**. + + .. code-block:: + + create table kafka_dest_tbl3 on cluster default_cluster + ( id UInt32, age UInt32, msg String ) + engine = ReplicatedMergeTree('/clickhouse/tables/{shard}/default/kafka_dest_tbl3', '{replica}') + partition by age + order by id; + +#. Create a materialized view, which converts data in Kafka in the background and saves the data to the created ClickHouse table. + + .. code-block:: + + create materialized view consumer3 on cluster default_cluster to kafka_dest_tbl3 as select * from kafka_src_tbl3; + +#. Perform :ref:`1 ` again to go to the Kafka client installation directory. + +#. Run the following command to send a message to the topic created in :ref:`2 `: + + **kafka-console-producer.sh --broker-list** *IP address 1 of the kafka broker instance*\ **:9092,**\ *IP address 2 of the kafka broker instance*\ **:9092,**\ *IP address 3 of the kafka broker instance*\ **:9092** **--topic** *kafkacktest2* + + .. code-block:: + + >{"id":31, "age":30, "msg":"31 years old"} + >{"id":32, "age":30, "msg":"31 years old"} + >{"id":33, "age":30, "msg":"31 years old"} + >{"id":35, "age":30, "msg":"31 years old"} + +#. Use the ClickHouse client to log in to the ClickHouse instance node in :ref:`3 ` and query the ClickHouse table data, for example, to query the replicated table **kafka_dest_tbl3**. It shows that the data in the Kafka message has been synchronized to this table. + + .. code-block:: + + select * from kafka_dest_tbl3; + + |image1| + +.. |image1| image:: /_static/images/en-us_image_0000001532676350.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/importing_and_exporting_file_data.rst b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_clickhouse_to_import_and_export_data.rst similarity index 73% rename from doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/importing_and_exporting_file_data.rst rename to doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_clickhouse_to_import_and_export_data.rst index eb5eb30..71cf547 100644 --- a/doc/component-operation-guide-lts/source/using_clickhouse/common_clickhouse_sql_syntax/importing_and_exporting_file_data.rst +++ b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_clickhouse_to_import_and_export_data.rst @@ -2,10 +2,13 @@ .. _mrs_01_24206: -Importing and Exporting File Data -================================= +Using ClickHouse to Import and Export Data +========================================== -This section describes the basic syntax and usage of the SQL statement for importing and exporting file data in ClickHouse. +Using the ClickHouse Client to Import and Export Data +----------------------------------------------------- + +Use the ClickHouse client to import and export data. - Importing data in CSV format @@ -15,19 +18,23 @@ This section describes the basic syntax and usage of the SQL statement for impor .. code-block:: - clickhouse client --host 10.5.208.5 --database testdb --port 21427 --secure --format_csv_delimiter="," --query="INSERT INTO testdb.csv_table FORMAT CSV" < /opt/data.csv + clickhouse client --host 10.5.208.5 --database testdb --port 9440 --secure --format_csv_delimiter="," --query="INSERT INTO testdb.csv_table FORMAT CSV" < /opt/data You need to create a table in advance. - Exporting data in CSV format + .. caution:: + + Exporting data files in CSV format may cause CSV injection. Exercise caution when performing this operation. + **clickhouse client --host** *Host name or IP address of the ClickHouse instance* **--database** *Database name* **--port** *Port number* **-m --secure --query=**"SELECT \* **FROM** *Table name*" > *CSV file export path* Example .. code-block:: - clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table" > /opt/test.csv + clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="SELECT * FROM test_table" > /opt/test - Importing data in Parquet format @@ -37,7 +44,7 @@ This section describes the basic syntax and usage of the SQL statement for impor .. code-block:: - cat /opt/student.parquet | clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="INSERT INTO parquet_tab001 FORMAT Parquet" + cat /opt/student.parquet | clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="INSERT INTO parquet_tab001 FORMAT Parquet" - Exporting data in Parquet format @@ -47,7 +54,7 @@ This section describes the basic syntax and usage of the SQL statement for impor .. code-block:: - clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="select * from test_table FORMAT Parquet" > /opt/student.parquet + clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="select * from test_table FORMAT Parquet" > /opt/student.parquet - Importing data in ORC format @@ -57,9 +64,9 @@ This section describes the basic syntax and usage of the SQL statement for impor .. code-block:: - cat /opt/student.orc | clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="INSERT INTO orc_tab001 FORMAT ORC" + cat /opt/student.orc | clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="INSERT INTO orc_tab001 FORMAT ORC" # Data in the ORC file can be exported from HDFS. For example: - hdfs dfs -cat /user/hive/warehouse/hivedb.db/emp_orc/000000_0_copy_1 | clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="INSERT INTO orc_tab001 FORMAT ORC" + hdfs dfs -cat /user/hive/warehouse/hivedb.db/emp_orc/000000_0_copy_1 | clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="INSERT INTO orc_tab001 FORMAT ORC" - Exporting data in ORC format @@ -69,7 +76,7 @@ This section describes the basic syntax and usage of the SQL statement for impor .. code-block:: - clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="select * from csv_tab001 FORMAT ORC" > /opt/student.orc + clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="select * from csv_tab001 FORMAT ORC" > /opt/student.orc - Importing data in JSON format @@ -90,10 +97,10 @@ This section describes the basic syntax and usage of the SQL statement for impor .. code-block:: # Export JSON file. - clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table FORMAT JSON" > /opt/test.json + clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="SELECT * FROM test_table FORMAT JSON" > /opt/test.json # Export json(JSONEachRow). - clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table FORMAT JSONEachRow" > /opt/test_jsoneachrow.json + clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="SELECT * FROM test_table FORMAT JSONEachRow" > /opt/test_jsoneachrow.json # Export json(JSONCompact). - clickhouse client --host 10.5.208.5 --database testdb --port 21427 -m --secure --query="SELECT * FROM test_table FORMAT JSONCompact" > /opt/test_jsoncompact.json + clickhouse client --host 10.5.208.5 --database testdb --port 9440 -m --secure --query="SELECT * FROM test_table FORMAT JSONCompact" > /opt/test_jsoncompact.json diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/using_the_clickhouse_data_migration_tool.rst b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_the_clickhouse_data_migration_tool.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_clickhouse/using_the_clickhouse_data_migration_tool.rst rename to doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_the_clickhouse_data_migration_tool.rst diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_the_migration_tool_to_quickly_migrate_clickhouse_cluster_data.rst b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_the_migration_tool_to_quickly_migrate_clickhouse_cluster_data.rst new file mode 100644 index 0000000..2f5b4ea --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/migrating_clickhouse_data/using_the_migration_tool_to_quickly_migrate_clickhouse_cluster_data.rst @@ -0,0 +1,530 @@ +:original_name: mrs_01_24508.html + +.. _mrs_01_24508: + +Using the Migration Tool to Quickly Migrate ClickHouse Cluster Data +=================================================================== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Scenario +-------- + +Scenario 1: As the number of MRS ClickHouse services increases, the storage and compute resources of clusters cannot meet service requirements. Thus, the clusters need to be split so that part of user service and database data can be migrated to new clusters. + +Scenario 2: The data center where the backend hosts of MRS ClickHouse clusters are located needs to be migrated, so does the ClickHouse cluster data. + +To meet the migration requirements, MRS provides a one-click data migration tool for migrating the databases, table objects (DDL), and service data of ClickHouse from a source cluster to the new cluster. + +Migration Mechanism +------------------- + +- Replicated*MergeTree table migration + + In this migration solution, ClickHouse uses ZooKeeper to automatically synchronize the data of Replicated*MergeTree tables of different replicas in the same shard. The logical procedure is as follows: + + First, add the ZooKeeper information of the source cluster to the configuration file of the destination cluster as the auxiliary ZooKeeper. Then, in the destination cluster, create a temporary table that has the same ZooKeeper path and structure as the source cluster but different replicas from it. After the temporary table is created, data in the source cluster will be automatically synchronized to the temporary table. Once data synchronization is complete, copy data from the temporary table to the formal table. + + + .. figure:: /_static/images/en-us_image_0000001587840761.png + :alt: **Figure 1** Replicated*MergeTree table migration architecture + + **Figure 1** Replicated*MergeTree table migration architecture + +- Distributed table migration + + During the migration of a replicated table, its metadata in the source cluster is exported and changed to the ZooKeeper path and replica of the destination cluster. Then, you can create a table in the destination cluster based on the modified metadata. + +- Non-replicated table and materialized view migration + + To migrate data in the non-replicated tables and materialized views, you can call the **remote** function. + +The preceding migration operations are encapsulated using the migration tool script. In this way, you can modify the related configuration files and run the migration scripts to complete the migration by one click. For details, see the procedure description. + +Prerequisites +------------- + +- The status of the source ClickHouse cluster to be migrated is normal and the cluster in the security mode. +- The destination ClickHouse cluster of MRS 3.1.3 or later has been created for data migration. The cluster must be in the security mode. The number of ClickHouserver instances in the ClickHouse cluster is greater than or equal to that of the source cluster. +- Currently, logical clusters support only data migration to clusters with the same number of replicas. + +Constraints +----------- + +- Only table data and table object metadata (DDL) can be migrated. SQL statements like ETL need to be manually migrated. +- To ensure data consistency before and after the migration, stop the ClickHouse service of the source cluster before the migration. For details, see the procedure description. +- If the table of the source cluster is deleted during the migration, this issue can only be solved manually. + +Procedure +--------- + +The overall migration procedure is as follows: + + +.. figure:: /_static/images/en-us_image_0000001532516862.png + :alt: **Figure 2** Migration flowchart + + **Figure 2** Migration flowchart + +.. table:: **Table 1** Migration description + + +---------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Step | Description | + +===============================================================================================================================================================+=============================================================================================================================================================================================================+ + | :ref:`Step 1: Connect the source cluster to the destination cluster. ` | This step ensures that the source and target ClickHouse clusters as well as their nodes can communicate with each other. | + +---------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Step 2: Add the ZooKeeper information of the source cluster to the configuration file of the destination cluster. ` | By doing so, ZooKeeper in the source cluster can be used as the auxiliary ZooKeeper during data migration. | + +---------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Step 3: Migrate metadata of databases and tables in the source ClickHouse cluster to the destination cluster. ` | You can run the corresponding script to migrate metadata such as the database name, table name, and table structure of the ClickHouse database and tables in the source cluster to the destination cluster. | + +---------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | :ref:`Step 4: Migrate data of the databases and tables in the source ClickHouse cluster to the destination cluster. ` | You can run the corresponding script to migrate the ClickHouse database and table data from the source cluster to the destination cluster. | + +---------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_24508__section132661749155412: + +Connecting the Source Cluster to the Destination Cluster +-------------------------------------------------------- + +#. Connect the source cluster to the destination cluster so that the ClickHouse instance nodes in the two clusters can communicate with each other. +#. Add the host configurations of the source cluster to all nodes in the destination cluster. + + a. Log in to FusionInsight Manager of the source ClickHouse cluster, choose **Cluster** > **ClickHouse**, click the **Instance** tab, and view the service IP address of the ClickHouseServer instance node. + + b. .. _mrs_01_24508__li1990635315277: + + Log in to any ClickHouseServer node using SSH and run the following command to check the host configurations of the ClickHouse instance in the source cluster: + + **cat /etc/hosts** + + The following figure shows the host configurations of the ClickHouse instance: + + |image1| + + c. Log in to FusionInsight Manager of the destination ClickHouse cluster, choose **Cluster** > **ClickHouse**, click the **Instance** tab, and view the service IP address of the ClickHouseServer instance node in the destination cluster. + + d. Log in to all ClickHouse nodes in the destination cluster as user **root** and run the following command to modify the **/etc/hosts** configuration of the nodes: + + **vi** **/etc/hosts** + + Copy the host information of the ClickHouse instance of the source cluster obtained in :ref:`2.b ` to the **hosts** file. + +#. Configure mutual trust between the source and destination clusters. + +.. _mrs_01_24508__section1669382893718: + +Adding the ZooKeeper Information of the Source Cluster to the Configuration File of the Destination Cluster +----------------------------------------------------------------------------------------------------------- + +#. .. _mrs_01_24508__li13474131194310: + + Log in to FusionInsight Manager of the source cluster, choose **Cluster** > **Services** > **ZooKeeper**, and click the **Instance** tab. On the displayed page, view the service IP addresses of the ZooKeeper quorumpeer instance , as shown in :ref:`Figure 3 `. + + .. _mrs_01_24508__fig39818132540: + + .. figure:: /_static/images/en-us_image_0000001537090654.png + :alt: **Figure 3** Addresses of the source Zookeeper quorumpeer instance + + **Figure 3** Addresses of the source Zookeeper quorumpeer instance + +#. .. _mrs_01_24508__li1315311694620: + + Log in to FusionInsight Manager of the destination cluster, choose **Cluster** > **Services** > **ClickHouse** and click the **Configurations** tab and then **All Configurations**. On the displayed page, search for the **clickhouse-config-customize** parameter. + +#. .. _mrs_01_24508__li16763133318475: + + Add ZooKeeper instance information of the source cluster to the **clickhouse-config-customize** parameter by referring to :ref:`Table 2 `. + + .. _mrs_01_24508__table6772947124819: + + .. table:: **Table 2** Configurations of **clickhouse-config-customize** + + +----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Value | + +==============================================+=========================================================================================================================================================================================================================+ + | auxiliary_zookeepers.zookeeper2.node[1].host | Service IP address of the first ZooKeeper quorumpeer instance in the source cluster obtained in :ref:`1 `. Currently, only the IP address of the ZooKeeper instance can be configured. | + +----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | auxiliary_zookeepers.zookeeper2.node[1].port | 2181 | + +----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | auxiliary_zookeepers.zookeeper2.node[2].host | Service IP address of the second ZooKeeper quorumpeer instance in the source cluster obtained in :ref:`1 `. Currently, only the IP address of the ZooKeeper instance can be configured. | + +----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | auxiliary_zookeepers.zookeeper2.node[2].port | 2181 | + +----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | auxiliary_zookeepers.zookeeper2.node[3].host | Service IP address of the third ZooKeeper quorumpeer instance in the source cluster obtained in :ref:`1 `. Currently, only the IP address of the ZooKeeper instance can be configured. | + +----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | auxiliary_zookeepers.zookeeper2.node[3].port | 2181 | + +----------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. After the configuration is complete, click **Save**. In the displayed dialog box, click **OK**. + +#. .. _mrs_01_24508__li17679261131: + + Log in to any ClickHouseServer node of the destination cluster as user **root**. Run the following command to view the ClickHouseServer instance information: + + **ps -ef \|grep clickhouse** + + Obtain the value of **--config-file**, that is, the configuration file directory of the ClickHouseServer, from the query result. + + + .. figure:: /_static/images/en-us_image_0000001537413022.png + :alt: **Figure 4** Obtaining the configuration file directory of ClickHouseServer + + **Figure 4** Obtaining the configuration file directory of ClickHouseServer + +#. Run the corresponding command to check whether the information about **** is added in the ClickHouse configuration file **config.xml**. + + **cat** *Directory of the config.xml file obtained in :ref:`5 `* + + + .. figure:: /_static/images/en-us_image_0000001583195985.png + :alt: **Figure 5** Viewing the added ZooKeeper information of the source cluster + + **Figure 5** Viewing the added ZooKeeper information of the source cluster + +#. .. _mrs_01_24508__li11758429194015: + + In the configuration file directory obtained in :ref:`5 `, run the following command to obtain the ZooKeeper authentication information of the source cluster: + + **cat ENV_VARS \| grep ZK** + + |image2| + + Obtain the values of **ZK_SERVER_FQDN**, **ZK_USER_PRINCIPAL** and **ZK_USER_REALM**. + +#. Log in to FusionInsight Manager of the destination cluster, choose **Cluster** > **Services** > **ClickHouse**, click the **Configurations** tab and then **All Configurations**. In the navigation pane on the left, choose **ClickHouseServer(Role)** > **Backup** and set the parameters by referring to the following table. + + .. table:: **Table 3** Configuring the cluster authentication information + + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Value | + +===================================+=============================================================================================================================+ + | AUXILIARY_ZK_SERVER_FQDN | Value of **ZK_SERVER_FQDN** obtained in :ref:`7 ` | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | AUXILIARY_ZK_SERVER_PRINCIPAL | Value of **ZK_USER_PRINCIPAL** obtained in :ref:`7 ` | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | AUXILIARY_ZK_SERVER_REALM | Value of **ZK_USER_REALM** obtained in :ref:`7 ` | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + | METADATA_COLLECTION_TIMEOUT | **180**. | + | | | + | | This parameter specifies the timeout interval for waiting for the completion of metadata backup on other nodes, in seconds. | + +-----------------------------------+-----------------------------------------------------------------------------------------------------------------------------+ + + + .. figure:: /_static/images/en-us_image_0000001532676354.png + :alt: **Figure 6** Configuring the cluster authentication information + + **Figure 6** Configuring the cluster authentication information + +#. Click **Save**. In the dialog box that is displayed, click **OK**. + +#. On the ClickHouse service page, click the **Instance** tab. On this tab page, select the ClickHouseServer instance from the instance list, and choose **More** > **Restart Instance** in the **Operation** column to restart the ClickHouseServer instance. + +.. _mrs_01_24508__section16918183315128: + +Migrating Metadata of Databases and Tables in the Source ClickHouse Cluster to the Destination Cluster +------------------------------------------------------------------------------------------------------ + +#. .. _mrs_01_24508__li1162843314242: + + Log in to FusionInsight Manager of the source and destination clusters, and create the username and password required for the migration. The procedure is as follows: + + a. Log in to Manager and choose **System** > **Permission** > **Role**. On the displayed page, click **Create Role**. + + b. .. _mrs_01_24508__li11670123433010: + + Specify **Role Name**, for example, **ckrole**. In the **Configure Resource Permission** area, click the cluster name. On the displayed service list page, click the ClickHouse service. + + c. Select **SUPER_USER_GROUP** and click **OK**. + + d. Choose **System**. On the navigation pane on the left, choose **Permission** > **User** and click **Create**. + + e. Select **Human-Machine** for **User Type** and set **Password** and **Confirm Password** to the password of the user. + + .. note:: + + - Username: The username cannot contain hyphens (-). Otherwise, the authentication will fail. + - Password: The password cannot contain special characters $, ., and #. Otherwise, the authentication will fail. + + f. In the **Role** area, click **Add** . In the displayed dialog box, select the role name in :ref:`1.b ` and click **OK** to add the role. Then, click **OK**. + + g. After the user is created, click the user name in the upper right corner to log out of the system. Log in to FusionInsight Manager as the new user and change its password as prompted. + +#. Download the ClickHouse client and install it as user **omm** to the destination cluster. + +#. Log in to the client node as user **omm**, go to the *Client installation directory*\ **/ClickHouse/clickhouse_migration_tool/clickhouse-metadata-migration** directory, and configure migration information. Run the following command to modify the **example_config.yaml** configuration file by referring to :ref:`Table 4 `: + + **cd** *Client installation directory*\ **/ClickHouse/clickhouse_migration_tool/clickhouse-metadata-migration** + + **vi example_config.yaml** + + After the configuration is modified, you must delete all comment with number sign(#) and retain only valid configurations. Otherwise, an error may occur during script migration. + + .. _mrs_01_24508__table16816744680: + + .. table:: **Table 4** Parameters in **the example_config.yaml** file + + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Sub-item | Value and Description | + +=======================+=======================+=================================================================================================================================================================================================================================================================================================================================+ + | source_cluster | host | IP address of any ClickHouseServer node in the source cluster. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | cluster_name | Name of the source ClickHouse cluster. You can log in to the ClickHouse client by referring to :ref:`Using ClickHouse from Scratch ` and run the following command to obtain the value. If the source cluster name has not been changed, the default value is **default_cluster**. | + | | | | + | | | **select cluster,shard_num,replica_num,host_name from system.clusters;** | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | https_port | To obtain the port number, log in to FusionInsight Manager of the source cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **https_port**. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | zookeeper_root_path | To obtain the value, log in to FusionInsight Manager of the source cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **clickhouse.zookeeper.root.path**. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | system | System parameter. Retain the default value. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | databases | Optional. | + | | | | + | | | - If this parameter is specified, data in the specified database of the source ClickHouse cluster is migrated. You can specify multiple databases. The following configuration is for your reference: | + | | | | + | | | .. code-block:: | + | | | | + | | | databases: | + | | | - "database" | + | | | - "database_1" | + | | | | + | | | Data in the **database** and **database_1** databases of the source cluster is migrated. | + | | | | + | | | - If this parameter is not specified, table data of all databases in the source ClickHouse cluster is migrated. Leave the **databases** parameter empty. The following is an example: | + | | | | + | | | .. code-block:: | + | | | | + | | | databases: | + | | | | + | | | Table information of all databases in the source ClickHouse cluster is migrated. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | tables | Optional. The value is in the format of *Database name.Table name*. The database name must be in the databases parameter list. | + | | | | + | | | - If this parameter is specified, data in specified tables in the source ClickHouse cluster database is migrated. You can configure multiple tables. The following configuration is for your reference: | + | | | | + | | | .. code-block:: | + | | | | + | | | tables: | + | | | - "database.table_1" | + | | | - "database_1.table_2" | + | | | | + | | | Data in **table_1** of **database** and **table_2** of database_1 of the source cluster is migrated. | + | | | | + | | | - If this parameter is not specified and the **databases** parameter is specified, all table data in the **databases** database is migrated. If the **databases** parameter is not specified, all table data in all databases of the source ClickHouse cluster is migrated. The following configuration is for your reference: | + | | | | + | | | .. code-block:: | + | | | | + | | | tables: | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | destination_cluster | host | IP address of any ClickHouseServer node in the destination cluster. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | cluster_name | Name of the destination ClickHouse cluster. You can log in to the ClickHouse client by referring to :ref:`Using ClickHouse from Scratch ` and run the following command to obtain the value. If the destination cluster name has not been changed, the default value is **default_cluster**. | + | | | | + | | | **select cluster,shard_num,replica_num,host_name from system.clusters;** | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | user | Username created in :ref:`1 ` for logging in to FusionInsight Manager of the destination ClickHouse cluster. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | https_port | To obtain the port number, log in to FusionInsight Manager of the destination cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **https_port**. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | zookeeper_root_path | To obtain the value, log in to FusionInsight Manager of the destination cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **clickhouse.zookeeper.root.path**. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | system | System parameter. Retain the default value. | + +-----------------------+-----------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Run the following command to migrate data and wait until the script execution is complete: + + **./clickhouse_migrate_metadata.sh -f yaml_file** + + Enter the usernames and passwords of the source and destination clusters. + + |image3| + +.. note:: + + If metadata migration fails, perform the following steps: + + #. Locate the failure cause. Specifically, check whether any parameters in the configuration file are incorrectly configured. + + - If yes, reconfigure the parameters and perform metadata migration. + - If no, go to :ref:`2 `. + + #. .. _mrs_01_24508__li19763810133612: + + Set the names of the tables that fail to be migrated in the metadata migration configuration file based on the **databases** and **tables** parameters in :ref:`Table 4 ` and run the metadata migration command again. If the migration fails, contact O&M personnel. + +.. _mrs_01_24508__section18047266428: + +Migrating Data of the Databases and Tables in the Source ClickHouse Cluster to the Destination Cluster +------------------------------------------------------------------------------------------------------ + +#. Log in to the ClickHouse client node in the destination cluster as user **omm** and go to *Client installation directory*\ **/ClickHouse/clickhouse_migration_tool/clickhouse-data-migration**. + + **cd** *Client installation directory*\ **/ClickHouse/clickhouse_migration_tool/clickhouse-data-migration** + +#. Run the following command to modify the **example_config.yaml** configuration file by referring to :ref:`Table 5 `: + + **vi example_config.yaml** + + After the configuration is modified, you must delete all comment with number sign(#) and retain only valid configurations. Otherwise, an error may occur during script migration. + + .. _mrs_01_24508__table993412163478: + + .. table:: **Table 5** Parameters in **example_config.yaml** + + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuration Item | Sub-item | Value and Description | + +===================================+=======================+============================================================================================================================================================================================================================================================================================================================================================================================+ + | source_cluster | host | IP address of any ClickHouseServer node in the source cluster. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | cluster_name | Name of the source ClickHouse cluster. You can log in to the ClickHouse client by referring to :ref:`Using ClickHouse from Scratch ` and run the following command to obtain the value. If the source cluster name has not been changed, the default value is **default_cluster**. | + | | | | + | | | **select cluster,shard_num,replica_num,host_name from system.clusters;** | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | user | Username created in :ref:`1 ` for logging in to FusionInsight Manager of the source ClickHouse cluster. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | https_port | To obtain the port number, log in to FusionInsight Manager of the source cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **https_port**. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | tcp_port | To obtain the value, log in to FusionInsight Manager of the source cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **tcp_port_secure** if the cluster is in security mode. Otherwise, search for **tcp_port**. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | zookeeper_root_path | To obtain the value, log in to FusionInsight Manager of the source cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **clickhouse.zookeeper.root.path**. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | system | System parameter. Retain the default value. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | databases | Optional. | + | | | | + | | | - If this parameter is specified, data in the specified database of the source ClickHouse cluster is migrated. You can specify multiple databases. The following configuration is for your reference: | + | | | | + | | | .. code-block:: | + | | | | + | | | databases: | + | | | - "database" | + | | | - "database_1" | + | | | | + | | | Data in the **database** and **database_1** databases of the source cluster is migrated. | + | | | | + | | | - If this parameter is not specified, table data of all databases in the source ClickHouse cluster is migrated. Leave the **databases** parameter empty. The following is an example: | + | | | | + | | | .. code-block:: | + | | | | + | | | databases: | + | | | | + | | | Table information of all databases in the source ClickHouse cluster is migrated. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | tables | Optional. The value is in the format of *Database name.Table name*. The database name must be in the databases parameter list. | + | | | | + | | | - If this parameter is specified, data in specified tables in the source ClickHouse cluster database is migrated. You can configure multiple tables. The following configuration is for your reference: | + | | | | + | | | .. code-block:: | + | | | | + | | | tables: | + | | | - "database.table_1" | + | | | - "database_1.table_2" | + | | | | + | | | Data in **table_1** of **database** and **table_2** of database_1 of the source cluster is migrated. | + | | | | + | | | - If this parameter is not specified and the **databases** parameter is specified, all table data in the **databases** database is migrated. If the **databases** parameter is not specified, all table data in all databases of the source ClickHouse cluster is migrated. The following configuration is for your reference: | + | | | | + | | | .. code-block:: | + | | | | + | | | tables: | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | destination_cluster | host | IP address of any ClickHouseServer node in the destination cluster. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | cluster_name | Name of the destination ClickHouse cluster. You can log in to the ClickHouse client by referring to :ref:`Using ClickHouse from Scratch ` and run the following command to obtain the value. If the destination cluster name has not been changed, the default value is **default_cluster**. | + | | | | + | | | **select cluster,shard_num,replica_num,host_name from system.clusters;** | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | user | Username created in :ref:`1 ` for logging in to FusionInsight Manager of the destination ClickHouse cluster. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | https_port | To obtain the port number, log in to FusionInsight Manager of the destination cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **https_port**. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | tcp_port | To obtain the value, log in to FusionInsight Manager of the destination cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **tcp_port_secure** if the cluster is in security mode. Otherwise, search for **tcp_port**. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | zookeeper_root_path | To obtain the value, log in to FusionInsight Manager of the destination cluster, choose **Cluster** > **Services** > **ClickHouse**, and click the **Configurations** tab and then **All Configurations**. In the displayed page, search for **clickhouse.zookeeper.root.path**. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | system | System parameter. Retain the default value. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | auxiliary_zookeepers | name | ZooKeeper name of the source ClickHouse cluster configured in :ref:`3 `, for example, **zookeeper2**. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | hosts | IP address of the ZooKeeper instance of the source ClickHouse. To obtain the IP address, log in to FusionInsight Manager of the source cluster, choose **Cluster** > **Services** > **ZooKeeper**, and click the **Instance** tab. On the displayed page, view the service IP addresses of the ZooKeeper quorumpeer instance , as shown in :ref:`Figure 3 `. | + | | | | + | | | The format is as follows: | + | | | | + | | | .. code-block:: | + | | | | + | | | hosts: | + | | | - "192.168.1.2" | + | | | - "192.168.1.3" | + | | | - "192.168.1.4" | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | port | 2181 | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | execution_procedure | ``-`` | This parameter is left blank by default, indicating that the script is executed once to synchronize service data. Value options are **firststep** and **secondstep**. | + | | | | + | | | - **firststep**: Only the temporary replication table is created. The auxiliary ZooKeeper can synchronize data from the original cluster to the temporary table in real time. | + | | | - **secondstep**: data in the temporary replication table is attached to the local table of the destination cluster. | + | | | | + | | | .. caution:: | + | | | | + | | | CAUTION: | + | | | If this parameter is set to **secondstep**, O&M personnel and users need to confirm that ClickHouse-related services have been stopped before script execution. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | onereplica_use_auxiliaryzookeeper | ``-`` | - If this parameter is set to **1**, temporary tables are created for only one replica of each shard. | + | | | - If this parameter is set to **0**, temporary tables are created for two replicas of each shard. | + +-----------------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Stop the ClickHouse service of the source cluster. + +#. Run the following command to migrate data and wait until the script execution is complete: + + **./clickhouse_migrate_data.sh -f yaml_file** + + Enter the usernames and passwords of the source and destination clusters. + +#. After the script is executed successfully, perform the following steps to check whether the migrated data in the source cluster is consistent with that in the destination cluster based on the migration result logs: + + Log in to the ClickHouse client node in the destination cluster and go to the *Client installation directory*\ **/ClickHouse/clickhouse_migration_tool/clickhouse-data-migration/comparison_result** directory. + + Compare the following result file information to check the data consistency between the source cluster and the destination cluster: + + - **source_cluster_table_info**: statistics of data migrated from the source cluster + - **destination_cluster_table_info**: statistics of data migrated to the destination cluster + - **compare_result_file.txt**: data consistency comparison result before and after migration + + If the data is inconsistent before and after the migration, clear the data in the table of the destination cluster and migrate the data in the table separately or manually. + + In addition, you can log in to the ClickHouse databases of the source and destination clusters to manually check whether the number of table data records and partitions are consistent. + +#. Log in to FusionInsight Manager of the destination cluster and delete the ZooKeeper information added to **clickhouse-config-customize** in :ref:`2 `. + + Click **Save**. In the displayed dialog box, click **OK**. + +#. After data migration is complete, switch services to the target ClickHouse cluster. + +#. Go to *Client installation directory*\ **/ClickHouse/clickhouse_migration_tool/clickhouse-data-migration** and *Client installation directory*\ **/ClickHouse/clickhouse_migration_tool/clickhouse-metadata-migration** on the ClickHouse node in the destination cluster. + + **vi example_config.yaml** + + Delete the password from the configuration file to prevent password leakage. + +.. note:: + + If service data migration fails, perform the following steps: + + #. Locate the failure cause. Specifically, check whether any parameters in the configuration file are incorrectly configured. + + - If yes, reconfigure the parameters and perform service data migration. + - If no, go to :ref:`2 `. + + #. .. _mrs_01_24508__li2481155164115: + + Run the **drop table** *table_name* command to delete the data tables related to the table from the node that fails to be migrated in the destination cluster. + + #. Run the **show create table** *table_name* command to query the table creation statements related to the table in the source cluster and create the table in the destination cluster again. + + #. Set the names of the tables that fail to be migrated in the service data migration configuration file based on the **databases** and **tables** parameters in :ref:`Table 5 ` and run the service data migration command again. If the command fails to execute, contact O&M personnel. + +.. |image1| image:: /_static/images/en-us_image_0000001532836098.png +.. |image2| image:: /_static/images/en-us_image_0000001583316321.png +.. |image3| image:: /_static/images/en-us_image_0000001537269552.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/configuring_the_password_of_the_default_account_of_a_clickhouse_cluster.rst b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/configuring_the_password_of_the_default_account_of_a_clickhouse_cluster.rst new file mode 100644 index 0000000..4a1f6fc --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/configuring_the_password_of_the_default_account_of_a_clickhouse_cluster.rst @@ -0,0 +1,118 @@ +:original_name: mrs_01_24575.html + +.. _mrs_01_24575: + +Configuring the Password of the Default Account of a ClickHouse Cluster +======================================================================= + +After a ClickHouse cluster is created, you can use the ClickHouse client to connect to the ClickHouse server. + +Configure the passwords of the default accounts **default** and **clickhouse** of a ClickHouse cluster. + +.. note:: + + - This section applies to MRS 3.2.0 or later. + - **default** and **clickhouse** is the default internal administrator of a ClickHouse cluster in normal mode (with Kerberos authentication disabled). + + +Configuring the Password of the Default Account of a ClickHouse Cluster +----------------------------------------------------------------------- + +#. Install the ClickHouse client. + +#. Log in to the ClickHouse client node as user **root**, go to *Client installation directory*\ **/ClickHouse/clickhouse/config**, and check whether the **metrika.xml** file contains information about all ClickHouseServer nodes. If information of any nodes is missing, add it. + + For example, there are four ClickHouseServer nodes: **192-168-43-125**, **192-168-43-165**, **192-168-43-175**, and **192-168-43-249**. + + .. code-block:: + + + true + + 192-168-43-125 + 21423 + clickhouse + + + + 192-168-43-165 + 21423 + clickhouse + + + + + true + + 192-168-43-175 + 21423 + clickhouse + + + + 192-168-43-249 + 21423 + clickhouse + + + + +#. Go to *Client installation directory*\ **/ClickHouse/clickhouse/config** and check whether the values of **CLICKHOUSE_CONF_DIR** and **CLICKHOUSE_INSTALL_HOME** in the **clickhouse-env.sh** file is **$BIGDATA_HOME/FusionInsight_ClickHouse_xxx/*_ClickHouseServer/etc** and **$BIGDATA_HOME/FusionInsight_ClickHouse_xxx/install/FusionInsight-ClickHouse-*-lts/**, respectively. + + |image1| + +#. Switch to user **omm** and go to the *Client installation directory*\ **/ClickHouse/clickhouse_change_password** directory. + + **su - omm** + + **cd** *Client installation directory*\ **/ClickHouse/clickhouse_change_password** + +#. Run the following command to change the password of the **default** or **clickhouse** user: + + **./change_password.sh** + + In the following figure, user **clickhouse** is used as an example. Enter **clickhouse** and its password as prompted, and wait until the password is changed. + + |image2| + + .. note:: + + The password complexity requirements are as follows: + + - The password contains 8 to 64 characters. + - The password must contain at least one lowercase letter, one uppercase letter, one digit, and one special character, and the following special characters are supported: ``-%;[]{}@_`` + +#. Check the password change result. + + a. Run the following commands to check the value of **CLICKHOUSE_CONF_DIR** in the *Client installation directory*\ **/ClickHouse/clickhouse/config/clickhouse-env.sh** file: + + **cd** *Client installation directory*\ **/ClickHouse/clickhouse/config/** + + **vi clickhouse-env.sh** + + The following is an example: + + .. code-block:: + + LICKHOUSE_CONF_DIR="${BIGDATA_HOME}/FusionInsight_ClickHouse_*/*_ClickHouseServer/etc" + CLICKHOUSE_SECURITY_ENABLED="true" + CLICKHOUSE_BALANCER_LIST="192.168.42.14,192.168.67.89" + CLICKHOUSE_STARTUP_PRINCIPAL="clickhouse/hadoop.hadoop.com@HADOOP.COM" + USER_REALM="HADOOP.COM" + OM_DECOMMISSION_HOSTNAME_LIST="" + CLICKHOUSE_INSTALL_HOME="${BIGDATA_HOME}/FusionInsight_ClickHouse_8.2.0/install/FusionInsight-ClickHouse-v22.3.2.2-lts" + CK_BALANCER_LIST="server-2110081635-0003,server-2110082001-0019" + + b. Log in to the ClickHouse Server node and check the value of **password_sha256_hex** in the **${BIGDATA_HOME}/FusionInsight_ClickHouse_*/*_ClickHouseServer/etc/users.xml** file. The value is the new password. + + **cd ${BIGDATA_HOME}/**\ ``FusionInsight_ClickHouse_*``\ **/**\ ``*_ClickHouseServer``\ **/etc/** + + **vi** **users.xml** + + As shown in the following figure, the new password is stored in the **password_sha256_hex** file. + + |image3| + +.. |image1| image:: /_static/images/en-us_image_0000001583316949.png +.. |image2| image:: /_static/images/en-us_image_0000001532677010.png +.. |image3| image:: /_static/images/en-us_image_0000001583436657.png diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/setting_the_clickhouse_username_and_password.rst b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/configuring_the_password_of_the_default_account_of_a_clickhouse_clusterfor_mrs_3.1.2.rst similarity index 80% rename from doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/setting_the_clickhouse_username_and_password.rst rename to doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/configuring_the_password_of_the_default_account_of_a_clickhouse_clusterfor_mrs_3.1.2.rst index 3117910..355aa5a 100644 --- a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/setting_the_clickhouse_username_and_password.rst +++ b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/configuring_the_password_of_the_default_account_of_a_clickhouse_clusterfor_mrs_3.1.2.rst @@ -2,8 +2,8 @@ .. _mrs_01_2395: -Setting the ClickHouse Username and Password -============================================ +Configuring the Password of the Default Account of a ClickHouse Cluster(for MRS 3.1.2) +====================================================================================== After a ClickHouse cluster is created, you can use the ClickHouse client to connect to the ClickHouse server. The default username is **default**. @@ -11,8 +11,8 @@ This section describes how to set ClickHouse username and password after a Click .. note:: - **default** is the default internal user of ClickHouse. It is an administrator user available only in normal mode (kerberos authentication disabled). - + - This section applies to MRS 3.1.2. + - **default** is the default internal user of ClickHouse. It is an administrator user available only in normal mode (kerberos authentication disabled). Setting the ClickHouse Username and Password -------------------------------------------- diff --git a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/index.rst b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/index.rst index fa531e9..1cb84ec 100644 --- a/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/index.rst +++ b/doc/component-operation-guide-lts/source/using_clickhouse/user_management_and_authentication/index.rst @@ -6,11 +6,13 @@ User Management and Authentication ================================== - :ref:`ClickHouse User and Permission Management ` -- :ref:`Setting the ClickHouse Username and Password ` +- :ref:`Configuring the Password of the Default Account of a ClickHouse Cluster(for MRS 3.1.2) ` +- :ref:`Configuring the Password of the Default Account of a ClickHouse Cluster ` .. toctree:: :maxdepth: 1 :hidden: clickhouse_user_and_permission_management - setting_the_clickhouse_username_and_password + configuring_the_password_of_the_default_account_of_a_clickhouse_clusterfor_mrs_3.1.2 + configuring_the_password_of_the_default_account_of_a_clickhouse_cluster diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_and_taskmanager.rst b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_and_taskmanager.rst index 42e5051..87e68d6 100644 --- a/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_and_taskmanager.rst +++ b/doc/component-operation-guide-lts/source/using_flink/flink_configuration_management/jobmanager_and_taskmanager.rst @@ -38,14 +38,8 @@ Main configuration items include communication port, memory management, connecti +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ | taskmanager.network.numberOfBuffers | Number of TaskManager network transmission buffer stacks. If an error indicates insufficient system buffer, increase the parameter value. | 2048 | No | +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ - | taskmanager.memory.fraction | Ratio of JVM heap memory that TaskManager reserves for sorting, hash tables, and caching of intermediate results. | 0.7 | No | - +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ - | taskmanager.memory.off-heap | Whether TaskManager uses off-heap memory for sorting, hash tables and intermediate status. You are advised to enable this item for large memory needs to improve memory operation efficiency. | false | Yes | - +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ | taskmanager.memory.segment-size | Size of the memory buffer used by the memory manager and network stack The unit is bytes. | 32768 | No | +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ - | taskmanager.memory.preallocate | Whether TaskManager allocates reserved memory space upon startup. You are advised to enable this item when off-heap memory is used. | false | No | - +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ | taskmanager.debug.memory.startLogThread | Enable this item for debugging Flink memory and garbage collection (GC)-related problems. TaskManager periodically collects memory and GC statistics, including the current utilization of heap and off-heap memory pools and GC time. | false | No | +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+-----------------+ | taskmanager.debug.memory.logIntervalMs | Interval at which TaskManager periodically collects memory and GC statistics. | 0 | No | diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_process_parameters.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_flink_process_parameters.rst similarity index 96% rename from doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_process_parameters.rst rename to doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_flink_process_parameters.rst index 5e37b84..1df80af 100644 --- a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_process_parameters.rst +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/configuring_flink_process_parameters.rst @@ -2,8 +2,8 @@ .. _mrs_01_1590: -Configuring Process Parameters -============================== +Configuring Flink Process Parameters +==================================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/index.rst b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/index.rst index 03ff67a..8b37be7 100644 --- a/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/index.rst +++ b/doc/component-operation-guide-lts/source/using_flink/flink_performance_tuning/optimization_datastream/index.rst @@ -7,7 +7,7 @@ Optimization DataStream - :ref:`Memory Configuration Optimization ` - :ref:`Configuring DOP ` -- :ref:`Configuring Process Parameters ` +- :ref:`Configuring Flink Process Parameters ` - :ref:`Optimizing the Design of Partitioning Method ` - :ref:`Configuring the Netty Network Communication ` - :ref:`Summarization ` @@ -18,7 +18,7 @@ Optimization DataStream memory_configuration_optimization configuring_dop - configuring_process_parameters + configuring_flink_process_parameters optimizing_the_design_of_partitioning_method configuring_the_netty_network_communication summarization diff --git a/doc/component-operation-guide-lts/source/using_flink/flink_restart_policy.rst b/doc/component-operation-guide-lts/source/using_flink/flink_restart_policy.rst new file mode 100644 index 0000000..93ba253 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/flink_restart_policy.rst @@ -0,0 +1,70 @@ +:original_name: mrs_01_24779.html + +.. _mrs_01_24779: + +Flink Restart Policy +==================== + +Overview +-------- + +Flink supports different restart policies to control whether and how to restart a job when a fault occurs. If no restart policy is specified, the cluster uses the default restart policy. You can also specify a restart policy when submitting a job. For details about how to configure such a policy on the job development page of MRS 3.1.0 or later, see :ref:`Managing Jobs on the Flink Web UI `. + +The restart policy can be specified by configuring the **restart-strategy** parameter in the Flink configuration file *Client installation directory*\ **/Flink/flink/conf/flink-conf.yaml** or can be dynamically specified in the application code. The configuration takes effect globally. Restart policies include **failure-rate** and the following two default policies: + +- **No restart**: If CheckPoint is not enabled, this policy is used by default. +- **Fixed-delay**: If CheckPoint is enabled but no restart policy is configured, this policy is used by default. + +No restart Policy +----------------- + +When a fault occurs, the job fails and does not attempt to restart. + +Configure the parameter as follows: + +.. code-block:: + + restart-strategy: none + +fixed-delay Policy +------------------ + +When a fault occurs, the job attempts to restart for a fixed number of times. If the number of attempts exceeds the times you specified, the job fails. The restart policy waits for a fixed period of time between two consecutive restart attempts. + +In the following example, a job fails if the job attempts to restart for three times at an interval of 10 seconds. Configure the parameters as follows: + +.. code-block:: + + restart-strategy: fixed-delay + restart-strategy.fixed-delay.attempts: 3 + restart-strategy.fixed-delay.delay: 10 s + +failure-rate Policy +------------------- + +When a job fails, the job restarts directly. If the failure rate exceeds the value you configured, the job is considered as failed. The restart policy waits for a fixed period of time between two consecutive restart attempts. + +In the following example, a job is considered as failed if the job attempts to restart for three times at an interval of 10 minutes. Configure the parameters as follows: + +.. code-block:: + + restart-strategy: failure-rate + restart-strategy.failure-rate.max-failures-per-interval: 3 + restart-strategy.failure-rate.failure-rate-interval: 10 min + restart-strategy.failure-rate.delay: 10 s + +Choosing a Restart Policy +------------------------- + +- If you do not want to retry a failed job, select the **No restart** policy. + +- To retry a failed job, select the **failure-rate** policy. If the fixed-delay policy is used, the number of job failures may reach the maximum number of retries due to hardware faults such as network and memory faults. As a result, the job fails. + + To prevent repeated restarts when the failure-rate policy is used, configure parameters as follows: + + .. code-block:: + + restart-strategy: failure-rate + restart-strategy.failure-rate.max-failures-per-interval: 3 + restart-strategy.failure-rate.failure-rate-interval: 10 min + restart-strategy.failure-rate.delay: 10 s diff --git a/doc/component-operation-guide-lts/source/using_flink/index.rst b/doc/component-operation-guide-lts/source/using_flink/index.rst index 3603e35..46f57fb 100644 --- a/doc/component-operation-guide-lts/source/using_flink/index.rst +++ b/doc/component-operation-guide-lts/source/using_flink/index.rst @@ -17,6 +17,7 @@ Using Flink - :ref:`Flink Performance Tuning ` - :ref:`Common Flink Shell Commands ` - :ref:`Reference ` +- :ref:`Flink Restart Policy ` .. toctree:: :maxdepth: 1 @@ -34,3 +35,4 @@ Using Flink flink_performance_tuning/index common_flink_shell_commands reference/index + flink_restart_policy diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/flink_web_ui_application_process.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flink_web_ui_overview/flink_web_ui_application_process.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/flink_web_ui_application_process.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flink_web_ui_overview/flink_web_ui_application_process.rst diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flink_web_ui_overview/index.rst similarity index 86% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/index.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flink_web_ui_overview/index.rst index 4766f91..d631df3 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/index.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flink_web_ui_overview/index.rst @@ -2,8 +2,8 @@ .. _mrs_01_24015: -Overview -======== +Flink Web UI Overview +===================== - :ref:`Introduction to Flink Web UI ` - :ref:`Flink Web UI Application Process ` diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/introduction_to_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flink_web_ui_overview/introduction_to_flink_web_ui.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/overview/introduction_to_flink_web_ui.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flink_web_ui_overview/introduction_to_flink_web_ui.rst diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/overview.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/flinkserver_permissions_overview.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/overview.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/flinkserver_permissions_overview.rst index 18af61c..99bd878 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/overview.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/flinkserver_permissions_overview.rst @@ -2,8 +2,8 @@ .. _mrs_01_24048: -Overview -======== +FlinkServer Permissions Overview +================================ User **admin** of Manager does not have the FlinkServer service operation permission. To perform FlinkServer service operations, you need to grant related permission to the user. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/index.rst index cad4bc7..64b08b7 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/index.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/flinkserver_permissions_management/index.rst @@ -5,12 +5,12 @@ FlinkServer Permissions Management ================================== -- :ref:`Overview ` +- :ref:`FlinkServer Permissions Overview ` - :ref:`Authentication Based on Users and Roles ` .. toctree:: :maxdepth: 1 :hidden: - overview + flinkserver_permissions_overview authentication_based_on_users_and_roles diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/importing_and_exporting_jobs.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/importing_and_exporting_jobs.rst new file mode 100644 index 0000000..ba5301f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/importing_and_exporting_jobs.rst @@ -0,0 +1,44 @@ +:original_name: mrs_01_24481.html + +.. _mrs_01_24481: + +Importing and Exporting Jobs +============================ + +Scenario +-------- + +The FlinkServer web UI enables you only to import and export jobs, UDFs, andstream tables. + +- Jobs, flow tables, and UDFs with the same name cannot be imported to the same cluster. +- When exporting a job, you need to manually select the stream tables and UDFs on which the job depends. Otherwise, a dialog box indicating that the dependent data is not selected will be displayed. The application information of a job will not be exported. +- When you export a stream table, the application information on which the stream table depends will not be exported. +- When you export UDFs, the application information on which the UDFs depend and information about jobs used by UDFs will not be exported. +- Data import and export between different applications. are supported. + +Importing a Job +--------------- + +#. Access the Flink web UI as a user with **FlinkServer Admin Privilege**. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Import Jobs**. +#. Click **Select** to select a local TAR file and click **OK**. Wait until the file is imported. + + .. note:: + + The maximum size of a local TAR file to be uploaded is 200 MB. + +Exporting a job +--------------- + +#. Access the Flink web UI as a user with **FlinkServer Admin Privilege**. For details, see :ref:`Accessing the Flink Web UI `. +#. Choose **System Management** > **Export Jobs**. +#. Select the data to be exported in either of the following ways. To deselect the content, click **Clear Selected Node**. + + - Select the data to be exported as required. + - Click **Query Regular Expression**. On the displayed page, select the type of the data to be exported (**Table Management**, **Job Management**, or **UDF Management**), enter the keyword, and click **Query**. After the data is successfully matched, click **Synchronize**. + + .. note:: + + All matched data will be synchronized after you click **Synchronize**. Currently, you cannot select some data for synchronization. + +#. Click **Verify**. After the verification is complete, click **OK**. Wait until the data is exported. diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/index.rst index c661bd5..9ea8ba3 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/index.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/index.rst @@ -5,7 +5,7 @@ Using the Flink Web UI ====================== -- :ref:`Overview ` +- :ref:`Flink Web UI Overview ` - :ref:`FlinkServer Permissions Management ` - :ref:`Accessing the Flink Web UI ` - :ref:`Creating an Application on the Flink Web UI ` @@ -13,14 +13,15 @@ Using the Flink Web UI - :ref:`Creating a Data Connection on the Flink Web UI ` - :ref:`Managing Tables on the Flink Web UI ` - :ref:`Managing Jobs on the Flink Web UI ` -- :ref:`Managing UDFs on the Flink Web UI ` +- :ref:`Importing and Exporting Jobs ` +- :ref:`Managing UDFs ` - :ref:`Interconnecting FlinkServer with External Components ` .. toctree:: :maxdepth: 1 :hidden: - overview/index + flink_web_ui_overview/index flinkserver_permissions_management/index accessing_the_flink_web_ui creating_an_application_on_the_flink_web_ui @@ -28,5 +29,6 @@ Using the Flink Web UI creating_a_data_connection_on_the_flink_web_ui managing_tables_on_the_flink_web_ui managing_jobs_on_the_flink_web_ui - managing_udfs_on_the_flink_web_ui/index + importing_and_exporting_jobs + managing_udfs/index interconnecting_flinkserver_with_external_components/index diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_clickhouse.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_clickhouse.rst index dbdc4c1..7b85014 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_clickhouse.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_clickhouse.rst @@ -10,6 +10,10 @@ Scenario Flink interconnects with the ClickHouseBalancer instance of ClickHouse to read and write data, preventing ClickHouse traffic distribution problems. +.. important:: + + When "FlinkSQL" is displayed in the command output on the FlinkServer web UI in MRS 3.2.0 or later clusters, the **password** field in the SQL statement is left blank to meet security requirements. Before you submit a job, manually enter the password. + Prerequisites ------------- diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hudi.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hudi.rst index 0fb1c0d..cff8f8d 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hudi.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/interconnecting_flinkserver_with_external_components/interconnecting_flinkserver_with_hudi.rst @@ -183,3 +183,51 @@ Procedure .. important:: Ensure that no partition is added before the synchronization. After the synchronization, new partitions cannot be read. + +Synchronizing Metadata from Flink On Hudi to Hive +------------------------------------------------- + +This section applies to MRS 3.2.0 or later. + +- Synchronizing metadata to Hive in JDBC mode + + .. code-block:: + + CREATE TABLE stream_mor( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts INT, + `p` VARCHAR(20) + ) PARTITIONED BY (`p`) WITH ( + 'connector' = 'hudi', + 'path' = 'hdfs://hacluster/tmp/hudi/stream_mor', + 'table.type' = 'MERGE_ON_READ', + 'hive_sync.enable' = 'true', + 'hive_sync.table' = 'Name of the table to be synchronized to Hive', + 'hive_sync.db' = 'Name of the database to be synchronized to Hive', + 'hive_sync.metastore.uris' = 'Value of hive.metastore.uris in the hive-site.xml file on the Hive client', + 'hive_sync.jdbc_url' = 'Value of CLIENT_HIVE_URI in the component_env file on the Hive client' + ); + +- Synchronizing metadata to Hive in HMS mode + + .. code-block:: + + CREATE TABLE stream_mor( + uuid VARCHAR(20), + name VARCHAR(10), + age INT, + ts INT, + `p` VARCHAR(20) + ) PARTITIONED BY (`p`) WITH ( + 'connector' = 'hudi', + 'path' = 'hdfs://hacluster/tmp/hudi/stream_mor', + 'table.type' = 'MERGE_ON_READ', + 'hive_sync.enable' = 'true', + 'hive_sync.table' = 'Name of the table to be synchronized to Hive', + 'hive_sync.db' = 'Name of the database to be synchronized to Hive', + 'hive_sync.mode' = 'hms', + 'hive_sync.metastore.uris' = 'Value of hive.metastore.uris in the hive-site.xml file on the Hive client', + 'properties.hive.metastore.kerberos.principal' = 'Value of hive.metastore.kerberos.principal in the hive-site.xml file on the Hive client' + ); diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_jobs_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_jobs_on_the_flink_web_ui.rst index d33634c..6a92052 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_jobs_on_the_flink_web_ui.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_jobs_on_the_flink_web_ui.rst @@ -47,8 +47,6 @@ Creating a Job a. Develop the job on the job development page. - |image1| - b. Click **Check Semantic** to check the input content and click **Format SQL** to format SQL statements. c. After the job SQL statements are developed, set running parameters by referring to :ref:`Table 2 ` and click **Save**. @@ -104,12 +102,6 @@ Creating a Job a. Click **Select**, upload a local JAR file, and set parameters by referring to :ref:`Table 3 `. - - .. figure:: /_static/images/en-us_image_0000001349059937.png - :alt: **Figure 1** Creating a Flink JAR job - - **Figure 1** Creating a Flink JAR job - .. _mrs_01_24024__en-us_topic_0000001173470782_table1388311381402: .. table:: **Table 3** Parameter configuration @@ -205,5 +197,3 @@ Deleting a Job #. Access the Flink web UI. For details, see :ref:`Accessing the Flink Web UI `. #. Click **Job Management**. The job management page is displayed. #. In the **Operation** column of the item to be deleted, click **Delete**, and click **OK** in the displayed page. Jobs in the **Draft**, **Saved**, **Submission failed**, **Running succeeded**, **Running failed**, or **Stop** state can be deleted. - -.. |image1| image:: /_static/images/en-us_image_0000001387905484.png diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/index.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/index.rst similarity index 86% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/index.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/index.rst index bcccb93..923fb1b 100644 --- a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/index.rst +++ b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/index.rst @@ -2,8 +2,8 @@ .. _mrs_01_24223: -Managing UDFs on the Flink Web UI -================================= +Managing UDFs +============= - :ref:`Managing UDFs on the Flink Web UI ` - :ref:`UDF Java and SQL Examples ` diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/managing_udfs_on_the_flink_web_ui.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/managing_udfs_on_the_flink_web_ui.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/managing_udfs_on_the_flink_web_ui.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/managing_udfs_on_the_flink_web_ui.rst diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udaf_java_and_sql_examples.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/udaf_java_and_sql_examples.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udaf_java_and_sql_examples.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/udaf_java_and_sql_examples.rst diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udf_java_and_sql_examples.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/udf_java_and_sql_examples.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udf_java_and_sql_examples.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/udf_java_and_sql_examples.rst diff --git a/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udtf_java_and_sql_examples.rst b/doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/udtf_java_and_sql_examples.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs_on_the_flink_web_ui/udtf_java_and_sql_examples.rst rename to doc/component-operation-guide-lts/source/using_flink/using_the_flink_web_ui/managing_udfs/udtf_java_and_sql_examples.rst diff --git a/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/overview.rst b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/flume_service_model_overview.rst similarity index 85% rename from doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/overview.rst rename to doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/flume_service_model_overview.rst index f76e1ba..9c3ba83 100644 --- a/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/overview.rst +++ b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/flume_service_model_overview.rst @@ -2,8 +2,8 @@ .. _mrs_01_1074: -Overview -======== +Flume Service Model Overview +============================ Guide a reasonable Flume service configuration by providing performance differences between Flume common modules, to avoid a nonstandard overall service performance caused when a frontend Source and a backend Sink do not match in performance. diff --git a/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/index.rst b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/index.rst index 39976b5..3020349 100644 --- a/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/index.rst +++ b/doc/component-operation-guide-lts/source/using_flume/configuring_the_flume_service_model/index.rst @@ -5,12 +5,12 @@ Configuring the Flume Service Model =================================== -- :ref:`Overview ` +- :ref:`Flume Service Model Overview ` - :ref:`Service Model Configuration Guide ` .. toctree:: :maxdepth: 1 :hidden: - overview + flume_service_model_overview service_model_configuration_guide diff --git a/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/index.rst b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/index.rst index dff45c0..fb20182 100644 --- a/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/index.rst +++ b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/index.rst @@ -6,11 +6,11 @@ Encrypted Transmission ====================== - :ref:`Configuring the Encrypted Transmission ` -- :ref:`Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS ` +- :ref:`Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS Encrypted Transmission ` .. toctree:: :maxdepth: 1 :hidden: configuring_the_encrypted_transmission - typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs + typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs_encrypted_transmission diff --git a/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs_encrypted_transmission.rst similarity index 99% rename from doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst rename to doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs_encrypted_transmission.rst index 0c811fb..c63cd8a 100644 --- a/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs.rst +++ b/doc/component-operation-guide-lts/source/using_flume/encrypted_transmission/typical_scenario_collecting_local_static_logs_and_uploading_them_to_hdfs_encrypted_transmission.rst @@ -2,8 +2,8 @@ .. _mrs_01_1070: -Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS -========================================================================= +Typical Scenario: Collecting Local Static Logs and Uploading Them to HDFS Encrypted Transmission +================================================================================================ Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_flume/overview.rst b/doc/component-operation-guide-lts/source/using_flume/flume_overview.rst similarity index 99% rename from doc/component-operation-guide-lts/source/using_flume/overview.rst rename to doc/component-operation-guide-lts/source/using_flume/flume_overview.rst index 47e69f0..fc56f10 100644 --- a/doc/component-operation-guide-lts/source/using_flume/overview.rst +++ b/doc/component-operation-guide-lts/source/using_flume/flume_overview.rst @@ -2,8 +2,8 @@ .. _mrs_01_0391: -Overview -======== +Flume Overview +============== Flume is a distributed, reliable, and highly available system for aggregating massive logs, which can efficiently collect, aggregate, and move massive log data from different data sources and store the data in a centralized data storage system. Various data senders can be customized in the system to collect data. Additionally, Flume provides simple data processes capabilities and writes data to data receivers (which is customizable). diff --git a/doc/component-operation-guide-lts/source/using_flume/index.rst b/doc/component-operation-guide-lts/source/using_flume/index.rst index a396eb9..76510cc 100644 --- a/doc/component-operation-guide-lts/source/using_flume/index.rst +++ b/doc/component-operation-guide-lts/source/using_flume/index.rst @@ -6,7 +6,7 @@ Using Flume =========== - :ref:`Using Flume from Scratch ` -- :ref:`Overview ` +- :ref:`Flume Overview ` - :ref:`Installing the Flume Client on Clusters ` - :ref:`Viewing Flume Client Logs ` - :ref:`Stopping or Uninstalling the Flume Client ` @@ -30,7 +30,7 @@ Using Flume :hidden: using_flume_from_scratch - overview + flume_overview installing_the_flume_client_on_clusters viewing_flume_client_logs stopping_or_uninstalling_the_flume_client diff --git a/doc/component-operation-guide-lts/source/using_flume/introduction_to_flume_logs.rst b/doc/component-operation-guide-lts/source/using_flume/introduction_to_flume_logs.rst index 60e8549..a8acfd0 100644 --- a/doc/component-operation-guide-lts/source/using_flume/introduction_to_flume_logs.rst +++ b/doc/component-operation-guide-lts/source/using_flume/introduction_to_flume_logs.rst @@ -18,31 +18,33 @@ Log Description .. table:: **Table 1** Flume log list - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | Type | Name | Description | - +==========+=====================================+======================================================================================+ - | Run logs | /flume/flumeServer.log | Log file that records FlumeServer running environment information. | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /flume/install.log | FlumeServer installation log file | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /flume/flumeServer-gc.log.\ ** | GC log file of the FlumeServer process | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /flume/prestartDvietail.log | Work log file before the FlumeServer startup | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /flume/startDetail.log | Startup log file of the Flume process | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /flume/stopDetail.log | Shutdown log file of the Flume process | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /monitor/monitorServer.log | Log file that records MonitorServer running environment information | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /monitor/startDetail.log | Startup log file of the MonitorServer process | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | /monitor/stopDetail.log | Shutdown log file of the MonitorServer process | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | function.log | External function invoking log file | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ - | | threadDump-.log | The jstack log file to be printed when the NodeAgent delivers a service stop command | - +----------+-------------------------------------+--------------------------------------------------------------------------------------+ + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | Type | Name | Description | + +============================================+=====================================+======================================================================================+ + | Run logs | /flume/flumeServer.log | Log file that records FlumeServer running environment information. | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/install.log | FlumeServer installation log file | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/flumeServer-gc.log.\ ** | GC log file of the FlumeServer process | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/prestartDvietail.log | Work log file before the FlumeServer startup | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/startDetail.log | Startup log file of the Flume process | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /flume/stopDetail.log | Shutdown log file of the Flume process | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /monitor/monitorServer.log | Log file that records MonitorServer running environment information | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /monitor/startDetail.log | Startup log file of the MonitorServer process | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | /monitor/stopDetail.log | Shutdown log file of the MonitorServer process | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | function.log | External function invoking log file | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | | threadDump-.log | The jstack log file to be printed when the NodeAgent delivers a service stop command | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ + | Stack information log (MRS 3.2.0 or later) | threadDump-.log | The jstack log file to be printed when the NodeAgent delivers a service stop command | + +--------------------------------------------+-------------------------------------+--------------------------------------------------------------------------------------+ Log Level --------- diff --git a/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_data_compression_and_encoding.rst b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_data_compression_and_encoding.rst index 6da6c3b..5fb02cf 100644 --- a/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_data_compression_and_encoding.rst +++ b/doc/component-operation-guide-lts/source/using_hbase/configuring_hbase_data_compression_and_encoding.rst @@ -1,6 +1,6 @@ -:original_name: en-us_topic_0000001295898904.html +:original_name: mrs_01_24112.html -.. _en-us_topic_0000001295898904: +.. _mrs_01_24112: Configuring HBase Data Compression and Encoding =============================================== diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/index.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/index.rst index d8dad88..edeb608 100644 --- a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/index.rst +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/index.rst @@ -10,7 +10,7 @@ HBase Performance Tuning - :ref:`Optimizing Put and Scan Performance ` - :ref:`Improving Real-time Data Write Performance ` - :ref:`Improving Real-time Data Read Performance ` -- :ref:`Optimizing JVM Parameters ` +- :ref:`Optimizing HBase JVM Parameters ` .. toctree:: :maxdepth: 1 @@ -21,4 +21,4 @@ HBase Performance Tuning optimizing_put_and_scan_performance improving_real-time_data_write_performance improving_real-time_data_read_performance - optimizing_jvm_parameters + optimizing_hbase_jvm_parameters diff --git a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_jvm_parameters.rst b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_hbase_jvm_parameters.rst similarity index 97% rename from doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_jvm_parameters.rst rename to doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_hbase_jvm_parameters.rst index 247ac48..aa59970 100644 --- a/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_jvm_parameters.rst +++ b/doc/component-operation-guide-lts/source/using_hbase/hbase_performance_tuning/optimizing_hbase_jvm_parameters.rst @@ -2,8 +2,8 @@ .. _mrs_01_1019: -Optimizing JVM Parameters -========================= +Optimizing HBase JVM Parameters +=============================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_hbase/in-house_enhanced_phoenix/csvbulkloadtool_supports_parsing_user-defined_delimiters_in_data_files.rst b/doc/component-operation-guide-lts/source/using_hbase/in-house_enhanced_phoenix/csvbulkloadtool_supports_parsing_user-defined_delimiters_in_data_files.rst new file mode 100644 index 0000000..9731275 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/in-house_enhanced_phoenix/csvbulkloadtool_supports_parsing_user-defined_delimiters_in_data_files.rst @@ -0,0 +1,106 @@ +:original_name: mrs_01_24580.html + +.. _mrs_01_24580: + +CsvBulkloadTool Supports Parsing User-Defined Delimiters in Data Files +====================================================================== + +Scenario +-------- + +Phoenix provides CsvBulkloadTool, a batch data import tool. This tool supports import of user-defined delimiters. Specifically, users can use any visible characters within the specified length as delimiters to import data files. + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Constraints +----------- + +- User-defined delimiters cannot be an empty string. +- A user-defined delimiter can contain a maximum of 16 characters. + + .. note:: + + A long delimiter affects parsing efficiency, slows down data import, reduces the proportion of valid data, and results in large files. Use short delimiters as possible. + +- User-defined delimiters must be visible characters. + + .. note:: + + A user-defined delimiter whitelist can be configured to avoid any injection issues possible. Currently, the following delimiters are supported: letters, numbers, and special characters (:literal:`\`~!@#$%^&*()\\\\-_=+\\\\[\\\\]{}\\\\\\\\|;:'\\",<>./?`). + +- The start and end of a user-defined delimiter cannot be the same. + +Description of New Parameters +----------------------------- + +The following two parameters are added based on the open source CsvBulkloadTool: + +- **--multiple-delimiter(-md)** + + This parameter specifies the user-defined delimiter. If this parameter is specified, it takes effect preferentially and overwrites the **-d** parameter in the original command. + +- **--multiple-delimiter-skip-check(-mdsc)** + + This parameter is used to skip the delimiter length and whitelist verification. It is not recommended. + +Procedure +--------- + +#. .. _mrs_01_24580__li1033704115320: + + Upload the data file to the node where the client is deployed. For example, upload the **data.csv** file to the **/opt/test** directory on the target node. The delimiter is **\|^[**. The file content is as follows: + + |image1| + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to go to the client directory: + + **cd** *Client installation directory* + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. Run the following command to authenticate the current user if Kerberos authentication is enabled for the current cluster. The current user must have the permissions to create HBase tables and operate HDFS. + + **kinit** *Component service user* + + Run the following command to set the Hadoop username if Kerberos authentication is not enabled for the current cluster: + + **export HADOOP_USER_NAME=hbase** + +#. Run the following command to upload the data file **data.csv** in :ref:`1 ` to an HDFS directory, for example, **/tmp**: + + **hdfs dfs -put /opt/test/data.csv /tmp** + +#. Run the Phoenix client command. + + **sqlline.py** + +#. Run the following command to create the **TEST** table: + + **CREATE TABLE TEST ( ID INTEGER NOT NULL PRIMARY KEY, NAME VARCHAR, AGE INTEGER, ADDRESS VARCHAR, GENDER BOOLEAN, A DECIMAL, B DECIMAL ) split on (1, 2, 3,4,5,6,7,8,9);** + + After the table is created, run the **!quit** command to exit the Phoenix CLI. + +#. Run the following import command: + + **hbase org.apache.phoenix.mapreduce.CsvBulkLoadTool -md '**\ *User-defined delimiter*\ **' -t** *Table name* **-i** *Data path* + + For example, to import the **data.csv** file to the **TEST** table, run the following command: + + **hbase org.apache.phoenix.mapreduce.CsvBulkLoadTool -md '\|^[' -t** **TEST** **-i** **/tmp/data.csv** + +#. Run the following command to view data imported to the **TEST** table: + + **sqlline.py** + + **SELECT \* FROM TEST LIMIT 10;** + + |image2| + +.. |image1| image:: /_static/images/en-us_image_0000001532503042.png +.. |image2| image:: /_static/images/en-us_image_0000001583182157.png diff --git a/doc/component-operation-guide-lts/source/using_hbase/in-house_enhanced_phoenix/index.rst b/doc/component-operation-guide-lts/source/using_hbase/in-house_enhanced_phoenix/index.rst new file mode 100644 index 0000000..bee905a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hbase/in-house_enhanced_phoenix/index.rst @@ -0,0 +1,14 @@ +:original_name: mrs_01_24579.html + +.. _mrs_01_24579: + +In-House Enhanced Phoenix +========================= + +- :ref:`CsvBulkloadTool Supports Parsing User-Defined Delimiters in Data Files ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + csvbulkloadtool_supports_parsing_user-defined_delimiters_in_data_files diff --git a/doc/component-operation-guide-lts/source/using_hbase/index.rst b/doc/component-operation-guide-lts/source/using_hbase/index.rst index 72f2e2e..7a1cfa5 100644 --- a/doc/component-operation-guide-lts/source/using_hbase/index.rst +++ b/doc/component-operation-guide-lts/source/using_hbase/index.rst @@ -12,9 +12,10 @@ Using HBase - :ref:`Enabling Cross-Cluster Copy ` - :ref:`Supporting Full-Text Index ` - :ref:`Using the ReplicationSyncUp Tool ` +- :ref:`In-House Enhanced Phoenix ` - :ref:`Configuring HBase DR ` - :ref:`Performing an HBase DR Service Switchover ` -- :ref:`Configuring HBase Data Compression and Encoding ` +- :ref:`Configuring HBase Data Compression and Encoding ` - :ref:`Performing an HBase DR Active/Standby Cluster Switchover ` - :ref:`Community BulkLoad Tool ` - :ref:`Configuring the MOB ` @@ -36,6 +37,7 @@ Using HBase enabling_cross-cluster_copy supporting_full-text_index using_the_replicationsyncup_tool + in-house_enhanced_phoenix/index configuring_hbase_dr performing_an_hbase_dr_service_switchover configuring_hbase_data_compression_and_encoding diff --git a/doc/component-operation-guide-lts/source/using_hdfs/closing_hdfs_files.rst b/doc/component-operation-guide-lts/source/using_hdfs/closing_hdfs_files.rst new file mode 100644 index 0000000..8875701 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hdfs/closing_hdfs_files.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_24485.html + +.. _mrs_01_24485: + +Closing HDFS Files +================== + +Scenario +-------- + +By default, an HDFS file can be closed only if all blocks are reported (in the **COMPLETED** state). Therefore, the write performance of HDFS is affected by waiting for DataNode blocks and NameNode processing blocks to be reported. For a cluster with heavy load, the waiting consumption has a great impact on the cluster. You can configure the **dfs.namenode.file.close.num-committed-allowed** parameter of HDFS to close files in advance to improve data write performance. However, data may fail to be read because the block cannot be found or the data block information recorded in the NameNode metadata is inconsistent with that stored in the DataNode. Therefore, this feature does not apply to the scenario where data is read immediately after being written. Exercise caution when using this feature. + +.. note:: + + This section applies to MRS 3.2.0 or later. + +Procedure +--------- + +#. Log in to FusionInsight Manager. +#. Choose **Cluster** > **Services** > **HDFS** and click the **Configurations** tab and then **All Configurations**. +#. Search for and modify the **dfs.namenode.file.close.num-committed-allowed** parameter. For more information, see the following table. + + +-----------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===============================================+========================================================================================================================================================================+ + | dfs.namenode.file.close.num-committed-allowed | Maximum number of blocks in the **COMMITTED** state in the file to be closed. | + | | | + | | The default value is 0, indicating that this feature is disabled. If this feature is enabled, the recommended value is **1** or **2**. | + | | | + | | For example, if this parameter is set to **1**, it indicates that a file can be closed without waiting for status of the last block status to change to **COMPLETED**. | + +-----------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Save the configuration. +#. On the **Instance** page of HDFS, select the active and standby NameNode instances, choose **More** > **Instance Rolling Restart**, and wait until the rolling restart is complete. diff --git a/doc/component-operation-guide-lts/source/using_hdfs/index.rst b/doc/component-operation-guide-lts/source/using_hdfs/index.rst index 33c434b..50d85b9 100644 --- a/doc/component-operation-guide-lts/source/using_hdfs/index.rst +++ b/doc/component-operation-guide-lts/source/using_hdfs/index.rst @@ -31,6 +31,7 @@ Using HDFS - :ref:`Configuring HDFS NodeLabel ` - :ref:`Configuring HDFS DiskBalancer ` - :ref:`Performing Concurrent Operations on HDFS Files ` +- :ref:`Closing HDFS Files ` - :ref:`Introduction to HDFS Logs ` - :ref:`HDFS Performance Tuning ` - :ref:`FAQ ` @@ -65,6 +66,7 @@ Using HDFS configuring_hdfs_nodelabel configuring_hdfs_diskbalancer performing_concurrent_operations_on_hdfs_files + closing_hdfs_files introduction_to_hdfs_logs hdfs_performance_tuning/index faq/index diff --git a/doc/component-operation-guide-lts/source/using_hdfs/overview_of_hdfs_file_system_directories.rst b/doc/component-operation-guide-lts/source/using_hdfs/overview_of_hdfs_file_system_directories.rst index 9e964a8..7656d4e 100644 --- a/doc/component-operation-guide-lts/source/using_hdfs/overview_of_hdfs_file_system_directories.rst +++ b/doc/component-operation-guide-lts/source/using_hdfs/overview_of_hdfs_file_system_directories.rst @@ -50,7 +50,7 @@ This section describes the directory structure in HDFS, as shown in the followin +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ | /user/oozie | Fixed directory | Stores dependent libraries required for Oozie running, which needs to be manually uploaded. | No | Failed to schedule Oozie. | +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ - | /user/mapred/hadoop-mapreduce-*3.1.1*.tar.gz | Fixed files | Stores JAR files used by the distributed MR cache. | No | The MR distributed cache function is unavailable. | + | /user/mapred/hadoop-mapreduce-xxx.tar.gz | Fixed files | Stores JAR files used by the distributed MR cache. | No | The MR distributed cache function is unavailable. | +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ | /user/hive | Fixed directory | Stores Hive-related data by default, including the depended Spark lib package and default table data storage path. | No | User data is lost. | +----------------------------------------------------+---------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/before_you_start.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/before_you_start.rst index 6c785d2..2f37222 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/before_you_start.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/before_you_start.rst @@ -26,7 +26,7 @@ HetuEngine supports quick joint query of multiple data sources and GUI-based dat +-----------------+---------------+------------------+---------------------------------------------------------------+ | | GaussDB | | GaussDB 200 and GaussDB A 8.0.0 | +-----------------+---------------+------------------+---------------------------------------------------------------+ - | | Hudi | | MRS 3.1.1 or later | + | | Hudi | | MRS 3.1.2 or later | +-----------------+---------------+------------------+---------------------------------------------------------------+ | | ClickHouse | | MRS 3.1.1 or later | +-----------------+---------------+------------------+---------------------------------------------------------------+ @@ -36,7 +36,7 @@ HetuEngine supports quick joint query of multiple data sources and GUI-based dat +-----------------+---------------+------------------+---------------------------------------------------------------+ | | Elasticsearch | | Elasticsearch of the current cluster | +-----------------+---------------+------------------+---------------------------------------------------------------+ - | | Hudi | | MRS 3.1.1 or later | + | | Hudi | | MRS 3.1.2 or later | +-----------------+---------------+------------------+---------------------------------------------------------------+ | | ClickHouse | | MRS 3.1.1 or later | +-----------------+---------------+------------------+---------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_an_iotdb_data_source.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_an_iotdb_data_source.rst new file mode 100644 index 0000000..8c1c688 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/configuring_an_iotdb_data_source.rst @@ -0,0 +1,120 @@ +:original_name: mrs_01_24743.html + +.. _mrs_01_24743: + +Configuring an IoTDB Data Source +================================ + +This section applies to MRS 3.2.0 or later. + +Scenario +-------- + +Add an IoTDB JDBC data source on HSConsole of a cluster in security mode. + +Prerequisites +------------- + +- The domain name of the cluster where the data source is located must be different from that of the HetuEngine cluster. +- The cluster where the data source is located and the HetuEngine cluster nodes can communicate with each other. +- A HetuEngine compute instance has been created. +- By default, SSL is enabled for the IoTDB service in a security cluster. After SSL is enabled, you need to upload the **truststore.jks** file. For details about how to obtain the file, see :ref:`Using the IoTDB Client `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a HetuEngine administrator and choose **Cluster** > **Services** > **HetuEngine**. + +#. On the **Dashboard** tab page that is displayed, find the **Basic Information** area, and click the link next to **HSConsole WebUI**. + +#. Choose **Data Source** and click **Add Data Source**. Configure parameters on the **Add Data Source** page. + + a. In the **Basic Configuration** area, configure **Name** and choose **JDBC** > **IoTDB** for **Data Source Type**. + + b. Configure parameters in the **IoTDB Configuration** area by referring to :ref:`Table 1 `. + + .. _mrs_01_24743__en-us_topic_0000001521279689_table102190549122: + + .. table:: **Table 1** IoTDB configuration parameters + + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Example Value | + +=======================+============================================================================================================================================================+==============================================================================================================================================================+ + | Driver | The default value is **iotdb**. | iotdb | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | JDBC URL | JDBC URL for connecting to IoTDB. | jdbc:iotdb://10.10.10.11:22260,10.10.10.12:22260 | + | | | | + | | Format: **jdbc:iotdb://**\ *Service IP address 1 of IoTDBServer*\ **:**\ *Port number*\ **,**\ *Service IP address 2 of IoTDBServer*\ **:**\ *Port number* | | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Username | IoTDB username for connecting to the IoTDB data source | .. note:: | + | | | | + | | | If the cluster where IoTDB resides is in non-security mode, set this parameter to the default IoTDB user **root**. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Password | Password of the IoTDB username for connecting to the IoTDB data source | .. note:: | + | | | | + | | | If the cluster where the IoTDB service is installed is in non-security mode, obtain the password of user **root** from the administrator of this cluster. | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Enable SSL | Whether SSL is enabled for the IoTDB service. SSL is enabled by default in a security cluster. | Yes | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | truststore File | After SSL is enabled for IoTDB, upload the **truststore.jks** file. | ``-`` | + +-----------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. note:: + + - Service IP addresses of IoTDBServer: + + Log in to FusionInsight Manager, choose **Cluster** > **Services** > **IoTDB**. On the page that is displayed, click the **Instance** tab. On this tab page, check **Service IP Address** of IoTDBServer. + + - Port number: + + Log in to FusionInsight Manager, choose **Cluster** > **Services** > **IoTDB**. On the page that is displayed, click the **Configurations** tab. On this tab page, search for and check the value of **IOTDB_SERVER_RPC_PORT**. The default value is **22260**. + + c. (Optional) Add custom configurations as needed. + + d. Click **OK**. + +#. Log in to the node where the cluster client is located and run the following commands to switch to the client installation directory and authenticate the user: + + **cd /opt/client** + + **source bigdata_env** + + **kinit** *User performing HetuEngine operations* (If the cluster is in normal mode, skip this step.) + +#. Run the following command to log in to the catalog of the data source: + + **hetu-cli --catalog** *Data source name* **--schema** *Database name* + + For example, run the following command: + + **hetu-cli --catalog** **iotdb_1** **--schema root.ln** + +#. Run the following command. If the database table information can be viewed or no error is reported, the connection is successful. + + **show tables;** + +Data Type Mapping +----------------- + +=============== ==================== +IoTDB Data Type HetuEngine Data Type +=============== ==================== +BOOLEAN BOOLEAN +INT32 BIGINT +INT64 BIGINT +FLOAT DOUBLE +DOUBLE DOUBLE +TEXT VARCHAR +=============== ==================== + +Function Enhancement +-------------------- + +- IoTDB can confgiure any label fields for time series. These IoTDB label fields and other data sources can be jointly queried through HetuEngine. +- Any nodes that are stored by IoTDB to the time series can be used as tables for data query on HetuEngine. + +Constraints +----------- + +- IoTDB data cannot be created but can be queried. +- The IoTDB user who uses HetuEngine for query must at least be configured with the read permission on the root directory. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/index.rst index eff1440..d315f8b 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/index.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/configuring_data_sources/index.rst @@ -11,6 +11,7 @@ Configuring Data Sources - :ref:`Configuring a GaussDB Data Source ` - :ref:`Configuring a HetuEngine Data Source ` - :ref:`Configuring a ClickHouse Data Source ` +- :ref:`Configuring an IoTDB Data Source ` .. toctree:: :maxdepth: 1 @@ -22,3 +23,4 @@ Configuring Data Sources configuring_a_gaussdb_data_source configuring_a_hetuengine_data_source configuring_a_clickhouse_data_source + configuring_an_iotdb_data_source diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/configuring_permissions_for_tables_columns_and_databases.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/configuring_permissions_for_tables_columns_and_databases.rst index cd3354c..b1d1c64 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/configuring_permissions_for_tables_columns_and_databases.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/configuring_permissions_for_tables_columns_and_databases.rst @@ -27,38 +27,34 @@ Concepts .. table:: **Table 1** Using HetuEngine tables, columns, or data - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | Scenario | Required Permission | - +===================================+=============================================================================================================================================+ - | DESCRIBE TABLE | Select | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | ANALYZE TABLE | Select and Insert | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | SHOW COLUMNS | Select | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | SHOW TABLE STATUS | Select | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | SHOW TABLE PROPERTIES | Select | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | SELECT | Select | - | | | - | | .. note:: | - | | | - | | To perform the SELECT operation on a view, you must have the **Select** permission on the view and the tables corresponding to the view. | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | EXPLAIN | Select | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | CREATE VIEW | Select, Grant Of Select, and Create | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | CREATE TABLE | Create | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | ALTER TABLE ADD PARTITION | Insert | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | INSERT | Insert | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | INSERT OVERWRITE | Insert and Delete | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | ALTER TABLE DROP PARTITION | The table-level Alter and Delete, and column-level Select permissions need to be granted. | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ - | ALTER DATABASE | Hive Admin Privilege | - +-----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + +----------------------------+-------------------------------------------------------------------------------------------+ + | Scenario | Required Permission | + +============================+===========================================================================================+ + | DESCRIBE TABLE | Select | + +----------------------------+-------------------------------------------------------------------------------------------+ + | ANALYZE TABLE | Select and Insert | + +----------------------------+-------------------------------------------------------------------------------------------+ + | SHOW COLUMNS | Select | + +----------------------------+-------------------------------------------------------------------------------------------+ + | SHOW TABLE STATUS | Select | + +----------------------------+-------------------------------------------------------------------------------------------+ + | SHOW TABLE PROPERTIES | Select | + +----------------------------+-------------------------------------------------------------------------------------------+ + | SELECT | Select | + +----------------------------+-------------------------------------------------------------------------------------------+ + | EXPLAIN | Select | + +----------------------------+-------------------------------------------------------------------------------------------+ + | CREATE VIEW | Select, Grant Of Select, and Create | + +----------------------------+-------------------------------------------------------------------------------------------+ + | CREATE TABLE | Create | + +----------------------------+-------------------------------------------------------------------------------------------+ + | ALTER TABLE ADD PARTITION | Insert | + +----------------------------+-------------------------------------------------------------------------------------------+ + | INSERT | Insert | + +----------------------------+-------------------------------------------------------------------------------------------+ + | INSERT OVERWRITE | Insert and Delete | + +----------------------------+-------------------------------------------------------------------------------------------+ + | ALTER TABLE DROP PARTITION | The table-level Alter and Delete, and column-level Select permissions need to be granted. | + +----------------------------+-------------------------------------------------------------------------------------------+ + | ALTER DATABASE | Hive Admin Privilege | + +----------------------------+-------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/index.rst index 1e48dce..a0aa9a9 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/index.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/index.rst @@ -5,7 +5,7 @@ HetuEngine MetaStore-based Permission Control ============================================= -- :ref:`Overview ` +- :ref:`MetaStore Permission Overview ` - :ref:`Creating a HetuEngine Role ` - :ref:`Configuring Permissions for Tables, Columns, and Databases ` @@ -13,6 +13,6 @@ HetuEngine MetaStore-based Permission Control :maxdepth: 1 :hidden: - overview + metastore_permission_overview creating_a_hetuengine_role configuring_permissions_for_tables_columns_and_databases diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/overview.rst b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/metastore_permission_overview.rst similarity index 99% rename from doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/overview.rst rename to doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/metastore_permission_overview.rst index 28e7245..3465b53 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/overview.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/hetuengine_permission_management/hetuengine_metastore-based_permission_control/metastore_permission_overview.rst @@ -2,8 +2,8 @@ .. _mrs_01_1725: -Overview -======== +MetaStore Permission Overview +============================= Constraints: This parameter applies only to the Hive data source. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/index.rst index 95bea5f..1ec4097 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/index.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/index.rst @@ -14,6 +14,8 @@ Using HetuEngine - :ref:`Using the HetuEngine Client ` - :ref:`Using the HetuEngine Cross-Source Function ` - :ref:`Using HetuEngine Cross-Domain Function ` +- :ref:`Using HetuEngine Materialized Views ` +- :ref:`Using HetuEngine SQL Diagnosis ` - :ref:`Using a Third-Party Visualization Tool to Access HetuEngine ` - :ref:`Function & UDF Development and Application ` - :ref:`Introduction to HetuEngine Logs ` @@ -33,6 +35,8 @@ Using HetuEngine using_the_hetuengine_client using_the_hetuengine_cross-source_function/index using_hetuengine_cross-domain_function/index + using_hetuengine_materialized_views/index + using_hetuengine_sql_diagnosis using_a_third-party_visualization_tool_to_access_hetuengine/index function_and_udf_development_and_application/index introduction_to_hetuengine_logs diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/index.rst index 2e848aa..4f51f3d 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/index.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/index.rst @@ -8,7 +8,6 @@ Using a Third-Party Visualization Tool to Access HetuEngine - :ref:`Usage Instruction ` - :ref:`Using DBeaver to Access HetuEngine ` - :ref:`Using Tableau to Access HetuEngine ` -- :ref:`Using PowerBI to Access HetuEngine ` - :ref:`Using Yonghong BI to Access HetuEngine ` .. toctree:: @@ -18,5 +17,4 @@ Using a Third-Party Visualization Tool to Access HetuEngine usage_instruction using_dbeaver_to_access_hetuengine using_tableau_to_access_hetuengine - using_powerbi_to_access_hetuengine using_yonghong_bi_to_access_hetuengine diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_dbeaver_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_dbeaver_to_access_hetuengine.rst index 8047d6a..8907aaa 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_dbeaver_to_access_hetuengine.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_dbeaver_to_access_hetuengine.rst @@ -5,237 +5,130 @@ Using DBeaver to Access HetuEngine ================================== -This section uses DBeaver 6.3.5 as an example to describe how to perform operations on HetuEngine. +Use DBeaver 7.2.0 as an example to describe how to access HetuEngine. + +.. _mrs_01_2337__section10331142641215: Prerequisites ------------- -- The DBeaver has been installed properly. Download the DBeaver software from https://dbeaver.io/files/6.3.5/. - - .. note:: - - Currently, DBeaver 5.\ *x* and 6.\ *x* are supported. - -- A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. +- The DBeaver has been installed properly. Download the DBeaver software from https://dbeaver.io/files/7.2.0/. +- A human-machine user, for example, **hetu_user**, has been created in the cluster. For details, see :ref:`Creating a HetuEngine User `. For clusters with Ranger authentication enabled, the Ranger permission must be added to user **hetu_user** based on service requirements. For details, see :ref:`Adding a Ranger Access Permission Policy for HetuEngine `. +- A compute instance has been created and is running properly. For details, see :ref:`Creating HetuEngine Compute Instances `. Procedure --------- -Method 1: Using ZooKeeper to access HetuEngine +#. .. _mrs_01_2337__li599475416716: -#. .. _mrs_01_2337__en-us_topic_0000001219029577_li1747527125: - - Download the HetuEngine client. + Download the HetuEngine client to obtain the JDBC JAR package. a. Log in to FusionInsight Manager. - b. Choose **Cluster** > **Services** > **HetuEngine** > **Dashboard**. - c. In the upper right corner of the page, choose **More** > **Download Client** and download the **Complete Client** to the local PC as prompted. - - d. .. _mrs_01_2337__en-us_topic_0000001219029577_li1727232161619: - - Decompress the HetuEngine client package **FusionInsight_Cluster\_**\ *Cluster ID*\ **\_ HetuEngine\_Client.tar** to obtain the JDBC file and save it to a local directory, for example, **D:\\test**. + d. Decompress the HetuEngine client package **FusionInsight_Cluster\_**\ *Cluster ID*\ **\_ HetuEngine\_Client.tar** to obtain the JDBC file and save it to a local directory, for example, **D:\\test**. .. note:: Obtaining the JDBC file: - Obtain the **hetu-jdbc-*.jar** file from the **FusionInsight_Cluster\_**\ *Cluster ID*\ **\_HetuEngine\_ClientConfig\\HetuEngine\\xxx\\** directory. + Decompress the package in the **FusionInsight_Cluster\_**\ *Cluster ID*\ **\_HetuEngine\_ClientConfig\\HetuEngine\\xxx\\** directory to obtain the **hetu-jdbc-*.jar** file. Note: **xxx** can be **arm** or **x86**. -#. Download the Kerberos authentication file of the HetuEngine user. +#. Add the host mapping to the local **hosts** file. - a. Log in to FusionInsight Manager. - b. Choose **System** > **Permission** > **User**. - c. Locate the row that contains the target HetuEngine user, click **More** in the **Operation** column, and select **Download Authentication Credential**. - d. Decompress the downloaded package to obtain the **user.keytab** and **krb5.conf** files. + Add the mapping of the host where the instance is located in the HSFabric or HSBroker mode. The format is *Host IP address* *Host name*. -#. Log in to the node where the HSBroker role is deployed in the cluster as user **omm**, go to the **${BIGDATA_HOME}/FusionInsight_Hetu\_8.1.2.2/xxx\ \_HSBroker/etc/** directory, and download the **jaas-zk.conf** and **hetuserver.jks** files to the local PC. - - .. note:: - - The version 8.1.2.2 is used as an example. Replace it with the actual version number. - - Modify the **jaas-zk.conf** file as follows. **keyTab** is the keytab file path of the user who accesses HetuEngine, and **principal** is *Username for accessing HetuEngine*\ **@Domain name in uppercase.COM**. - - .. code-block:: - - Client { - com.sun.security.auth.module.Krb5LoginModule required - useKeyTab=true - keyTab="D:\\tmp\\user.keytab" - principal="admintest@HADOOP.COM" - useTicketCache=false - storeKey=true - debug=true; - }; - -#. Add the host mapping to the local **hosts** file. The content format is as follows: - - *Host IP address Host name* - - Example: 192.168.23.221 192-168-23-221 + Example: **192.168.42.90 server-2110081635-0001** .. note:: The local **hosts** file in a Windows environment is stored in, for example, **C:\\Windows\\System32\\drivers\\etc**. -#. Configure the DBeaver startup file **dbeaver.ini**. - - a. Add the Java path to the file. - - .. code-block:: - - -VM - C:\Program Files\Java\jdk1.8.0_131\bin - - b. Set the ZooKeeper and Kerberos parameters by referring to the following parameters. Replace the file paths with the actual paths. - - .. code-block:: - - -Dsun.security.krb5.debug=true - -Djava.security.auth.login.config=D:\tmp\jaas-zk.conf - -Dzookeeper.sasl.clientconfig=Client - -Dzookeeper.auth.type=kerberos - -Djava.security.krb5.conf=D:\tmp\krb5.conf - - .. note:: - - - The Greenwich Mean Time (GMT) is not supported. If the current time zone is GMT+, add **-Duser.timezone=UTC** to the **dbeaver.ini** file to change the time zone to UTC. - - If DBeaver is started, restart the DBeaver software for the new configuration items in the **dbeaver.ini** file to take effect. - -#. Start the DBeaver, right-click **Database Navigator**, and click **Create New Connection**. - -#. Search for **Presto** in the search box and double-click the Presto icon. - -#. Click **Edit Driver Settings**. - -#. Set **Class Name** to **io.prestosql.jdbc.PrestoDriver**. - -#. Enter the URL of HetuEngine in the **URL Template** text box. - - URL format: jdbc:presto://*IP address of node 1 where the ZooKeeper service resides*:2181,\ *IP address of node 2 where the ZooKeeper service resides*:2181,\ *IP address of node 3 where the ZooKeeper service resides*:2181/hive/default?serviceDiscoveryMode=zooKeeper&zooKeeperNamespace=hsbroker&zooKeeperServerPrincipal=zookeeper/hadoop.hadoop.com - - Example: **jdbc:presto://192.168.8.37:**\ 2181\ **,192.168.8.38:**\ 2181\ **,192.168.8.39:**\ 2181\ **/hive/default?serviceDiscoveryMode=zooKeeper&zooKeeperNamespace=hsbroker&zooKeeperServerPrincipal=zookeeper/hadoop.hadoop.com** - -#. Click **Add File** and select the obtained JDBC file obtained in :ref:`1.d `. - -#. Click **Connection properties**. On the **Connection properties** tab page, right-click and select **Add new property**. Set parameters by referring to :ref:`Table 1 `. - - .. _mrs_01_2337__en-us_topic_0000001219029577_table1173517153344: - - .. table:: **Table 1** Property information - - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | Parameter | Example Value | - +===================================+===========================================================================================================================================+ - | KerberosPrincipal | zhangsan | - | | | - | | .. note:: | - | | | - | | Human-machine user created in the cluster. For details, see :ref:`Creating a HetuEngine User `. | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | KerberosKeytabPath | D:\\\\user.keytab | - | | | - | | .. note:: | - | | | - | | You need to configure this parameter when using the keytab mode for access. | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | KerberosRemoteServiceName | HTTP | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | SSL | true | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | deploymentMode | on_yarn | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | tenant | default | - | | | - | | .. note:: | - | | | - | | The tenant to which the user belongs needs to be configured in the cluster. | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | user | zhangsan | - | | | - | | .. note:: | - | | | - | | Human-machine user created in the cluster. For details, see :ref:`Creating a HetuEngine User `. | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | password | zhangsan@##65331853 | - | | | - | | .. note:: | - | | | - | | - Password set when a human-machine user is created in the cluster. For details, see :ref:`Creating a HetuEngine User `. | - | | - You need to configure this parameter when using username and password for access. | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | KerberosConfigPath | D:\\\\krb5.conf | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - | SSLTrustStorePath | D:\\\\hetuserver.jks | - +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+ - - :ref:`Figure 1 ` shows an example of the parameter settings. - - .. _mrs_01_2337__en-us_topic_0000001219029577_fig16912205184112: - - .. figure:: /_static/images/en-us_image_0000001438431645.png - :alt: **Figure 1** Example of parameter settings - - **Figure 1** Example of parameter settings - -#. Click **OK**. - -#. Click **Finish**. The HetuEngine is successfully connected. - - .. note:: - - If a message is displayed indicating that you do not have the permission to view the table, configure the permission by referring to :ref:`Configuring Permissions for Tables, Columns, and Databases `. - -Method 2: Using HSBroker to access HetuEngine - -#. .. _mrs_01_2337__en-us_topic_0000001219029577_li29221671357: - - Obtain the JDBC JAR file by referring to :ref:`1 `. - -#. Open DBeaver, choose **Database** > **New Database Connection**, search for PrestoSQL, and open it. +#. Open DBeaver, choose **Database** > **New Database Connection**, search for **PrestoSQL** in **ALL**, and open PrestoSQL. #. Click **Edit Driver Settings** and set parameters by referring to the following table. - .. table:: **Table 2** Driver settings + .. table:: **Table 1** Driver settings - +-----------------------+--------------------------------+----------------------------------------------------------------------------------------------------------------------------+ - | Parameter | Value | Remarks | - +=======================+================================+============================================================================================================================+ - | Class Name | io.prestosql.jdbc.PrestoDriver | / | - +-----------------------+--------------------------------+----------------------------------------------------------------------------------------------------------------------------+ - | URL Template | URL of HetuEngine | URL format: | - | | | | - | | | jdbc:presto://<*HSBrokerIP1:port1*>,<*HSBrokerIP2:port2*>,<*HSBrokerIP3:port3*>/hive/default?serviceDiscoveryMode=hsbroker | - +-----------------------+--------------------------------+----------------------------------------------------------------------------------------------------------------------------+ + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Value | + +===================================+===============================================================================================================================+ + | Class Name | io.prestosql.jdbc.PrestoDriver | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | URL Template | - Accessing HetuEngine using HSFabric | + | | | + | | jdbc:presto://<*HSFabricIP1:port1*>,<*HSFabricIP2:port2*>,<*HSFabricIP3:port3*>/hive/default?serviceDiscoveryMode=hsfabric | + | | | + | | Example: | + | | | + | | jdbc:presto://192.168.42.90:29902,192.168.42.91:29902,192.168.42.92:29902/hive/default?serviceDiscoveryMode=hsfabric | + | | | + | | - Accessing HetuEngine using HSBroker | + | | | + | | jdbc:presto://<*HSBrokerIP1:port1*>,<*HSBrokerIP2:port2*>,<*HSBrokerIP3:port3*>/hive/default?serviceDiscoveryMode=hsbroker | + | | | + | | Example: | + | | | + | | jdbc:presto://192.168.42.90:29860,192.168.42.91:29860,192.168.42.92:29860/hive/default?serviceDiscoveryMode=hsbroker | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ -#. Click **Add File** and upload the JDBC driver package obtained in :ref:`1 `. + .. note:: -#. Click **Find Class**. The driver class is automatically obtained. Click **OK** to complete the driver setting, as shown in :ref:`Figure 2 `. + - To obtain the IP addresses and port numbers of the HSFabric and HSBroker nodes, perform the following operations: - .. _mrs_01_2337__en-us_topic_0000001219029577_fig7280201602711: + a. Log in to FusionInsight Manager. - .. figure:: /_static/images/en-us_image_0000001441091233.png - :alt: **Figure 2** Driver settings + b. Choose **Cluster** > **Services** > **HetuEngine**. Click the **Instance** tab to obtain the service IP addresses of all HSFabric or HSBroker instances. You can select one or more normal instances for connection. - **Figure 2** Driver settings + c. To obtain the port numbers, choose **Cluster** > **Services** > **HetuEngine**. Click **Configurations** then **All Configurations**. -#. On the **Main** tab page for creating a connection, enter the user name and password, and click **Test Connection**. After the connection is successful, click **OK**, and then click **Finish**. + Search for **gateway.port** to obtain the HSFabric port number. The default port number is **29902** in security mode and **29903** in normal mode. + + Search for **server.port** to obtain the HSBroker port number. The default port number is **29860** in security mode and **29861** in normal mode. + + - If the connection fails, disable the proxy and try again. + +#. Click **Add File** and upload the JDBC driver package obtained in :ref:`1 `. + +#. Click **Find Class**. The driver class is automatically obtained. Click **OK** to complete the driver setting. If **io.prestosql:presto-jdbc:RELEASE** exists in **Libraries**, delete it before clicking **Find Class**. - .. figure:: /_static/images/en-us_image_0000001349259429.png - :alt: **Figure 3** Creating a connection + .. figure:: /_static/images/en-us_image_0000001584317997.png + :alt: **Figure 1** Configuring the driver in security mode - **Figure 3** Creating a connection + **Figure 1** Configuring the driver in security mode -#. After the connection is successful, the page shown in :ref:`Figure 4 ` is displayed. +#. Configure the connection. - .. _mrs_01_2337__en-us_topic_0000001219029577_fig18372036443: + - Security mode (clusters with Kerberos authentication enabled): - .. figure:: /_static/images/en-us_image_0000001441208981.png + On the **Main** tab page for creating a connection, enter the user name and password created in :ref:`Prerequisites `, and click **Test Connection**. After the connection is successful, click **OK** then **Finish**. You can click **Connection details (name, type, ... )** to change the connection name. + + + .. figure:: /_static/images/en-us_image_0000001533678044.png + :alt: **Figure 2** Configuring parameters on the Main tab in security mode + + **Figure 2** Configuring parameters on the Main tab in security mode + + - Normal mode (clusters with Kerberos authentication disabled): + + On the **Main** tab page for creating a connection, set JDBC URL and leave Password blank. + + On the page for creating a connection, configure the parameters on the **Driver properties** tab. Set **user** to the user created in :ref:`Prerequisites `. Click **Test Connection**. After the connection is successful, click **OK** then **Finish**. You can click **Connection details (name, type, ... )** to change the connection name. + + + .. figure:: /_static/images/en-us_image_0000001533198872.png + :alt: **Figure 3** Configuring parameters on the Driver properties tab in normal mode + + **Figure 3** Configuring parameters on the Driver properties tab in normal mode + +#. After the connection is successful, the page shown in :ref:`Figure 4 ` is displayed. + + .. _mrs_01_2337__fig296125555813: + + .. figure:: /_static/images/en-us_image_0000001533358396.png :alt: **Figure 4** Successful connection **Figure 4** Successful connection diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_powerbi_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_powerbi_to_access_hetuengine.rst deleted file mode 100644 index 38b80ef..0000000 --- a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_powerbi_to_access_hetuengine.rst +++ /dev/null @@ -1,136 +0,0 @@ -:original_name: mrs_01_24012.html - -.. _mrs_01_24012: - -Using PowerBI to Access HetuEngine -================================== - -Prerequisites -------------- - -- PowerBI has been installed. -- The JDBC JAR file has been obtained. For details, see :ref:`1 `. -- A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. - -Procedure ---------- - -#. Use the default configuration to install **hetu-odbc-win64.msi**. Download link: `https://openlookeng.io/download.html `__. - - - .. figure:: /_static/images/en-us_image_0000001349259001.png - :alt: **Figure 1** Downloading the driver - - **Figure 1** Downloading the driver - -#. Configure data source driver. - - a. Run the following commands in the local command prompt to stop the ODBC service that is automatically started. - - **cd C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\bin** - - **mycat.bat stop** - - |image1| - - b. Replace the JDBC driver. - - Copy the JDBC JAR file obtained in :ref:`1 ` to the **C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\lib** directory and delete the original **hetu-jdbc-1.0.1.jar** file from the directory. - - c. Edit the protocol prefix of the ODBC **server.xml** file. - - Change the property value of **server.xml** in the **C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\conf** directory from **jdbc:lk://** to - - **jdbc:presto://**. - - d. .. _mrs_01_24012__en-us_topic_0000001173470764_li13423101229: - - Configure the connection mode of using the user name and password. - - Create a **jdbc_param.properties** file in a user-defined path, for example, **C:\\hetu**, and add the following content to the file: - - .. code-block:: - - user=admintest - password=admintest@##65331853 - - .. note:: - - **user**: indicates the username of the created human-machine user, for example, **admintest**. - - **password**: indicates the password of the created human-machine user, for example, **admintest@##65331853**. - - e. Run the following commands to restart the ODBC service: - - **cd C:\\Program Files\\openLooKeng\\openLooKeng ODBC Driver 64-bit\\odbc_gateway\\mycat\\bin** - - **mycat.bat restart** - - .. note:: - - The ODBC service must be stopped each time the configuration is modified. After the modification is complete, restart the ODBC service. - -#. On the Windows **Control Panel**, enter **odbc** to search for the ODBC management program. - - |image2| - -#. Choose **Add** > **openLookeng ODBC 1.1 Driver** > **Finish**. - - |image3| - -#. Enter the name and description as shown in the following figure and click **Next**. - - |image4| - -#. Configure parameters by referring to the following figure. Obtain **,,/hive/default?serviceDiscoveryMode=hsbroker** for **Connect URL** by referring to :ref:`2 `. Select the **jdbc_param.properties** file prepared in :ref:`2.d ` for **Connect Config**. Set **User name** to the user name that is used to download the credential. - - |image5| - -#. Click **Test DSN** to test the connection. If the connection is successful and both **Catalog** and **Schema** contain content, the connection is successful. Click **Next**. - - |image6| - - |image7| - -#. Click **Finish**. - - |image8| - -#. To use PowerBI for interconnection, choose **Get data** > **All** > **ODBC** > **Connect**. - - |image9| - -#. Select the data source to be added and click **OK**. - - - .. figure:: /_static/images/en-us_image_0000001349259005.png - :alt: **Figure 2** Adding a data source - - **Figure 2** Adding a data source - -#. (Optional) Enter **User name** and **Password** of the user who downloads the credential, and click **Connect**. - - - .. figure:: /_static/images/en-us_image_0000001296219336.png - :alt: **Figure 3** Entering the database username and password - - **Figure 3** Entering the database username and password - -#. After the connection is successful, all table information is displayed, as shown in :ref:`Figure 4 `. - - .. _mrs_01_24012__en-us_topic_0000001173470764_fig5250802327: - - .. figure:: /_static/images/en-us_image_0000001349059549.png - :alt: **Figure 4** Successful connection - - **Figure 4** Successful connection - -.. |image1| image:: /_static/images/en-us_image_0000001295899864.png -.. |image2| image:: /_static/images/en-us_image_0000001348739725.png -.. |image3| image:: /_static/images/en-us_image_0000001349139417.png -.. |image4| image:: /_static/images/en-us_image_0000001296059704.png -.. |image5| image:: /_static/images/en-us_image_0000001295739900.png -.. |image6| image:: /_static/images/en-us_image_0000001296219332.png -.. |image7| image:: /_static/images/en-us_image_0000001295899860.png -.. |image8| image:: /_static/images/en-us_image_0000001349139413.png -.. |image9| image:: /_static/images/en-us_image_0000001295739896.png diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_tableau_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_tableau_to_access_hetuengine.rst index 61f89b3..4b2abe2 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_tableau_to_access_hetuengine.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_tableau_to_access_hetuengine.rst @@ -9,7 +9,7 @@ Prerequisites ------------- - Tableau has been installed. -- The JDBC JAR file has been obtained. For details, see :ref:`1 `. +- The JDBC JAR file has been obtained. For details, see :ref:`1 `. - A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. Procedure diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_yonghong_bi_to_access_hetuengine.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_yonghong_bi_to_access_hetuengine.rst index d3cb21b..600e72d 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_yonghong_bi_to_access_hetuengine.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_a_third-party_visualization_tool_to_access_hetuengine/using_yonghong_bi_to_access_hetuengine.rst @@ -9,7 +9,7 @@ Prerequisites ------------- - Yonghong BI has been installed. -- The JDBC JAR file has been obtained. For details, see :ref:`1 `. +- The JDBC JAR file has been obtained. For details, see :ref:`1 `. - A human-machine user has been created in the cluster. For details about how to create a user, see :ref:`Creating a HetuEngine User `. Procedure diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/index.rst index ebd3047..6914c4a 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/index.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/index.rst @@ -5,7 +5,7 @@ Using HetuEngine Cross-Domain Function ====================================== -- :ref:`Introduction to HetuEngine Cross-Source Function ` +- :ref:`Introduction to HetuEngine Cross-Domain Function ` - :ref:`HetuEngine Cross-Domain Function Usage ` - :ref:`HetuEngine Cross-Domain Rate Limit Function ` @@ -13,6 +13,6 @@ Using HetuEngine Cross-Domain Function :maxdepth: 1 :hidden: - introduction_to_hetuengine_cross-source_function + introduction_to_hetuengine_cross-domain_function hetuengine_cross-domain_function_usage hetuengine_cross-domain_rate_limit_function diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-source_function.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-domain_function.rst similarity index 97% rename from doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-source_function.rst rename to doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-domain_function.rst index 960e3e7..5f3720d 100644 --- a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-source_function.rst +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_cross-domain_function/introduction_to_hetuengine_cross-domain_function.rst @@ -2,7 +2,7 @@ .. _mrs_01_2334: -Introduction to HetuEngine Cross-Source Function +Introduction to HetuEngine Cross-Domain Function ================================================ HetuEngine provide unified standard SQL to implement efficient access to multiple data sources distributed in multiple regions (or data centers), shields data differences in the structure, storage, and region, and decouples data and applications. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_a_hetuengine_maintenance_instance.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_a_hetuengine_maintenance_instance.rst new file mode 100644 index 0000000..b29fe31 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_a_hetuengine_maintenance_instance.rst @@ -0,0 +1,29 @@ +:original_name: mrs_01_24535.html + +.. _mrs_01_24535: + +Configuring a HetuEngine Maintenance Instance +============================================= + +Scenario +-------- + +A maintenance instance is a special compute instance that performs automatic tasks. Maintenance instances are used to automatically refresh, create, and delete materialized views. + +Only one compute instance can be set as a maintenance instance, and the maintenance instance can also carry original computing services at the same time. + +Prerequisites +------------- + +- You have created a user for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. +- The compute instance to be configured must be in the stopped state. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI. +#. Choose **Cluster** > **Services** > **HetuEngine** to go its service page. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Locate the row that contains the target instance, and click **Configure** in the **Operation** column. +#. Check whether **Maintenance Instance** in **Advanced Configuration** is set to **ON**. If not, change the value to **ON**. +#. After the modification is complete, select **Start Now** and click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_caching_of_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_caching_of_materialized_views.rst new file mode 100644 index 0000000..a856e30 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_caching_of_materialized_views.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_24544.html + +.. _mrs_01_24544: + +Configuring Caching of Materialized Views +========================================= + +After a materialized view is created for an SQL statement, the SQL statement is rewritten to be queried through the materialized view when the SQL statement is executed. If the rewrite cache function is enabled for materialized views, the rewritten SQL statements will be saved to the cache (a maximum of 10,000 records can be saved by default) after the SQL statement is executed for multiple times. When the SQL statement is executed within the cache validity period (24 hours by default), the system obtains the rewritten SQL statement from the cache instead of rewriting the SQL statement. + +You can add user-defined parameters **rewrite.cache.timeout** and **rewrite.cache.limit** to a compute instance to set the cache validity period and the maximum number of rewritten SQL statements that can be saved. + +- When a new materialized view is created or an existing materialized view is deleted, the cache becomes invalid. +- If the materialized view associated with a rewritten SQL query in the cache becomes invalid or is in the **Refreshing** status, the rewritten SQL query will not be used. +- When the cache is used, the executed SQL query cannot be changed. Otherwise, it will be treated as a new SQL query. +- A maximum of 500 materialized views can be rewritten for SQL queries. That is, if the materialized views used during SQL rewriting are included in the 500 materialized views, the views will be rewritten. Otherwise, the views will be executed as common SQL statements. You can refer to :ref:`System level ` to add user-defined parameter **hetu.select.top.materialized.view** to compute instances to change the number of materialized views that can be used. + +Enabling Rewrite Cache for Materialized Views +--------------------------------------------- + +- Session level: + + Run the **set session rewrite_cache_enabled=true** command on the HetuEngine client by referring to :ref:`Using the HetuEngine Client `. + +- .. _mrs_01_24544__en-us_topic_0000001470000412_li2891647173015: + + Enabling the materialized view rewriting capability at the system level: + + #. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI. + #. Choose **Cluster** > **Services** > **HetuEngine** to go its service page. + #. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + #. Check whether the status of the instance to be operated is **STOPPED**. If not, change the status to **STOPPED**. + #. Locate the row that contains the target instance, click **Configure** in the **Operation** column, and add the following customized parameters: + + +-----------------------+-----------------+-------------------------------+--------------------------------------------------------------------------------------+ + | Parameter | Value | Parameter File | Description | + +=======================+=================+===============================+======================================================================================+ + | rewrite.cache.enabled | true | coordinator.config.properties | Enable the rewrite cache function for materialized views. | + +-----------------------+-----------------+-------------------------------+--------------------------------------------------------------------------------------+ + | rewrite.cache.timeout | 86400000 | coordinator.config.properties | - Change the validity period of the rewrite cache. | + | | | | - If this parameter is left blank, **86400000** is used by default. The unit is ms. | + +-----------------------+-----------------+-------------------------------+--------------------------------------------------------------------------------------+ + | rewrite.cache.limit | 10000 | coordinator.config.properties | - Modify the upper limit of the rewrite cache. | + | | | | - If this parameter is left blank, **10000** is used by default. | + +-----------------------+-----------------+-------------------------------+--------------------------------------------------------------------------------------+ + + #. After the parameters are added, select **Start Now** and click **OK**. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_intelligent_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_intelligent_materialized_views.rst new file mode 100644 index 0000000..78d41c8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_intelligent_materialized_views.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_24798.html + +.. _mrs_01_24798: + +Configuring Intelligent Materialized Views +========================================== + +Overview +-------- + +HetuEngine intelligent materialized views provide intelligent precalculation and cache acceleration. The HetuEngine QAS role can automatically extract historical SQL statements for analysis and learning, and automatically generate candidate SQL statements for high-value materialized views based on the revenue maximization principle. In practice, HetuEngine administrators can enable automatic creation and refresh of materialized views by configuring maintenance instances. Service users can configure client sessions to implement automatic rewriting and acceleration based on automatically created materialized views. + +This capability significantly simplifies the use of materialized views and accelerates analysis without interrupting services. HetuEngine administrators can intelligently accelerate high-frequency SQL services by using a small amount of compute and storage resources. In addition, this capability reduces the overall load (such as CPU, memory, and I/O) of the data platform and improves system stability. + +The intelligent materialized view provides the following functions: + +- Automatic recommendation of materialized views +- Automatic creation of materialized views +- Automatic refresh of materialized views +- Automatic deletion of materialized views + +Prerequisites +------------- + +The cluster is running properly and at least one QAS instance has been installed. + +Application Process +------------------- + + +.. figure:: /_static/images/en-us_image_0000001533639950.png + :alt: **Figure 1** Application process of HetuEngine intelligent materialized views + + **Figure 1** Application process of HetuEngine intelligent materialized views + +.. table:: **Table 1** Process description + + +-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | Step | Description | Reference | + +=======================================================+==================================================================================================================================================================================================================================================================================================================================================================================================================================+===================================================================================================================+ + | Enable the materialized view recommendation function. | After this function is enabled, QAS instances automatically recommend SQL statements of high-value materialized views based on users' SQL execution records. You can view the recommended materialized view statements on the materialized view recommendation page on the HSConsole. For details, see :ref:`Viewing Materialized View Recommendation Results `. | :ref:`Enabling Materialized View Recommendation ` | + +-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | Configure a maintenance instance. | After a compute instance is set as a maintenance instance, the maintenance instance automatically creates, refreshes, and deletes the materialized view SQL statements recommended by the materialized view recommendation function. You can view the generated automatic task records on the HetuEngine automation task page. For details, see :ref:`Viewing Automatic Tasks of Materialized Views `. | :ref:`Configuring a HetuEngine Maintenance Instance ` | + +-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ + | Enable rewriting of materialized views. | After rewriting is enabled for materialized views, HetuEngine determines whether the materialized view rewriting requirements are met based on the SQL statements entered by users and converts queries or subqueries that match materialized views into materialized views, avoiding repeated data calculation. | :ref:`Configuring Rewriting of Materialized Views ` | + +-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_recommendation_of_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_recommendation_of_materialized_views.rst new file mode 100644 index 0000000..aca4d17 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_recommendation_of_materialized_views.rst @@ -0,0 +1,73 @@ +:original_name: mrs_01_24776.html + +.. _mrs_01_24776: + +Configuring Recommendation of Materialized Views +================================================ + +Scenario +-------- + +HetuEngine QAS module provides automatic detection, learning, and diagnosis of historical SQL execution records. After the materialized view recommendation function is enabled, the system can automatically learn and recommend the most valuable materialized view SQL statements, enabling HetuEngine to have the automatic precomputation acceleration capability. In related scenarios, the online query efficiency is improved by multiple times, and the system load pressure is effectively reduced. + +Prerequisites +------------- + +- The cluster is running properly and at least one QAS instance has been installed. +- You have created a user for accessing the HetuEngine web UI, for example, **Hetu_user**. For details, see :ref:`Creating a HetuEngine User `. + +.. _mrs_01_24776__en-us_topic_0000001521080921_section109434212018: + +Enabling Materialized View Recommendation +----------------------------------------- + +#. Log in to FusionInsight Manager as user **Hetu_user**. + +#. Choose **Cluster** > **Services** > **HetuEngine** and then choose **Configurations** > **All Configurations**. In the navigation tree, choose **QAS(Role)** > **Materialized View Recommendation**. Set materialized view recommendation parameters by referring to :ref:`Table 1 ` and retain the default values for other parameters. + + .. _mrs_01_24776__en-us_topic_0000001521080921_table49551729155011: + + .. table:: **Table 1** Materialized view recommendation parameters + + +--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Example Value | Description | + +================================+===============+========================================================================================================================================================================+ + | qas.enable.auto.recommendation | true | Whether to enable materialized view recommendation. The default value is **false**. | + +--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | qas.sql.submitter | default,zuhu1 | Name of the tenant for which the materialized view recommendation function is enabled. Use commas (,) to separate multiple tenants. | + +--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | qas.schedule.fixed.delay | 1d | Interval for recommending materialized views. Once a day is recommended. | + +--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | qas.threshold.for.mv.recommend | 0.05 | Filtering threshold of materialized view recommendation. The value ranges from **0.001** to **1**. You are advised to adjust the value based on the site requirements. | + +--------------------------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Click **Save**. + +#. Click **Instance**, select all QAS instances, click **More**, and select **Restart Instance**. In the displayed dialog box, enter the password to restart all QAS instances for the parameters to take effect. + +.. _mrs_01_24776__en-us_topic_0000001521080921_section051233712276: + +Viewing Materialized View Recommendation Results +------------------------------------------------ + +#. Log in to FusionInsight Manager as user **Hetu_user**. + +#. Choose **Cluster** > **Services** > **HetuEngine** to go its service page. + +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + +#. Choose **SQL O&M** > **Automatic MV Recommendation**. You can search for materialized views by tenant, status, recommendation period, and materialized view name. Fuzzy search is supported. You can export the recommendation result of a specified materialized view. + + The status of a materialized view task can be: + + .. table:: **Table 2** Status of a materialized view task + + ============= =============== =========== ======================= + Status Name Description Status Name Description + ============= =============== =========== ======================= + To Be Created To be created Deleting Terminating + Creating Creating Deleted Terminated + Created Created Planning Being planned + Failed Creation failed Aborted Aborted + Updating Updating Duplicated Repeated recommendation + ============= =============== =========== ======================= diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_rewriting_of_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_rewriting_of_materialized_views.rst new file mode 100644 index 0000000..e1b08f2 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_rewriting_of_materialized_views.rst @@ -0,0 +1,172 @@ +:original_name: mrs_01_24543.html + +.. _mrs_01_24543: + +Configuring Rewriting of Materialized Views +=========================================== + +Enabling Rewriting of Materialized Views +---------------------------------------- + +HetuEngine provides the materialized view rewriting capability at the system or session level. + +- Enabling the materialized view rewriting capability at the session level: + + Run the **set session materialized_view_rewrite_enabled=true** command on the HetuEngine client by referring to HetuEngine. + +- Enabling the materialized view rewriting capability at the system level: + + #. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI. + #. Choose **Cluster** > **Services** > **HetuEngine** to go its service page. + #. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. + #. Check whether the status of the instance to be operated is **STOPPED**. If not, change the status to **STOPPED**. + #. Locate the row that contains the target instance, click **Configure** in the **Operation** column, and add the following customized parameters: + + ================================= ===== ============================= + Parameter Value Parameter File + ================================= ===== ============================= + materialized.view.rewrite.enabled true coordinator.config.properties + materialized.view.rewrite.timeout 5 coordinator.config.properties + ================================= ===== ============================= + + .. note:: + + This step applies to MRS 3.2.0 or later. + + - **materialized.view.rewrite.timeout**: timeout interval for overwriting a materialized view, in seconds. The recommended value is 5 seconds. Materialized view overwrite takes some time. This parameter can be added to limit the performance loss caused by overwrite. After materialized view overwrite times out, the original SQL statement is executed. + - To enable the materialized view function at the session level and enable the timeout control for materialized view overwrite, run the **set session materialized_view_rewrite_timeout = 5** command first. + + #. After the parameters are added, select **Start Now** and click **OK**. + +Scope of Materialized View Rewriting +------------------------------------ + +- Supported materialized view types + + BOOLEAN, DECIMAL, DOUBLE, REAL/FLOAT, INT, BIGINT, SMALLINT, TINYINT, CHAR/VARCHAR, DATE, TIME, TIMESTAMP, INTERVAL YEAR TO MONTH, INTERVAL DAY TO SECOND, BINARY/VARBINARY, and UUID. + +- Supported functions for materialized view rewriting + + - Conversion function: Only the CAST function is supported. + - String function: All string functions are supported, including char_length, character_length, chr, codepoint, decode, encode, find_in_set, format_number, locate, hamming_distance, instr, levenshtein, levenshtein_distance, ltrim, lpad, octet_length, position, quote, and repeat2. + - Mathematical operator: All mathematical operators are supported. + - Aggregate function: **COUNT**, **SUM**, **MIN**, **MAX**, **AVG**, **LEAD**, **LAG**, **FIRST_VALUE**, **LAST_VALUE**, **COVAR_POP**, **COVAR_SAMP**, **REGR_SXX**, **REGR_SYY**, **STDDEV_POP**, **STDDEV_SAMP**, **VAR_POP**, **VAR_SAMP**, **ROW_NUMBER**, **RANK**, **PERCENT_RANK**, **DENSE_RANK**, and **CUME_DIST** are supported. + + .. important:: + + In the following scenarios, materialized views cannot be used to rewrite SQL queries that contain functions: + + - SQL queries contain parameterless functions. + - SQL queries contain functions supported by HetuEngine that obtain different types of return values based on parameter types. + - SQL queries contain nested functions or contain functions that throw exceptions and cause overwrite failures. + +Example of Materialized View Rewriting Scenarios +------------------------------------------------ + +The core principle of materialized view rewriting is that the data of the logically created materialized view must contain the data to be queried in the future query statements or all the data to be included in the subquery in the future query. It is recommended that you enable the automatic creation of materialized views to create materialized views. The following is an example of some scenarios: + +In the SQL statement example for creating a materialized view, **CREATE MATERIALIZED VIEW** *xxx* **WITH(**\ *xxx*\ **) AS** is omitted. For details about the complete statement template, see :ref:`Table 1 `. + +.. table:: **Table 1** Example of materialized view rewriting scenarios + + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario | Description | | SQL Statement Example for Creating a Materialized View | SQL Statement Example for a User Query | SQL Statement Rewritable | Remarks | + +=======================================================================================================================+==================================================================================================+==================================================================================+=======================================================================================================================================================================+=======================================================================================================================================================================+==========================+==========================================================================================================================================================================================================================================================+ + | Full table query | Basic full table query scenario | | select \* from tb_a; | select \* from tb_a; | No | Creating a materialized view for full table scanning is meaningless and is not supported. | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Column query | Basic column query scenario | | select col1,col2,col3 from tb_a; | select col1,col2,col3 from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | User query renaming | | select col1 from tb_a; | select col1 as a from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1,col2,col3 from tb_a; | select col1 as a,col2 as b,col3 as c from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Mathematical expression | | select col1*col2 from tb_a; | select col2*col1 from tb_a; | Yes | The two columns must have the same type. | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Source column used by a materialized view; and **cast** is used for user query. | | select col1,col2 from tb_a; | select cast(col1 as varchar),col2 from tb_a; | No | Original data columns used by a materialized view, which are not rewritten if no filter criteria are configured in the functions used for user query. | + | | | | | | | | + | | | | | | | Original data columns used by a materialized view, which can be rewritten if the original data columns and filter criteria are used for user query. | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | case when scenario | | select col1, (case col2 when 1 then 'b' when 2 'a' end) as col from tb_a; | select col1, (case col2 when 1 then 'b' when 2 'a' end) as col from tb_a; | No | The case when scenario is not supported in query columns. | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | String function | | select col13 from tb_a; | select length(col13) from tb_a; | No | All string functions use the original table data to create materialized views. The materialized views are not rewritten when queries without filter criteria configured. | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select length(col13) from tb_a; | select length(col13) from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Aggregate function column query | count | Materialized views and user queries use **count**. | select count(col1) from tb_a; | select count(col1) from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | Source data used by a materialized view, and **count** is used for user queries. | select col1 from tb_a; | select count(col1) from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | sum | **sum** is used for materialized views and user queries. | select sum(col1) from tb_a; | select sum(col1) from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | Source data used by a materialized view, and **sum** is used for user queries. | select col1 from tb_a; | select sum(col1) from tb_a; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Querying information by specifying filter criteria | where filtering | Maximum range of materialized views (<) | select col1 from tb_a; | select col1 from tb_a where col1<11; | Yes | ``-`` | + | | | | | | | | + | (The core is that the data in materialized views is logically the same as or more than that in query SQL statements.) | | | | | | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | The materialized view range is greater than the user query range (<). | select col1 from tb_a where col1<50; | select col1 from tb_a where col1<45; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a where col1<50; | select col1 from tb_a where col1<=45; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a where col1<50; | select col1 from tb_a where col1 between 21 and 29; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | The materialized view range is equal to the user query range (>). | select col1 from tb_a where col1<50; | select col1 from tb_a where col1<50; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | The materialized view range is greater than the user query range (and). | select col1 from tb_a where col1<60 and col1>30; | select col1 from tb_a where col1<55 and col1>30; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a where col1<60 and col1>30; | select col1 from tb_a where col1 between 35 and 55; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a where col1<60 and col1>30; | select col1 from tb_a where (col1<55 and col1>30) and col1 = 56; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | where nested subquery | Subquery source table as a materialized view | select col1 from tb_a; | select count(col1) from tb_a where col1=(select min(col1) from tb_a); | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | Subquery as a materialized view | select min(col1) from tb_a; | select count(col1) from tb_a where col1=(select min(col1) from tb_a); | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | Parent query source table as a materialized view | select col1 from tb_a where col1=(select min(col1) from tb_a); | select count(col1) from tb_a where col1=(select min(col1) from tb_a); | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | Parent query as a materialized view | select count(col1) from tb_a where col1=(select min(col1) from tb_a); | select count(col1) from tb_a where col1=(select min(col1) from tb_a); | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | limit | limit in a query | select col1 from tb_a; | select col1 from tb_a limit 5; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a limit 5; | select col1 from tb_a limit 5; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a limit 5; | select col1 from tb_a; | No | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | limit combined with order by | select col1 from tb_a; | select col1 from tb_a order by col1 limit 5; | Yes | Do not use **order by** when creating a materialized view. If the query SQL contains **order by** or **limit**, remove it from the SQL statements for creating a materialized view. | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a order by col1; | select col1 from tb_a order by col1 limit 5; | Yes | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a order by col1 limit 5; | select col1 from tb_a order by col1 limit 5; | No | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | having filtering | Maximum range of materialized views (<) | select col1 from tb_a; | select col1 from tb_a group by col1 having col1 <11; | Yes | group by + having: The scenario of having is different from that of where. The having condition cannot be compensated. The materialized view SQL statements must not have the having condition or must be the same as that of user query SQL statements. | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | The materialized view range is greater than the user query range (<). | select col1 from tb_a group by col1 having col1<50; | select col1 from tb_a group by col1 having col1<45; | No | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a group by col1 having col1<50; | select col1 from tb_a group by col1 having col1<=45; | No | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a group by col1 having col1<50; | select col1 from tb_a group by col1 having col1=45; | No | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col1 from tb_a group by col1 having col1<50; | select col1 from tb_a group by col1 having col1 between 21 and 29; | No | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | The materialized view range is greater than the user query range (<). | select col1 from tb_a group by col1 having col1<50; | select col1 from tb_a group by col1 having col1<50; | Yes | | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | JOIN association query | Two subqueries as a materialized view | | select col1,col3 from tb_a where col1<11; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select cast(col2 as varchar) col2,col3 from tb_b; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Parent query as a materialized view | | with t1 as (select col1,col3 from tb_a),t2 as (select col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Aggregate + JOIN query | Source table data as a materialized view | | select col1,col3 from tb_a; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select col2,col3 from tb_b; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Subquery as a materialized view | | select col1,col3 from tb_a where col1<11; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | | | select cast(col2 as varchar) col2,col3 from tb_b; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Parent query (whose subqueries use the source table, non-aggregate query) as a materialized view | | with t1 as (select col1,col3 from tb_a),t2 as (select col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Parent query (non-aggregate query) as a materialized view | | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select col1,col2 from t1 join t2 on t1.col3=t2.col3; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Parent query as a materialized view | | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | with t1 as (select col1,col3 from tb_a where col1<11),t2 as (select cast(col2 as varchar) col2,col3 from tb_b) select count(col1) from t1 join t2 on t1.col3=t2.col3; | Yes | ``-`` | + +-----------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_the_validity_period_and_data_update_of_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_the_validity_period_and_data_update_of_materialized_views.rst new file mode 100644 index 0000000..c9137e9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/configuring_the_validity_period_and_data_update_of_materialized_views.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_24546.html + +.. _mrs_01_24546: + +Configuring the Validity Period and Data Update of Materialized Views +===================================================================== + +Validity Period of Materialized Views +------------------------------------- + +The **mv_validity** field for creating a materialized view indicates the validity period of the materialized view. HetuEngine allows you to rewrite the SQL statements using only the materialized views within the validity period. + +Refreshing Materialized View Data +--------------------------------- + +If data needs to be updated periodically, you can use either of the following methods to periodically refresh the materialized views: + +- Manually refreshing a materialized view + + Run the **refresh materialized view** ** command on the HetuEngine client by referring to HetuEngine, or run the **refresh materialized view** ** command using JDBC in the service program to manually update the database. + +- Automatically refreshing a materialized view + + Use **create materialized view** to create a materialized view that can be automatically refreshed. + + .. note:: + + - To enable the automatic refresh capability of the materialized views, you must set a computing instance as the maintenance instance must exist. For details, see :ref:`Configuring a HetuEngine Maintenance Instance `. + - If there are too many materialized views, some materialized views may expire due to too long waiting time. + - The automatic refresh function does not automatically refresh materialized views in the **disable** status. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/index.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/index.rst new file mode 100644 index 0000000..250cf94 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/index.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_24533.html + +.. _mrs_01_24533: + +Using HetuEngine Materialized Views +=================================== + +- :ref:`Overview of Materialized Views ` +- :ref:`SQL Statement Example of Materialized Views ` +- :ref:`Configuring a HetuEngine Maintenance Instance ` +- :ref:`Configuring Rewriting of Materialized Views ` +- :ref:`Configuring Recommendation of Materialized Views ` +- :ref:`Configuring Caching of Materialized Views ` +- :ref:`Configuring the Validity Period and Data Update of Materialized Views ` +- :ref:`Configuring Intelligent Materialized Views ` +- :ref:`Viewing Automatic Tasks of Materialized Views ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + overview_of_materialized_views + sql_statement_example_of_materialized_views + configuring_a_hetuengine_maintenance_instance + configuring_rewriting_of_materialized_views + configuring_recommendation_of_materialized_views + configuring_caching_of_materialized_views + configuring_the_validity_period_and_data_update_of_materialized_views + configuring_intelligent_materialized_views + viewing_automatic_tasks_of_materialized_views diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/overview_of_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/overview_of_materialized_views.rst new file mode 100644 index 0000000..84b0951 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/overview_of_materialized_views.rst @@ -0,0 +1,88 @@ +:original_name: mrs_01_24541.html + +.. _mrs_01_24541: + +Overview of Materialized Views +============================== + +Materialized Views applies to MRS 3.2.0 or later. + +Background +---------- + +HetuEngine provides the materialized view capability. It enables you to pre-compute frequently accessed and time-consuming operators (such as join and aggregation operators) through materialized views. In this way, queries or subqueries that can match the materialized views are converted into corresponding materialized views, avoiding repeated data computing and improving the query response efficiency. + +A materialized view is typically created based on the results of queries that aggregate and join multiple data tables. + +Materialized views support query rewrite. It is an optimization technique that converts query statements compiled based on an original table into equivalent requests for querying one or more materialized view statements. The following is an example of the SQL statement of a materialized view: + +.. code-block:: + + create materialized view mv.default.mv1 with(mv_storage_table='hive.default.mv1') AS select id from hive.mvschema.t1; + +The actual data of the materialized view is stored in the **hive.default.mv1** table. During query rewriting, the SQL statement **select id from hive.mvschema.t1** is rewritten as the table for querying the materialized view, that is, **select id from hive.default.mv1**. + +Scenario +-------- + +Compared with common views, materialized views occupy storage resources and cause data delay because of actual data storage and pre-computation. Therefore, materialized views are recommended in the following scenarios: + +- Frequently executed queries are required. +- Queries involve time-consuming operations like aggregation and join operations. +- A certain delay is allowed for the query result data. +- Materialized views can only be connected to co-deployed Hive and external Hive data sources. Data source tables are stored in ORC or PARQUET format. Cross-source and cross-domain scenarios are not supported. + +Permission Introduction +----------------------- + +:ref:`Table 1 ` lists materialized view permissions. Permission control for materialized views depends on the Ranger. If Ranger authentication is disabled, permissions may become invalid. + +.. _mrs_01_24541__en-us_topic_0000001520999729_table12606432191615: + +.. table:: **Table 1** HetuEngine materialized view permissions + + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Operation | Permission on catalog mv | Permission on Tables Stored in MVs | Permission on Original Physical Table | + +======================================================================================+===========================+====================================+=======================================+ + | Creating a materialized view | Table creation permission | Table creation permission | Column query permission | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Deleting a materialized view | Table deletion permission | N/A | N/A | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Refreshing a materialized view | Table update permission | N/A | Column query permission | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Overwriting query statements using materialized views | N/A | N/A | Column query permission | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Using materialized views to rewrite the execution plan of query statements (EXPLAIN) | N/A | Column query permission | Column query permission | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Querying a materialized view | Column query permission | N/A | N/A | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Querying physical tables of materialized and non-materialized views | Column query permission | N/A | Column query permission | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Viewing a materialized view | N/A | N/A | N/A | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + | Viewing the statement for creating a materialized view | N/A | N/A | N/A | + +--------------------------------------------------------------------------------------+---------------------------+------------------------------------+---------------------------------------+ + +How to Use +---------- + +.. table:: **Table 2** Introduction to materialized views + + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ + | Phase | Description | Reference | + +=======================================================================+===========================================================================================================================================================================================================================================================================================+=============================================================================================+ + | SQL statement example of materialized views | This section describes the operations supported by materialized views, including creating, listing, and querying materialized views. | :ref:`SQL Statement Example of Materialized Views ` | + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ + | Configuring rewriting of materialized views | Enables the materialized view capability for faster query response. | :ref:`Configuring Rewriting of Materialized Views ` | + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ + | Configuring recommendation of materialized views | Automatically learns and recommends materialized view SQL statements that are most valuable to services, improving online query efficiency and reducing system load pressure. | :ref:`Configuring Recommendation of Materialized Views ` | + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ + | Configuring caching of materialized views | The SQL statements that have been executed and rewritten for multiple times can be saved to the cache. When the SQL statements are executed again, the rewritten SQL statements are directly obtained from the cache instead of rewriting the SQL statements, improving query efficiency. | :ref:`Configuring Caching of Materialized Views ` | + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ + | Configuring the validity period and data update of materialized views | - Configures the validity period of the materialized view. Currently, only the materialized view within the validity period is automatically overwritten. | :ref:`Configuring the Validity Period and Data Update of Materialized Views ` | + | | - Configures periodic data update. Materialized views can be refreshed manually or automatically. | | + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ + | Configuring intelligent materialized views | Provides automatic creation of materialized views. You do not need to manually execute SQL statements to create materialized views (recommended). | :ref:`Configuring Intelligent Materialized Views ` | + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ + | Viewing automatic tasks of materialized views | Views the task execution status to evaluate the cluster health status. | :ref:`Viewing Automatic Tasks of Materialized Views ` | + +-----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/sql_statement_example_of_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/sql_statement_example_of_materialized_views.rst new file mode 100644 index 0000000..bd8d814 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/sql_statement_example_of_materialized_views.rst @@ -0,0 +1,104 @@ +:original_name: mrs_01_24545.html + +.. _mrs_01_24545: + +SQL Statement Example of Materialized Views +=========================================== + +For details about the SQL statements for materialized views, see :ref:`Table 1 `. + +.. _mrs_01_24545__en-us_topic_0000001521198793_table167251859399: + +.. table:: **Table 1** Operations on materialized views + + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Operation | Function | SQL Statement Example of Materialized View | Remarks | + +===============================================================================================+=================================================================================================+========================================================================================================================================================================================================================================================+=====================================================================================================================================================================================================================================+ + | Creating a materialized view | Creat a materialized view that never expires. | create materialized view mv.default.mv1 with(mv_storage_table='hive.default.mv11') AS select id from hive.mvschema.t1; | - **mv_storage_table** specifies the location where the materialized view data is materialized into a physical table. | + | | | | - When creating a materialized view, you must specify **mv** for the catalog. You can also create a schema. | + | | | | - For the **AS SELECT** clause, pay attention to the items listed in :ref:`Creating the AS SELECT Clause for a Materialized View `. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Create a materialized view that is valid for one day and cannot automatically refresh. | create materialized view mv.default.mv1 with(mv_storage_table='hive.default.mv11', mv_validity = '24h') AS select id from hive.mvschema.t1; | **mv_validity** specifies the validity of a materialized view. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Create a materialized view that automatically refreshes data every hour. | create materialized view mv.default.mv1 with(mv_storage_table='hive.default.mv1', need_autorefresh = true, mv_validity = '1h', start_refresh_ahead_of_expiry = 0.2, refresh_priority = 3, refresh_duration = '5m') AS select id from hive.mvschema.t1; | - **need_autorefresh**: indicates whether to enable automatic refresh. | + | | | | - **start_refresh_ahead_of_expiry**: a refresh task is triggered for the materialized view at the time specified by **mv_validity\*** (**1-start_refresh_ahead_of_expiry**) so that the task status is changed to **Refreshable**. | + | | | | - **refresh_priority** specifies the priority of refreshing tasks. | + | | | | - **refresh_duration** specifies the maximum duration of a refreshing task. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Showing materialized views | Show all MVs whose catalog name is **mv** and schema name is **mvschema**. | show materialized views from mvschema; | **mvschema** indicates the schema name. The value of **catalog** is fixed to **mv**. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | | Use the LIKE clause to filter the materialized views whose names meet the rule expression. | show MATERIALIZED VIEWs in mvschema tables like ``*`` mvtb_0001; | **mvschema** indicates the schema name. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Querying the statement for creating a materialized view | Query the statement for creating the the materialized view of **mv.default.mv1**. | show create materialized view mv.default.mv1; | **mv1** indicates the name of the materialized view. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Querying a materialized view | Query data in **mv.default.mv1**. | select \* from mv.default.mv1; | **mv1** indicates the name of the materialized view. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Refreshing a materialized view | Refresh the materialized view of **mv.default.mv1**. | refresh materialized view mv.default.mv1; | ``-`` | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Modifying the properties of materialized views | Modifying the properties of the **mv.default.mv1** materialized view | Alter materialized view mv.mvtestprop.pepa_ss set PROPERTIES(refresh_priority = 2); | **refresh_priority = 2** is the property of the materialized view. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Changing the status of materialized views | Changing the status of the **mv.default.mv1** materialized view | alter materialized view mv.default.mv1 set status SUSPEND; | **SUSPEND** is the status of the materialized view. The status can be: | + | | | | | + | | | | - **SUSPEND**: The materialized view is suspended. The suspended materialized view is not rewritten. | + | | | | - **ENABLED**: The materialized view is available. | + | | | | - **Refreshing**: The materialized view data is being refreshed and cannot be rewritten. | + | | | | - **DISABLED**: The materialized view has been disabled. | + | | | | - **UNKNOWN**: The cache is inconsistent with the database. You are advised to run the **refresh catalog mv** command. | + | | | | | + | | | | Manual refresh supports only the conversion between **ENABLED** and **SUSPEND**. | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Deleting a materialized view | Delete the materialized view of **mv.default.mv1**. | drop materialized view mv.default.mv1; | ``-`` | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Enabling materialized view rewriting capability to optimize SQL statements | Enabling materialized view rewriting capability at the session level to optimize SQL statements | set session materialized_view_rewrite_enabled=true; | ``-`` | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Verifying whether SQL statements can be optimized by rewriting a query to a materialized view | Verify whether the SELECT statement can be rewritten and optimized by **mv.default.mv1**. | verify materialized view mvname(mv.default.mv1) originalsql select id from hive.mvschema.t1; | ``-`` | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Enabling the specified materialized view at the SQL level to optimize the SQL statements | Forcibly use **mv.default.mv1** for SQL statement optimization in queries. | /``*+`` REWRITE(mv.default.mv1) \*/ select id from hive.mvschema.t1; | ``-`` | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Disabling materialized views at the SQL level to optimize the SQL statements | Do not use materialized views for SQL statement optimization in queries. | /``*+`` NOREWRITE \*/ select id from hive.mvschema.t1; | ``-`` | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Refreshing the metadata cache of materialized views | Synchronize the metadata cache of materialized views between tenants. | refresh catalog mv; | ``-`` | + +-----------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +.. _mrs_01_24545__en-us_topic_0000001521198793_section1215122434112: + +Creating the AS SELECT Clause for a Materialized View +----------------------------------------------------- + +The **AS SELECT** clause for creating materialized views cannot contain reserved keywords in Calcite SQL parsing and rewriting functions, such as **default**. To use reserved keywords in the **AS SELECT** clause, use either of the following solutions: + +- When creating MVs and executing original queries, you need to add double quotes to the default schema name. + + The following uses reserved keyword **default** in the **AS SELECT** clause as an example: + + Creating a materialized view + + .. code-block:: + + CREATE MATERIALIZED VIEW mv.default.mv1 WITH(mv_storage_table='hive.default.mv11') AS SELECT id FROM hive."default".t1; + + SELECT query + + .. code-block:: + + SELECT id FROM hive."default".t1; + +- Set the corresponding catalog and schema at the Session level, rather than passing fully qualified names in the query. + + For example, set **catalogname** to **hive** and **schemaname** to **default**. + + .. code-block:: + + USE hive.default; + + Creating a materialized view + + .. code-block:: + + CREATE MATERIALIZED VIEW mv.default.mv1 WITH(mv_storage_table='hive.default.mv11') AS SELECT id FROM t1; + + SELECT query + + .. code-block:: + + SELECT id FROM t1; diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/viewing_automatic_tasks_of_materialized_views.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/viewing_automatic_tasks_of_materialized_views.rst new file mode 100644 index 0000000..be5094e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_materialized_views/viewing_automatic_tasks_of_materialized_views.rst @@ -0,0 +1,51 @@ +:original_name: mrs_01_24505.html + +.. _mrs_01_24505: + +Viewing Automatic Tasks of Materialized Views +============================================= + +Scenario +-------- + +View the status and execution result of an automatic HetuEngine task on HSConsol. You can periodically view the task execution status and evaluate the cluster health status. + +Prerequisites +------------- + +You have created a user for accessing the HetuEngine web UI. For details, see :ref:`Creating a HetuEngine User `. + +Procedure +--------- + +#. Log in to FusionInsight Manager as a user who can access the HetuEngine web UI and choose **Cluster** > **Services** > **HetuEngine**. The **HetuEngine** service page is displayed. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Choose **Automated Tasks**. On the displayed page, you can search for tasks by **Task Type**, **Status**, **Additional Info**, **Start Time**, or **End Time**. Fuzzy search is supported. + + +------------------+----------------------------------------------------------------------------------------------+ + | Search Criterion | Description | + +==================+==============================================================================================+ + | Task Type | **Refresh of materialized view**: refreshes materialized views. | + +------------------+----------------------------------------------------------------------------------------------+ + | | **Recommendation of materialized view**: recommends materialized views. | + +------------------+----------------------------------------------------------------------------------------------+ + | | **Auto create materialized view**: automatically creates materialized views. | + +------------------+----------------------------------------------------------------------------------------------+ + | | **Drop auto created and stale materialized view**: automatically deletes materialized views. | + +------------------+----------------------------------------------------------------------------------------------+ + | Status | success | + +------------------+----------------------------------------------------------------------------------------------+ + | | failed | + +------------------+----------------------------------------------------------------------------------------------+ + | | waiting | + +------------------+----------------------------------------------------------------------------------------------+ + | | running | + +------------------+----------------------------------------------------------------------------------------------+ + | | skip_execute | + +------------------+----------------------------------------------------------------------------------------------+ + | | time out | + +------------------+----------------------------------------------------------------------------------------------+ + | | unknown | + +------------------+----------------------------------------------------------------------------------------------+ + +#. Click **Search**. The tasks that match the search condition are displayed. You can click the **Link** on the **Task Details** column to show the specified task information. diff --git a/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_sql_diagnosis.rst b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_sql_diagnosis.rst new file mode 100644 index 0000000..c2f7e5d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hetuengine/using_hetuengine_sql_diagnosis.rst @@ -0,0 +1,52 @@ +:original_name: mrs_01_24838.html + +.. _mrs_01_24838: + +Using HetuEngine SQL Diagnosis +============================== + +This section applies to MRS 3.2.0 or later. + +Scenario +-------- + +The HetuEngine QAS module provides automatic detection, learning, and diagnosis of historical SQL execution records for more efficient online SQL O&M and faster online SQL analysis. After SQL diagnosis is enabled, the system provides the following capabilities: + +- Automatically detects and displays tenant-level and user-level SQL execution statistics in different time periods to cluster administrators, helping them quickly predict service running status and potential risks. +- Automatically diagnoses large SQL statements, slow SQL statements, and related submission information, displays the information in multiple dimensions for cluster administrators, and provides diagnosis and optimization suggestions for these statements. + +Prerequisites +------------- + +- The cluster is running properly and at least one QAS instance has been installed. +- You have created a user for accessing the HetuEngine web UI, for example, **Hetu_user**. For details, see :ref:`Creating a HetuEngine User `. + +Enabling SQL Diagnosis +---------------------- + +The SQL diagnosis function of HetuEngine is enabled by default. You can perform the following steps to configure other common parameters or retain the default settings: + +#. Log in to FusionInsight Manager as user **Hetu_user**. +#. Choose **Cluster** > **Services** > **HetuEngine**. Click **Configurations** then **All Configurations**, click **QAS(Role)**, and select **SQL Diagnosis**. If **qas.sql.auto.diagnosis.enabled** is set to **true**, the SQL diagnosis function is enabled. In this case, you can configure recommended SQL diagnosis parameters based on service requirements. +#. Click **Save**. +#. Click **Instance**, select all QAS instances, click **More**, and select **Restart Instance**. In the displayed dialog box, enter the password to restart all QAS instances for the parameters to take effect. + +Viewing SQL Diagnosis Results +----------------------------- + +#. Log in to FusionInsight Manager as user **Hetu_user**. +#. Choose **Cluster** > **Services** > **HetuEngine** to go its service page. +#. In the **Basic Information** area on the **Dashboard** page, click the link next to **HSConsole WebUI**. The HSConsole page is displayed. +#. Choose **SQL O&M** to view SQL diagnosis results. + + - On the **Overview** page, you can view the overall running status of historical tasks, including the query duration distribution chart by segment, query user distribution chart, total submitted SQL queries, SQL execution success rate, average SQL query response time, number of queries, average execution time, and average waiting time. + - Choose **SQL Query Diagnostics** > **Slow Query Distribution** to view the slow query distribution of historical tasks, including: + + - Slow SQL statistics: collects statistics on the number of slow queries (the query time is greater than the slow query threshold) submitted by each tenant. + - Top users with the maximum slow query requests: collects statistics on slow query statistics of each user. The statistics can be sorted in a list and exported. + + - Choose **SQL Query Diagnostics** > **Slow Queries** to view the slow query list, diagnosis results, and optimization suggestions of historical tasks. Query results can be exported. + + .. note:: + + The validity period of historical statistics depends on the JVM memory size of HSConsole instances and cannot exceed 60 days. diff --git a/doc/component-operation-guide-lts/source/using_hive/connecting_hive_with_external_rds.rst b/doc/component-operation-guide-lts/source/using_hive/connecting_hive_with_external_rds.rst index 0ce8b46..7768053 100644 --- a/doc/component-operation-guide-lts/source/using_hive/connecting_hive_with_external_rds.rst +++ b/doc/component-operation-guide-lts/source/using_hive/connecting_hive_with_external_rds.rst @@ -1,6 +1,6 @@ -:original_name: mrs_01_1751.html +:original_name: mrs_01_17511.html -.. _mrs_01_1751: +.. _mrs_01_17511: Connecting Hive with External RDS ================================= @@ -105,7 +105,7 @@ Connecting Hive with External RDS #. Log in to each MetaStore background node and check whether the local directory **/opt/Bigdata/tmp** exists. - - If yes, go to :ref:`8 `. + - If yes, go to :ref:`8 `. - If no, run the following commands to create one: @@ -113,6 +113,6 @@ Connecting Hive with External RDS **chmod 755 /opt/Bigdata/tmp** -#. .. _mrs_01_1751__en-us_topic_0000001219350615_li24241321154318: +#. .. _mrs_01_17511__en-us_topic_0000001219350615_li24241321154318: Save the configuration. Choose **Dashboard** > **More** > **Restart Service**, and enter the password to restart the Hive service. diff --git a/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/importing_and_exporting_hive_databases.rst b/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/importing_and_exporting_hive_databases.rst new file mode 100644 index 0000000..809f21b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/importing_and_exporting_hive_databases.rst @@ -0,0 +1,112 @@ +:original_name: mrs_01_24742.html + +.. _mrs_01_24742: + +Importing and Exporting Hive Databases +====================================== + +Scenario +-------- + +In big data application scenarios, Hive databases and all tables in these databases are usually migrated to another cluster. You can run the Hive database **export** and **import** commands to migrate a complete database. + +.. note:: + + This section applies to MRS 3.2.0 or later. + + The Hive database import and export function does not support importing or exporting encrypted tables, transaction tables, HBase external tables, Hudi tables, view tables, and materialized view tables. + +Prerequisites +------------- + +- If Hive databases are imported or exported across clusters and Kerberos authentication is enabled for both the source and destination clusters, configure cross-cluster mutual trust. + +- If you want to run the **dump** or **load** command to import or export databases created by other users, grant the corresponding database permission to the users. + + - If Ranger authentication is not enabled for the cluster, log in to FusionInsight Manager to grant the administrator rights of the role to which the user belongs. For details, see section :ref:`Creating a Hive Role `. + - If Ranger authentication is enabled for the cluster, grant users the permission to dump and load databases. For details, see :ref:`Adding a Ranger Access Permission Policy for Hive `. + +- Enable the inter-cluster copy function in the source cluster and destination cluster. + +- Configure the HDFS service address parameter for the source cluster to access the destination cluster. + + Log in to FusionInsight Manager of the source cluster, click **Cluster**, choose **Services** > **Hive**, and click **Configuration**. On the displayed page, search for **hdfs.site.customized.configs**, add custom parameter **dfs.namenode.rpc-address.haclusterX**, and set its value to *Service IP address of the active NameNode instance node in the destination cluster*:*RPC port*. Add custom parameter **dfs.namenode.rpc-address.haclusterX1** and set its value to *Service IP address of the standby NameNode instance node in the destination cluster*:*RPC port*. The RPC port of NameNode is **25000** by default. After saving the configuration, roll-restart the Hive service. + +Procedure +--------- + +#. Log in to the node where the client is installed in the source cluster as the Hive client installation user. + +#. .. _mrs_01_24742__li2282114610113: + + Run the following command to switch to the client installation directory, for example, **/opt/client**: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. If Kerberos authentication is enabled for the cluster, run the following command to authenticate the user. Otherwise, skip this step. + + **kinit** *Hive service user* + +#. .. _mrs_01_24742__li9282204611111: + + Run the following command to log in to the Hive client: + + **beeline** + +#. Run the following command to create the **dump_db** database: + + **create database** *dump_db*\ **;** + +#. Run the following command to switch to the **dump_db** database: + + **use** *dump_db*\ **;** + +#. Run the following command to create the **test** table in the **dump_db** database: + + **create table** *test*\ **(**\ *id int*\ **);** + +#. Run the following command to insert data to the **test** table: + + **insert into** *test* **values(**\ *123*\ **);** + +#. Run the following command to set the **dump_db** database as the source of the replication policy: + + **alter database** *dump_db* **set dbproperties** **('**\ *repl.source.for*\ **'='**\ replpolicy1\ **');** + + .. note:: + + - Perform the following steps to set permissions for users when the **alter** command is used to modify database attributes: + + - If Ranger authentication is not enabled for the cluster, log in to FusionInsight Manager to grant the administrator rights of the role to which the user belongs. For details, see section :ref:`Creating a Hive Role `. + - If Ranger authentication is enabled for the cluster, grant users the permission to dump and load databases. For details, see :ref:`Adding a Ranger Access Permission Policy for Hive `. + + - Databases with replication policy sources configured can be deleted only after their replication policy sources are set to null. To do so, run the following command: + + **alter database** *dump_db* **set dbproperties** **('**\ *repl.source.for*\ **'='');** + +#. Run the following command to export the **dump_db** database to the **/user/hive/test** directory of the destination cluster: + + **repl dump** *dump_db* **with ('hive.repl.rootdir'='hdfs://hacluster**\ *X/user/hive/test*\ **');** + + .. note:: + + - **hacluster X** is the value of **haclusterX** in new custom parameter\ **dfs.namenode.rpc-address.haclusterX**. + - Ensure that the current user has the read and write permissions on the export directory to be specified. + +#. Log in to the node where the client is installed in the destination cluster as the Hive client installation user, and perform :ref:`2 ` to :ref:`5 `. + +#. Run the following command to import data from the **dump_db** database in the **/user/hive/test** directory to the **load_db** database: + + **repl load** *load_db* **from '**\ */user/hive/repl*\ **';** + + .. note:: + + When the **repl load** command is used to import a database, pay attention to the following points when specifying the database name: + + - If the specified database does not exist, the database will be created during the import. + - If the specified database exists and the value of **hive.repl.ckpt.key** of the database is the same as the imported path, skip the import operation. + - If the specified database already exists and no table or function exists in this database, only the tables in the source database are imported to the current database during the import. Otherwise, the import fails. diff --git a/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/importing_and_exporting_table_partition_data_in_hive.rst b/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/importing_and_exporting_table_partition_data_in_hive.rst new file mode 100644 index 0000000..e205011 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/importing_and_exporting_table_partition_data_in_hive.rst @@ -0,0 +1,172 @@ +:original_name: mrs_01_24741.html + +.. _mrs_01_24741: + +Importing and Exporting Table/Partition Data in Hive +==================================================== + +Scenario +-------- + +In big data application scenarios, data tables in Hive usually need to be migrated to another cluster. You can run the Hive **import** and **export** commands to migrate data in tables. That is, you can run the **export** command to export Hive tables from the source cluster to the HDFS of the target cluster, run the **import** command in the target cluster to import the exported data to the corresponding Hive table. + +.. note:: + + This section applies to MRS 3.2.0 or later. + + The Hive table import and export function does not support importing or exporting encrypted tables, HBase external tables, transaction tables, Hudi tables, view tables, and materialized view tables. + +Prerequisites +------------- + +- If Hive tables or partition data is imported or exported across clusters and Kerberos authentication is enabled for both the source and destination clusters, configure cross-cluster mutual trust. + +- If you want to run the **import** or **export** command to import or export tables or partitions created by other users, grant the corresponding table permission to the users. + + - If Ranger authentication is not enabled for the cluster, log in to FusionInsight Manager to grant the **Select Authorization** permission of the table corresponding to the role to which the user belongs. For details, see section :ref:`Configuring Permissions for Hive Tables, Columns, or Databases `. + - If Ranger authentication is enabled for the cluster, grant users the permission to import and export tables. For details, see :ref:`Adding a Ranger Access Permission Policy for Hive `. + +- Enable the inter-cluster copy function in the source cluster and destination cluster. + +- Configure the HDFS service address parameter for the source cluster to access the destination cluster. + + Log in to FusionInsight Manager of the source cluster, click **Cluster**, choose **Services** > **Hive**, and click **Configuration**. On the displayed page, search for **hdfs.site.customized.configs**, add custom parameter **dfs.namenode.rpc-address.haclusterX**, and set its value to *Service IP address of the active NameNode instance node in the destination cluster*:*RPC port*. Add custom parameter **dfs.namenode.rpc-address.haclusterX1** and set its value to *Service IP address of the standby NameNode instance node in the destination cluster*:*RPC port*. The RPC port of NameNode is **25000** by default. After saving the configuration, roll-restart the Hive service. + +Procedure +--------- + +#. .. _mrs_01_24741__li1793444135814: + + Log in to the node where the client is installed in the destination cluster as the Hive client installation user. + +#. Run the following command to switch to the client installation directory, for example, **/opt/client**: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. .. _mrs_01_24741__li610182915598: + + If Kerberos authentication is enabled for the cluster, run the following command to authenticate the user. Otherwise, skip this step. + + **kinit** *Hive service user* + +#. Run the following command to log in to the Hive client in the destination cluster: + + **beeline** + +#. Run the following command to create the **export_test** table: + + **create table** *export_test(id int)* **;** + +#. Run the following command to insert data to the **export_test** table: + + **insert into** *export_test values(123)*\ **;** + +#. .. _mrs_01_24741__li89938161394: + + Repeat :ref:`1 ` to :ref:`4 ` in the destination cluster and run the following command to create an HDFS path for storing the exported **export_test** table: + + **dfs -mkdir** */tmp/export* + +#. Run the following command to log in to the Hive client: + + **beeline** + +#. Import and export the **export_test** table. + + The Hive **import** and **export** commands can be used to migrate table data in the following modes. Select a proper data migration mode as required. + + - Mode 1: Table export and import + + a. .. _mrs_01_24741__li422755591415: + + Run the following command in the source cluster to export the metadata and service data of the **export_test** table to the directory created in :ref:`8 `: + + **export table** *export_test* **to** **'hdfs://hacluster**\ *X/tmp/export*\ **';** + + b. Run the following command in the destination cluster to import the table data exported in :ref:`10.a ` to the **export_test** table: + + **import from '**\ */tmp/export*\ **';** + + - Mode 2: Renaming a table during the import + + a. .. _mrs_01_24741__li207111134162118: + + Run the following command in the source cluster to export the metadata and service data of the **export_test** table to the directory created in :ref:`8 `: + + **export table** *export_test* **to** **'hdfs://hacluster**\ *X/tmp/export*\ **';** + + b. Run the following command in the destination cluster to import the table data exported in :ref:`10.a ` to the **import_test** table: + + **import table** *import_test* **from '**\ */tmp/export*\ **';** + + - Mode 3: Partition export and import + + a. .. _mrs_01_24741__li77435347346: + + Run the following commands in the source cluster to export the **pt1** and **pt2** partitions of the **export_test** table to the directory created in :ref:`8 `: + + **export table** *export_test* **partition** **(**\ *pt1*\ **="**\ *in*\ **"**, *pt2*\ **="**\ *ka*\ **")** **to** **'hdfs://hacluster**\ **X**\ */tmp/export*\ **';** + + b. Run the following command in the destination cluster to import the table data exported in :ref:`10.a ` to the **export_test** table: + + **import from '**\ */tmp/export*\ **';** + + - Mode 4: Exporting table data to a Partition + + a. .. _mrs_01_24741__li19785214114715: + + Run the following command in the source cluster to export the metadata and service data of the **export_test** table to the directory created in :ref:`8 `: + + **export table** *export_test* **to 'hdfs://hacluster**\ *X/tmp/export*\ **';** + + b. Run the following command in the destination cluster to import the table data exported in :ref:`10.a ` to the **pt1** and **pt2** partitions of the **import_test** table: + + **import table** *import_test* **partition (**\ *pt1*\ **="**\ *us*\ **",** *pt2*\ **="**\ *tn*\ **") from '**\ */tmp/export*\ **';** + + - Mode 5: Specifying the table location during the import + + a. .. _mrs_01_24741__li11635456135510: + + Run the following command in the source cluster to export the metadata and service data of the **export_test** table to the directory created in :ref:`8 `: + + **export table** *export_test* **to 'hdfs://hacluster**\ *X/tmp/export*\ **';** + + b. Run the following command in the destination cluster to import the table data exported in :ref:`10.a ` to the **import_test** table and specify its location as **tmp/export**: + + **import table** *import_test* **from '**\ */tmp*' **location** **'**/*tmp/export*\ **';** + + - Mode 6: Exporting data to an external table + + a. .. _mrs_01_24741__li437611737: + + Run the following command in the source cluster to export the metadata and service data of the **export_test** table to the directory created in :ref:`8 `: + + **export table** *export_test* **to 'hdfs://hacluster**\ *X/tmp/export*\ **';** + + b. Run the following command in the destination cluster to import the table data exported in :ref:`10.a ` to external table **import_test**: + + **import external table** *import_test* **from '**\ */tmp/export*\ **';** + + .. note:: + + Before exporting table or partition data, ensure that the HDFS path for storage has been created and is empty. Otherwise, the export fails. + + When partitions are exported or imported, the exported or imported table must be a partitioned table, and data of multiple partition values of the same partition field cannot be exported. + + During the data import: + + - If the **import from '**\ */tmp/export*\ **';** statement is used to import a table, the table name is not specified, and the imported data is saved to the table path with the same name as the source table. Pay attention to the following points: + + - If there is no table with the same name as that in the source cluster in the destination cluster, such a table will be created during the table import. + - Otherwise, the HDFS directory of the table must be empty, or the import fails. + + - If the **import external table** *import_test* **from '**\ */tmp/export*\ **';** statement is used to import a table, the exported table is imported to the specified table. Pay attention to the following points: + + - If there is no table with the same name as the specified table exists in the destination cluster, such a table will be created during the table import. + - Otherwise, the HDFS directory of the table must be empty, or the import fails. + + **hacluster X** is the value of **haclusterX** in new custom parameter\ **dfs.namenode.rpc-address.haclusterX**. diff --git a/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/index.rst b/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/index.rst new file mode 100644 index 0000000..266c998 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/data_import_and_export_in_hive/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24744.html + +.. _mrs_01_24744: + +Data Import and Export in Hive +============================== + +- :ref:`Importing and Exporting Table/Partition Data in Hive ` +- :ref:`Importing and Exporting Hive Databases ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + importing_and_exporting_table_partition_data_in_hive + importing_and_exporting_hive_databases diff --git a/doc/component-operation-guide-lts/source/using_hive/enabling_the_function_of_creating_a_foreign_table_in_a_directory_that_can_only_be_read.rst b/doc/component-operation-guide-lts/source/using_hive/enabling_the_function_of_creating_a_foreign_table_in_a_directory_that_can_only_be_read.rst index 09f99cd..ba9fdde 100644 --- a/doc/component-operation-guide-lts/source/using_hive/enabling_the_function_of_creating_a_foreign_table_in_a_directory_that_can_only_be_read.rst +++ b/doc/component-operation-guide-lts/source/using_hive/enabling_the_function_of_creating_a_foreign_table_in_a_directory_that_can_only_be_read.rst @@ -20,9 +20,17 @@ Procedure --------- #. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations**. -#. Choose **HiveServer(Role)** > **Customization**, add a customized parameter to the **hive-site.xml** parameter file, set **Name** to **hive.restrict.create.grant.external.table**, and set **Value** to **true**. -#. Choose **MetaStore(Role)** > **Customization**, add a customized parameter to the **hivemetastore-site.xml** parameter file, set **Name** to **hive.restrict.create.grant.external.table**, and set **Value** to **true**. Restart all Hive instances after the modification. -#. Determine whether to enable this function on the Spark2x client. +#. Modify parameters and restart related instances: - - If yes, download and install the Spark2x client again. - - If no, no further action is required. + - For versions earlier than MRS 3.2.0: + + a. Choose **MetaStore(Role)** > **Customization**, add a custom parameter to the **hivemetastore-site.xml** parameter file, and set **Name** to **hive.supports.over.32.roles** and **Value** to **true**. + b. Choose **HiveServer(Role)** > **Customization** for versions earlier than MRS 3.2.0, add a custom parameter to the **hive-site.xml** parameter file, set **Name** to **hive.supports.over.32.roles**, and set **Value** to **true**. + c. Click **Save** to save the configuration. + d. Click **Instance**, select all Hive instances, and choose **More** > **Restart** Instance to restart all Hive instances. + + - For MRS 3.2.0 or later: + + a. Choose **MetaStore(Role)** > **Customization**, add a custom parameter to the **hivemetastore-site.xml** parameter file, and set **Name** to **hive.supports.over.32.roles** and **Value** to **true**. + b. Click **Save** to save the configuration. + c. Click **Instance**, select all MetaStore instances, and choose **More** > **Restart** Instance to restart all MetaStore instances. diff --git a/doc/component-operation-guide-lts/source/using_hive/hive_materialized_view.rst b/doc/component-operation-guide-lts/source/using_hive/hive_materialized_view.rst deleted file mode 100644 index 0c39b76..0000000 --- a/doc/component-operation-guide-lts/source/using_hive/hive_materialized_view.rst +++ /dev/null @@ -1,154 +0,0 @@ -:original_name: mrs_01_2311.html - -.. _mrs_01_2311: - -Hive Materialized View -====================== - -Introduction ------------- - -A Hive materialized view is a special table obtained based on the query results of Hive internal tables. A materialized view can be considered as an intermediate table that stores actual data and occupies physical space. The tables on which a materialized view depends are called the base tables of the materialized view. - -Materialized views are used to pre-compute and save the results of time-consuming operations such as table joining or aggregation. When executing a query, you can rewrite the query statement based on the base tables to the query statement based on materialized views. In this way, you do not need to perform time-consuming operations such as join and group by, thereby quickly obtaining the query result. - -.. note:: - - - A materialized view is a special table that stores actual data and occupies physical space. - - Before deleting a base table, you must delete the materialized view created based on the base table. - - The materialized view creation statement is atomic, which means that other users cannot see the materialized view until all query results are populated. - - A materialized view cannot be created based on the query results of another materialized view. - - A materialized view cannot be created based on the results of a tableless query. - - You cannot insert, update, delete, load, or merge materialized views. - - You can perform complex query operations on materialized views, because they are special tables in nature. - - When the data of a base table is updated, you need to manually update the materialized view. Otherwise, the materialized view will retain the old data. That is, the materialized view expires. - - You can use the describe syntax to check whether the materialized view created based on ACID tables has expired. - - The describe statement cannot be used to check whether a materialized view created based on non-ACID tables has expired. - -Creating a Materialized View ----------------------------- - -**Syntax** - -.. code-block:: - - CREATE MATERIALIZED VIEW [IF NOT EXISTS] [db_name.]materialized_view_name - [COMMENT materialized_view_comment] - DISABLE REWRITE - [ROW FORMAT row_format] - [STORED AS file_format] - | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] - ] - [LOCATION hdfs_path] - [TBLPROPERTIES (property_name=property_value, ...)] - AS - ; - -.. note:: - - - Currently, the following materialized view file formats are supported: PARQUET, TextFile, SequenceFile, RCfile, and ORC. If **STORED AS** is not specified in the creation statement, the default file format is ORC. - - Names of materialized views must be unique in the same database. Otherwise, you cannot create a new materialized view, and data files of the original materialized view will be overwritten by the data files queried based on the base table in the new one. As a result, data may be tampered with. (After being tampered with, the materialized view can be restored by re-creating the materialized view.). - -**Cases** - -#. Log in to the Hive client and run the following command to enable the following parameters. For details, see :ref:`Using a Hive Client `. - - **set hive.support.concurrency=true;** - - **set hive.exec.dynamic.partition.mode=nonstrict;** - - **set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;** - -#. Create a base table and insert data. - - .. code-block:: - - create table tb_emp( - empno int,ename string,job string,mgr int,hiredate TIMESTAMP,sal float,comm float,deptno int - )stored as orc - tblproperties('transactional'='true'); - - insert into tb_emp values(7369, 'SMITH', 'CLERK',7902, '1980-12-17 08:30:09',800.00,NULL,20), - (7499, 'ALLEN', 'SALESMAN',7698, '1981-02-20 17:12:00',1600.00,300.00,30), - (7521, 'WARD', 'SALESMAN',7698, '1981-02-22 09:05:34',1250.00,500.00,30), - (7566, 'JONES', 'MANAGER', 7839, '1981-04-02 10:14:13',2975.00,NULL,20), - (7654, 'MARTIN', 'SALESMAN',7698, '1981-09-28 08:36:17',1250.00,1400.00,30), - (7698, 'BLAKE', 'MANAGER',7839, '1981-05-01 11:12:55',2850.00,NULL,30), - (7782, 'CLARK', 'MANAGER',7839, '1981-06-09 15:45:28',2450.00,NULL,10), - (7788, 'SCOTT', 'ANALYST',7566, '1987-04-19 14:05:34',3000.00,NULL,20), - (7839, 'KING', 'PRESIDENT',NULL, '1981-11-17 10:18:25',5000.00,NULL,10), - (7844, 'TURNER', 'SALESMAN',7698, '1981-09-08 09:05:34',1500.00,0.00,30), - (7876, 'ADAMS', 'CLERK',7788, '1987-05-23 15:07:44',1100.00,NULL,20), - (7900, 'JAMES', 'CLERK',7698, '1981-12-03 16:23:56',950.00,NULL,30), - (7902, 'FORD', 'ANALYST',7566, '1981-12-03 08:48:17',3000.00,NULL,20), - (7934, 'MILLER', 'CLERK',7782, '1982-01-23 11:45:29',1300.00,NULL,10); - -#. Create a materialized view based on the results of the **tb_emp** query. - - .. code-block:: - - create materialized view group_mv disable rewrite - row format serde 'org.apache.hadoop.hive.serde2.JsonSerDe' - stored as textfile - tblproperties('mv_content'='Total compensation of each department') - as select deptno,sum(sal) sum_sal from tb_emp group by deptno; - -Applying a Materialized View ----------------------------- - -Rewrite the query statement based on base tables to the query statement based on materialized views to improve the query efficiency. - -**Cases** - -Execute the following query statement: - -**select deptno,sum(sal) from tb_emp group by deptno having sum(sal)>10000;** - -Based on the created materialized view, rewrite the query statement: - -**select deptno, sum_sal from group_mv where sum_sal>10000;** - -Checking a Materialized View ----------------------------- - -**Syntax** - -**SHOW MATERIALIZED VIEWS [IN database_name] ['identifier_with_wildcards'];** - -**DESCRIBE [EXTENDED \| FORMATTED] [db_name.]materialized_view_name;** - -**Cases** - -**show materialized views;** - -**describe formatted group_mv;** - -Deleting a Materialized View ----------------------------- - -**Syntax** - -**DROP MATERIALIZED VIEW [db_name.]materialized_view_name;** - -**Cases** - -**drop materialized view group_mv;** - -Rebuilding a Materialized View ------------------------------- - -When a materialized view is created, the base table data is filled in the materialized view. However, the data that is added, deleted, or modified in the base table is not automatically synchronized to the materialized view. Therefore, you need to manually rebuild the view after updating the data. - -**Syntax** - -**ALTER MATERIALIZED VIEW [db_name.]materialized_view_name REBUILD;** - -**Cases** - -**alter materialized view group_mv rebuild;** - -.. note:: - - When the base table data is updated but the materialized view data is not updated, the materialized view is in the expired state by default. - - The describe statement can be used to check whether a materialized view created based on transaction tables has expired. If the value of **Outdated for Rewriting** is **Yes**, the license has expired. If the value of **Outdated for Rewriting** is **No**, the license has not expired. diff --git a/doc/component-operation-guide-lts/source/using_hive/hive_supporting_transactions.rst b/doc/component-operation-guide-lts/source/using_hive/hive_supporting_transactions.rst index 470ee4a..6e2d54f 100644 --- a/doc/component-operation-guide-lts/source/using_hive/hive_supporting_transactions.rst +++ b/doc/component-operation-guide-lts/source/using_hive/hive_supporting_transactions.rst @@ -215,6 +215,97 @@ Procedure of Automatic Compression After compression, small files are not deleted immediately. After the cleaner thread performs cleaning, the files are deleted in batches. +Manual Compression Procedure +---------------------------- + +If you do not want the system to automatically determine when to compress a table, configure the table attribute **NO_AUTO_Compaction** to disable automatic compression. After automatic compression is disabled, you can still use the **ALTER Table /Partition Compact** statement to perform manual compression. + +.. note:: + + This operation applies only to MRS 8.2.0 and later versions. + +#. Log in to the Hive client by referring to :ref:`Using a Hive Client ` and run the following commands to disable automatic compression when creating a table: + + .. code-block:: + + CREATE TABLE table_name ( + id int, name string + ) + CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC + TBLPROPERTIES ("transactional"="true", + "NO_AUTO_COMPACTION"="true" + ); + + .. note:: + + You can also run the following command to disable automatic compression after a table is created: + + **ALTER TABLE** *table_name* **set TBLPROPERTIES ("NO_AUTO_COMPACTION"="true");** + +#. Run the following command to set the compression type of the table. **compaction_type** indicates the compression type, which can be **minor** or **major**. + + **ALTER TABLE** *table_name* **COMPACT 'compaction_type';** + +Procedure for Specifying a Queue for Running a Compression Task +--------------------------------------------------------------- + +This operation applies only to MRS 8.2.0 and later versions. + +#. .. _mrs_01_0975__li1157469141717: + + Create a queue. + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Hive**. Click **Configuration** then **All Configurations**, click **MetaStore(Role)**, and select **Transaction**. + +#. Set the following parameters as required: + + .. table:: **Table 2** Parameter description + + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +=====================================+==============================================================================================================================================================================================================================================================================================================+ + | hive.compactor.job.queue | The name of the Hadoop queue to which the compression job is submitted, that is, the name of the queue created in :ref:`1 `. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hive.compactor.check.interval | The interval for executing the compression thread, in seconds. The default value is **300**. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hive.compactor.cleaner.run.interval | The interval for executing the clearance thread, in milliseconds. The default value is **5000**. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hive.compactor.delta.num.threshold | The threshold of the number of incremental files that triggers minor compression. The default value is **10**. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hive.compactor.delta.pct.threshold | The ratio threshold of the total size of incremental files (delta) that trigger major compression to the size of base files. The value **0.1** indicates that major compression is triggered when the ratio of the total size of delta files to the size of base files is 10%. The default value is **0.1**. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | hive.compactor.max.num.delta | The maximum number of incremental files that the compressor will attempt to process in a single job. The default value is **500**. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | metastore.compactor.initiator.on | Whether to run the startup program thread and cleanup program thread on the MetaStore instance. To start a transaction, set this parameter to **true**. The default value is **false**. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | metastore.compactor.worker.threads | The number of compression program work threads running on MetaStore. If this parameter is set to **0**, no compression is performed. To use a transaction, set this parameter to a positive number on one or more instances of the MetaStore service. The unit is second. The default value is **0**. | + +-------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Log in to the Hive client and perform compression. For details, see :ref:`Using a Hive Client `. + + .. code-block:: + + CREATE TABLE table_name ( + id int, name string + ) + CLUSTERED BY (id) INTO 2 BUCKETS STORED AS ORC + TBLPROPERTIES ("transactional"="true", + "compactor.mapreduce.map.memory.mb"="2048", -- Specify the properties of a compression map job. + "compactorthreshold.hive.compactor.delta.num.threshold"="4", -- If there are more than four incremental directories, slight compression is triggered. + "compactorthreshold.hive.compactor.delta.pct.threshold"="0.5" -- If the ratio of the incremental file size to the basic file size is greater than 50%, deep compression is triggered. + ); + + or + + .. code-block:: + + ALTER TABLE table_name COMPACT 'minor' WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="3072"); -- Specify the properties of a compression map job. + ALTER TABLE table_name COMPACT 'major' WITH OVERWRITE TBLPROPERTIES ("tblprops.orc.compress.size"="8192"); -- Modify any other Hive table attributes. + + .. note:: + + After compression, small files are not deleted immediately. After the cleaner thread performs cleaning, the files are deleted in batches. + .. |image1| image:: /_static/images/en-us_image_0000001295739904.jpg .. |image2| image:: /_static/images/en-us_image_0000001296059708.jpg .. |image3| image:: /_static/images/en-us_image_0000001348739733.jpg diff --git a/doc/component-operation-guide-lts/source/using_hive/hive_supports_isolation_of_metastore_instances_based_on_components.rst b/doc/component-operation-guide-lts/source/using_hive/hive_supports_isolation_of_metastore_instances_based_on_components.rst new file mode 100644 index 0000000..1faaa30 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/hive_supports_isolation_of_metastore_instances_based_on_components.rst @@ -0,0 +1,52 @@ +:original_name: mrs_01_24467.html + +.. _mrs_01_24467: + +Hive Supports Isolation of Metastore instances Based on Components +================================================================== + +Scenario +-------- + +This function restricts components in a cluster to connect to specified Hive Metastore instances. By default, components can connect to all Metastore instances. This function applies only to clusters whose version is MRS 3.2.0 or later. + +Currently, only HetuEngine, Hive, Loader, Metadata, Spark2x and Flink can connect to Metastore in a cluster. The Metastore instances can be allocated in a unified manner. + +.. note:: + + - This function only limits the Metastore instances accessed by component servers. Metadata is not isolated. + - Currently, Flink tasks can only connect to Metastore instances through the client. + - When spark-sql is used to execute tasks, the client is directly connected to Metastore. The client needs to be updated for the isolation to take effect. + - This function supports only isolation in the same cluster. If HetuEngine is deployed in different clusters, unified isolation configuration is not supported. You need to modify the HetuEngine configuration to connect to the specified Metastore instance. + - You are advised to configure at least two Metastore instances for each component to ensure availability during isolation configuration. + +Prerequisites +------------- + +The Hive service has been installed in the cluster and is running properly. + +Procedure +--------- + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Hive**. On the displayed page, click the **Configurations** tab and then **All Configurations**, and search for the **HIVE_METASTORE_URI** parameter. + +#. .. _mrs_01_24467__li31543502585: + + Set the value of **HIVE_METASTORE_URI_DEFAULT** to the URI connection string of all Metastore instances. + + |image1| + +#. Connect a component to a specified Metastore instance. Copy the value in :ref:`2 `, modify the configuration items based on the component name, save the modification, and restart the component. + + The following example shows how Spark2x connects to only two Metastore instances of Hive. + + a. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Hive**. On the displayed page, click the **Configurations** tab and then **All Configurations**, and search for the **HIVE_METASTORE_URI** parameter. + + b. Copy the default configuration of **HIVE_METASTORE_URI_DEFAULT** to the URI configuration item of Spark2x. If Spark2x needs to connect to only two Metastore instances, retain two nodes as required. Click **Save**. + + |image2| + + c. Choose **Cluster** > **Services** > **Spark2x**. On the displayed page, click the **Instance** tab, select the instances whose configuration has expired, and choose **More** > **Restart Instance**. In the dialog box that is displayed, enter the password and click **OK** to restart the instances. + +.. |image1| image:: /_static/images/en-us_image_0000001533544798.png +.. |image2| image:: /_static/images/en-us_image_0000001583504773.png diff --git a/doc/component-operation-guide-lts/source/using_hive/index.rst b/doc/component-operation-guide-lts/source/using_hive/index.rst index 99ef6d0..e1f261b 100644 --- a/doc/component-operation-guide-lts/source/using_hive/index.rst +++ b/doc/component-operation-guide-lts/source/using_hive/index.rst @@ -31,14 +31,19 @@ Using Hive - :ref:`Authorizing Over 32 Roles in Hive ` - :ref:`Restricting the Maximum Number of Maps for Hive Tasks ` - :ref:`HiveServer Lease Isolation ` +- :ref:`Hive Supports Isolation of Metastore instances Based on Components ` - :ref:`Hive Supporting Transactions ` - :ref:`Switching the Hive Execution Engine to Tez ` -- :ref:`Connecting Hive with External RDS ` +- :ref:`Connecting Hive with External RDS ` +- :ref:`Interconnecting Hive with External Self-Built Relational Databases ` - :ref:`Redis-based CacheStore of HiveMetaStore ` -- :ref:`Hive Materialized View ` - :ref:`Hive Supporting Reading Hudi Tables ` - :ref:`Hive Supporting Cold and Hot Storage of Partitioned Metadata ` - :ref:`Hive Supporting ZSTD Compression Formats ` +- :ref:`Locating Abnormal Hive Files ` +- :ref:`Using the ZSTD_JNI Compression Algorithm to Compress Hive ORC Tables ` +- :ref:`Load Balancing for Hive MetaStore Client Connection ` +- :ref:`Data Import and Export in Hive ` - :ref:`Hive Log Overview ` - :ref:`Hive Performance Tuning ` - :ref:`Common Issues About Hive ` @@ -73,14 +78,19 @@ Using Hive authorizing_over_32_roles_in_hive restricting_the_maximum_number_of_maps_for_hive_tasks hiveserver_lease_isolation + hive_supports_isolation_of_metastore_instances_based_on_components hive_supporting_transactions switching_the_hive_execution_engine_to_tez connecting_hive_with_external_rds + interconnecting_hive_with_external_self-built_relational_databases redis-based_cachestore_of_hivemetastore - hive_materialized_view hive_supporting_reading_hudi_tables hive_supporting_cold_and_hot_storage_of_partitioned_metadata hive_supporting_zstd_compression_formats + locating_abnormal_hive_files + using_the_zstd_jni_compression_algorithm_to_compress_hive_orc_tables + load_balancing_for_hive_metastore_client_connection + data_import_and_export_in_hive/index hive_log_overview hive_performance_tuning/index common_issues_about_hive/index diff --git a/doc/component-operation-guide-lts/source/using_hive/interconnecting_hive_with_external_self-built_relational_databases.rst b/doc/component-operation-guide-lts/source/using_hive/interconnecting_hive_with_external_self-built_relational_databases.rst new file mode 100644 index 0000000..39cab41 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/interconnecting_hive_with_external_self-built_relational_databases.rst @@ -0,0 +1,194 @@ +:original_name: mrs_01_1751.html + +.. _mrs_01_1751: + +Interconnecting Hive with External Self-Built Relational Databases +================================================================== + +.. note:: + + - This section describes how to connect Hive with built-in relational databases open-source MySQL and Postgres. + - After an external metadata database is deployed in a cluster with Hive data, the original metadata tables will not be automatically synchronized. Before installing Hive, determine whether to store metadata in an external database or DBService. For the former, deploy an external database when installing Hive or when there is no Hive data. After Hive installation, the metadata storage location cannot be changed. Otherwise, the original metadata will be lost. + +**Hive supports access to open source MySQL and Postgres metabases.** + +#. Install the open source MySQL or Postgres database. + + .. note:: + + The node where the database is installed must be in the same network segment as the cluster, so that they can access each other. + +#. Upload the driver package. + + - PostgreSQL: + + Use the open source driver package to replace the cluster's existing one. Download the open source PostgreSQL driver package **postgresql-42.2.5.jar** at https://repo1.maven.org/maven2/org/postgresql/postgresql/42.2.5/ and upload it to the **${BIGDATA_HOME}/third_lib/Hive** directory on all MetaStore nodes. + + Run the following commands on all MetaStore nodes to modify the permission on the driver package: + + **cd ${BIGDATA_HOME}/third_lib/Hive** + + **chown omm:wheel** **postgresql-42.2.5.jar** + + **chmod 600** **postgresql-42.2.5.jar** + + - MySQL: + + Visit the MySQL official website at https://www.mysql.com/, choose **DOWNLOADS** > **MySQL Community(GPL) DownLoads** > **Connector/J**, and download the driver package of the required version. + + - Versions earlier than MRS 8.2.0, upload the driver package to the **/opt/Bigdata/FusionInsight_HD_*/install/FusionInsight-Hive-*/hive-*/lib/** directory on all RDSMetastore nodes. + - MRS 8.2.0 and later versions, upload the driver package to the **${BIGDATA_HOME}/third_lib/Hive** directory on all RDSMetastore nodes. + + Run the following commands on all MetaStore nodes to modify the permission on the driver package: + + **cd /opt/Bigdata/FusionInsight_HD_*/install/FusionInsight-Hive-*/hive-*/lib/** + + **chown omm:wheel** **mysql-connector-java-*.jar** + + **chmod 600** **mysql-connector-java-*.jar** + +#. .. _mrs_01_1751__li1742162711619: + + Create a user and metadata database in the self-built database and assign all permissions on the database to the user. For example: + + - Run the following commands in the PostgreSQL database to create the database **hivemeta** and the user **testuser**, and assign all permissions on **hivemeta** to **testuser**: + + **create user testuser with password 'password';** + + **create database hivemeta owner testuser;** + + **grant all privileges on database hivemeta to testuser;** + + - Run the following commands in the MySQL database to create the database **hivemeta** and the user **testuser**, and assign all permissions on **hivemeta** to **testuser**: + + **create database hivemeta;** + + **create user 'testuser'@'%' identified by 'password';** + + **grant all privileges on hivemeta.\* to 'testuser';** + + **flush privileges;** + +#. Import the SQL statements for creating metadata tables. + + - SQL script path in the PostgreSQL database: **${BIGDATA_HOME}/FusionInsight_HD_*/install/FusionInsight-Hive-*/hive-*/scripts/metastore/upgrade/postgres/hive-schema-3.1.0.postgres.sql** + + Run the following command to import the SQL file to Postgres: + + **./bin/psql -U** *username* **-d** *databasename* **-f hive-schema-3.1.0.postgres.sql** + + Specifically: + + **./bin/psql** is in the Postgres installation directory. + + **username** indicates the username for logging in to Postgres. + + **databasename** indicates the database name. + + - SQL script path in the MySQL database: **${BIGDATA_HOME}/FusionInsight_HD_*/install/FusionInsight-Hive-*/hive-*/scripts/metastore/upgrade/mysql/hive-schema-3.1.0.mysql.sql** + + Run the following command to import the SQL file to the MySQL database: + + **./bin/mysql -u** *username* **-p** *password* **-D** *databasename*\ ** *Name of the desired cluster* > **Services** > **Hive**. Click **Configurations** then **All Configurations**, click **Hive(Service)**, select **MetaDB**, modify the following parameters, and click **Save**: + + .. table:: **Table 1** Parameters + + +---------------------------------------+-------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +=======================================+=================================================================================================+=============================================================================================================================================+ + | javax.jdo.option.ConnectionDriverName | org.postgresql.Driver | Driver class for connecting metadata on MetaStore | + | | | | + | | | - If an external MySQL database is used, the value is: | + | | | | + | | | **com.mysql.jdbc.Driver** | + | | | | + | | | - If an external Postgres database is used, the value is: | + | | | | + | | | **org.postgresql.Driver** | + +---------------------------------------+-------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | javax.jdo.option.ConnectionURL | jdbc:postgresql://%{DBSERVICE_FLOAT_IP}%{DBServer}:%{DBSERVICE_CPORT}/hivemeta?socketTimeout=60 | URL of the JDBC link of the MetaStore metadata | + | | | | + | | | - If an external MySQL database is used, the value is: | + | | | | + | | | **jdbc:mysql://**\ *IP address of the MySQL database*\ **:**\ *Port number of the MySQL database*\ **/hivemeta?characterEncoding=utf-8** | + | | | | + | | | - If an external Postgres database is used, the value is: | + | | | | + | | | **jdbc:postgresql://**\ *IP address of the PostgreSQL database*\ **:**\ *Port number of the PostgreSQL database*\ **/hivemeta** | + +---------------------------------------+-------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | javax.jdo.option.ConnectionUserName | hive${SERVICE_INDEX}${SERVICE_INDEX} | Username for connecting to the metadata database on Metastore | + +---------------------------------------+-------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Change the Postgres database password in MetaStore. Choose **Cluster** > **Name of the desired cluster** > **Services** > **Hive** > **Configurations** > **All Configurations** > **MetaStore(Role)** > **MetaDB**, modify the following parameters, and click **Save**. + + .. table:: **Table 2** Parameter + + +--------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Default Value | Description | + +============================================+===============+===========================================================================================================================+ + | javax.jdo.option.extend.ConnectionPassword | \*****\* | User password for connecting to the external metadata database on Metastore. The password is encrypted in the background. | + +--------------------------------------------+---------------+---------------------------------------------------------------------------------------------------------------------------+ + +#. Log in to each MetaStore background node and check whether the local directory **/opt/Bigdata/tmp** exists. + + - If yes, go to :ref:`8 `. + + - If no, run the following commands to create one: + + **mkdir -p /opt/Bigdata/tmp** + + **chmod 755 /opt/Bigdata/tmp** + +#. .. _mrs_01_1751__li24241321154318: + + Save the configuration. Choose **Dashboard** > **More** > **Restart Service**, and enter the password to restart the Hive service. + +#. Log in to the MySQL or PostgreSQL database and view metadata tables generated in the metadata database created in :ref:`3 `. + + |image1| + +#. Check whether the metadata database is successfully deployed. + + a. Log in to the node where the Hive client is installed as the client installation user. + + **cd** *Client installation directory* + + **source bigdata_env** + + **kinit** *Component service user* (Skip this step for clusters with Kerberos authentication disabled.) + + b. Run the following command to log in to the Hive client CLI: + + **beeline** + + c. Run the following command to create the **test** table: + + **create table** *test*\ **(id int,str1 string,str2 string);** + + d. Run the following command in the **hivemeta** database of the MySQL or PostgreSQL database to check whether there is any information about the **test** table: + + **select \* from TBLS;** + + If information about the **test** table is displayed, the external database is successfully deployed. For example: + + - The result in the MySQL database is as follows: + + |image2| + + - The result in the PostgreSQL database is as follows: + + |image3| + +.. |image1| image:: /_static/images/en-us_image_0000001584077717.png +.. |image2| image:: /_static/images/en-us_image_0000001583757997.png +.. |image3| image:: /_static/images/en-us_image_0000001583957937.png diff --git a/doc/component-operation-guide-lts/source/using_hive/load_balancing_for_hive_metastore_client_connection.rst b/doc/component-operation-guide-lts/source/using_hive/load_balancing_for_hive_metastore_client_connection.rst new file mode 100644 index 0000000..aaab048 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/load_balancing_for_hive_metastore_client_connection.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_24738.html + +.. _mrs_01_24738: + +Load Balancing for Hive MetaStore Client Connection +=================================================== + +Scenario +-------- + +The client connection of Hive MetaStore supports load balancing. That is, heavy load of a single MetaStore node during heavy service traffic can be avoided by connecting to the node with the least connections based on the connection number recorded in ZooKeeper. Enabling this function does not affect the original connection mode. + +.. note:: + + This section applies to MRS 3.2.0 or later. + +Procedure +--------- + +#. Log in to FusionInsight Manager, click **Cluster**, choose **Services** > **Hive**, click **Configurations**, and then **All Configurations**. + +#. Search for the **hive.metastore-ext.balance.connection.enable** parameter and set its value to **true**. + +#. Click **Save**. + +#. Click **Instance**, select all instances, choose **More** > **Restart Instance**, enter the password, and click **OK** to restart all Hive instances. + +#. For other components that connect to MetaStore, add the **hive.metastore-ext.balance.connection.enable** parameter and set its value to **true**. + + The following example shows how to add this parameter if Spark2x needs to be connected to MetaStore: + + a. Log in to FusionInsight Manager, click **Cluster**, choose **Services** > **Spark2x**, and click **Configurations**. + b. Click **Customization**, add a custom parameter **hive.metastore-ext.balance.connection.enable** to all **hive-site.xml** parameter files, set its value to **true**, and click **Save**. + c. Click **Instance**, select all configuration-expired instances, choose **More** > **Restart Instance**, enter the password, and click **OK** to restart them. diff --git a/doc/component-operation-guide-lts/source/using_hive/locating_abnormal_hive_files.rst b/doc/component-operation-guide-lts/source/using_hive/locating_abnormal_hive_files.rst new file mode 100644 index 0000000..3a6fd56 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/locating_abnormal_hive_files.rst @@ -0,0 +1,63 @@ +:original_name: mrs_01_24480.html + +.. _mrs_01_24480: + +Locating Abnormal Hive Files +============================ + +Scenario +-------- + +- Data files stored in Hive are abnormal due to misoperations or disk damage, thereby causing task execution failures or incorrect data results. +- Common non-text data files can be located using the specified tool. + + .. note:: + + This section applies only to MRS 3.2.0 or later. + +Procedure +--------- + +#. Log in to the node where the Hive service is installed as user **omm** and run the following command to go to the Hive installation directory: + + **cd ${BIGDATA_HOME}/FusionInsight_HD_*/install/FusionInsight-Hive-*/hive-*/bin** + +#. Run the following tool to locate abnormal Hive files: + + **sh hive_parser_file.sh [--help] ** + + :ref:`Table 1 ` describes the related parameters. + + Note: You can run only one command at a time. + + .. _mrs_01_24480__table11352804551: + + .. table:: **Table 1** Parameter description + + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Remarks | + +=================+==============================================================================================================================+================================================================================================================================================================================================+ + | filetype | Specifies the format of the data file to be parsed. Currently, only the ORC, RC (RCFile), and Parquet formats are supported. | Currently, data files in the RC format can only be viewed. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -c | Prints the column information in the current metadata. | The column information includes the class name, file format, and sequence number. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -d | Prints data in a data file. You can limit the data volume using the **limit** parameter. | The data is the content of the specified data file. Note that only one value can be specified for the **limit** parameter at a time. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -t | Prints the time zone to which the data is written. | The time zone is the zone to which the file is written. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -h | Prints the help information. | Help information. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -m | Prints information about various storage formats. | The information varies based on the storage format. For example, if the file format is ORC, information such as strip and block size will be printed. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | -a | Prints detailed information. | The detailed information, including the preceding parameters, is displayed. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | input-file | Specifies the data files to be input. | If the input directory contains a file of the supported formats, the file will be parsed. Otherwise, this operation is omitted. You can specify a local file or an HDFS/OBS file or directory. | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | input-directory | Specifies the directory where the input data file is located. This parameter is used when there are multiple subfiles. | | + +-----------------+------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Example: + + **sh hive_parser_file.sh orc -d limit=100 hdfs://hacluster/user/hive/warehouse/orc_test** + + If the file name does not contain a prefix similar to **hdfs://hacluster**, the local file is read by default. diff --git a/doc/component-operation-guide-lts/source/using_hive/switching_the_hive_execution_engine_to_tez.rst b/doc/component-operation-guide-lts/source/using_hive/switching_the_hive_execution_engine_to_tez.rst index b95174c..a26cf97 100644 --- a/doc/component-operation-guide-lts/source/using_hive/switching_the_hive_execution_engine_to_tez.rst +++ b/doc/component-operation-guide-lts/source/using_hive/switching_the_hive_execution_engine_to_tez.rst @@ -20,7 +20,7 @@ Switching the Execution Engine on the Client to Tez #. Install and log in to the Hive client. For details, see :ref:`Using a Hive Client `. -#. Run the following commands to switch the engine and enable the **yarn.timeline-service.enabled** parameter: +#. Run the following commands for MRS 3.1.2 to switch the engine and enable the **yarn.timeline-service.enabled** parameter: **set hive.execution.engine=tez**; @@ -33,6 +33,14 @@ Switching the Execution Engine on the Client to Tez - When the execution engine needs to be switched to another engine, you need to run the **set yarn.timeline-service.enabled=false** command on the client to disable the **yarn.timeline-service.enabled** parameter. - To specify a Yarn running queue, run the **set tez.queue.name=default** command on the client. +#. For MRS 3.2.0 and later versions, run the following command to switch the engine: + + **set hive.execution.engine=tez**; + + .. note:: + + To specify a running Yarn queue, run the **set tez.queue.name=default** command on the client. + #. Submit and execute the Tez tasks. #. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Tez** > **TezUI**\ *(host name)* to view the task execution status on the TezUI page. @@ -42,8 +50,8 @@ Switching the Default Execution Engine of Hive to Tez #. Log in to FusionInsight Manager. For details, see :ref:`Accessing FusionInsight Manager `. Choose **Cluster** > *Name of the desired cluster* > **Services** > **Hive** > **Configurations** > **All Configurations** > **HiveServer(Role)**, and search for **hive.execution.engine**. #. Set **hive.execution.engine** to **tez**. -#. Choose **Hive(Service)** > **Customization** and search for **yarn.site.customized.configs**. -#. Add a customized parameter **yarn.timeline-service.enabled** next to **yarn.site.customized.configs** and set its value to **true**. +#. Choose **Hive(Service)** > **Customization** and search for **yarn.site.customized.configs**\ for MRS 3.1.2. +#. Add a customized parameter **yarn.timeline-service.enabled** next to **yarn.site.customized.configs** and set its value to **true**\ for MRS 3.1.2. .. note:: diff --git a/doc/component-operation-guide-lts/source/using_hive/using_the_zstd_jni_compression_algorithm_to_compress_hive_orc_tables.rst b/doc/component-operation-guide-lts/source/using_hive/using_the_zstd_jni_compression_algorithm_to_compress_hive_orc_tables.rst new file mode 100644 index 0000000..139a566 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hive/using_the_zstd_jni_compression_algorithm_to_compress_hive_orc_tables.rst @@ -0,0 +1,64 @@ +:original_name: mrs_01_24507.html + +.. _mrs_01_24507: + +Using the ZSTD_JNI Compression Algorithm to Compress Hive ORC Tables +==================================================================== + +Scenario +-------- + +ZSTD_JNI is a native implementation of the ZSTD compression algorithm. Compared with ZSTD, ZSTD_JNI has higher compression read/write efficiency and compression ratio, and allows you to specify the compression level as well as the compression mode for data columns in a specific format. + +Currently, only ORC tables can be compressed using ASTD_JNI. By contrast, ZSTD enables you to compress tables in the full storage format. Therefore, you are advised to use this feature only when you have high requirements on data compression. + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Example +------- + +#. Log in to the node where the client is installed as the Hive client installation user. + +#. Run the following command to switch to the client installation directory, for example, **/opt/client**: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. Check whether the cluster authentication mode is in security mode. + + - If yes, run the following command to perform user authentication and then go to :ref:`5 `. + + **kinit** *Hive service user* + + - If no, go to :ref:`5 `. + +#. .. _mrs_01_24507__li333142945916: + + Run the following command to log in to the Hive client: + + **beeline** + +#. Create a table in ZSTD_JNI compression format as follows: + + - Run the following example command to set the **orc.compress** parameter to **ZSTD_JNI** when using this compression algorithm to create an ORC table: + + **create table tab_1(...) stored as orc TBLPROPERTIES("orc.compress"="ZSTD_JNI");** + + - The compression level of ZSTD_JNI ranges from 1 to 19. A larger value indicates a higher compression ratio but a slower read/write speed. A smaller value indicates a lower compression ratio but a faster compression speed compared with read/write speed and the other way around. The default value is **6**. You can set the compression level through the **orc.global.compress.level** parameter, as shown in the follows. + + **create table tab_1(...) stored as orc TBLPROPERTIES("orc.compress"="ZSTD_JNI", 'orc.global.compress.level'='3');** + + - This compression algorithm allows you to compress service data and columns in a specific data format. Currently, data in the following formats is supported: JSON data columns, Base64 data columns, timestamp data columns, and UUID data columns. You can achieve this function by setting the **orc.column.compress** parameter during table creation. + + The following example code shows how to use ZSTD_JNI to compress data in the JSON, Base64, timestamp, and UUID formats. + + **create table test_orc_zstd_jni(f1 int, f2 string, f3 string, f4 string, f5 string) stored as orc** + + **TBLPROPERTIES('orc.compress'='ZSTD_JNI', 'orc.column.compress'='[{"type":"cjson","columns":"f2"},{"type":"base64","columns":"f3"},{"type ":"gorilla","columns":{"format": "yyyy-MM-dd HH:mm:ss.SSS", "columns": "f4"}},{"type":"uuid","columns":"f5"}]');** + + You can insert data in the corresponding format based on the site requirements to further compress the data. diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/cleaning.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/cleaning.rst index c8e3659..c4fbbfe 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/cleaning.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/cleaning.rst @@ -19,4 +19,4 @@ You can use either of the following methods to perform cleaning: **spark-submit --master yarn --jars /opt/client/Hudi/hudi/lib/hudi-client-common-**\ *xxx*\ **.jar --class org.apache.hudi.utilities.HoodieCleaner /opt/client/Hudi/hudi/lib/hudi-utilities\_**\ *xxx*\ **.jar --target-base-path /tmp/default/tb_test_mor** -For details about more cleaning parameters, see :ref:`Configuration Reference `. +For details about more cleaning parameters, see :ref:`Hudi Configuration Reference `. diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/clustering.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/clustering.rst index 350fb73..3556f97 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/clustering.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/clustering.rst @@ -82,7 +82,7 @@ How to Execute Clustering hoodie.clustering.inline.max.commits=4 -For details, see :ref:`Configuration Reference `. +For details, see :ref:`Hudi Configuration Reference `. .. caution:: diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/metadata_table.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/metadata_table.rst index 03b8ba8..bd5021a 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/metadata_table.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/metadata_table.rst @@ -31,7 +31,7 @@ Metadata Table When using Spark to write data, set **hoodie.metadata.enable** in the **option** parameter to **true**. - For details about more parameters, see :ref:`Configuration Reference ` or visit Hudi official website http://hudi.apache.org/docs/configurations.html#metadata-config. + For details about more parameters, see :ref:`Hudi Configuration Reference ` or visit Hudi official website http://hudi.apache.org/docs/configurations.html#metadata-config. - **Performance improvement** diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/single-table_concurrent_write.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/single-table_concurrent_write.rst index bfc8804..109e7d1 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/single-table_concurrent_write.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/data_management_and_maintenance/single-table_concurrent_write.rst @@ -54,7 +54,7 @@ How to Use the Concurrency Mechanism **hoodie.write.lock.zookeeper.base_path**\ =\ ** -For details about more parameters, see :ref:`Configuration Reference `. +For details about more parameters, see :ref:`Hudi Configuration Reference `. .. caution:: diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/clustering_configuration.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/clustering_configuration.rst new file mode 100644 index 0000000..9f88f1e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/clustering_configuration.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_24804.html + +.. _mrs_01_24804: + +Clustering Configuration +======================== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + + Clustering has two strategies: **hoodie.clustering.plan.strategy.class** and **hoodie.clustering.execution.strategy.class**. Typically, if **hoodie.clustering.plan.strategy.class** is set to **SparkRecentDaysClusteringPlanStrategy** or **SparkSizeBasedClusteringPlanStrategy**, **hoodie.clustering.execution.strategy.class** does not need to be specified. However, if **hoodie.clustering.plan.strategy.class** is set to **SparkSingleFileSortPlanStrategy**, **hoodie.clustering.execution.strategy.class** must be set to **SparkSingleFileSortExecutionStrategy**. + ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Parameter | Description | Default Value | ++=======================================================+====================================================================================================================================================================================+======================================================================================+ +| hoodie.clustering.inline | Whether to execute clustering synchronously | false | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.inline.max.commits | Number of commits that trigger clustering | 4 | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.plan.strategy.target.file.max.bytes | Maximum size of each file after clustering | 1024 \* 1024 \* 1024 byte | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.plan.strategy.small.file.limit | Files smaller than this size will be clustered. | 300 \* 1024 \* 1024 byte | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.plan.strategy.sort.columns | Columns used for sorting in clustering | None | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.layout.optimize.strategy | Clustering execution strategy. Three sorting modes are available: **linear**, **z-order**, and **hilbert**. | linear | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.layout.optimize.enable | Set this parameter to **true** when **z-order** or **hilbert** is used. | false | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.plan.strategy.class | Strategy class for filtering file groups for clustering. By default, files whose size is less than the value of **hoodie.clustering.plan.strategy.small.file.limit** are filtered. | org.apache.hudi.client.clustering.plan.strategy.SparkSizeBasedClusteringPlanStrategy | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.execution.strategy.class | Strategy class for executing clustering (subclass of RunClusteringStrategy), which is used to define the execution mode of a cluster plan. | org.apache.hudi.client.clustering.run.strategy.SparkSortAndSizeExecutionStrategy | +| | | | +| | The default classes sort the file groups in the plan by the specified column and meets the configured target file size. | | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.plan.strategy.max.num.groups | Maximum number of file groups that can be selected during clustering. A larger value indicates a higher concurrency. | 30 | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| hoodie.clustering.plan.strategy.max.bytes.per.group | Maximum number of data records in each file group involved in clustering | 2 \* 1024 \* 1024 \* 1024 byte | ++-------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/compaction_and_cleaning_configurations.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/compaction_and_cleaning_configurations.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/compaction_and_cleaning_configurations.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/compaction_and_cleaning_configurations.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/configuration_of_hive_table_synchronization.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/configuration_of_hive_table_synchronization.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/configuration_of_hive_table_synchronization.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/configuration_of_hive_table_synchronization.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/index.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/index.rst similarity index 86% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/index.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/index.rst index 42b2d4b..2e9dd4f 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/index.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/index.rst @@ -2,8 +2,8 @@ .. _mrs_01_24032: -Configuration Reference -======================= +Hudi Configuration Reference +============================ This section describes important Hudi configurations. For details, visit the Hudi official website https://hudi.apache.org/docs/configurations.html. @@ -14,6 +14,7 @@ This section describes important Hudi configurations. For details, visit the Hud - :ref:`Compaction and Cleaning Configurations ` - :ref:`Metadata Table Configuration ` - :ref:`Single-Table Concurrent Write Configuration ` +- :ref:`Clustering Configuration ` .. toctree:: :maxdepth: 1 @@ -26,3 +27,4 @@ This section describes important Hudi configurations. For details, visit the Hud compaction_and_cleaning_configurations metadata_table_configuration single-table_concurrent_write_configuration + clustering_configuration diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/index_configuration.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/index_configuration.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/index_configuration.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/index_configuration.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/metadata_table_configuration.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/metadata_table_configuration.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/metadata_table_configuration.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/metadata_table_configuration.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/single-table_concurrent_write_configuration.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/single-table_concurrent_write_configuration.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/single-table_concurrent_write_configuration.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/single-table_concurrent_write_configuration.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/storage_configuration.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/storage_configuration.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/storage_configuration.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/storage_configuration.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/write_configuration.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/write_configuration.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/basic_operations/configuration_reference/write_configuration.rst rename to doc/component-operation-guide-lts/source/using_hudi/basic_operations/hudi_configuration_reference/write_configuration.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/index.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/index.rst index 34a7b76..b513086 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/index.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/index.rst @@ -10,7 +10,7 @@ Basic Operations - :ref:`Read ` - :ref:`Data Management and Maintenance ` - :ref:`Using the Hudi Client ` -- :ref:`Configuration Reference ` +- :ref:`Hudi Configuration Reference ` .. toctree:: :maxdepth: 1 @@ -21,4 +21,4 @@ Basic Operations read/index data_management_and_maintenance/index using_the_hudi_client/index - configuration_reference/index + hudi_configuration_reference/index diff --git a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/write/stream_write.rst b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/write/stream_write.rst index 78f8646..3a70ca3 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/basic_operations/write/stream_write.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/basic_operations/write/stream_write.rst @@ -72,3 +72,182 @@ Run the following commands to specify the HoodieDeltaStreamer execution paramete **--enable-hive-sync** // Enable Hive synchronization to synchronize the Hudi table to Hive. **--continuous** // Set the stream processing mode to **continuous**. + +Stream Write Using HoodieMultiTableDeltaStreamer +------------------------------------------------ + +.. note:: + + HoodieMultiTableDeltaStreamer streaming write applies only to MRS 3.2.0 or later. + +HoodieDeltaStreamer allows you to capture data from multiple types of source tables and write the data to Hudi tables. However, you can only write data in one source table to one destination table. By contrast, HoodieMultiTableDeltaStreamer supports data write from multiple source tables to one or multiple destination tables. + +- **The following example describes how to write data in two Kafka source tables to two Hudi tables.** + + .. note:: + + Set the following parameters: + + .. code-block:: + + // Specify the target table. + hoodie.deltastreamer.ingestion.tablesToBeIngested=Directory name.target table + //Specify all source tables to specific destination tables. + hoodie.deltastreamer.source.sourcesBoundTo.Destination table=Directory name.Source table 1,Directory name.Source table 2 + // Specify the configuration file path of each source table. + Hoodie.deltastreamer.Source.directory name.Source table 1.configFile=Path 1 + Hoodie.deltastreamer.source.Directory name.Source table 2.configFile=Path 2 + // Specify the check point of each source table. The format of the recovery point varies according to the source table type. For example, the recovery point format of Kafka source is "Topic name,Partition name:offset". + hoodie.deltastreamer.current.source.checkpoint=Topic name,Partition name:offset + // Specify the associated table (Hudi table) of each source table. If there are multiple associated tables, separate them with commas (,). + hoodie.deltastreamer.source.associated.tables=hdfs://hacluster/....., hdfs://hacluster/..... + // Specify the transform operation before the data in each source table is written to Hudi. Note that the columns to be written must be listed. Do not use select *. + // indicates the current source table and cannot be changed. + hoodie.deltastreamer.transformer.sql=select field1,field2,field3,... from + + Spark submission command: + + .. code-block:: + + spark-submit \ + --master yarn \ + --driver-memory 1g \ + --executor-memory 1g \ + --executor-cores 1 \ + --num-executors 5 \ + --conf spark.driver.extraClassPath=/opt/client/Hudi/hudi/conf:/opt/client/Hudi/hudi/lib/*:/opt/client/Spark2x/spark/jars/* \ + --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer /opt/client/Hudi/hudi/lib/hudi-utilities_2.12-*.jar \ + --props file:///opt/hudi/testconf/sourceCommon.properties \ + --config-folder file:///opt/hudi/testconf/ \ + --source-class org.apache.hudi.utilities.sources.JsonKafkaSource \ + --schemaprovider-class org.apache.hudi.examples.common.HoodieMultiTableDeltaStreamerSchemaProvider \ + --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \ + --source-ordering-field col6 \ + --base-path-prefix hdfs://hacluster/tmp/ \ + --table-type COPY_ON_WRITE \ + --target-table KafkaToHudi \ + --enable-hive-sync \ + --allow-fetch-from-multiple-sources \ + --allow-continuous-when-multiple-sources + + .. note:: + + #. When the **source** type is **kafka source**, the schema provider class specified by **--schemaprovider-class** needs to be developed by users. + #. **--allow-fetch-from-multiple-sources** indicates that multi-source table writing is enabled. + #. **--allow-continuous-when-multiple-sources** indicates that multi-source table continuous write is enabled. If this parameter is not set, the task ends after all source tables are written once. + + sourceCommon.properties: + + .. code-block:: + + hoodie.deltastreamer.ingestion.tablesToBeIngested=testdb.KafkaToHudi + hoodie.deltastreamer.source.sourcesBoundTo.KafkaToHudi=source1,source2 + hoodie.deltastreamer.source.default.source1.configFile=file:///opt/hudi/testconf/source1.properties + hoodie.deltastreamer.source.default.source2.configFile=file:///opt/hudi/testconf/source2.properties + + hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator + hoodie.datasource.write.partitionpath.field=col0 + hoodie.datasource.write.recordkey.field=primary_key + hoodie.datasource.write.precombine.field=col6 + + hoodie.datasource.hive_sync.table=kafkatohudisync + hoodie.datasource.hive_sync.partition_fields=col0 + hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor + + bootstrap.servers=192.168.34.221:21005,192.168.34.136:21005,192.168.34.175:21005 + auto.offset.reset=latest + group.id=hoodie-test + + source1.properties: + + .. code-block:: + + hoodie.deltastreamer.current.source.name=source1 // Specify the name of a Kafka source table. + hoodie.deltastreamer.source.kafka.topic=s1 + hoodie.deltastreamer.current.source.checkpoint=s1,0:0,1:0 // Checkpoint of the source table when the task is started. The deltastreamer tasks resume from offset 0 of partition 0 and offset 0 of partition 1. + // Specify the Hudi table to be combined with the source1 table. If the Hudi table has been synchronized to Hive, skip this step and use the table name in the SQL statement. + hoodie.deltastreamer.source.associated.tables=hdfs://hacluster/tmp/huditest/tb_test_cow_par + // indicates the current source table, that is, source1. The value is fixed. + hoodie.deltastreamer.transformer.sql=select A.primary_key, A.col0, B.col1, B.col2, A.col3, A.col4, B.col5, B.col6, B.col7 from as A join tb_test_cow_par as B on A.primary_key = B.primary_key + + source2.properties + + .. code-block:: + + hoodie.deltastreamer.current.source.name=source2 + hoodie.deltastreamer.source.kafka.topic=s2 + hoodie.deltastreamer.current.source.checkpoint=s2,0:0,1:0 + hoodie.deltastreamer.source.associated.tables=hdfs://hacluster/tmp/huditest/tb_test_cow_par + hoodie.deltastreamer.transformer.sql=select A.primary_key, A.col0, B.col1, B.col2, A.col3, A.col4, B.col5, B.col6, B.col7 from as A join tb_test_cow_par as B on A.primary_key = B.primary_key + +- **The following example describes how to write data in two Hudi tables to one Hudi table** + + Spark submission command: + + .. code-block:: + + spark-submit \ + --master yarn \ + --driver-memory 1g \ + --executor-memory 1g \ + --executor-cores 1 \ + --num-executors 2 \ + --conf spark.driver.extraClassPath=/opt/client/Hudi/hudi/conf:/opt/client/Hudi/hudi/lib/*:/opt/client/Spark2x/spark/jars/* \ + --class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer /opt/client/Hudi/hudi/lib/hudi-utilities_2.12-*.jar \ + --props file:///opt/testconf/sourceCommon.properties \ + --config-folder file:///opt/testconf/ \ + --source-class org.apache.hudi.utilities.sources.HoodieIncrSource \ // Specify that the source table is a Hudi table, which can only be COW. + --payload-class org.apache.hudi.common.model.OverwriteNonDefaultsWithLatestAvroPayload \ // Specify a payload, which determines how the original value is changed to a new value. + --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer \ // Specify a transformer class. If the schema of the source table is different from that of the target table, the source table data can be written to the target table only after being transformed. + --source-ordering-field col6 \ + --base-path-prefix hdfs://hacluster/tmp/ \ // Path for saving the destination tables + --table-type MERGE_ON_READ \ // Type of the destination table, which can be COW or MOR. + --target-table tb_test_mor_par_300 \ // Specify the name of the target table. When you write data in multiple source tables to a target table, the name of the target table must be specified. + --checkpoint 000 \ // Specify a checkpoint (commit timestamp), which indicates that Delta Streamer is restored from this checkpoint. 000 indicates that Delta Streamer is restored from the beginning. + --enable-hive-sync \ + --allow-fetch-from-multiple-sources \ + --allow-continuous-when-multiple-sources \ + --op UPSERT // Specify the write type. + + .. note:: + + - If the **source** type is **HoodieIncrSourc**, **--schemaprovider-class** does not need to be specified. + - If **transformer-class** is set to **SqlQueryBasedTransformer**, you can use SQL queries to convert the data structure of the source table to that of the destination table. + + file:///opt/testconf/sourceCommon.properties: + + .. code-block:: + + # Common properties of source tables + hoodie.deltastreamer.ingestion.tablesToBeIngested=testdb.tb_test_mor_par_300 // Specify a target table (common property) to which multiple source tables are written. + hoodie.deltastreamer.source.sourcesBoundTo.tb_test_mor_par_300=testdb.tb_test_mor_par_100,testdb.tb_test_mor_par_200 //Specify multiple source tables. + hoodie.deltastreamer.source.testdb.tb_test_mor_par_100.configFile=file:///opt/testconf/tb_test_mor_par_100.properties // Property file path of the source table tb_test_mor_par_100 + hoodie.deltastreamer.source.testdb.tb_test_mor_par_200.configFile=file:///opt/testconf/tb_test_mor_par_200.properties //Property file path of the source table tb_test_mor_par_200 + + # Hudi write configurations shared by all source tables. The independent configurations of a source table need to be written to its property file. + hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator + hoodie.datasource.write.partitionpath.field=col0 + hoodie.datasource.write.recordkey.field=primary_key + hoodie.datasource.write.precombine.field=col6 + + file:///opt/testconf/tb_test_mor_par_100.properties + + .. code-block:: + + # Configurations of the source table tb_test_mor_par_100 + hoodie.deltastreamer.source.hoodieincr.path=hdfs://hacluster/tmp/testdb/tb_test_mor_par_100 // Path of the source table + hoodie.deltastreamer.source.hoodieincr.partition.fields=col0 // Partitioning key of the source table + hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt=false + hoodie.deltastreamer.source.associated.tables=hdfs://hacluster/tmp/testdb/tb_test_mor_par_400 //Specify the table to be associated with the source table. + hoodie.deltastreamer.transformer.sql=select A.primary_key, A.col0, B.col1, B.col2, A.col3, A.col4, B.col5, A.col6, B.col7 from as A join tb_test_mor_par_400 as B on A.primary_key = B.primary_key //This configuration takes effect only when transformer-class is set to SqlQueryBasedTransformer. + + file:///opt/testconf/tb_test_mor_par_200.properties + + .. code-block:: + + # Configurations of the source table tb_test_mor_par_200 + hoodie.deltastreamer.source.hoodieincr.path=hdfs://hacluster/tmp/testdb/tb_test_mor_par_200 + hoodie.deltastreamer.source.hoodieincr.partition.fields=col0 + hoodie.deltastreamer.source.hoodieincr.read_latest_on_missing_ckpt=false + hoodie.deltastreamer.source.associated.tables=hdfs://hacluster/tmp/testdb/tb_test_mor_par_400 + hoodie.deltastreamer.transformer.sql=select A.primary_key, A.col0, B.col1, B.col2, A.col3, A.col4, B.col5, A.col6, B.col7 from as A join tb_test_mor_par_400 as B on A.primary_key = B.primary_key //Convert the data structure of the source table to that of the destination table. If the source table needs to be associated with Hive, you can use the table name in the SQL query for association. If the source table needs to be associated with a Hudi table, you need to specify the path of the Hudi table first and then use the table name in the SQL query for association. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/concurrency_for_schema_evolution.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/concurrency_for_schema_evolution.rst new file mode 100644 index 0000000..adecec5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/concurrency_for_schema_evolution.rst @@ -0,0 +1,51 @@ +:original_name: mrs_01_24550.html + +.. _mrs_01_24550: + +Concurrency for Schema Evolution +================================ + +.. caution:: + + When creating a table, you need to set **hoodie.cleaner.policy.failed.writes** to **LAZY**. Otherwise, rollback will be triggered when concurrent submission operations are performed. + +DDL Concurrency +--------------- + +.. table:: **Table 1** Concurrent DDL operations + + ============== === ====== =========== ============== ==== + DDL Operation add rename change type change comment drop + ============== === ====== =========== ============== ==== + add Y Y Y Y Y + rename Y Y Y Y Y + change type Y Y Y Y Y + change comment Y Y Y Y Y + drop Y Y Y Y N + ============== === ====== =========== ============== ==== + +.. note:: + + When performing DDL operations on the same column concurrently, pay attention to the following: + + - Multiple drop operations cannot be concurrently performed on the same column. Otherwise, only the first drop operation can be successfully performed and then exception message "java.lang.UnsupportedOperationException: cannot evolution schema implicitly, the column for which the update operation is performed does not exist." is thrown. + - When drop, rename, change type, and change comment operations are concurrently performed , drop operations must be executed last. Otherwise, only drop and operations before drop can be performed, and exception message "java.lang.UnsupportedOperationException: cannot evolution schema implicitly, the column for which the update operation is performed does not exist." is thrown when operations after drop are performed. + +DDL and DML Concurrency +----------------------- + +.. table:: **Table 2** Concurrent DDL and DML operations + + ============== =========== ====== ====== ========= + DDL Operation insert into update delete set/reset + ============== =========== ====== ====== ========= + add Y Y Y Y + rename N N Y N + change type N N Y N + change comment Y Y Y Y + drop N N Y N + ============== =========== ====== ====== ========= + +.. note:: + + Exception message "cannot evolution schema implicitly, actions such as rename, delete, and type change were found" is thrown when unsupported DDL or DML operations are performed concurrently. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/evolution_introduction.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/evolution_introduction.rst new file mode 100644 index 0000000..57e535d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/evolution_introduction.rst @@ -0,0 +1,12 @@ +:original_name: mrs_01_24493.html + +.. _mrs_01_24493: + +Evolution Introduction +====================== + +Schema evolution allows users to easily change the current schema of a Hudi table to adapt to the data that is changing over time. + +.. note:: + + This section applies only to MRS 3.1.3 or later. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/index.rst new file mode 100644 index 0000000..078b09a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/index.rst @@ -0,0 +1,20 @@ +:original_name: mrs_01_24492.html + +.. _mrs_01_24492: + +Hudi Schema Evolution +===================== + +- :ref:`Evolution Introduction ` +- :ref:`Schema Evolution Scenarios ` +- :ref:`SparkSQL Schema Evolution and Syntax Description ` +- :ref:`Concurrency for Schema Evolution ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + evolution_introduction + schema_evolution_scenarios + sparksql_schema_evolution_and_syntax_description/index + concurrency_for_schema_evolution diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/schema_evolution_scenarios.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/schema_evolution_scenarios.rst new file mode 100644 index 0000000..6ffcd1d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/schema_evolution_scenarios.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_24494.html + +.. _mrs_01_24494: + +Schema Evolution Scenarios +========================== + +Schema evolution scenarios + +- Columns (including nested columns) can be added, deleted, modified, and moved. +- Partition columns cannot be evolved. +- You cannot add, delete, or perform operations on nested columns of the Array type. + +.. table:: **Table 1** Engines supported + + +------------+---------------+----------------------------+---------------------------+---------------------------------+ + | Component | DDL Operation | Hudi Table Write Operation | Hudi Table Read Operation | Hudi Table Compaction Operation | + +============+===============+============================+===========================+=================================+ + | SparkSQL | Y | Y | Y | Y | + +------------+---------------+----------------------------+---------------------------+---------------------------------+ + | Flink | N | Y | Y | Y | + +------------+---------------+----------------------------+---------------------------+---------------------------------+ + | HetuEngine | N | N | Y | N | + +------------+---------------+----------------------------+---------------------------+---------------------------------+ + | Hive | N | N | Y | N | + +------------+---------------+----------------------------+---------------------------+---------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/adding_a_column.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/adding_a_column.rst new file mode 100644 index 0000000..35dea0d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/adding_a_column.rst @@ -0,0 +1,59 @@ +:original_name: mrs_01_24498.html + +.. _mrs_01_24498: + +Adding a Column +=============== + +Function +-------- + +The **ADD COLUMNS** command is used to add a column to an existing table. + +Syntax +------ + +**ALTER TABLE** *Table name* **ADD COLUMNS**\ *(col_spec[, col_spec ...])* + +Parameter Description +--------------------- + +.. table:: **Table 1** ADD COLUMNS parameters + + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+==================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | tableName | Table name. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | col_spec | Column specifications, consisting of five fields, **col_name**, **col_type**, **nullable**, **comment**, and **col_position**. | + | | | + | | - **col_name**: name of the new column. It is mandatory. | + | | | + | | To add a sub-column to a nested column, specify the full name of the sub-column in this field. For example: | + | | | + | | - To add sub-column **col1** to a nested struct type column **column users struct**, set this field to **users.col1**. | + | | - To add sub-column **col1** to a nested map type column **member map>**, set this field to **member.value.col1**. | + | | - To add sub-column **col2** to a nested array type column **arraylike array>**, set this field to **arraylike.element.col2**. | + | | | + | | - **col_type**: type of the new column. It is mandatory. | + | | | + | | - **nullable**: whether the new column can be null. The value can be left empty. | + | | | + | | - **comment**: comment of the new column. The value can be left empty. | + | | | + | | - **col_position**: position where the new column is added. The value can be **FIRST** or **AFTER origin_col**. If it is set to **FIRST**, the new column will be added to the first column of the table. If it is set to **AFTER origin_col**, the new column will be added after original column **origin_col**. The value can be left empty. **FIRST** can be used only when new sub-columns are added to nested columns. Do not use **FIRST** in top-level columns. There are no restrictions about the usage of **AFTER**. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Example +------- + +.. code-block:: + + alter table h0 add columns(ext0 string); + alter table h0 add columns(new_col int not null comment 'add new column' after col1); + alter table complex_table add columns(col_struct.col_name string comment 'add new column to a struct col' after col_from_col_struct); + +Response +-------- + +You can run the **DESCRIBE** command to view the new column. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/altering_a_column.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/altering_a_column.rst new file mode 100644 index 0000000..33248bf --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/altering_a_column.rst @@ -0,0 +1,78 @@ +:original_name: mrs_01_24499.html + +.. _mrs_01_24499: + +Altering a Column +================= + +Function +-------- + +The **ALTER TABLE ... ALTER COLUMN** command is used to change the attributes of a column, such as the column type, position, and comment. + +Syntax +------ + +**ALTER TABLE** *Table name* **ALTER** + +**[COLUMN]** *col_old_name* **TYPE** *column_type* + +**[COMMENT]** *col_comment* + +**[FIRST|AFTER]** *column_name* + +Parameter Description +--------------------- + +.. table:: **Table 1** ALTER COLUMN parameters + + +--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +==============+===============================================================================================================================================+ + | tableName | Table name. | + +--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | col_old_name | Name of the column to be altered. | + +--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | column_type | Type of the target column. | + +--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | col_comment | Column comment. | + +--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | column_name | New position to place the target column. For example, **AFTER column_name** indicates that the target column is placed after **column_name**. | + +--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + +Example +------- + +- Changing the column type + + .. code-block:: + + ALTER TABLE table1 ALTER COLUMN a.b.c TYPE bigint + + **a.b.c** indicates the full path of a nested column. For details about the nested column rules, see :ref:`Adding a Column `. + + The following changes on column types are supported: + + - int => long/float/double/string/decimal + - long => float/double/string/decimal + - float => double/String/decimal + - From double to string or decimal + - From decimal to decimal or string + - From string to date or decimal + - From date to string + +- Altering other attributes + + .. code-block:: + + ALTER TABLE table1 ALTER COLUMN a.b.c DROP NOT NULL + ALTER TABLE table1 ALTER COLUMN a.b.c COMMENT 'new comment' + ALTER TABLE table1 ALTER COLUMN a.b.c FIRST + ALTER TABLE table1 ALTER COLUMN a.b.c AFTER x + + **a.b.c** indicates the full path of a nested column. For details about the nested column rules, see :ref:`Adding a Column `. + +Response +-------- + +You can run the **DESCRIBE** command to view the modified column. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/changing_a_table_name.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/changing_a_table_name.rst new file mode 100644 index 0000000..bf8e3a7 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/changing_a_table_name.rst @@ -0,0 +1,40 @@ +:original_name: mrs_01_24501.html + +.. _mrs_01_24501: + +Changing a Table Name +===================== + +Function +-------- + +The **ALTER TABLE ... RENAME** command is used to change the table name. + +Syntax +------ + +**ALTER TABLE** *tableName* **RENAME TO** *newTableName* + +Parameter Description +--------------------- + +.. table:: **Table 1** RENAME parameters + + ============ =============== + Parameter Description + ============ =============== + tableName Table name. + newTableName New table name. + ============ =============== + +Example +------- + +.. code-block:: + + ALTER TABLE table1 RENAME TO table2 + +Response +-------- + +You can run the **SHOW TABLES** command to view the new table name. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/changing_the_column_name.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/changing_the_column_name.rst new file mode 100644 index 0000000..ea05933 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/changing_the_column_name.rst @@ -0,0 +1,47 @@ +:original_name: mrs_01_24503.html + +.. _mrs_01_24503: + +Changing the Column Name +======================== + +Function +-------- + +The **ALTER TABLE ... RENAME COLUMN** command is used to change the column name. + +Syntax +------ + +**ALTER TABLE** *tableName* **RENAME COLUMN** *old_columnName* **TO** *new_columnName* + +Parameter Description +--------------------- + +.. table:: **Table 1** RENAME COLUMN parameters + + ============== ================ + Parameter Description + ============== ================ + tableName Table name. + old_columnName Old column name. + new_columnName New column name. + ============== ================ + +Example +------- + +.. code-block:: + + ALTER TABLE table1 RENAME COLUMN a.b.c TO x + +**a.b.c** indicates the full path of a nested column. For details about the nested column rules, see :ref:`Adding a Column `. + +.. note:: + + After the column name is changed, the change is automatically synchronized to the column comment. The comment is in **rename oldName to newName** format. + +Response +-------- + +You can run the **DESCRIBE** command to view the new column name. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/deleting_a_column.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/deleting_a_column.rst new file mode 100644 index 0000000..8fdfea9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/deleting_a_column.rst @@ -0,0 +1,43 @@ +:original_name: mrs_01_24500.html + +.. _mrs_01_24500: + +Deleting a Column +================= + +Function +-------- + +The **ALTER TABLE ... DROP COLUMN** command is used to delete a column. + +Syntax +------ + +**ALTER TABLE** *tableName* **DROP COLUMN|COLUMNS** *cols* + +Parameter Description +--------------------- + +.. table:: **Table 1** DROP COLUMN parameters + + ========= ======================================================== + Parameter Description + ========= ======================================================== + tableName Table name. + cols Columns to be deleted. You can specify multiple columns. + ========= ======================================================== + +Example +------- + +.. code-block:: + + ALTER TABLE table1 DROP COLUMN a.b.c + ALTER TABLE table1 DROP COLUMNS a.b.c, x, y + +**a.b.c** indicates the full path of a nested column. For details about the nested column rules, see :ref:`Adding a Column `. + +Response +-------- + +You can run the **DESCRIBE** command to check which column is deleted. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/enabling_schema_evolution.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/enabling_schema_evolution.rst new file mode 100644 index 0000000..ca71cde --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/enabling_schema_evolution.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_24496.html + +.. _mrs_01_24496: + +Enabling Schema Evolution +========================= + +.. caution:: + + Schema evolution cannot be disabled once being enabled. + +- To use spark-beeline, log in to FusionInsight Manager, choose **Cluster** > **Services** > **Spark2x**, and click the **Configurations** tab then the **All Configurations** sub-tab. + + Search for **spark.sql.extensions** in the search box and change its value of JDBCServer to **org.apache.spark.sql.hive.FISparkSessionExtension,org.apache.spark.sql.hudi.HoodieSparkSessionExtension,org.apache.spark.sql.hive.CarbonInternalExtensions**. + +- For SQL operations, run the following command before running any SQL statements: + + .. code-block:: + + set hoodie.schema.evolution.enable=true + +- For API calls, specify the following parameter in DataFrame options: + + .. code-block:: + + hoodie.schema.evolution.enable -> true diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/index.rst new file mode 100644 index 0000000..36d0976 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/index.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_24495.html + +.. _mrs_01_24495: + +SparkSQL Schema Evolution and Syntax Description +================================================ + +- :ref:`Enabling Schema Evolution ` +- :ref:`Adding a Column ` +- :ref:`Altering a Column ` +- :ref:`Deleting a Column ` +- :ref:`Changing a Table Name ` +- :ref:`Modifying Table Properties ` +- :ref:`Changing the Column Name ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + enabling_schema_evolution + adding_a_column + altering_a_column + deleting_a_column + changing_a_table_name + modifying_table_properties + changing_the_column_name diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/modifying_table_properties.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/modifying_table_properties.rst new file mode 100644 index 0000000..3628acd --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_schema_evolution/sparksql_schema_evolution_and_syntax_description/modifying_table_properties.rst @@ -0,0 +1,41 @@ +:original_name: mrs_01_24502.html + +.. _mrs_01_24502: + +Modifying Table Properties +========================== + +Function +-------- + +The **ALTER TABLE ... SET|UNSET** command is used to modify table properties. + +Syntax +------ + +**ALTER TABLE** *Table name* **SET|UNSET** *tblproperties* + +Parameter Description +--------------------- + +.. table:: **Table 1** SET|UNSET parameters + + ============= ================= + Parameter Description + ============= ================= + tableName Table name. + tblproperties Table properties. + ============= ================= + +Example +------- + +.. code-block:: + + ALTER TABLE table SET TBLPROPERTIES ('table_property' = 'property_value') + ALTER TABLE table UNSET TBLPROPERTIES [IF EXISTS] ('comment', 'key') + +Response +-------- + +You can run the **DESCRIBE** command to view new table properties. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/change_table.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/change_table.rst new file mode 100644 index 0000000..0264258 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/change_table.rst @@ -0,0 +1,52 @@ +:original_name: mrs_01_24740.html + +.. _mrs_01_24740: + +CHANGE_TABLE +============ + +Function +-------- + +The **CHANGE_TABLE** command can be used to modify the type and index of a table. Key parameters such as the type and index of Hudi tables cannot be modified. Therefore, this command is actually used to rewrite Hudi tables. + +Syntax +------ + +.. code-block:: + + call change_table(table => '[table_name]', hoodie.index.type => '[index_type]', hoodie.datasource.write.table.type => '[table_type]'); + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + ========== ================================ + Parameter Description + ========== ================================ + table_name Name of the table to be modified + table_type Type of the table to be modified + index_type Type of the index to be modified + ========== ================================ + +Precautions +----------- + +If the index type to be modified has other configuration parameters, the parameters must be transferred to the SQL statement in the **key =>'value'** format. + +For example, to change the index type to bucket, run the following command: + +.. code-block:: + + call change_table(table => 'hudi_table1', hoodie.index.type => 'BUCKET', hoodie.bucket.index.num.buckets => '3'); + +Example +------- + +call change_table(table => 'hudi_table1', hoodie.index.type => 'SIMPLE', hoodie.datasource.write.table.type => 'MERGE_ON_READ'); + +System Response +--------------- + +After the execution is complete, you can run the **desc formatted table** command to view the table properties. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/clean_file.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/clean_file.rst new file mode 100644 index 0000000..52ff247 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/clean_file.rst @@ -0,0 +1,63 @@ +:original_name: mrs_01_24781.html + +.. _mrs_01_24781: + +CLEAN_FILE +========== + +Function +-------- + +Cleans invalid data files from the Hudi table directory. + +Syntax +------ + +**call clean_file**\ (table => '[table_name]', mode=>'[op_type]', backup_path=>'[backup_path]', start_instant_time=>'[start_time]', end_instant_time=>'[end_time]'); + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+========================================================================================================================================================================================+ + | table_name | Mandatory. Name of the Hudi table from which invalid data files are to be deleted. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | op_type | Optional. Command running mode. The default value is **dry_run**. Value options are **dry_run**, **repair**, **undo**, and **query**. | + | | | + | | **dry_run**: displays invalid data files to be cleaned. | + | | | + | | **repair**: displays and cleans invalid data files. | + | | | + | | **undo**: restores deleted data files. | + | | | + | | **query**: displays the backup directories that have been cleaned. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | backup_path | Mandatory. Backup directory of the data files to be restored. This parameter is available only when the running mode is **undo**. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | start_time | Optional. Start time for generating invalid data files. This parameter is available only when the running mode is **dry_run** or **repair**. The start time is not limited by default. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | end_time | Optional. End time for generating invalid data files. This parameter is available only when the running mode is **dry_run** or **repair**. The end time is not limited by default. | + +-----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Example +------- + +.. code-block:: + + call clean_file(table => 'h1', mode=>'repair'); + call clean_file(table => 'h1', mode=>'dry_run'); + call clean_file(table => 'h1', mode=>'query'); + call clean_file(table => 'h1', mode=>'undo', backup_path=>'/tmp/hudi/h1/.hoodie/.cleanbackup/hoodie_repair_backup_20220222222222'); + +Precautions +----------- + +The command cleans only invalid Parquet files. + +System Response +--------------- + +You can view command execution results in the driver log or on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/hudi_clustering.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/hudi_clustering.rst new file mode 100644 index 0000000..2a75018 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/hudi_clustering.rst @@ -0,0 +1,60 @@ +:original_name: mrs_01_24802.html + +.. _mrs_01_24802: + +Hudi CLUSTERING +=============== + +Function +-------- + +Performs the clustering operation on Hudi tables. For details, see :ref:`Clustering `. + +Syntax +------ + +- Creating a savepoint: + + **call run_clustering(table=>'[table]', path=>'[path]', predicate=>'[predicate]', order=>'[order]');** + +- Performing clustering: + + **call show_clustering(table=>'[table]', path=>'[path]', limit=>'[limit]');** + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + +-----------+-----------------------------------------------------------------------------------------+-----------+ + | Parameter | Description | Mandatory | + +===========+=========================================================================================+===========+ + | table | Name of the table to be queried. The value can be in the **database.tablename** format. | No | + +-----------+-----------------------------------------------------------------------------------------+-----------+ + | path | Path of the table to be queried | No | + +-----------+-----------------------------------------------------------------------------------------+-----------+ + | predicate | Predicate sentence to be defined | No | + +-----------+-----------------------------------------------------------------------------------------+-----------+ + | order | Sorting field for clustering | No | + +-----------+-----------------------------------------------------------------------------------------+-----------+ + | limit | Number of query results to display | No | + +-----------+-----------------------------------------------------------------------------------------+-----------+ + +Example +------- + +.. code-block:: + + call show_clustering(table => 'hudi_table1'); + + call run_clustering(table => 'hudi_table1', predicate => '(ts >= 1006L and ts < 1008L) or ts >= 1009L', order => 'ts'); + +Precautions +----------- + +Either **table** or **path** must exist. Otherwise, the Hudi table to be clustered cannot be determined. + +System Response +--------------- + +You can view query results on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/index.rst new file mode 100644 index 0000000..936cd6d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/index.rst @@ -0,0 +1,30 @@ +:original_name: mrs_01_24739.html + +.. _mrs_01_24739: + +CALL COMMAND +============ + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +- :ref:`CHANGE_TABLE ` +- :ref:`CLEAN_FILE ` +- :ref:`SHOW_TIME_LINE ` +- :ref:`SHOW_HOODIE_PROPERTIES ` +- :ref:`SAVE_POINT ` +- :ref:`ROLL_BACK ` +- :ref:`Hudi CLUSTERING ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + change_table + clean_file + show_time_line + show_hoodie_properties + save_point + roll_back + hudi_clustering diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/roll_back.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/roll_back.rst new file mode 100644 index 0000000..49131ea --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/roll_back.rst @@ -0,0 +1,46 @@ +:original_name: mrs_01_24803.html + +.. _mrs_01_24803: + +ROLL_BACK +========= + +Function +-------- + +Rolls back a specified commit. + +Syntax +------ + +**call rollback_to_instant(table => '[table_name]', instant_time => '[instant]');** + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + +------------+--------------------------------------------------------------------------+ + | Parameter | Description | + +============+==========================================================================+ + | table_name | Mandatory. Name of the Hudi table to be rolled back. | + +------------+--------------------------------------------------------------------------+ + | instant | Mandatory. Commit instant timestamp of the Hudi table to be rolled back. | + +------------+--------------------------------------------------------------------------+ + +Example +------- + +.. code-block:: + + call rollback_to_instant(table => 'h1', instant_time=>'20220915113127525'); + +Precautions +----------- + +Only the latest commit timestamps can be rolled back in sequence. + +System Response +--------------- + +You can view command execution results in the driver log or on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/save_point.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/save_point.rst new file mode 100644 index 0000000..e7e5426 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/save_point.rst @@ -0,0 +1,64 @@ +:original_name: mrs_01_24800.html + +.. _mrs_01_24800: + +SAVE_POINT +========== + +Function +-------- + +Manages savepoints of Hudi tables. + +Syntax +------ + +- Creating a savepoint: + + **call create_savepoints('[table_name]', '[commit_Time]', '[user]', '[comments]');** + +- Viewing all existing savepoints + + **call show_savepoints(table =>'[table_name]');** + +- Rolling back a savepoint: + + **call rollback_savepoints('[table_name]', '[commit_Time]');** + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + +-------------+-----------------------------------------------------------------------------------------+-----------+ + | Parameter | Description | Mandatory | + +=============+=========================================================================================+===========+ + | table_name | Name of the table to be queried. The value can be in the **database.tablename** format. | Yes | + +-------------+-----------------------------------------------------------------------------------------+-----------+ + | commit_Time | Specified creation or rollback timestamp | Yes | + +-------------+-----------------------------------------------------------------------------------------+-----------+ + | user | User who creates a savepoint | No | + +-------------+-----------------------------------------------------------------------------------------+-----------+ + | comments | Description of the savepoint | No | + +-------------+-----------------------------------------------------------------------------------------+-----------+ + +Example +------- + +.. code-block:: + + call create_savepoints('hudi_test1', '20220908155421949'); + call show_savepoints(table =>'hudi_test1'); + call rollback_savepoints('hudi_test1', '20220908155421949'); + +Precautions +----------- + +- MOR tables do not support savepoints. +- The commit-related files before the latest savepoint are not cleaned. +- If there are multiple savepoints, perform the rollback from the latest savepoint. The logic is as follows: roll back the latest savepoint; delete the savepoint; and roll back the next savepoint. + +System Response +--------------- + +You can view query results on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/show_hoodie_properties.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/show_hoodie_properties.rst new file mode 100644 index 0000000..abc96d5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/show_hoodie_properties.rst @@ -0,0 +1,39 @@ +:original_name: mrs_01_24799.html + +.. _mrs_01_24799: + +SHOW_HOODIE_PROPERTIES +====================== + +Function +-------- + +Displays the configuration in the **hoodie.properties** file of a specified Hudi table. + +Syntax +------ + +**call show_hoodie_properties(table => '[table_name]');** + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + +------------+-----------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+=========================================================================================+ + | table_name | Name of the table to be queried. The value can be in the **database.tablename** format. | + +------------+-----------------------------------------------------------------------------------------+ + +Example +------- + +.. code-block:: + + call show_hoodie_properties(table => "hudi_table5"); + +System Response +--------------- + +You can view query results on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/show_time_line.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/show_time_line.rst new file mode 100644 index 0000000..54053b3 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/call_command/show_time_line.rst @@ -0,0 +1,63 @@ +:original_name: mrs_01_24782.html + +.. _mrs_01_24782: + +SHOW_TIME_LINE +============== + +Function +-------- + +Displays the effective or archived Hudi timelines and details of a specified instant time. + +Syntax +------ + +- Viewing the list of effective timelines of a table: + + **call show_active_instant_list(table => '[table_name]');** + +- Viewing the list of effective timelines after a timestamp in a table: + + **call show_active_instant_list(table => '[table_name]', instant => '[instant]');** + +- Viewing information about an instant that takes effect in a table: + + **call show_active_instant_detail(table => '[table_name]', instant => '[instant]');** + +- Viewing the list of archived instant timelines in a table: + + **call show_archived_instant_list(table => '[table_name]');** + +- Viewing the list of archived instant timelines after a timestamp in a table: + + **call show_archived_instant_list(table => '[table_name]', instant => '[instant]');** + +- Viewing information about archived instants in a table: + + **call show_archived_instant_detail(table => '[table_name], instant => '[instant]');** + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + +------------+-----------------------------------------------------------------------------------------+ + | Parameter | Description | + +============+=========================================================================================+ + | table_name | Name of the table to be queried. The value can be in the **database.tablename** format. | + +------------+-----------------------------------------------------------------------------------------+ + | instant | Instant timestamp to be queried | + +------------+-----------------------------------------------------------------------------------------+ + +Example +------- + +.. code-block:: + + call show_active_instant_detail(table => 'hudi_table1', instant => '20220913144936897'"); + +System Response +--------------- + +You can view query results on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/index.rst deleted file mode 100644 index de720bf..0000000 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/index.rst +++ /dev/null @@ -1,26 +0,0 @@ -:original_name: mrs_01_24263.html - -.. _mrs_01_24263: - -DDL -=== - -- :ref:`CREATE TABLE ` -- :ref:`CREATE TABLE AS SELECT ` -- :ref:`DROP TABLE ` -- :ref:`SHOW TABLE ` -- :ref:`ALTER RENAME TABLE ` -- :ref:`ALTER ADD COLUMNS ` -- :ref:`TRUNCATE TABLE ` - -.. toctree:: - :maxdepth: 1 - :hidden: - - create_table - create_table_as_select - drop_table - show_table - alter_rename_table - alter_add_columns - truncate_table diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/index.rst deleted file mode 100644 index 52abe9e..0000000 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/index.rst +++ /dev/null @@ -1,24 +0,0 @@ -:original_name: mrs_01_24272.html - -.. _mrs_01_24272: - -DML -=== - -- :ref:`INSERT INTO ` -- :ref:`MERGE INTO ` -- :ref:`UPDATE ` -- :ref:`DELETE ` -- :ref:`COMPACTION ` -- :ref:`SET/RESET ` - -.. toctree:: - :maxdepth: 1 - :hidden: - - insert_into - merge_into - update - delete - compaction - set_reset diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/alter_add_columns.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/alter_add_columns.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/alter_add_columns.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/alter_add_columns.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/alter_rename_table.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/alter_rename_table.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/alter_rename_table.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/alter_rename_table.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/create_table.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/create_hudi_table.rst similarity index 99% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/create_table.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/create_hudi_table.rst index 0d68624..ed15d08 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/create_table.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/create_hudi_table.rst @@ -2,8 +2,8 @@ .. _mrs_01_24264: -CREATE TABLE -============ +CREATE Hudi TABLE +================= Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/create_table_as_select.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/create_hudi_table_as_select.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/create_table_as_select.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/create_hudi_table_as_select.rst index 61c2c84..d60be98 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/create_table_as_select.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/create_hudi_table_as_select.rst @@ -2,8 +2,8 @@ .. _mrs_01_24265: -CREATE TABLE AS SELECT -====================== +CREATE Hudi TABLE AS SELECT +=========================== Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/drop_table.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/drop_hudi_table.rst similarity index 97% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/drop_table.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/drop_hudi_table.rst index f789d29..6f51398 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/drop_table.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/drop_hudi_table.rst @@ -2,8 +2,8 @@ .. _mrs_01_24266: -DROP TABLE -========== +DROP Hudi TABLE +=============== Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/index.rst new file mode 100644 index 0000000..1ab2da1 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/index.rst @@ -0,0 +1,26 @@ +:original_name: mrs_01_24263.html + +.. _mrs_01_24263: + +Hudi DDL +======== + +- :ref:`CREATE Hudi TABLE ` +- :ref:`CREATE Hudi TABLE AS SELECT ` +- :ref:`DROP Hudi TABLE ` +- :ref:`SHOW TABLE ` +- :ref:`ALTER RENAME TABLE ` +- :ref:`ALTER ADD COLUMNS ` +- :ref:`TRUNCATE Hudi TABLE ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + create_hudi_table + create_hudi_table_as_select + drop_hudi_table + show_table + alter_rename_table + alter_add_columns + truncate_hudi_table diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/show_table.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/show_table.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/show_table.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/show_table.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/truncate_table.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/truncate_hudi_table.rst similarity index 94% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/truncate_table.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/truncate_hudi_table.rst index cf9fdbc..5d10857 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/ddl/truncate_table.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_ddl/truncate_hudi_table.rst @@ -2,8 +2,8 @@ .. _mrs_01_24271: -TRUNCATE TABLE -============== +TRUNCATE Hudi TABLE +=================== Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/archivelog.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/archivelog.rst new file mode 100644 index 0000000..aae5b77 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/archivelog.rst @@ -0,0 +1,53 @@ +:original_name: mrs_01_24783.html + +.. _mrs_01_24783: + +ARCHIVELOG +========== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Function +-------- + +Archives instants on the Timeline based on configurations and deletes archived instants from the Timeline to reduce the operation pressure on the Timeline. + +Syntax +------ + +**RUN ARCHIVELOG ON** tableIdentifier; + +**RUN ARCHIVELOG ON** tablelocation; + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + =============== ============================== + Parameter Description + =============== ============================== + tableIdentifier Name of the Hudi table + tablelocation Storage path of the Hudi table + =============== ============================== + +Example +------- + +.. code-block:: + + run archivelog on h1; + run archivelog on "/tmp/hudi/h1"; + +Precautions +----------- + +- Only instants that are not cleaned can be archived. +- No matter whether the compaction operation is performed, at least *x* (*x* indicates the value of **hoodie.compact.inline.max.delta.commits**) instants are retained and not archived to ensure that there are enough instants to trigger the compaction schedule. + +System Response +--------------- + +You can view command execution results in the driver log or on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/clean.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/clean.rst new file mode 100644 index 0000000..3233c92 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/clean.rst @@ -0,0 +1,54 @@ +:original_name: mrs_01_24801.html + +.. _mrs_01_24801: + +CLEAN +===== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Function +-------- + +Cleans instants on the Timeline based on configurations and deletes historical version files to reduce the data storage and read/write pressure of Hudi tables. + +Syntax +------ + +**RUN CLEAN ON** tableIdentifier; + +**RUN CLEAN ON** tablelocation; + +Parameter Description +--------------------- + +.. table:: **Table 1** Parameters + + =============== ============================== + Parameter Description + =============== ============================== + tableIdentifier Name of the Hudi table + tablelocation Storage path of the Hudi table + =============== ============================== + +Example +------- + +.. code-block:: + + run clean on h1; + run clean on "/tmp/hudi/h1"; + +Precautions +----------- + +Only the table owner can perform the clean operation on a table. + +To modify the default cleaning parameters, run set commands to configure the parameters such as the number of commits to be retained. + +System Response +--------------- + +You can view command execution results in the driver log or on the client. diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/compaction.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/compaction_hudi_data.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/compaction.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/compaction_hudi_data.rst index 386f38a..2deeca5 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/compaction.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/compaction_hudi_data.rst @@ -2,8 +2,8 @@ .. _mrs_01_24277: -COMPACTION -========== +COMPACTION Hudi Data +==================== Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/delete.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/delete_hudi_data.rst similarity index 97% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/delete.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/delete_hudi_data.rst index c80428a..7b40ec5 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/delete.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/delete_hudi_data.rst @@ -2,8 +2,8 @@ .. _mrs_01_24276: -DELETE -====== +DELETE Hudi Data +================ Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/index.rst new file mode 100644 index 0000000..23f078b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/index.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_24272.html + +.. _mrs_01_24272: + +Hudi DML +======== + +- :ref:`INSERT INTO ` +- :ref:`MERGE INTO ` +- :ref:`UPDATE Hudi Data ` +- :ref:`DELETE Hudi Data ` +- :ref:`COMPACTION Hudi Data ` +- :ref:`SET/RESET Hudi Data ` +- :ref:`ARCHIVELOG ` +- :ref:`CLEAN ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + insert_into + merge_into + update_hudi_data + delete_hudi_data + compaction_hudi_data + set_reset_hudi_data + archivelog + clean diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/insert_into.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/insert_into.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/insert_into.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/insert_into.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/merge_into.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/merge_into.rst similarity index 100% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/merge_into.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/merge_into.rst diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/set_reset.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/set_reset_hudi_data.rst similarity index 99% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/set_reset.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/set_reset_hudi_data.rst index a8e20f4..861a220 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/set_reset.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/set_reset_hudi_data.rst @@ -2,8 +2,8 @@ .. _mrs_01_24278: -SET/RESET -========= +SET/RESET Hudi Data +=================== Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/update.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/update_hudi_data.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/update.rst rename to doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/update_hudi_data.rst index 90a7aaa..62e9f30 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/dml/update.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/hudi_dml/update_hudi_data.rst @@ -2,8 +2,8 @@ .. _mrs_01_24275: -UPDATE -====== +UPDATE Hudi Data +================ Function -------- diff --git a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/index.rst b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/index.rst index c03ebfd..1b33af8 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/index.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/hudi_sql_syntax_reference/index.rst @@ -6,13 +6,15 @@ Hudi SQL Syntax Reference ========================= - :ref:`Constraints ` -- :ref:`DDL ` -- :ref:`DML ` +- :ref:`Hudi DDL ` +- :ref:`Hudi DML ` +- :ref:`CALL COMMAND ` .. toctree:: :maxdepth: 1 :hidden: constraints - ddl/index - dml/index + hudi_ddl/index + hudi_dml/index + call_command/index diff --git a/doc/component-operation-guide-lts/source/using_hudi/index.rst b/doc/component-operation-guide-lts/source/using_hudi/index.rst index 08a712a..2e27446 100644 --- a/doc/component-operation-guide-lts/source/using_hudi/index.rst +++ b/doc/component-operation-guide-lts/source/using_hudi/index.rst @@ -9,6 +9,7 @@ Using Hudi - :ref:`Basic Operations ` - :ref:`Hudi Performance Tuning ` - :ref:`Hudi SQL Syntax Reference ` +- :ref:`Hudi Schema Evolution ` - :ref:`Common Issues About Hudi ` .. toctree:: @@ -19,4 +20,5 @@ Using Hudi basic_operations/index hudi_performance_tuning/index hudi_sql_syntax_reference/index + hudi_schema_evolution/index common_issues_about_hudi/index diff --git a/doc/component-operation-guide-lts/source/using_iotdb/configuring_iotdb_parameters.rst b/doc/component-operation-guide-lts/source/using_iotdb/configuring_iotdb_parameters.rst new file mode 100644 index 0000000..b8b65a8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/configuring_iotdb_parameters.rst @@ -0,0 +1,54 @@ +:original_name: mrs_01_24159.html + +.. _mrs_01_24159: + +Configuring IoTDB Parameters +============================ + +Scenario +-------- + +IoTDB uses the multi-replica deployment architecture to implement cluster high availability. Each region (DataRegion and SchemaRegion) has three replicas by default. You can also configure more replicas. If a node is faulty, replicas on other nodes of the region replica can take over services from the faulty node, ensuring service continuity and improving cluster stability. + +Procedure +--------- + +#. Log in to Manager, choose **Cluster** > **Services** > **IoTDB** > **Configurations** > **All Configurations** to go to the IoTDB configuration page, and modify the parameters. + +#. Modify the ConfigNode and IoTDBServer configurations. + + - Modifying the ConfigNode configuration: + + - Click **ConfigNode(Role)**. You can modify the existing configuration according to :ref:`Table 1 `. + - Choose **ConfigNode(Role)** > **Customization**. You can customize ConfigNode configurations in the **confignode.customized.configs** parameter according to :ref:`Table 1 `. + + - Modifying the IoTDBServer configuration: + + - Click **IoTDBServer(Role)**. You can modify the existing configuration according to :ref:`Table 1 `. + - Choose **IoTDBServer(Role)** > **Customization**. You can customize IoTDBServer configurations in the **engine.customized.configs** parameter according to :ref:`Table 1 `. + + .. _mrs_01_24159__table3406106165317: + + .. table:: **Table 1** Common parameters + + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Role | Example Value | Description | + +================================+=============+===============+================================================================================================================================================+ + | read_consistency_level | ConfigNode | strong | Read consistency level of the custom parameter **confignode.customized.configs**. Currently, the value can only be **strong** or **weak**. | + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | flush_proportion | IoTDBServer | 0.4 | Write memory ratio for invoking disk flushing. If the write load is too high (for example, batch processing = 1000), you can reduce the value. | + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | replica_affinity_policy | IoTDBServer | random | When the value of **read_consistency_level** is **weak**, the strategy of the region replica node is selected for the query task. | + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | coordinator_read_executor_size | IoTDBServer | 20 | Number of read thread cores of the IoTDBServer Coordinator of the custom parameter **engine.customized.configs** | + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | rpc_thrift_compression_enable | ALL | false | Whether to compress data during transmission. Data is not compressed by default. | + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | root.log.level | ALL | INFO | IoTDB log level. The modification of this parameter takes effect without restarting related instances. | + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + | SSL_ENABLE | ALL | true | Whether to encrypt the channel between the client and server using SSL | + +--------------------------------+-------------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. Click **Save**. + +#. Click the **Instance** tab. Select the corresponding instance and choose **More** > **Restart Instance** to make the configuration take effect. diff --git a/doc/component-operation-guide-lts/source/using_iotdb/data_types_and_encodings_supported_by_iotdb.rst b/doc/component-operation-guide-lts/source/using_iotdb/data_types_and_encodings_supported_by_iotdb.rst new file mode 100644 index 0000000..a39f395 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/data_types_and_encodings_supported_by_iotdb.rst @@ -0,0 +1,23 @@ +:original_name: mrs_01_24764.html + +.. _mrs_01_24764: + +Data Types and Encodings Supported by IoTDB +=========================================== + +IoTDB supports the following data types and encodings. For details, see :ref:`Table 1 `. + +.. _mrs_01_24764__table4767452502: + +.. table:: **Table 1** Data types and encodings supported by IoTDB + + ======= ============ =========================================== + Type Description Supported Encoding + ======= ============ =========================================== + BOOLEAN Boolean PLAIN, RLE + INT32 Integer PLAIN, RLE, TS_2DIFF, GORILLA, FREQ, ZIGZAG + INT64 Long integer PLAIN, RLE, TS_2DIFF, GORILLA, FREQ, ZIGZAG + FLOAT Float PLAIN, RLE, TS_2DIFF, GORILLA, FREQ + DOUBLE Double PLAIN, RLE, TS_2DIFF, GORILLA, FREQ + TEXT String PLAIN, DICTIONARY + ======= ============ =========================================== diff --git a/doc/component-operation-guide-lts/source/using_iotdb/index.rst b/doc/component-operation-guide-lts/source/using_iotdb/index.rst new file mode 100644 index 0000000..cd0cdf4 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/index.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_24144.html + +.. _mrs_01_24144: + +Using IoTDB +=========== + +- :ref:`Using IoTDB from Scratch ` +- :ref:`Using the IoTDB Client ` +- :ref:`Configuring IoTDB Parameters ` +- :ref:`Data Types and Encodings Supported by IoTDB ` +- :ref:`IoTDB Permission Management ` +- :ref:`IoTDB Log Overview ` +- :ref:`UDFs ` +- :ref:`IoTDB Data Import and Export ` +- :ref:`Planning IoTDB Capacity ` +- :ref:`IoTDB Performance Tuning ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + using_iotdb_from_scratch + using_the_iotdb_client + configuring_iotdb_parameters + data_types_and_encodings_supported_by_iotdb + iotdb_permission_management/index + iotdb_log_overview + udfs/index + iotdb_data_import_and_export/index + planning_iotdb_capacity + iotdb_performance_tuning diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/exporting_iotdb_data.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/exporting_iotdb_data.rst new file mode 100644 index 0000000..24a8509 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/exporting_iotdb_data.rst @@ -0,0 +1,109 @@ +:original_name: mrs_01_24511.html + +.. _mrs_01_24511: + +Exporting IoTDB Data +==================== + +Scenario +-------- + +This section describes how to use **export-csv.sh** to export data from IoTDB to a CSV file. + +.. important:: + + Exporting data to CSV files may cause injection risks. Exercise caution when performing this operation. + +Prerequisites +------------- + +- The client has been installed. For details, see . For example, the installation directory is **/opt/client**. The client directory in the following operations is only an example. Change it based on the actual installation directory onsite. +- Service component users have been created by the MRS cluster administrator by referring to . In security mode, machine-machine users need to download the keytab file. For details, see . A human-machine user must change the password upon the first login. +- By default, SSL is enabled on the server. You have generated the **truststore.jks** certificate by following the instructions provided in :ref:`Using the IoTDB Client ` and copied it to the **Client installation directory/IoTDB/iotdb/conf** directory. + +Procedure +--------- + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to switch to the client installation directory: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. (Optional) Perform this step to authenticate the current user if Kerberos authentication is enabled for the cluster. If Kerberos authentication is not enabled, skip this step. + + **kinit** *Component service user* + +#. Run the following command to switch to the directory where the **export-csv.sh** script is stored: + + **cd /opt/client/IoTDB/iotdb/tools** + +#. Before executing the export script, input some queries or specify some SQL files. If a SQL file contains multiple SQL statements, the SQL statements must be separated by newline characters. For example: + + .. code-block:: + + select * from root.fit.d1 + select * from root.sg1.d1 + +#. .. _mrs_01_24511__li87641654151912: + + Execute **export-csv.sh** to export data. + + **./export-csv.sh -h** *Service IP address of the IoTDBServer instance* **-p** *IoTDBServer RPC port* **-td [-tf -s ]** + + Example: + + .. code-block:: + + ./export-csv.sh -h x.x.x.x -p 22260 -td ./ + # Or + ./export-csv.sh -h x.x.x.x -p 22260 -td ./ -tf yyyy-MM-dd\ HH:mm:ss + # Or + ./export-csv.sh -h x.x.x.x -p 22260 -td ./ -s sql.txt + # Or + ./export-csv.sh -h x.x.x.x -p 22260 -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt + + .. note:: + + - You can log in to FusionInsight Manager, choose **Cluster** > **Services** > **IoTDB**, and click the **Instance** tab to view the service IP address of the IoTDBServer instance node. + - The default RPC port number is **22260**. To obtain the port number, choose **Cluster** > **Services** > **IoTDB**, click the **Configurations** tab and then **All Configurations**, and search for **rpc_port**. + - If the data exported contains a comma (,), it will be enclosed in double quotation marks (""). For example, **hello,world** is exported as **"hello,world"**. + - If the data exported contains double quotation marks (""), it will be enclosed in double quotation marks ("") and the original double quotation masks are replaced with **\\"**. For example, **"world"** is exported as **"\\"world\\""** + +#. When you run the command in :ref:`7 `, a message is displayed indicating that CSV injection may occur. Enter **yes** to continue the command. If you enter another value, the data export operation will be canceled. + + |image1| + + For example, after you enter **yes**, enter the service username and password as prompted. If information in the following figure is displayed, the data is exported: + + |image2| + + .. note:: + + - To prevent security risks, you are advised to export CSV files in interactive mode. + + - You can also export CSV files by running the **./export-csv.sh -h** *Service IP address of the IoTDBServer instance* **-p** *IoTDBServer RPC port* **-u** *Service username* **-pw** *Service user password* **-td [-tf -s ]** command. + + Example: + + .. code-block:: + + ./export-csv.sh -h x.x.x.x -p 22260 -u test -pw Password -td ./ + # Or + ./export-csv.sh -h x.x.x.x -p 22260 -u test -pw Password -td ./ + # Or + ./export-csv.sh -h x.x.x.x -p 22260 -u test -pw Password -td ./ -s sql.txt + # Or + ./export-csv.sh -h x.x.x.x -p 22260 -u test -pw Password -td ./ -tf yyyy-MM-dd\ HH:mm:ss -s sql.txt + + If information in the following figure is displayed, the CSV file is exported: + + |image3| + +.. |image1| image:: /_static/images/en-us_image_0000001583391913.png +.. |image2| image:: /_static/images/en-us_image_0000001582952137.png +.. |image3| image:: /_static/images/en-us_image_0000001532472784.png diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/importing_iotdb_data.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/importing_iotdb_data.rst new file mode 100644 index 0000000..d847d27 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/importing_iotdb_data.rst @@ -0,0 +1,155 @@ +:original_name: mrs_01_24510.html + +.. _mrs_01_24510: + +Importing IoTDB Data +==================== + +Scenario +-------- + +This section describes how to use **import-csv.sh** to import data in CSV format to IoTDB. + +Prerequisites +------------- + +- The client has been installed. For details, see . For example, the installation directory is **/opt/client**. The client directory in the following operations is only an example. Change it based on the actual installation directory onsite. +- Service component users have been created by the MRS cluster administrator by referring to . In security mode, machine-machine users need to download the keytab file. For details, see . A human-machine user must change the password upon the first login. +- By default, SSL is enabled on the server. You have generated the **truststore.jks** certificate by following the instructions provided in :ref:`Using the IoTDB Client ` and copied it to the **Client installation directory/IoTDB/iotdb/conf** directory. + +Procedure +--------- + +#. .. _mrs_01_24510__li4604150164718: + + Prepare a CSV file named **example-filename.csv** on the local PC with the following content: + + .. code-block:: + + Time,root.fit.d1.s1,root.fit.d1.s2,root.fit.d2.s1,root.fit.d2.s3,root.fit.p.s1 + 1,100,hello,200,300,400 + 2,500,world,600,700,800 + 3,900,"hello, \"world\"",1000,1100,1200 + + .. important:: + + Before importing data, pay attention to the following: + + - The data to be imported cannot contain spaces. Otherwise, importing that line of data fails and is skipped, but subsequent import operations are not affected. + - Data that contains commas (,) must be enclosed in single or double quotation marks. For example, **hello,world** is changed to **"hello,world"**. + - Quotation marks ("") in the data must be replaced with the escape character **\\"**. For example, **"world"** is changed to **\\"world\\"**. + - Single quotation marks (') in the data must be replaced with the escape character **\\'**. For example, **'world'** will be changed to **\\'world\\'**. + - If the data to be imported is time, the format is **yyyy-MM-dd'T'HH:mm:ss**, **yyy-MM-dd HH:mm:ss** or **yyyy-MM-dd'T'HH:mm:ss.SSSZ**, for example, **2022-02-28T11:07:00**, **2022-02-28T11:07:00**, or **2022-02-28T11:07:00.000Z**. + +#. Use WinSCP to import the CSV file to the directory of the node where the client is installed, for example, **/opt/client/IoTDB/iotdb/sbin**. + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to switch to the client installation directory: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. .. _mrs_01_24510__li16611562035: + + (Optional) Perform this step to authenticate the current user if Kerberos authentication is enabled for the cluster. If Kerberos authentication is not enabled, skip this step. + + **kinit** *Component service user* + +#. Run the following command to switch to the directory where the IoTDB client running script is stored: + + **cd /opt/client/IoTDB/iotdb/sbin** + +#. .. _mrs_01_24510__li05599513462: + + Run the following command to log in to the client: + + **./start-cli.sh** **-h** *Service IP address of the IoTDBServer instance node* **-p** *IoTDBServer RPC port* + + .. note:: + + - You can log in to FusionInsight Manager and choose **Cluster** > **Services** > **IoTDB** > **Instance** to view the service IP address of the IoTDBServer instance node. + - The default RPC port number is **22260**. To obtain the port number, choose **Cluster** > **Services** > **IoTDB**, choose **Configurations** > **All Configurations**, and search for **rpc_port**. + + After you run this command, specify the service username as required. + + - To specify the service username, enter **yes** and enter the service username and password as prompted. + + |image1| + + - If you will not specify the service username, enter **no**. In this case, you will perform subsequent operations as the user in :ref:`6 `. + + |image2| + + - If you enter other information, you will log out. + + |image3| + +#. (Optional) Create metadata. + + IoTDB has the capability of type inference, so it is not necessary to create metadata before data import. However, it is recommended that you create metadata before using the CSV tool to import data, because this avoids unnecessary type conversion errors. The commands are as follows: + + .. code-block:: + + SET STORAGE GROUP TO root.fit.d1; + SET STORAGE GROUP TO root.fit.d2; + SET STORAGE GROUP TO root.fit.p; + CREATE TIMESERIES root.fit.d1.s1 WITH DATATYPE=INT32,ENCODING=RLE; + CREATE TIMESERIES root.fit.d1.s2 WITH DATATYPE=TEXT,ENCODING=PLAIN; + CREATE TIMESERIES root.fit.d2.s1 WITH DATATYPE=INT32,ENCODING=RLE; + CREATE TIMESERIES root.fit.d2.s3 WITH DATATYPE=INT32,ENCODING=RLE; + CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE; + +#. Run the following command to exit the client: + + **quit;** + +#. Run the following command to switch to the directory where the **import-csv.sh** script is stored: + + **cd /opt/client/IoTDB/iotdb/tools** + +#. Run the following command to run **import-csv.sh** and import the **example-filename.csv** file: + + **./import-csv.sh -h** *Service IP address of the IoTDBServer instance* **-p**\ *IoTDBServer RPC port* **-f** *example-filename.csv* + + Enter the service username and password in interactive mode as prompted. If information in the following figure is displayed, the CSV file is imported: + + |image4| + +#. Verify data consistency. + + a. Run the following command to switch to the directory where the IoTDB client running script is stored: + + **cd /opt/client/IoTDB/iotdb/sbin** + + b. Log in to the IoTDB client by referring to :ref:`8 `. Run SQL statements to query data and compare the data with that in the :ref:`1 ` file. + + c. Check whether the imported data is consistent with the data in the :ref:`1 `. If they are, the import is successful. + + Run the following command to check the imported data: + + **SELECT \* FROM root.fit.**;** + + |image5| + + .. note:: + + - To prevent security risks, you are advised to import CSV files in interactive mode. + + - You can also import CSV files by running the **./import-csv.sh -h** *Service IP address of the IoTDBServer instance* **-p** *IoTDBServer RPC port* **-u** *Service username* **-pw** *Service user password*\ **-f** *example-filename.csv* command. + + If information in the following figure is displayed, the CSV file is imported. + + |image6| + + - If nanosecond (ns) time precision is enabled for the IoTDB on the server, the **-tp ns** parameter needs to be added when the client imports data with the nanosecond timestamp. To check whether nanosecond time precision is enabled for a cluster, log in to FusionInsight Manager, choose **Cluster** > **Configurations** > **All Non-default Values**, and search for **timestamp_precision**. + +.. |image1| image:: /_static/images/en-us_image_0000001532951928.png +.. |image2| image:: /_static/images/en-us_image_0000001583272201.png +.. |image3| image:: /_static/images/en-us_image_0000001582952133.png +.. |image4| image:: /_static/images/en-us_image_0000001532792008.png +.. |image5| image:: /_static/images/en-us_image_0000001532951944.png +.. |image6| image:: /_static/images/en-us_image_0000001583151917.png diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/index.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/index.rst new file mode 100644 index 0000000..279576f --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_data_import_and_export/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24509.html + +.. _mrs_01_24509: + +IoTDB Data Import and Export +============================ + +- :ref:`Importing IoTDB Data ` +- :ref:`Exporting IoTDB Data ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + importing_iotdb_data + exporting_iotdb_data diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_log_overview.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_log_overview.rst new file mode 100644 index 0000000..fdeae16 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_log_overview.rst @@ -0,0 +1,98 @@ +:original_name: mrs_01_24161.html + +.. _mrs_01_24161: + +IoTDB Log Overview +================== + +**Description** + +Log Description +--------------- + +**Log paths**: The default paths of IoTDB logs are **/var/log/Bigdata/iotdb/iotdbserver** (for storing run logs) and **/var/log/Bigdata/audit/iotdb/iotdbserver** (for storing audit logs). + +**Log archive rule**: The automatic compression and archiving function of IoTDB is enabled. By default, when the size of a log file exceeds 20 MB (which is adjustable), the log file is automatically compressed. The naming rule of the compressed log file is as follows: <*Original log file name*>-<*yyyymmdd*>.\ *ID*.\ **log.gz**. A maximum of 10 latest compressed files are reserved. The number of compressed files and compression threshold can be configured. + +.. table:: **Table 1** IoTDB log list + + +-----------+----------------------------------------------------+----------------------------------------------------+ + | Type | Name | Description | + +===========+====================================================+====================================================+ + | Run logs | log-all.log | IoTDB service log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | log-error.log | IoTDB service error log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | log-measure.log | IoTDB service monitoring log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | log-query-debug.log | IoTDB query debug log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | log-query-frequency.log | IoTDB query frequency log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | log-sync.log | IoTDB synchronization log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | log-slow-sql.log | IoTDB slow SQL log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | server.out | Log that records IoTDB service startup exceptions. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | postinstall.log | IoTDB process startup log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | prestart.log | Log that records IoTDB process startup exceptions. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | service-healthcheck.log | IoTDB database initialization log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | start.log | IoTDBServer service startup log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | stop.log | IoTDBServer service stop log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | | IoTDBServer-omm---gc.log.0.current | IoTDBServer service GC log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + | Audit log | log_audit.log | IoTDB audit log. | + +-----------+----------------------------------------------------+----------------------------------------------------+ + +**Log levels** + +:ref:`Table 2 ` describes the log levels supported by IoTDB. + +Levels of logs are ERROR, WARN, INFO, and DEBUG from the highest to the lowest priority. Run logs of equal or higher levels are recorded. The higher the specified log level, the fewer the logs recorded. + +.. _mrs_01_24161__table135218393114: + +.. table:: **Table 2** Log levels + + +-------+------------------------------------------------------------------------------------------+ + | Level | Description | + +=======+==========================================================================================+ + | ERROR | Logs of this level record error information about system running. | + +-------+------------------------------------------------------------------------------------------+ + | WARN | Logs of this level record exception information about the current event processing. | + +-------+------------------------------------------------------------------------------------------+ + | INFO | Logs of this level record normal running status information about the system and events. | + +-------+------------------------------------------------------------------------------------------+ + | DEBUG | Logs of this level record the system information and system debugging information. | + +-------+------------------------------------------------------------------------------------------+ + +To modify log levels, perform the following operations: + +#. Go to the **All Configurations** page of the IoTDB service by referring to :ref:`Modifying Cluster Service Configuration Parameters `. +#. In the navigation tree on the left, select **Log** corresponding to the role to be modified. +#. Select a desired log level and save the configuration. + +.. note:: + + The IoTDB log level takes effect 60 seconds after being configured. You do not need to restart the service. + +Log Formats +----------- + +The following table lists the IoTDB log formats: + +.. table:: **Table 3** Log formats + + +-----------+-----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Format | Example | + +===========+===================================================================================================================================+============================================================================================================================================================================================================================+ + | Run log | <*yyyy-MM-dd HH:mm:ss,SSS*> \| *Log level* \| [*Thread name*] \| *Log information* \| *Log printing class* (*File*:*Line number*) | 2021-06-08 10:08:41,221 \| ERROR \| [main] \| Client failed to open SaslClientTransport to interact with a server during session initiation: \| org.apache.iotdb.rpc.sasl.TFastSaslTransport (TFastSaslTransport.java:257) | + +-----------+-----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | <*yyyy-MM-dd HH:mm:ss,SSS*> \| *Log level* \| [*Thread name*] \| *Log information* \| *Log printing class* (*File*:*Line number*) | 2021-06-08 11:03:49,365 \| INFO \| [ClusterClient-1] \| Session-1 is closing \| IoTDB_AUDIT_LOGGER (TSServiceImpl.java:326) | + +-----------+-----------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_performance_tuning.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_performance_tuning.rst new file mode 100644 index 0000000..74c2239 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_performance_tuning.rst @@ -0,0 +1,41 @@ +:original_name: mrs_01_24162.html + +.. _mrs_01_24162: + +IoTDB Performance Tuning +======================== + +Scenario +-------- + +You can increase IoTDB memory to improve IoTDB performance because read and write operations are performed in HBase memory. + +Configuration +------------- + +Log in to Manager, choose **Cluster** > **Services** > **IoTDB**, and click the **Configurations** tab and then **All Configurations**. Search the parameters and modify their values. + +For details, see :ref:`Table 1 `. + +.. _mrs_01_24162__table3095993: + +.. table:: **Table 1** Description + + +------------------------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Default Value | Optimization Suggestion | + +==========================================+=================================================================================+==========================================================================================================================================================================================================+===========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+ + | SSL_ENABLE | Whether to encrypt the channel between the client and server using SSL | true | **true** indicates that SSL encryption is enabled, and **false** indicates that SSL encryption is disabled. Data encryption and decryption during transmission have a great impact on performance. The test result shows that the performance gap is 200%. Therefore, you are advised to disable SSL encryption during the performance test. The parameter for the ConfigNode and IoTDBServer roles must be both modified. | + +------------------------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | iotdb_server_kerberos_qop | Encrypted data transmission of each IoTDBServer instance in the cluster | auth-int | **auth-int** indicates that data transmission is encrypted, and **auth** indicates that data is authenticated only without being encrypted. Therefore, you are advised to set this parameter to **auth**. The parameter for the ConfigNode and IoTDBServer roles must be both modified. | + +------------------------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | GC_OPTS | Memory and garbage collection (GC) configuration parameters used by IoTDBServer | **-Xms2G -Xmx2G -XX:MaxDirectMemorySize=512M -XX:+UseG1GC -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -Djdk.tls.ephemeralDHKeySize=3072** (set this parameter as required.) | - **-Xms2G -Xmx2G** indicates the IoTDB JVM heap memory. Set this parameter to a large value in the scenarios with a large number of time series and concurrent writes. You can adjust the parameter value based on the GC duration threshold alarm or heap memory threshold alarm. If an alarm is generated, increase the parameter value by 0.5 times. If this alarm is frequently generated, double the value. When you adjust **HeapSize**, set **Xms** and **Xmx** to the same value to avoid performance deterioration during dynamic heap size adjustment by JVM. | + | | | | - **-XX:MaxDirectMemorySize** indicates the IoTDB JVM direct memory. The recommended value is 1/4 of the heap memory. This parameter mainly affects the write performance. If the write performance deteriorates significantly, you can increase the parameter value by 0.5 times. Note that the sum of the heap memory and direct memory must be less than or equal to 80% of the available system memory. Otherwise, IoTDB fails to be started. | + | | | | - Query scenario optimization example: If the query range is large, for example, a single time series contains more than 10,000 data points, the quotient of 20% of the JVM memory allocated divided by the number of time series is recommended to be bigger than 160 KB for better performance of the storage engine in the default configuration. | + +------------------------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | write_read_schema_free_memory_proportion | Memory allocation ratio: write, read, schema, and free | 4:3:1:2 | You can adjust the memory based on the load. | + | | | | | + | | | | - A larger write memory means the better write throughput and single query performance. | + | | | | - A larger read memory means more supported concurrent queries. | + | | | | - A larger metadata memory means a lower probability of error message "IoTDB system load is too large". | + | | | | - A larger free memory means a lower probability of memory exhaustion. | + +------------------------------------------+---------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/creating_an_iotdb_role.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/creating_an_iotdb_role.rst new file mode 100644 index 0000000..69e2404 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/creating_an_iotdb_role.rst @@ -0,0 +1,66 @@ +:original_name: mrs_01_24142.html + +.. _mrs_01_24142: + +Creating an IoTDB Role +====================== + +Create and configure an IoTDB role on Manager as an MRS cluster administrator. An IoTDB role can be configured with IoTDB administrator permissions or a common user's permissions to read, write, or delete data. + +Prerequisites +------------- + +- The MRS cluster administrator has understood service requirements. +- You have installed the IoTDB client. + +Procedure +--------- + +#. On Manager, choose **System** > **Permission** > **Role**. + +#. On the displayed page, click **Create Role** and specify **Role Name** and **Description**. + +#. Configure **Configure Resource Permission**. For details, see :ref:`Table 1 `. + + IoTDB permissions: + + - **Common User Privileges**: includes data operation permissions. Permissions on the IoTDB **root** directory, storage group, and any node path from a storage group to a time series can be granted selectively. The minimum permissions are read, write, modify, and delete permissions on the time series. + - **IoTDB Admin Privilege**: includes all permissions in :ref:`Table 1 `. + + .. _mrs_01_24142__t873a9c44357b40cd98cb948ce9438d93: + + .. table:: **Table 1** Configuring a role + + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario | Role Authorization | + +======================================================================+===========================================================================================================================================================================================================================================+ + | Configuring the IoTDB administrator permission | In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **IoTDB** and select **IoTDB Admin Privilege**. | + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuring the permission for users to create storage groups | a. In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **IoTDB** > **Common User Privileges**. | + | | b. Select **Set StorageGroup** for the **root** directory. | + | | c. A user with this permission can create storage groups in the **root** directory. | + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuring the permission for users to create time series | a. In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **IoTDB** > **Common User Privileges**. | + | | b. Select **Create** for the **root** directory. You will have the permission to create time series in all recursive paths in the **root** directory. | + | | c. Click **root** to go to the storage group page and select the **Create** permission for the corresponding storage group. You will have the permission to create time series in all recursive paths in the storage group directory. | + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuring the permission for users to modify time series | a. In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **IoTDB** > **Common User Privileges**. | + | | b. Select **Alter** for the **root** directory. You will have the permission to modify time series in all recursive paths in the **root** directory. | + | | c. Click **root** to go to the storage group page and select the **Alter** permission for the corresponding storage group. You will have the permission to modify time series in all recursive paths of the storage group. | + | | d. Click the specified storage group to go the time series page and select the **Alter** permission for the corresponding time series. You will have the permission to modify the time series. | + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuring the permission for users to insert data into time series | a. In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **IoTDB** > **Common User Privileges**. | + | | b. Select **Insert** for the **root** directory. You will have the permission to insert data into the time series in all recursive paths in the **root** directory. | + | | c. Click **root** to go to the storage group page and select the **Insert** permission for the corresponding storage group. You will have the permission to insert data into the time series in all recursive paths of the storage group. | + | | d. Click the specified storage group to go the time series page and select the **Insert** permission for the corresponding time series. You will have the permission to insert data into the time series. | + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuring the permission for users to read data from time series | a. In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **IoTDB** > **Common User Privileges**. | + | | b. Select **Read** for the **root** directory. You will have the permission to read data from the time series in all recursive paths in the **root** directory. | + | | c. Click **root** to go to the storage group page and select the **Read** permission for the corresponding storage group. You will have the permission to read data from the time series in all recursive paths of the storage group. | + | | d. Click the specified storage group to go the time series page and select the **Read** permission for the corresponding time series. You will have the permission to read data from the time series. | + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Configuring the permission for users to delete time series | a. In the **Configure Resource Permission** table, choose *Name of the desired cluster* > **IoTDB** > **Common User Privileges**. | + | | b. Select **Delete** for the **root** directory. You will have the permission to delete data or time series in all recursive paths in the **root** directory. | + | | c. Click **root** to go to the storage group page and select the **Delete** permission for the corresponding storage group. You will have the permission to delete data or time series in all recursive paths of the storage group. | + | | d. Click the specified storage group to go the time series page and select the **Delete** permission for the corresponding time series. You will have the permission to delete data from the time series or delete the time series. | + +----------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/index.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/index.rst new file mode 100644 index 0000000..d5162f6 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/index.rst @@ -0,0 +1,16 @@ +:original_name: mrs_01_24140.html + +.. _mrs_01_24140: + +IoTDB Permission Management +=========================== + +- :ref:`IoTDB Permissions ` +- :ref:`Creating an IoTDB Role ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + iotdb_permissions + creating_an_iotdb_role diff --git a/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/iotdb_permissions.rst b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/iotdb_permissions.rst new file mode 100644 index 0000000..ced8599 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/iotdb_permission_management/iotdb_permissions.rst @@ -0,0 +1,101 @@ +:original_name: mrs_01_24141.html + +.. _mrs_01_24141: + +IoTDB Permissions +================= + +MRS supports users, user groups, and roles. Permissions must be assigned to roles and then roles are bound to users or user groups. Users can obtain permissions only by binding a role or joining a group that is bound with a role. + +.. note:: + + In security mode, you need to manage IoTDB permissions and add the created user to the **iotdbgroup** user group. In normal mode, IoTDB permission management is not required. + +IoTDB Permission List +--------------------- + +The **Name** column in :ref:`Table 1 ` lists the permissions supported by open-source IoTDB. If an MRS user needs to use corresponding permissions to perform operations, grant the permissions to the user on Manager by referring to the **Required Permission** column. For details, see :ref:`Creating an IoTDB Role `. + +.. _mrs_01_24141__table1392557124016: + +.. table:: **Table 1** IoTDB permissions + + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | Name | Description | Required Permission | Example | + +===========================+=========================================================================================================================================+=======================+===============================================================================================================================================+ + | SET_STORAGE_GROUP | Used for creating a storage group, including setting permissions for the storage group and setting or canceling its time to live (TTL). | Set StorageGroup | Example 1: set storage group to root.ln; | + | | | | | + | | | | Example 2: set ttl to root.ln 3600000; | + | | | | | + | | | | Example 3: unset ttl to root.ln; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | CREATE_TIMESERIES | Used for creating a time series. | Create | Example 1: Creating a time series | + | | | | | + | | | | create timeseries root.ln.wf02.status with datatype=BOOLEAN,encoding=PLAIN; | + | | | | | + | | | | Example 2: Creating an aligned time series | + | | | | | + | | | | create aligned timeseries root.ln.device1(latitude FLOAT encoding=PLAIN compressor=SNAPPY, longitude FLOAT encoding=PLAIN compressor=SNAPPY); | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | INSERT_TIMESERIES | Used for inserting data. | Write | Example 1: insert into root.ln.wf02(timestamp,status) values(1,true); | + | | | | | + | | | | Example 2: insert into root.sg1.d1(time, s1, s2) aligned values(1, 1, 1); | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | ALTER_TIMESERIES | Used for modifying a time series, and adding attributes and tags. | Alter | Example 1: alter timeseries root.turbine.d1.s1 ADD TAGS tag3=v3, tag4=v4; | + | | | | | + | | | | Example 2: ALTER timeseries root.turbine.d1.s1 UPSERT ALIAS=newAlias TAGS(tag2=newV2, tag3=v3) ATTRIBUTES(attr3=v3, attr4=v4); | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | READ_TIMESERIES | Used for querying data. | Read | Example 1: show storage group; | + | | | | | + | | | | Example 2: show child paths root.ln, show child nodes root.ln; | + | | | | | + | | | | Example 3: show devices; | + | | | | | + | | | | Example 4: show timeseries root.**; | + | | | | | + | | | | Example 5: show all ttl; | + | | | | | + | | | | Example 6: Querying data | + | | | | | + | | | | select \* from root.ln.**; | + | | | | | + | | | | Example 7: Querying performance tracing | + | | | | | + | | | | tracing select \* from root.**; | + | | | | | + | | | | Example 8: Querying the UDF | + | | | | | + | | | | select example(``*``) from root.sg.d1; | + | | | | | + | | | | Example 9: Querying statistics | + | | | | | + | | | | count devices; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | DELETE_TIMESERIES | Used for deleting data or time series. | Delete | Example 1: Deleting a time series | + | | | | | + | | | | delete timeseries root.ln.wf01.wt01.status; | + | | | | | + | | | | Example 2: Deleting data | + | | | | | + | | | | delete from root.ln.wf02.wt02.status where time < 10; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | DELETE_STORAGE_GROUP | Used for deleting a storage group. | IoTDB Admin Privilege | Example: delete storage group root.ln; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | CREATE_FUNCTION | Used for registering a UDF. | IoTDB Admin Privilege | Example: create function example AS 'org.apache.iotdb.udf.UDTFExample'; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | DROP_FUNCTION | Used for deregistering a UDF. | IoTDB Admin Privilege | Example: drop function example; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | UPDATE_TEMPLATE | Used for creating, deleting, and modifying metadata templates. | IoTDB Admin Privilege | Example 1: create schema template t1(s1 int32); | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | READ_TEMPLATE | Used for viewing all metadata templates and metadata template content. | IoTDB Admin Privilege | Example 1: show schema templates; | + | | | | | + | | | | Example 2: show nodes in template t1; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | APPLY_TEMPLATE | Used for attaching, detaching, and activating a metadata template. | IoTDB Admin Privilege | Example 1: set schema template t1 to root.sg.d; | + | | | | | + | | | | Example 2: create timeseries of schema template on root.sg.d; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ + | READ_TEMPLATE_APPLICATION | Used for viewing the path for attaching or activating the metadata template. | IoTDB Admin Privilege | Example 1: show paths set schema template t1; | + | | | | | + | | | | Example 2: show paths using schema template t1; | + +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+ diff --git a/doc/component-operation-guide-lts/source/using_iotdb/planning_iotdb_capacity.rst b/doc/component-operation-guide-lts/source/using_iotdb/planning_iotdb_capacity.rst new file mode 100644 index 0000000..5e8312a --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/planning_iotdb_capacity.rst @@ -0,0 +1,31 @@ +:original_name: mrs_01_24765.html + +.. _mrs_01_24765: + +Planning IoTDB Capacity +======================= + +IoTDB has the multi-replica mechanism. By default, both schema regions and data regions have three replicas. The ConfigNode stores the mapping between regions and the IoTDBServer. The IoTDBServer stores region data and uses the file system of the OS to manage metadata and data files. + +Capacity Specifications +----------------------- + +- ConfigNode capacity specifications + + When a new storage group is created, IoTDB allocates 10,000 slots to it by default. When data is written, IoTDB allocates or creates a data region and mounts it to a slot based on the device name and time value. Therefore, the memory usage of a ConfigNode is related to the number of storage groups and the continuous write time of the storage groups. + + ========================== ================== + Objects of Slot Allocation Object Size (Byte) + ========================== ================== + TTimePartitionSlot 4 + TSeriesPartitionSlot 8 + TConsensusGroupId 4 + ========================== ================== + + According to the preceding table, creating one storage group that keeps running for 10 years requires about 0.68 GB of memory on a ConfigNode. + + 10,000 (slots) x 10 (years) x 365 (partitions) x (TTimePartitionSlot size + TSeriesPartitionSlot size + TConsensusGroupId size) = 0.68 GB + +- IoTDBServer capacity specifications + + Data in IoTDB is allocated to IoTDBServers by region. By default, a region stores data as three replicas, and therefore three files are stored in the IoTDBServer file system. The upper limit of the IoTDBServer capacity is the maximum number of files that can be stored in the OS. For Linux, the upper limit is the number of inodes. diff --git a/doc/component-operation-guide-lts/source/using_iotdb/udfs/index.rst b/doc/component-operation-guide-lts/source/using_iotdb/udfs/index.rst new file mode 100644 index 0000000..77fd70b --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/udfs/index.rst @@ -0,0 +1,14 @@ +:original_name: mrs_01_24512.html + +.. _mrs_01_24512: + +UDFs +==== + +- :ref:`UDF Overview ` + +.. toctree:: + :maxdepth: 1 + :hidden: + + udf_overview diff --git a/doc/component-operation-guide-lts/source/using_iotdb/udfs/udf_overview.rst b/doc/component-operation-guide-lts/source/using_iotdb/udfs/udf_overview.rst new file mode 100644 index 0000000..9a3d2a8 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/udfs/udf_overview.rst @@ -0,0 +1,338 @@ +:original_name: mrs_01_24513.html + +.. _mrs_01_24513: + +UDF Overview +============ + +IoTDB provides multiple built-in functions and user-defined functions (UDFs) to meet users' computing requirements. + +UDF Types +--------- + +:ref:`Table 1 ` lists the UDF types supported by IoTDB. + +.. _mrs_01_24513__table869011383477: + +.. table:: **Table 1** UDF types + + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ + | Type | Description | + +====================================================+=========================================================================================================================================+ + | User-defined timeseries generating function (UDTF) | This type of function can take multiple time series as input and generate one time series, which can contain any number of data points. | + +----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------+ + +UDTF +---- + +To write a UDTF, you need to inherit the **org.apache.iotdb.db.query.udf.api.UDTF** class and implement at least the **beforeStart** method and one **transform** method. + +:ref:`Table 2 ` describes all interfaces that can be implemented by users. + +.. _mrs_01_24513__table13622155265515: + +.. table:: **Table 2** Interface description + + +------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+ + | Interface Definition | Description | Mandatory | + +================================================================================================+=============================================================================================================================================================================================================================================================================================================================================================================================================================================================+=========================================================================================+ + | void validate(UDFParameterValidator validator) throws Exception | This method is used to validate **UDFParameters** and is executed before **beforeStart** is called. | No | + +------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+ + | void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception | This is an initialization method used to call the user-defined initialization behavior before the UDTF processes the input data. Each time a user executes a UDTF query, the framework constructs a new UDF instance, and this method is called. It is called only once in the lifecycle of each UDF instance. | Yes | + +------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+ + | void transform(Row row, PointCollector collector) throws Exception | This method is called by the framework. When you choose to use the **RowByRowAccessStrategy** strategy in **beforeStart** to consume raw data, this data processing method is called. The input data is passed in by **Row**, and the result is output by **PointCollector**. You need to call the data collection method provided by **collector** in this method to determine the output data. | Use either this method or **transform(RowWindow rowWindow, PointCollector collector)**. | + +------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+ + | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | This method is called by the framework. When you choose to use the **SlidingSizeWindowAccessStrategy** or **SlidingTimeWindowAccessStrategy** strategy in **beforeStart** to consume raw data, this data processing method will be called. The input data is passed in by **RowWindow**, and the result is output by **PointCollector**. You need to call the data collection method provided by **collector** in this method to determine the output data. | Use either this method or **transform(Row row, PointCollector collector)**. | + +------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+ + | void terminate(PointCollector collector) throws Exception | This method is called by the framework. This method is called after all **transform** calls have been executed and before **beforeDestory** is called. In a single UDF query, this method will be called only once. You need to call the data collection method provided by **collector** in this method to determine the output data. | No | + +------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+ + | void beforeDestroy() | This method is called by the framework after the last input data is processed, and will be called only once in the lifecycle of each UDF instance. | No | + +------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+ + +**Calling sequence of each method:** + +#. **void validate(UDFParameterValidator validator) throws Exception** +#. **void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception** +#. **void transform(Row row, PointCollector collector) throws Exception** or **void transform(RowWindow rowWindow, PointCollector collector) throws Exception** +#. **void terminate(PointCollector collector) throws Exception** +#. **void beforeDestroy()** + +.. important:: + + Each time the framework executes a UDTF query, a new UDF instance will be constructed. When the query ends, this UDF instance will be destroyed. Therefore, the internal data of the instances in different UDTF queries (even in the same SQL statement) is isolated. You can maintain some state data in the UDTF without considering the impact of concurrency and other factors. + +**Interface usage:** + +- void validate(UDFParameterValidator validator) throws Exception + + The **validate** method is used to validate the parameters entered by users. + + In this method, you can limit the number and types of input time series, check the attributes of user input, or perform any custom logic verification. + +- void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception + + Using this method, you can do the following things: + + - Use **UDFParameters** to get the time series paths and parse the entered key-value pair attributes. + - Set information required for running the UDF. That is, set the strategy to access the raw data and set the output data type in **UDTFConfigurations**. + - Create resources, such as creating external connections and opening files. + +UDFParameters +------------- + +**UDFParameters** is used to parse the UDF parameters in SQL statements (the part in the parentheses following the UDF name in the SQL statements). The parameters include two parts. The first part is the path and its data type of the time series to be processed by the UDF. The second part is the key-value pair attributes for customization. + +Example: + +.. code-block:: + + SELECT UDF(s1, s2, 'key1'='iotdb', 'key2'='123.45') FROM root.sg.d; + +Usage: + +.. code-block:: + + void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + for (PartialPath path : parameters.getPaths()) { + TSDataType dataType = parameters.getDataType(path); + // do something + } + String stringValue = parameters.getString("key1"); // iotdb + Float floatValue = parameters.getFloat("key2"); // 123.45 + Double doubleValue = parameters.getDouble("key3"); // null + int intValue = parameters.getIntOrDefault("key4", 678); // 678 + // do something + + // configurations + // ... + } + +UDTFConfigurations +------------------ + +You can use **UDTFConfigurations** to specify the strategy used by the UDF to access raw data and the type of the output time series. + +Usage: + +.. code-block:: + + void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // parameters + // ... + + // configurations + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(TSDataType.INT32); + } + +The **setAccessStrategy** method is used to set the strategy used by the UDF to access raw data. The **setOutputDataType** method is used to set the data type of the output time series. + +- setAccessStrategy + + Note that the raw data access strategy you set here determines which **transform** method the framework will call. Implement the **transform** method corresponding to the raw data access strategy. You can also dynamically decide which strategy to set based on the attribute parameters parsed by **UDFParameters**. Therefore, the two **transform** methods are also allowed to be implemented in one UDF. + + The following are the strategies you can set. + + +---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+ + | Interface Definition | Description | transform Method to Call | + +=================================+=====================================================================================================================================================================================================================================================================================================================================================================================================================================================================================+================================================================================+ + | RowByRowAccessStrategy | Processes raw data row by row. The framework calls the **transform** method once for each row of raw data input. When a UDF has only one input time series, a row of input is a data point in the input time series. When a UDF has multiple input time series, a row of input is a result record of the raw query (aligned by time) on these input time series. (In a row, there may be a column with a value of **null**, but not all of them are **null**.) | void transform(Row row, PointCollector collector) throws Exception | + +---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+ + | SlidingTimeWindowAccessStrategy | Processes a batch of data in a fixed time interval each time. A data batch is called a window. The framework calls the **transform** method once for each raw data input window. A window may contain multiple rows of data. Each row of data is a result record of the raw query (aligned by time) on these input time series. (In a row, there may be a column with a value of **null**, but not all of them are **null**.) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | + +---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+ + | SlidingSizeWindowAccessStrategy | Processes raw data batch by batch, and each batch contains a fixed number of raw data rows (except the last batch). A data batch is called a window. The framework calls the **transform** method once for each raw data input window. A window may contain multiple rows of data. Each row of data is a result record of the raw query (aligned by time) on these input time series. (In a row, there may be a column with a value of **null**, but not all of them are **null**.) | void transform(RowWindow rowWindow, PointCollector collector) throws Exception | + +---------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------+ + + The construction of **RowByRowAccessStrategy** does not require any parameters. + + **SlidingTimeWindowAccessStrategy** has multiple constructors, and you can pass the following types of parameters to the constructors: + + - Start time and end time of the display window on the time axis + - Time interval for dividing the time axis (must be positive) + - Time sliding step (not required to be greater than or equal to the time interval, but must be a positive number) + + The display window on the time axis is optional. If these parameters are not provided, the start time of the display window will be set to the same as the minimum timestamp of the query result set, and the end time of the display window will be set to the same as the maximum timestamp of the query result set. + + The sliding step parameter is also optional. If the parameter is not provided, the sliding step will be set to the same as the time interval for dividing the time axis. + + The following figure shows the relationship between the three types of parameters. + + |image1| + + Note that the actual time interval of some of the last time windows may be less than the specified time interval parameter. In addition, the number of data rows in some time windows may be 0. In this case, the framework will also call the **transform** method for the empty windows. + + **SlidingSizeWindowAccessStrategy** has multiple constructors, and you can pass the following types of parameters to the constructors: + + - Window size, that is, the number of data rows in a data processing window. Note that the number of data rows in some of the last time windows may be less than the specified number of data rows. + - Sliding step, that is, the number of rows between the first point of the next window and the first point of the current window. (This parameter is not required to be greater than or equal to the window size, but must be a positive number.) + + The sliding step parameter is optional. If this parameter is not provided, the sliding step will be set to the same as the window size. + + Note that the type of output time series you set here determines the type of data that **PointCollector** in the **transform** method can actually receive. The relationship between the output data type set in **setOutputDataType** and the actual data output type that **PointCollector** can receive is as follows. + + +-------------------------------------------+-----------------------------------------------------------+ + | Output Data Type Set in setOutputDataType | Data Type That PointCollector Can Receive | + +===========================================+===========================================================+ + | INT32 | int | + +-------------------------------------------+-----------------------------------------------------------+ + | INT64 | long | + +-------------------------------------------+-----------------------------------------------------------+ + | FLOAT | float | + +-------------------------------------------+-----------------------------------------------------------+ + | DOUBLE | double | + +-------------------------------------------+-----------------------------------------------------------+ + | BOOLEAN | boolean | + +-------------------------------------------+-----------------------------------------------------------+ + | TEXT | java.lang.String and org.apache.iotdb.tsfile.utils.Binary | + +-------------------------------------------+-----------------------------------------------------------+ + +- The type of the output time series of a UDTF is determined at runtime. The UDTF can dynamically determine the type of the output time series according to the type of the input time series. + + Example: + + .. code-block:: + + void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) throws Exception { + // do something + // ... + + configurations + .setAccessStrategy(new RowByRowAccessStrategy()) + .setOutputDataType(parameters.getDataType(0)); + } + + - void transform(Row row, PointCollector collector) throws Exception + + You need to implement this method when you specify the strategy for the UDF to read raw data as **RowByRowAccessStrategy** in **beforeStart**. + + This method processes one row of raw data at a time. The raw data is input from **Row** and output by **PointCollector**. You can choose to output any number of data points in one **transform** call. Note that the type of the output data points must be the same as you set in the **beforeStart** method, and the timestamp of the output data points must be strictly monotonically increasing. + + The following is a complete UDF example that implements the **void transform(Row row, PointCollector collector) throws Exception** method. It is an adder that receives two columns of time series as input. When two data points in a row are not **null**, this UDF will output the algebraic sum of these two data points. + + .. code-block:: + + import org.apache.iotdb.db.query.udf.api.UDTF; + import org.apache.iotdb.db.query.udf.api.access.Row; + import org.apache.iotdb.db.query.udf.api.collector.PointCollector; + import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; + import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; + import org.apache.iotdb.db.query.udf.api.customizer.strategy.RowByRowAccessStrategy; + import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + + public class Adder implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT64) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) throws Exception { + if (row.isNull(0) || row.isNull(1)) { + return; + } + collector.putLong(row.getTime(), row.getLong(0) + row.getLong(1)); + } + } + + - void transform(RowWindow rowWindow, PointCollector collector) throws Exception + + You need to implement this method when you specify the strategy for the UDF to read raw data as **SlidingTimeWindowAccessStrategy** or **SlidingSizeWindowAccessStrategy**. + + This method processes a batch of data in a fixed number of rows or a fixed time interval each time, and the container containing this batch of data is called a window. The raw data is input from **RowWindow** and output by **PointCollector**. **RowWindow** can help you access a batch of rows, and it provides a set of interfaces for random access and iterative access to this batch of rows. You can choose to output any number of data points in one **transform** call. Note that the type of output data points must be the same as you set in the **beforeStart** method, and the timestamps of output data points must be strictly monotonically increasing. + + The following is a complete UDF example that implements the **void transform(RowWindow rowWindow, PointCollector collector) throws Exception** method. It is a counter that receives any number of time series as input, and its function is to count and output the number of data rows in each time window within a specified time range. + + .. code-block:: + + import java.io.IOException; + import org.apache.iotdb.db.query.udf.api.UDTF; + import org.apache.iotdb.db.query.udf.api.access.RowWindow; + import org.apache.iotdb.db.query.udf.api.collector.PointCollector; + import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; + import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; + import org.apache.iotdb.db.query.udf.api.customizer.strategy.SlidingTimeWindowAccessStrategy; + import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + + public class Counter implements UDTF { + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new SlidingTimeWindowAccessStrategy( + parameters.getLong("time_interval"), + parameters.getLong("sliding_step"), + parameters.getLong("display_window_begin"), + parameters.getLong("display_window_end"))); + } + + @Override + public void transform(RowWindow rowWindow, PointCollector collector) throws Exception { + if (rowWindow.windowSize() != 0) { + collector.putInt(rowWindow.getRow(0).getTime(), rowWindow.windowSize()); + } + } + } + + - void terminate(PointCollector collector) throws Exception + + In some scenarios, a UDF needs to traverse all the raw data to calculate the final output data points. The **terminate** interface provides support for those scenarios. + + This method is called after all **transform** calls have been executed and before **beforeDestory** is called. You can implement the **transform** method to perform pure data processing, and implement the **terminate** method to output the processing results. + + The processing results need to be output by **PointCollector**. You can choose to output any number of data points in one **terminate** call. Note that the type of the output data points must be the same as you set in the **beforeStart** method, and the timestamp of the output data points must be strictly monotonically increasing. + + The following is a complete UDF example that implements the **void terminate(PointCollector collector) throws Exception** method. It takes one time series whose data type is **INT32** as input, and outputs the maximum value point of the series. + + .. code-block:: + + import java.io.IOException; + import org.apache.iotdb.db.query.udf.api.UDTF; + import org.apache.iotdb.db.query.udf.api.access.Row; + import org.apache.iotdb.db.query.udf.api.collector.PointCollector; + import org.apache.iotdb.db.query.udf.api.customizer.config.UDTFConfigurations; + import org.apache.iotdb.db.query.udf.api.customizer.parameter.UDFParameters; + import org.apache.iotdb.db.query.udf.api.customizer.strategy.RowByRowAccessStrategy; + import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; + + public class Max implements UDTF { + + private Long time; + private int value; + + @Override + public void beforeStart(UDFParameters parameters, UDTFConfigurations configurations) { + configurations + .setOutputDataType(TSDataType.INT32) + .setAccessStrategy(new RowByRowAccessStrategy()); + } + + @Override + public void transform(Row row, PointCollector collector) { + int candidateValue = row.getInt(0); + if (time == null || value < candidateValue) { + time = row.getTime(); + value = candidateValue; + } + } + + @Override + public void terminate(PointCollector collector) throws IOException { + if (time != null) { + collector.putInt(time, value); + } + } + } + + - void beforeDestroy() + + This method is used to terminate a UDF. + + This method is called by the framework. For a UDF instance, **beforeDestroy** will be called after the last record is processed. In the entire lifecycle of the instance, **beforeDestroy** will be called only once. + +.. |image1| image:: /_static/images/en-us_image_0000001583272185.png diff --git a/doc/component-operation-guide-lts/source/using_iotdb/using_iotdb_from_scratch.rst b/doc/component-operation-guide-lts/source/using_iotdb/using_iotdb_from_scratch.rst new file mode 100644 index 0000000..1ab4a22 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/using_iotdb_from_scratch.rst @@ -0,0 +1,148 @@ +:original_name: mrs_01_24157.html + +.. _mrs_01_24157: + +Using IoTDB from Scratch +======================== + +IoTDB is a data management engine that integrates collection, storage, and analysis of time series data. It features lightweight, high performance, and ease of use. It perfectly interconnects with the Hadoop and Spark ecosystems and meets the requirements of high-speed write and complex analysis and query on massive time series data in industrial IoT applications. + +Background +---------- + +Assume that a group has three production lines with five devices on each. Sensors collect indicators (such as temperature, speed, and running status) of these devices in real time, as shown in :ref:`Figure 1 `. The service process of storing and managing data using IoTDB is as follows: + +#. Create a storage group named **root.**\ *Group name* to represent the group. +#. Create time series to store the device indicators. +#. Simulate sensors and record indicators. +#. Run SQL statements to query indicators. +#. After the service is complete, delete the stored data. + +.. _mrs_01_24157__fig129001748528: + +.. figure:: /_static/images/en-us_image_0000001532951892.png + :alt: **Figure 1** Data structure + + **Figure 1** Data structure + +Procedure +--------- + +#. Log in to the client. + + a. Log in to the node where the client is installed as the client installation user and run the following command to switch to the client installation directory, for example, **/opt/client**. + + **cd /opt/client** + + b. Run the following command to configure environment variables: + + **source bigdata_env** + + c. .. _mrs_01_24157__li10203101511564: + + If Kerberos authentication is enabled for the current cluster, run the following command to authenticate the current user. The current user must have the permission to create IoTDB tables. For details, see :ref:`IoTDB Permission Management `. If Kerberos authentication is disabled for the current cluster, skip this step. + + **kinit** *MRS cluster user* + + Example: + + **kinit iotdbuser** + +#. Run the following command to switch to the directory where the script for running IoTDB client is stored: + + **cd /opt/client/IoTDB/iotdb/sbin** + +#. Run the following command to log in to the client: + + **./start-cli.sh -h** *IP address of the IoTDBServer instance node* **-p** *IoTDBServer RPC port* + + The default IoTDBServer RPC port number is **22260**, which can be configured in the **rpc_port** parameter. + + After you run this command, specify the service username as required. + + - To specify the service username, enter **yes** and enter the service username and password as prompted. + + |image1| + + - If you will not specify the service username, enter **no**. In this case, you will perform subsequent operations as the user in :ref:`1.c `. + + |image2| + + - If you enter other information, you will log out. + + |image3| + +#. Create a storage group named **root.company** based on :ref:`Figure 1 `. + + **set storage group to root.company;** + +#. Create corresponding time series for sensors of the devices on the production line. + + **create timeseries root.company.line1.device1.spin WITH DATATYPE=FLOAT, ENCODING=RLE;** + + **create timeseries root.company.line1.device1.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN;** + + **create timeseries root.company.line1.device2.temperature WITH DATATYPE=FLOAT, ENCODING=RLE;** + + **create timeseries root.company.line1.device2.power WITH DATATYPE=FLOAT, ENCODING=RLE;** + + **create timeseries root.company.line2.device1.temperature WITH DATATYPE=FLOAT, ENCODING=RLE;** + + **create timeseries root.company.line2.device1.speed WITH DATATYPE=FLOAT, ENCODING=RLE;** + + **create timeseries root.company.line2.device2.speed WITH DATATYPE=FLOAT, ENCODING=RLE;** + + **create timeseries root.company.line2.device2.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN;** + +#. Adds data to time series. + + **insert into root.company.line1.device1(timestamp, spin) values (now(), 6684.0);** + + **insert into root.company.line1.device1(timestamp, status) values (now(), false);** + + **insert into root.company.line1.device2(timestamp, temperature) values (now(), 66.7);** + + **insert into root.company.line1.device2(timestamp, power) values (now(), 996.4);** + + **insert into root.company.line2.device1(timestamp, temperature) values (now(), 2684.0);** + + **insert into root.company.line2.device1(timestamp, speed) values (now(), 120.23);** + + **insert into root.company.line2.device2(timestamp, speed) values (now(), 130.56);** + + **insert into root.company.line2.device2(timestamp, status) values (now(), false);** + +#. Query indicators of all devices on the production line 1. + + **select \* from root.company.line1.**;** + + .. code-block:: + + +-----------------------------+-------------------------------+---------------------------------+--------------------------------------+--------------------------------+ + | Time|root.company.line1.device1.spin|root.company.line1.device1.status|root.company.line1.device2.temperature|root.company.line1.device2.power| + +-----------------------------+-------------------------------+---------------------------------+--------------------------------------+--------------------------------+ + |2021-06-17T11:29:08.131+08:00| 6684.0| null| null| null| + |2021-06-17T11:29:08.220+08:00| null| false| null| null| + |2021-06-17T11:29:08.249+08:00| null| null| 66.7| null| + |2021-06-17T11:29:08.282+08:00| null| null| null| 996.4| + +-----------------------------+-------------------------------+---------------------------------+--------------------------------------+--------------------------------+ + +#. Delete all device indicators on the production line 2. + + **delete timeseries root.company.line2.*;** + + Query the indicator data on production line 2. The result shows no indicator data exists. + + **select \* from root.company.line2.**;** + + .. code-block:: + + +----+ + |Time| + +----+ + +----+ + Empty set. + +.. |image1| image:: /_static/images/en-us_image_0000001583391869.png +.. |image2| image:: /_static/images/en-us_image_0000001582952105.png +.. |image3| image:: /_static/images/en-us_image_0000001532632212.png diff --git a/doc/component-operation-guide-lts/source/using_iotdb/using_the_iotdb_client.rst b/doc/component-operation-guide-lts/source/using_iotdb/using_the_iotdb_client.rst new file mode 100644 index 0000000..dd1b3a5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_iotdb/using_the_iotdb_client.rst @@ -0,0 +1,94 @@ +:original_name: mrs_01_24158.html + +.. _mrs_01_24158: + +Using the IoTDB Client +====================== + +Scenario +-------- + +This section describes how to use the IoTDB client in the O&M or service scenario. + +Prerequisites +------------- + +- The client has been installed. For example, the installation directory is **/opt/client**. The client directory in the following operations is only an example. Change it based on the actual installation directory onsite. +- Service component users have been created by the MRS cluster administrator. In security mode, machine-machine users need to download the keytab file. A human-machine user must change the password upon the first login. + +Procedure +--------- + +#. Log in to the node where the client is installed as the client installation user. + +#. Run the following command to go to the client installation directory: + + **cd /opt/client** + +#. Run the following command to configure environment variables: + + **source bigdata_env** + +#. Run the following command to generate a client SSL certificate: + + **keytool -noprompt -import -alias myservercert -file ca.crt -keystore truststore.jks** + + After running this command, you are required to set a password. + +#. Copy the generated **truststore.jks** file to the **Client installation directory/IoTDB/iotdb/conf** directory. + + **cp truststore.jks** *Client installation directory*\ **/IoTDB/iotdb/conf** + +#. .. _mrs_01_24158__li15263191081118: + + Log in to the IoTDB client based on the cluster authentication mode. + + - In security mode, run the following command to authenticate the user and log in to the IoTDB client: + + **kinit** *Component service user* + + - Skip this step in normal mode. + +#. Run the following command to switch to the directory where the IoTDB client running script is stored: + + **cd /opt/client/IoTDB/iotdb/sbin** + +#. Run the following command to log in to the client: + + **./start-cli.sh** **-h** *IP address of the IoTDBServer instance node* **-p** *IoTDBServer RPC port* + + After you run this command, specify the service username as required. + + - To specify the service username, enter **yes** and enter the service username and password as prompted. + + |image1| + + - If you will not specify the service username, enter **no**. In this case, you will perform subsequent operations as the user in :ref:`6 `. + + |image2| + + - If you enter other information, you will log out. + + |image3| + + .. note:: + + - When you log in to the client, you can configure the **-maxRPC** parameter to control the number of lines of execution results to be printed at a time. The default value is **1000**. If the value of **-maxRPC** is less than or equal to 0, all results are printed at a time. This parameter is typically used to redirect SQL execution results. + + - Meanwhile, you can optionally use the **-disableISO8601** parameter to control the display format of the time column in the query result. If this parameter is not specified, the time is displayed in YYYYMMDDHHMMSS format. If this parameter is specified, the timestamp is displayed. + + - If the SSL configuration is disabled on the server, you need to disable it on the client as follows: + + **cd** *Client installation directory*\ **/IoTDB/iotdb/conf** + + **vi iotdb-client.env** + + Change the value of **iotdb_ssl_enable** to **false**, save the configuration, and exit. + + To check the SSL configuration of the server, log in to FusionInsight Manager, choose **Cluster** > **Services** > **IoTDB** > **Configurations**, and search for **SSL_ENABLE**. Value **true** indicates that SSL is enabled, and value **false** indicates that it is disabled. + +#. After logging in to the client, you can run SQL statements. + +.. |image1| image:: /_static/images/en-us_image_0000001532791960.png +.. |image2| image:: /_static/images/en-us_image_0000001583151877.png +.. |image3| image:: /_static/images/en-us_image_0000001583391873.png diff --git a/doc/component-operation-guide-lts/source/using_kafka/configuring_intranet_and_extranet_access_for_kafka.rst b/doc/component-operation-guide-lts/source/using_kafka/configuring_intranet_and_extranet_access_for_kafka.rst new file mode 100644 index 0000000..a517dd9 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_kafka/configuring_intranet_and_extranet_access_for_kafka.rst @@ -0,0 +1,78 @@ +:original_name: mrs_01_24576.html + +.. _mrs_01_24576: + +Configuring Intranet and Extranet Access for Kafka +================================================== + +This section applies to MRS 3.2.0 or later. + +Scenario +-------- + +To access Kafka Broker deployed on a private network from the Kafka client via the Internet, enable the Kafka private and public network traffic distribution function. + +Prerequisites +------------- + +- The node where Broker resides has both private and public IP addresses. Broker is bound to the private IP address and cannot be accessed via the Internet. Alternatively, the node where Broker resides has only private IP addresses, and external services access the private network through gateway mapping. +- The ZooKeeper service is running properly. +- The Kafka instance status and disk status are normal. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. .. _mrs_01_24576__li718292175116: + + Choose **Cluster** > **Services** > **Kafka**. On the page that is displayed, click the **Instance** tab. On this tab page, select the target broker instance in the instance list. On the displayed page, click the **Instance Configurations** tab and then the **All Configurations** sub-tab. Enter **broker.id** in the search box to view and record the broker ID of the current broker instance. + +#. Repeat :ref:`2 ` to view and record the broker ID of each broker instance. + +#. Choose **Cluster** > **Services** > **Kafka**. On the page that is displayed, click the **Configurations** tab then the **All Configurations** sub-tab. On this sub-tab page, click **Broker(Role)** and select **Server**. Enter **advertised** and **actual** in the search box. The five configuration items shown in the following figure are displayed. Configure the parameters according to :ref:`Table 1 `. + + |image1| + + |image2| + + .. _mrs_01_24576__table25971414778: + + .. table:: **Table 1** Parameter description + + +-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | Remarks | + +===============================+============================================================================================================================================================================================+============================================================================================================================================================================================================================================================================+ + | enable.advertised.listener | Whether to enable the advertised.listeners configuration. The default value is **false**. | Set **enable.advertised.listener** to **true**. | + | | | | + | | | .. note:: | + | | | | + | | | When you install the Kafka service, do not set this parameter to **true**. You can set this parameter to **true** only after broker instances and ZooKeeper are running properly. | + +-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | advertised.broker.id.ip.map | IP address released by Kafka. This parameter is left blank by default. | Map the broker ID of each broker instance recorded in :ref:`2 ` to the IP address to which the broker instance to bound. | + | | | | + | | Format: *Broker ID:IP*. Multiple brokers can use the same IP address. If multiple mappings are configured, use commas (,) to separate them. | For example, if there are three broker instances and one IP address, the broker IDs are **1**, **2**, and **3**, and the IP address is **10.**\ *xxx.xxx.xxx*, the configuration format is **1:10.**\ *xxx.xxx.xxx*\ **,2:10.**\ *xxx.xxx.xxx*\ **,3:10.**\ *xxx.xxx.xxx*. | + +-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | advertised.broker.id.port.map | Port released by Kafka. This parameter is left blank by default. | Map the broker ID of each broker instance recorded in :ref:`2 ` to the port to which the broker instance to bound. | + | | | | + | | Format: *Broker ID:Port*. **Port** indicates the port to be bound. This port is a custom port and must be available. If multiple mappings are configured, use commas (,) to separate them. | For example, if there are three broker instances and three ports, the broker IDs are **1**, **2**, and **3**, and the ports are **3307**, **3308**, and **3309**, the configuration format is **1:3307,2:3308,3:3309**. | + +-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | actual.broker.id.ip.map | IP address bound to Kafka. This parameter is left blank by default. | Map the broker ID of each broker instance recorded in :ref:`2 ` to the IP address to which the broker instance to bound. | + | | | | + | | Format: *Broker ID:IP*. If multiple mappings are configured, use commas (,) to separate them. | For example, if there are three broker instances and one IP address, the broker IDs are **1**, **2**, and **3**, and the IP address is **10.**\ *xxx.xxx.xxx*, the configuration format is **1:10.**\ *xxx.xxx.xxx*\ **,2:10.**\ *xxx.xxx.xxx*\ **,3:10.**\ *xxx.xxx.xxx*. | + +-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | actual.broker.id.port.map | Port bound to Kafka. This parameter is left blank by default. | Map the broker ID of each broker instance recorded in :ref:`2 ` to the port to which the broker instance to bound. | + | | | | + | | Format: *Broker ID:Port*. **Port** indicates the port to be bound. This port is a custom port and must be available. If multiple mappings are configured, use commas (,) to separate them. | For example, if there are three broker instances and three ports, the broker IDs are **1**, **2**, and **3**, and the ports are **3307**, **3308**, and **3309**, the configuration format is **1:3307,2:3308,3:3309**. | + +-------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. After the configuration is complete, click **Save** in the upper left corner. On the **Instance** tab, select the target broker instances and choose **More** > **Instance Rolling Restart**. Wait until the rolling restart is complete. + +#. (Optional) To disable this configuration, set **enable.advertised.listener** to **false** and click **Save**. On the **Instance** page of Kafka, select Kafka instances, choose **More** > **Instance Rolling Restart**, and wait until the rolling restart is complete. + +.. note:: + + In a cluster with Kerberos authentication enabled, after **enable.advertised.listener** is configured, the client supports only Kerberos authentication, but not PLAIN authentication. + +.. |image1| image:: /_static/images/en-us_image_0000001583349121.png +.. |image2| image:: /_static/images/en-us_image_0000001532709204.png diff --git a/doc/component-operation-guide-lts/source/using_kafka/index.rst b/doc/component-operation-guide-lts/source/using_kafka/index.rst index 1f1213e..57dd885 100644 --- a/doc/component-operation-guide-lts/source/using_kafka/index.rst +++ b/doc/component-operation-guide-lts/source/using_kafka/index.rst @@ -24,6 +24,8 @@ Using Kafka - :ref:`Using Kafka UI ` - :ref:`Introduction to Kafka Logs ` - :ref:`Performance Tuning ` +- :ref:`Migrating Data Between Kafka Nodes ` +- :ref:`Configuring Intranet and Extranet Access for Kafka ` - :ref:`Common Issues About Kafka ` .. toctree:: @@ -49,4 +51,6 @@ Using Kafka using_kafka_ui/index introduction_to_kafka_logs performance_tuning/index + migrating_data_between_kafka_nodes + configuring_intranet_and_extranet_access_for_kafka common_issues_about_kafka/index diff --git a/doc/component-operation-guide-lts/source/using_kafka/introduction_to_kafka_logs.rst b/doc/component-operation-guide-lts/source/using_kafka/introduction_to_kafka_logs.rst index db0270a..0ca268d 100644 --- a/doc/component-operation-guide-lts/source/using_kafka/introduction_to_kafka_logs.rst +++ b/doc/component-operation-guide-lts/source/using_kafka/introduction_to_kafka_logs.rst @@ -10,7 +10,8 @@ Log Description **Log paths**: The default storage path of Kafka logs is **/var/log/Bigdata/kafka**. The default storage path of audit logs is **/var/log/Bigdata/audit/kafka**. -Broker: **/var/log/Bigdata/kafka/broker** (run logs) +- Broker: **/var/log/Bigdata/kafka/broker** (run logs) +- Kafka UI: **/var/log/Bigdata/kafka/ui** (run logs) Log archive rule: The automatic Kafka log compression function is enabled. By default, when the size of logs exceeds 30 MB, logs are automatically compressed into a log file named in the following format: *-.[ID].*\ **log.zip**. A maximum of 20 latest compressed files are retained by default. You can configure the number of compressed files and the compression threshold. @@ -60,16 +61,46 @@ Log archive rule: The automatic Kafka log compression function is enabled. By de | | audit.log | Authentication log of the Ranger authentication plug-in. This log is archived in the **/var/log/Bigdata/audit/kafka** directory. | +---------+--------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------+ +.. table:: **Table 2** Kafka UI log list + + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Type | Log File Name | Description | + +=======================+================================+===============================================================================================================================+ + | Run log | kafka-ui.log | Run log of the Kafka UI process | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | postinstall.log | Work log after Kafka UI installation | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | cleanup.log | Cleanup log of Kafka UI uninstallation | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | prestart.log | Work log before Kafka UI startup | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | ranger-kafka-plugin-enable.log | Log that records the Ranger plug-ins enabled by Kafka UI | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | start.log | Startup log of the Kafka UI process | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | stop.log | Stop log of the Kafka UI process | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | start.out | Kafka UI process startup information | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Audit log | audit.log | Audit log of the KafkaUI service | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | Authentication log | kafka-authorizer.log | Run log file of the open-source authentication plug-in of Kafka. | + | | | | + | | | This log is archived in the **/var/log/Bigdata/audit/kafka/kafkaui** directory. | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + | | ranger-authorizer.log | Run log of the Ranger authentication plug-in. This log is archived in the **/var/log/Bigdata/audit/kafka/kafkaui** directory. | + +-----------------------+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+ + Log Level --------- -:ref:`Table 2 ` describes the log levels supported by Kafka. +:ref:`Table 3 ` describes the log levels supported by Kafka. Levels of run logs are ERROR, WARN, INFO, and DEBUG from the highest to the lowest priority. Run logs of equal or higher levels are recorded. The higher the specified log level, the fewer the logs recorded. .. _mrs_01_1042__en-us_topic_0000001219029335_tdd8d04c16fc9471c90e233547cf6579c: -.. table:: **Table 2** Log levels +.. table:: **Table 3** Log levels +-------+------------------------------------------------------------------------------------------+ | Level | Description | @@ -95,7 +126,7 @@ Log Format The following table describes the Kafka log format. -.. table:: **Table 3** Log formats +.. table:: **Table 4** Log formats +---------+------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------+ | Type | Format | Example | diff --git a/doc/component-operation-guide-lts/source/using_kafka/migrating_data_between_kafka_nodes.rst b/doc/component-operation-guide-lts/source/using_kafka/migrating_data_between_kafka_nodes.rst new file mode 100644 index 0000000..618c5f5 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_kafka/migrating_data_between_kafka_nodes.rst @@ -0,0 +1,111 @@ +:original_name: mrs_01_24534.html + +.. _mrs_01_24534: + +Migrating Data Between Kafka Nodes +================================== + +This section applies to MRS 3.2.0 or later. + +Scenario +-------- + +This section describes how to use Kafka client commands to migrate partition data between disks on a node without stopping the Kafka service. + +Prerequisites +------------- + +- The MRS cluster administrator has understood service requirements and prepared a Kafka user (belonging to the **kafkaadmin** group. It is not required for the normal mode.). +- The Kafka client has been installed. +- The Kafka instance status and disk status are normal. +- Based on the current disk space usage of the partition to be migrated, ensure that the disk space will be sufficient after the migration. + +Procedure +--------- + +#. Log in as a client installation user to the node on which the Kafka client is installed. + +#. Run the following command to switch to the Kafka client installation directory, for example, **/opt/kafkaclient**: + + **cd /opt/kafkaclient** + +#. Run the following command to set environment variables: + + **source bigdata_env** + +#. Run the following command to authenticate the user (skip this step in normal mode): + + **kinit** *Component service user* + +#. Run the following command to switch to the Kafka client directory: + + **cd Kafka/kafka/bin** + +#. .. _mrs_01_24534__li420725319552: + + Run the following command to view the topic details of the partition to be migrated: + + **Security mode:** + + **./kafka-topics.sh --describe --bootstrap-server** *IP address of the Kafkacluster:21007* **--command-config ../config/client.properties** **--topic** *topic name* + + **Normal mode:** + + **./kafka-topics.sh --describe --bootstrap-server** *IP address of the Kafka cluster:21005* **--command-config ../config/client.properties** **--topic** *Topic name* + + |image1| + +#. .. _mrs_01_24534__li1824951465613: + + Run the following command to query the mapping between **Broker_ID** and the IP address: + + **./kafka-broker-info.sh --zookeeper** *IP address of the ZooKeeper quorumpeer instance*:*ZooKeeper port number*\ **/kafka** + + .. code-block:: + + Broker_ID IP_Address + -------------------------- + 4 192.168.0.100 + 5 192.168.0.101 + 6 192.168.0.102 + + .. note:: + + - IP address of the ZooKeeper quorumpeer instance + + To obtain IP addresses of all ZooKeeper quorumpeer instances, log in to FusionInsight Manager and choose **Cluster** > **Services** > **ZooKeeper**. On the displayed page, click **Instance** and view the IP addresses of all the hosts where the quorumpeer instances locate. + + - Port number of the ZooKeeper client + + Log in to FusionInsight Manager and choose **Cluster** > **Service** > **ZooKeeper**. On the displayed page, click **Configurations** and check the value of **clientPort**. The default value is **24002**. + +#. .. _mrs_01_24534__li1230081019282: + + Obtain the partition distribution and node information from the command output in :ref:`6 ` and :ref:`7 `, and create the JSON file for reallocation in the current directory. + + To migrate data in the partition whose **Broker_ID** is **6** to the **/srv/BigData/hadoop/data1/kafka-logs** directory, the required JSON configuration file is as follows: + + .. code-block:: + + {"partitions":[{"topic": "testws","partition": 2,"replicas": [6,5],"log_dirs": ["/srv/BigData/hadoop/data1/kafka-logs","any"]}],"version":1} + + .. note:: + + - **topic** indicates the topic name, for example, **testws**. + - **partition** indicates the topic partition. + - The number in **replicas** corresponds to **Broker_ID**. + - **log_dirs** indicates the path of the disk to be migrated. In this example, **log_dirs** of the node whose **Broker_ID** is **5** is set to **any**, and that of the node whose **Broker_ID** is **6** is set to **/srv/BigData/hadoop/data1/kafka-logs.** Note that the path must correspond to the node. + +#. Run the following command to perform reallocation: + + **Security mode:** + + **./kafka-reassign-partitions.sh** **--bootstrap-server** *Service IP address of Broker*\ **:21007** **--command-config ../config/client.properties** **--zookeeper** *{zk_host}:{port}*\ **/kafka** **--reassignment-json-file** *Path of the JSON file compiled in :ref:`8 `* **--execute** + + **Normal mode:** + + **./kafka-reassign-partitions.sh** **--bootstrap-server** *Service IP address of Broker*\ **:21005** **--command-config ../config/client.properties** **--zookeeper** *{zk_host}:{port}*\ **/kafka** **--reassignment-json-file** *Path of the JSON file compiled in :ref:`8 `* **--execute** + + If message "Successfully started reassignment of partitions" is displayed, the execution is successful. + +.. |image1| image:: /_static/images/en-us_image_0000001583468825.png diff --git a/doc/component-operation-guide-lts/source/using_loader/exporting_data/index.rst b/doc/component-operation-guide-lts/source/using_loader/exporting_data/index.rst index 9e18339..54ab502 100644 --- a/doc/component-operation-guide-lts/source/using_loader/exporting_data/index.rst +++ b/doc/component-operation-guide-lts/source/using_loader/exporting_data/index.rst @@ -5,7 +5,7 @@ Exporting Data ============== -- :ref:`Overview ` +- :ref:`Loader Exporting Data Overview ` - :ref:`Using Loader to Export Data ` - :ref:`Typical Scenario: Exporting Data from HDFS/OBS to an SFTP Server ` - :ref:`Typical Scenario: Exporting Data from HBase to an SFTP Server ` @@ -21,7 +21,7 @@ Exporting Data :maxdepth: 1 :hidden: - overview + loader_exporting_data_overview using_loader_to_export_data typical_scenario_exporting_data_from_hdfs_obs_to_an_sftp_server typical_scenario_exporting_data_from_hbase_to_an_sftp_server diff --git a/doc/component-operation-guide-lts/source/using_loader/exporting_data/overview.rst b/doc/component-operation-guide-lts/source/using_loader/exporting_data/loader_exporting_data_overview.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_loader/exporting_data/overview.rst rename to doc/component-operation-guide-lts/source/using_loader/exporting_data/loader_exporting_data_overview.rst index 5e6e984..889cf60 100644 --- a/doc/component-operation-guide-lts/source/using_loader/exporting_data/overview.rst +++ b/doc/component-operation-guide-lts/source/using_loader/exporting_data/loader_exporting_data_overview.rst @@ -2,8 +2,8 @@ .. _mrs_01_1101: -Overview -======== +Loader Exporting Data Overview +============================== Description ----------- diff --git a/doc/component-operation-guide-lts/source/using_loader/importing_data/index.rst b/doc/component-operation-guide-lts/source/using_loader/importing_data/index.rst index 1b1d995..ace00d3 100644 --- a/doc/component-operation-guide-lts/source/using_loader/importing_data/index.rst +++ b/doc/component-operation-guide-lts/source/using_loader/importing_data/index.rst @@ -5,7 +5,7 @@ Importing Data ============== -- :ref:`Overview ` +- :ref:`Loader Importing Data Overview ` - :ref:`Importing Data Using Loader ` - :ref:`Typical Scenario: Importing Data from an SFTP Server to HDFS or OBS ` - :ref:`Typical Scenario: Importing Data from an SFTP Server to HBase ` @@ -24,7 +24,7 @@ Importing Data :maxdepth: 1 :hidden: - overview + loader_importing_data_overview importing_data_using_loader typical_scenario_importing_data_from_an_sftp_server_to_hdfs_or_obs typical_scenario_importing_data_from_an_sftp_server_to_hbase diff --git a/doc/component-operation-guide-lts/source/using_loader/importing_data/overview.rst b/doc/component-operation-guide-lts/source/using_loader/importing_data/loader_importing_data_overview.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_loader/importing_data/overview.rst rename to doc/component-operation-guide-lts/source/using_loader/importing_data/loader_importing_data_overview.rst index 7219f57..be2b6ea 100644 --- a/doc/component-operation-guide-lts/source/using_loader/importing_data/overview.rst +++ b/doc/component-operation-guide-lts/source/using_loader/importing_data/loader_importing_data_overview.rst @@ -2,8 +2,8 @@ .. _mrs_01_1087: -Overview -======== +Loader Importing Data Overview +============================== Description ----------- diff --git a/doc/component-operation-guide-lts/source/using_loader/job_management/index.rst b/doc/component-operation-guide-lts/source/using_loader/job_management/index.rst index 6e10836..c992e52 100644 --- a/doc/component-operation-guide-lts/source/using_loader/job_management/index.rst +++ b/doc/component-operation-guide-lts/source/using_loader/job_management/index.rst @@ -10,6 +10,7 @@ Job Management - :ref:`Importing Loader Jobs in Batches ` - :ref:`Exporting Loader Jobs in Batches ` - :ref:`Viewing Historical Job Information ` +- :ref:`Purging Historical Loader Data ` .. toctree:: :maxdepth: 1 @@ -20,3 +21,4 @@ Job Management importing_loader_jobs_in_batches exporting_loader_jobs_in_batches viewing_historical_job_information + purging_historical_loader_data diff --git a/doc/component-operation-guide-lts/source/using_loader/job_management/purging_historical_loader_data.rst b/doc/component-operation-guide-lts/source/using_loader/job_management/purging_historical_loader_data.rst new file mode 100644 index 0000000..49eb047 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_loader/job_management/purging_historical_loader_data.rst @@ -0,0 +1,44 @@ +:original_name: mrs_01_24813.html + +.. _mrs_01_24813: + +Purging Historical Loader Data +============================== + +This section applies to MRS 3.2.0 or later. + +Scenario +-------- + +Loader accumulates a large amount of historical data during service running. The historical data may affect job submission, running, and status query, and even cause page freezing and job running failures. Therefore, you need to properly configure the historical data pruge policy based on the Loader service data volume. + +Procedure +--------- + +#. Log in to FusionInsight Manager. + +#. Choose **Cluster** > **Services** > **Loader** and click the **Configurations** tab and then **All Configurations**. In the navigation pane on the left, choose **LoaderServer(Role)** > **Purge**. Then adjust the parameter settings shown in the following figure by referring to :ref:`Table 1 `. + + |image1| + + .. _mrs_01_24813__table48037266219: + + .. table:: **Table 1** Parameters for purging historical Loader data + + +------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-------------------+ + | Parameter | Description | Recommended Value | + +====================================+===========================================================================================================================+===================+ + | loader.submission.purge.interval | Interval for invoking the purge task, in minutes. | 60 | + +------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-------------------+ + | loader.submission.purge.limited | Number of submissions that are retained during the purge. This prevents historical job records from being totally purged. | 0 | + +------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-------------------+ + | loader.submission.purge.record.max | Maximum number of records that can be retained in a Loader job. Value **0** indicates that the number is not limited. | 7 | + +------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-------------------+ + | loader.submission.purge.threshold | Duration for retaining historical records, in hours. | 24 | + +------------------------------------+---------------------------------------------------------------------------------------------------------------------------+-------------------+ + +#. Click **Save**. + +#. Click **Dashboard** to go to the Loader service page. Click **More** and select **Restart Service**. Verify the identity and click **OK**. Wait until the restart is successful. + +.. |image1| image:: /_static/images/en-us_image_0000001532549720.png diff --git a/doc/component-operation-guide-lts/source/using_loader/operator_help/index.rst b/doc/component-operation-guide-lts/source/using_loader/operator_help/index.rst index 40d21b0..99aef79 100644 --- a/doc/component-operation-guide-lts/source/using_loader/operator_help/index.rst +++ b/doc/component-operation-guide-lts/source/using_loader/operator_help/index.rst @@ -5,7 +5,7 @@ Operator Help ============= -- :ref:`Overview ` +- :ref:`Loader Operator Overview ` - :ref:`Input Operators ` - :ref:`Conversion Operators ` - :ref:`Output Operators ` @@ -17,7 +17,7 @@ Operator Help :maxdepth: 1 :hidden: - overview + loader_operator_overview input_operators/index conversion_operators/index output_operators/index diff --git a/doc/component-operation-guide-lts/source/using_loader/operator_help/overview.rst b/doc/component-operation-guide-lts/source/using_loader/operator_help/loader_operator_overview.rst similarity index 99% rename from doc/component-operation-guide-lts/source/using_loader/operator_help/overview.rst rename to doc/component-operation-guide-lts/source/using_loader/operator_help/loader_operator_overview.rst index f2012e2..bdaa435 100644 --- a/doc/component-operation-guide-lts/source/using_loader/operator_help/overview.rst +++ b/doc/component-operation-guide-lts/source/using_loader/operator_help/loader_operator_overview.rst @@ -2,8 +2,8 @@ .. _mrs_01_1120: -Overview -======== +Loader Operator Overview +======================== Conversion Process ------------------ diff --git a/doc/component-operation-guide-lts/source/using_mapreduce/common_issues_about_mapreduce/why_does_it_take_a_long_time_to_run_a_task_upon_resourcemanager_active_standby_switchover.rst b/doc/component-operation-guide-lts/source/using_mapreduce/common_issues_about_mapreduce/why_does_it_take_a_long_time_to_run_a_task_upon_resourcemanager_active_standby_switchover.rst index af2fd68..1bbad16 100644 --- a/doc/component-operation-guide-lts/source/using_mapreduce/common_issues_about_mapreduce/why_does_it_take_a_long_time_to_run_a_task_upon_resourcemanager_active_standby_switchover.rst +++ b/doc/component-operation-guide-lts/source/using_mapreduce/common_issues_about_mapreduce/why_does_it_take_a_long_time_to_run_a_task_upon_resourcemanager_active_standby_switchover.rst @@ -15,7 +15,11 @@ Answer This is because, ResorceManager HA is enabled but the ResourceManager work preserving restart is not enabled. -If ResorceManager work preserving restart is not enabled, then ResorceManager switch containers are killed which causes the ResorceManager to timeout the ApplicationMaster. For ResorceManager work preserving restart feature details, see http://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html. +If ResorceManager work preserving restart is not enabled, then ResorceManager switch containers are killed which causes the ResorceManager to timeout the ApplicationMaster. For details about the Work-preserving RM restart function, visit the following website: + +Versions earlier than MRS 3.2.0: http://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html + +MRS 3.2.0 or later: https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRestart.html The following method can be used to solve the issue: diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/creating_a_workflow.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/creating_a_workflow.rst index 77e7e07..9b8e1d7 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/creating_a_workflow.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/creating_a_workflow.rst @@ -26,14 +26,14 @@ Procedure For submitting different job types, follow instructions in the following sections: - - :ref:`Submitting a Hive2 Job ` - - :ref:`Submitting a Spark2x Job ` - - :ref:`Submitting a Java Job ` - - :ref:`Submitting a Loader Job ` - - :ref:`Submitting a MapReduce Job ` - - :ref:`Submitting a Sub-workflow Job ` - - :ref:`Submitting a Shell Job ` - - :ref:`Submitting an HDFS Job ` - - :ref:`Submitting a DistCp Job ` + - :ref:`Submitting a Hive2 Job in Hue ` + - :ref:`Submitting a Spark2x Job in Hue ` + - :ref:`Submitting a Java Job in Hue ` + - :ref:`Submitting a Loader Job in Hue ` + - :ref:`Submitting a MapReduce Job in Hue ` + - :ref:`Submitting a Sub-workflow Job in Hue ` + - :ref:`Submitting a Shell Job in Hue ` + - :ref:`Submitting an HDFS Job in Hue ` + - :ref:`Submitting a DistCp Job in Hue ` .. |image1| image:: /_static/images/en-us_image_0000001296059856.png diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/example_of_mutual_trust_operations.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/example_of_mutual_trust_operations_in_hue.rst similarity index 96% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/example_of_mutual_trust_operations.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/example_of_mutual_trust_operations_in_hue.rst index 166be74..0f4c96c 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/example_of_mutual_trust_operations.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/example_of_mutual_trust_operations_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1830: -Example of Mutual Trust Operations -================================== +Example of Mutual Trust Operations in Hue +========================================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/index.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/index.rst index 6286aa5..1f247fc 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/index.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/index.rst @@ -5,34 +5,34 @@ Submitting a Workflow Job ========================= -- :ref:`Submitting a Hive2 Job ` -- :ref:`Submitting a Spark2x Job ` -- :ref:`Submitting a Java Job ` -- :ref:`Submitting a Loader Job ` -- :ref:`Submitting a MapReduce Job ` -- :ref:`Submitting a Sub-workflow Job ` -- :ref:`Submitting a Shell Job ` -- :ref:`Submitting an HDFS Job ` -- :ref:`Submitting a DistCp Job ` -- :ref:`Example of Mutual Trust Operations ` -- :ref:`Submitting an SSH Job ` -- :ref:`Submitting a Hive Script ` -- :ref:`Submitting an Email Job ` +- :ref:`Submitting a Hive2 Job in Hue ` +- :ref:`Submitting a Spark2x Job in Hue ` +- :ref:`Submitting a Java Job in Hue ` +- :ref:`Submitting a Loader Job in Hue ` +- :ref:`Submitting a MapReduce Job in Hue ` +- :ref:`Submitting a Sub-workflow Job in Hue ` +- :ref:`Submitting a Shell Job in Hue ` +- :ref:`Submitting an HDFS Job in Hue ` +- :ref:`Submitting a DistCp Job in Hue ` +- :ref:`Example of Mutual Trust Operations in Hue ` +- :ref:`Submitting an SSH Job in Hue ` +- :ref:`Submitting a Hive Script in Hue ` +- :ref:`Submitting an Email Job in Hue ` .. toctree:: :maxdepth: 1 :hidden: - submitting_a_hive2_job - submitting_a_spark2x_job - submitting_a_java_job - submitting_a_loader_job - submitting_a_mapreduce_job - submitting_a_sub-workflow_job - submitting_a_shell_job - submitting_an_hdfs_job - submitting_a_distcp_job - example_of_mutual_trust_operations - submitting_an_ssh_job - submitting_a_hive_script - submitting_an_email_job + submitting_a_hive2_job_in_hue + submitting_a_spark2x_job_in_hue + submitting_a_java_job_in_hue + submitting_a_loader_job_in_hue + submitting_a_mapreduce_job_in_hue + submitting_a_sub-workflow_job_in_hue + submitting_a_shell_job_in_hue + submitting_an_hdfs_job_in_hue + submitting_a_distcp_job_in_hue + example_of_mutual_trust_operations_in_hue + submitting_an_ssh_job_in_hue + submitting_a_hive_script_in_hue + submitting_an_email_job_in_hue diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_distcp_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_distcp_job_in_hue.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_distcp_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_distcp_job_in_hue.rst index ebface2..afe2341 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_distcp_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_distcp_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1829: -Submitting a DistCp Job -======================= +Submitting a DistCp Job in Hue +============================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive2_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive2_job_in_hue.rst similarity index 97% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive2_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive2_job_in_hue.rst index 0f15f4e..319025e 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive2_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive2_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1820: -Submitting a Hive2 Job -====================== +Submitting a Hive2 Job in Hue +============================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive_script.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive_script_in_hue.rst similarity index 93% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive_script.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive_script_in_hue.rst index 323c2bf..4753620 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive_script.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_hive_script_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_2372: -Submitting a Hive Script -======================== +Submitting a Hive Script in Hue +=============================== Scenario -------- @@ -23,7 +23,7 @@ Procedure |image3| -#. Configure the Job XML, for example, to the HDFS path **/user/admin/examples/apps/hive2/hive-site.xml**. For details, see :ref:`Submitting a Hive2 Job `. +#. Configure the Job XML, for example, to the HDFS path **/user/admin/examples/apps/hive2/hive-site.xml**. For details, see :ref:`Submitting a Hive2 Job in Hue `. #. Click |image4| in the upper right corner of the Oozie editor. diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_java_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_java_job_in_hue.rst similarity index 95% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_java_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_java_job_in_hue.rst index 8fbe333..962128f 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_java_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_java_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1822: -Submitting a Java Job -===================== +Submitting a Java Job in Hue +============================ Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_loader_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_loader_job_in_hue.rst similarity index 95% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_loader_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_loader_job_in_hue.rst index a7bc623..3268d76 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_loader_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_loader_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1823: -Submitting a Loader Job -======================= +Submitting a Loader Job in Hue +============================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_mapreduce_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_mapreduce_job_in_hue.rst similarity index 96% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_mapreduce_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_mapreduce_job_in_hue.rst index 9920814..1ecb94b 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_mapreduce_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_mapreduce_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1824: -Submitting a MapReduce Job -========================== +Submitting a MapReduce Job in Hue +================================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_shell_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_shell_job_in_hue.rst similarity index 96% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_shell_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_shell_job_in_hue.rst index 18aa945..810f87b 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_shell_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_shell_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1826: -Submitting a Shell Job -====================== +Submitting a Shell Job in Hue +============================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_spark2x_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_spark2x_job_in_hue.rst similarity index 97% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_spark2x_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_spark2x_job_in_hue.rst index a1cef6c..e1c88f5 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_spark2x_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_spark2x_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1821: -Submitting a Spark2x Job -======================== +Submitting a Spark2x Job in Hue +=============================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_sub-workflow_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_sub-workflow_job_in_hue.rst similarity index 94% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_sub-workflow_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_sub-workflow_job_in_hue.rst index 2e6f772..5c00b25 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_sub-workflow_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_a_sub-workflow_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1825: -Submitting a Sub-workflow Job -============================= +Submitting a Sub-workflow Job in Hue +==================================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_email_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_email_job_in_hue.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_email_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_email_job_in_hue.rst index 61e492f..048dc1b 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_email_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_email_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_24114: -Submitting an Email Job -======================= +Submitting an Email Job in Hue +============================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_hdfs_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_hdfs_job_in_hue.rst similarity index 95% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_hdfs_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_hdfs_job_in_hue.rst index 05bbee0..0a7e500 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_hdfs_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_hdfs_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1827: -Submitting an HDFS Job -====================== +Submitting an HDFS Job in Hue +============================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_ssh_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_ssh_job_in_hue.rst similarity index 91% rename from doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_ssh_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_ssh_job_in_hue.rst index ff23740..cb802c7 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_ssh_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_hue_to_submit_an_oozie_job/submitting_a_workflow_job/submitting_an_ssh_job_in_hue.rst @@ -2,8 +2,8 @@ .. _mrs_01_1831: -Submitting an SSH Job -===================== +Submitting an SSH Job in Hue +============================ Scenario -------- @@ -15,7 +15,7 @@ Procedure #. Create a workflow. For details, see :ref:`Creating a Workflow `. -#. For details about how to add the trust relationship, see :ref:`Example of Mutual Trust Operations `. +#. For details about how to add the trust relationship, see :ref:`Example of Mutual Trust Operations in Hue `. #. On the workflow editing page, select the **Ssh** button |image1| and drag it to the operation area. diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/index.rst b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/index.rst index ce4999d..b276ced 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/index.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/index.rst @@ -5,18 +5,18 @@ Using Oozie Client to Submit an Oozie Job ========================================= -- :ref:`Submitting a Hive Job ` -- :ref:`Submitting a Spark2x Job ` -- :ref:`Submitting a Loader Job ` -- :ref:`Submitting a DistCp Job ` -- :ref:`Submitting Other Jobs ` +- :ref:`Submitting a Hive Job with Oozie Client ` +- :ref:`Submitting a Spark2x Job with Oozie Client ` +- :ref:`Submitting a Loader Job with Oozie Client ` +- :ref:`Submitting a DistCp Job with Oozie Client ` +- :ref:`Submitting Other Jobs with Oozie Client ` .. toctree:: :maxdepth: 1 :hidden: - submitting_a_hive_job - submitting_a_spark2x_job - submitting_a_loader_job - submitting_a_distcp_job - submitting_other_jobs + submitting_a_hive_job_with_oozie_client + submitting_a_spark2x_job_with_oozie_client + submitting_a_loader_job_with_oozie_client + submitting_a_distcp_job_with_oozie_client + submitting_other_jobs_with_oozie_client diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_distcp_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_distcp_job_with_oozie_client.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_distcp_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_distcp_job_with_oozie_client.rst index dd61866..fb9f8fa 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_distcp_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_distcp_job_with_oozie_client.rst @@ -2,8 +2,8 @@ .. _mrs_01_2392: -Submitting a DistCp Job -======================= +Submitting a DistCp Job with Oozie Client +========================================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_hive_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_hive_job_with_oozie_client.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_hive_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_hive_job_with_oozie_client.rst index 9a006b0..d814af2 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_hive_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_hive_job_with_oozie_client.rst @@ -2,8 +2,8 @@ .. _mrs_01_1813: -Submitting a Hive Job -===================== +Submitting a Hive Job with Oozie Client +======================================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_loader_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_loader_job_with_oozie_client.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_loader_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_loader_job_with_oozie_client.rst index 3fcff09..fdaf048 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_loader_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_loader_job_with_oozie_client.rst @@ -2,8 +2,8 @@ .. _mrs_01_1815: -Submitting a Loader Job -======================= +Submitting a Loader Job with Oozie Client +========================================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_spark2x_job.rst b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_spark2x_job_with_oozie_client.rst similarity index 97% rename from doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_spark2x_job.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_spark2x_job_with_oozie_client.rst index 977185c..f50b31f 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_spark2x_job.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_a_spark2x_job_with_oozie_client.rst @@ -2,8 +2,8 @@ .. _mrs_01_1814: -Submitting a Spark2x Job -======================== +Submitting a Spark2x Job with Oozie Client +========================================== Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_other_jobs.rst b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_other_jobs_with_oozie_client.rst similarity index 98% rename from doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_other_jobs.rst rename to doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_other_jobs_with_oozie_client.rst index 43b0700..863ccda 100644 --- a/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_other_jobs.rst +++ b/doc/component-operation-guide-lts/source/using_oozie/using_oozie_client_to_submit_an_oozie_job/submitting_other_jobs_with_oozie_client.rst @@ -2,8 +2,8 @@ .. _mrs_01_1816: -Submitting Other Jobs -===================== +Submitting Other Jobs with Oozie Client +======================================= Scenario -------- diff --git a/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_cdl.rst b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_cdl.rst new file mode 100644 index 0000000..5243704 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_cdl.rst @@ -0,0 +1,183 @@ +:original_name: mrs_01_24245.html + +.. _mrs_01_24245: + +Adding a Ranger Access Permission Policy for CDL +================================================ + +Scenario +-------- + +Ranger administrators can use Ranger to configure creation, execution, query, and deletion permissions for CDL users. + +Prerequisites +------------- + +- The Ranger service has been installed and is running properly. +- You have created users, user groups, or roles for which you want to configure permissions. + +Procedure +--------- + +#. Log in to the Ranger web UI as the Ranger administrator **rangeradmin**. For details, see :ref:`Logging In to the Ranger Web UI `. + +#. On the home page, click the component plug-in name in the **CDL** area, for example, **CDL**. + +#. Click **Add New Policy** to add a CDL permission control policy. + +#. Configure the parameters listed in the table below based on the service demands. + + .. table:: **Table 1** CDL permission parameters + + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Parameter | Description | + +===================================+===================================================================================================================================================================================================================================================================================================================+ + | Policy Type | Access. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Policy Conditions | IP address filtering policy, which can be customized. You can enter one or more IP addresses or IP address segments. The IP address can contain the wildcard character (``*``), for example, **192.168.1.10**, **192.168.1.20**, or **192.168.1.\***. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Policy Name | Policy name, which can be customized and must be unique in the service. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Policy Label | A label specified for the current policy. You can search for reports and filter policies based on labels. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | job | Name of the job applicable to the current policy. You can enter multiple values. The value can contain wildcards, such as **test**, **test\***, and **\***. | + | | | + | | The **Include** policy applies to the current input object, and the **Exclude** policy applies to objects other than the current input object. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Description | Policy description. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Audit Logging | Whether to audit the policy. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Allow Conditions | Permission and exception conditions allowed by a policy. The priority of an exception condition is higher than that of a normal condition. | + | | | + | | In the **Select Role**, **Select Group**, and **Select User** columns, select the role, user group, or user to which you want to assign permissions. | + | | | + | | Click **Add Conditions**, add the IP address range to which the policy applies, and click **Add Permissions** to add corresponding permissions. | + | | | + | | - **Create** permission. | + | | - **Execute** permission. | + | | - **Delete** permission. | + | | - **Update** permission. | + | | - **Get** permission. | + | | - **Select/Deselect All** permission. | + | | | + | | To add multiple permission control rules, click |image1|. | + | | | + | | If users or user groups in the current condition need to manage this policy, select **Delegate Admin**. These users will become the agent administrators. The agent administrators can update and delete this policy and create sub-policies based on the original policy. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Deny Conditions | Policy rejection condition, which is used to configure the permissions and exceptions to be denied in the policy. The configuration method is the same as that of **Allow Conditions**. The priority of the rejection condition is higher than that of the allowed conditions configured in **Allow Conditions**. | + +-----------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + + .. table:: **Table 2** Setting user permissions + + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Scenario | Role Authorization | + +===============================================================================+============================================================================================================================================================+ + | Setting the CDL administrator permission | a. On the home page, click the component plug-in name in the **CDL** area, for example, **CDL**. | + | | b. Select the policies whose **Policy Name** is **all - job**, **all - link**, **all - driver** or **all - env**, and click |image2| to edit the policies. | + | | c. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | d. Click **Add Permissions** and select **Select/Deselect All**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to manage a CDL job | a. Select a CDL job name from the **job** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Select/Deselect All**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to create a CDL job | a. Select a CDL job name from the **job** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Create**. | + | | | + | | .. note:: | + | | | + | | By default, all users have the permission to create a CDL job. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to delete a CDL job | a. Select a CDL job name from the **job** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Delete**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to obtain information about a CDL job | a. Select a CDL job name from the **job** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Get**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to execute a CDL job | a. Select a CDL job name from the **job** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Execute**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to manage a CDL data link | a. Select the CDL data link name on the right of the **link** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Select/Deselect All**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to create a CDL data link | a. Select the CDL data link name on the right of the **link** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Create**. | + | | | + | | .. note:: | + | | | + | | By default, all users have the permission to creat CDL data links. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to delete a CDL data link | a. Select the CDL data link name on the right of the **link** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Delete**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to update a CDL data link | a. Select the CDL data link name on the right of the **link** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Update**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to obtain information about a CDL data link | a. Select the CDL data link name on the right of the **link** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Get**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to manage a CDL driver | a. Select the CDL driver name on the right of the **driver** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Select/Deselect All**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to delete a CDL driver | a. Select the CDL driver name on the right of the **driver** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Delete**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to update a CDL driver | a. Select the CDL driver name on the right of the **driver** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Update**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to obtain information about a CDL driver | a. Select the CDL driver name on the right of the **driver** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Get**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to manage a CDL environment variable | a. Select the CDL environment variable name on the right of the **env** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Select/Deselect All**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to create a CDL environment variable | a. Select the CDL environment variable name on the right of the **env** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Create**. | + | | | + | | .. note:: | + | | | + | | By default, all users have the permission to create a CDL environment variable. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to delete a CDL environment variable | a. Select the CDL environment variable name on the right of the **env** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Delete**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to update a CDL environment variable | a. Select the CDL environment variable name on the right of the **env** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Update**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | Setting the permission to obtain information about a CDL environment variable | a. Select the CDL environment variable name on the right of the **env** drop-down list. | + | | b. In the **Allow Conditions** area, select a user from the **Select User** drop-down list. | + | | c. Click **Add Permissions** and select **Get**. | + +-------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +#. (Optional) Add the validity period of the policy. Click **Add Validity period** in the upper right corner of the page, set **Start Time** and **End Time**, and select **Time Zone**. Click **Save**. To add multiple policy validity periods, click |image3|. To delete a policy validity period, click |image4|. + +#. Click **Add** to view the basic information about the policy in the policy list. After the policy takes effect, check whether the related permissions are normal. + + To disable a policy, click |image5| to edit the policy and set the policy to **Disabled**. + + If a policy is no longer used, click |image6| to delete it. + +.. |image1| image:: /_static/images/en-us_image_0000001583961513.png +.. |image2| image:: /_static/images/en-us_image_0000001583881265.png +.. |image3| image:: /_static/images/en-us_image_0000001533481354.png +.. |image4| image:: /_static/images/en-us_image_0000001533641294.png +.. |image5| image:: /_static/images/en-us_image_0000001584081289.png +.. |image6| image:: /_static/images/en-us_image_0000001533162146.png diff --git a/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hetuengine.rst b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hetuengine.rst index 8295311..93c9879 100644 --- a/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hetuengine.rst +++ b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hetuengine.rst @@ -239,7 +239,7 @@ Ranger supports data masking for HetuEngine data. It can process the return resu | | | | | Click **Select Masking Option** and select a data masking policy. | | | | - | | - **Redact**: Use **x** to mask all letters and **n** to mask all digits. | + | | - **Redact**: Use **x** to mask all letters and **0** to mask all digits. | | | - **Partial mask: show last 4**: Only the last four characters are displayed, and the rest characters are displayed using **x**. | | | - **Partial mask: show first 4**: Only the first four characters are displayed, and the rest characters are displayed using **x**. | | | - **Hash**: Replace the original value with the hash value. | diff --git a/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hive.rst b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hive.rst index 6bf8763..7a4b3a8 100644 --- a/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hive.rst +++ b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_hive.rst @@ -217,7 +217,7 @@ Ranger supports data masking for Hive data. It can process the returned result o | | | | | Click **Select Masking Option** and select a data masking policy. | | | | - | | - Redact: Use **x** to mask all letters and **n** to mask all digits. | + | | - Redact: Use **x** to mask all letters and **0** to mask all digits. | | | - Partial mask: show last 4: Only the last four characters are displayed, and the rest characters are displayed using **x**. | | | - Partial mask: show first 4: Only the first four characters are displayed, and the rest characters are displayed using **x**. | | | - Hash: Replace the original value with the hash value. The Hive built-in function **mask_hash** is used. This is valid only for fields of the string, character, and varchar types. NULL is returned for fields of other types. | diff --git a/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_spark2x.rst b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_spark2x.rst index d9447a6..5a382d7 100644 --- a/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_spark2x.rst +++ b/doc/component-operation-guide-lts/source/using_ranger/adding_a_ranger_access_permission_policy_for_spark2x.rst @@ -220,7 +220,7 @@ Ranger supports data masking for Spark2x data. It can process the returned resul | | | | | Click **Select Masking Option** and select a data masking policy. | | | | - | | - Redact: Use **x** to mask all letters and **n** to mask all digits. | + | | - Redact: Use **x** to mask all letters and **0** to mask all digits. | | | - Partial mask: show last 4: Only the last four characters are displayed. | | | - Partial mask: show first 4: Only the first four characters are displayed. | | | - Hash: Perform hash calculation for data. | diff --git a/doc/component-operation-guide-lts/source/using_ranger/configuring_ranger_specifications.rst b/doc/component-operation-guide-lts/source/using_ranger/configuring_ranger_specifications.rst new file mode 100644 index 0000000..23b5368 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_ranger/configuring_ranger_specifications.rst @@ -0,0 +1,32 @@ +:original_name: mrs_01_24767.html + +.. _mrs_01_24767: + +Configuring Ranger Specifications +================================= + +Scenario +-------- + +Ranger provides permission policies for services. When the number of service instances using Ranger increases, you need to adjust the specifications of Ranger. + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Configuring Memory Parameters +----------------------------- + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Ranger**. Click **Configurations** then **All Configurations**, and search for **GC_OPTS** in the **RangerAdmin JVM** parameter. The default value of **GC_OPTS** is **-Dproc_rangeradmin -Xms2G -Xmx2G -XX:MaxDirectMemorySize=512M -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=200M -XX:PermSize=64M -XX:MaxPermSize=512M -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:${RANGER_ADMIN_LOG_DIR}/gc-worker-%p-%t.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=20M -verbose:gc -Djdk.tls.ephemeralDHKeySize=3072 -Djava.security.auth.login.config=#{conf_dir}/jaas.conf -Djava.security.krb5.conf=${KRB5_CONFIG} -Dbeetle.application.home.path=${BIGDATA_HOME}/common/runtime/security/config -Djna.tmpdir=${RANGER_TMP_HOME} -Djava.io.tmpdir=${RANGER_TMP_HOME} ${JAVA_STACK_PREFER} -Djdk.tls.rejectClientInitiatedRenegotiation=true**. + +2. Change the value of **GC_OPTS** in the **RangerAdmin JVM** parameter as follows: + + Service instances that use Ranger include HDFS (NameNode), YARN (ResourceManager), HBase (HMaster and RegionServer), Hive (HiveServer), Kafka (Broker), Elasticsearch (EsNode, EsMaster, and EsClient), CDL (CDLService), and HetuEngine (HSBroker). When the number of these instances increases, change the default value **-Xms2G -Xmx2G** according to the reference RangerAdmin memory specifications listed below. + + ================ =============== + Ranger Instances Reference Value + ================ =============== + 200 -Xms4G -Xmx4G + 400 -Xms8G -Xmx8G + 600 -Xms12G -Xmx12G + ================ =============== diff --git a/doc/component-operation-guide-lts/source/using_ranger/index.rst b/doc/component-operation-guide-lts/source/using_ranger/index.rst index 486b2f6..d3d188c 100644 --- a/doc/component-operation-guide-lts/source/using_ranger/index.rst +++ b/doc/component-operation-guide-lts/source/using_ranger/index.rst @@ -12,6 +12,7 @@ Using Ranger - :ref:`Configuring a Security Zone ` - :ref:`Changing the Ranger Data Source to LDAP for a Normal Cluster ` - :ref:`Viewing Ranger Permission Information ` +- :ref:`Adding a Ranger Access Permission Policy for CDL ` - :ref:`Adding a Ranger Access Permission Policy for HDFS ` - :ref:`Adding a Ranger Access Permission Policy for HBase ` - :ref:`Adding a Ranger Access Permission Policy for Hive ` @@ -19,6 +20,7 @@ Using Ranger - :ref:`Adding a Ranger Access Permission Policy for Spark2x ` - :ref:`Adding a Ranger Access Permission Policy for Kafka ` - :ref:`Adding a Ranger Access Permission Policy for HetuEngine ` +- :ref:`Configuring Ranger Specifications ` - :ref:`Ranger Log Overview ` - :ref:`Common Issues About Ranger ` @@ -33,6 +35,7 @@ Using Ranger configuring_a_security_zone changing_the_ranger_data_source_to_ldap_for_a_normal_cluster viewing_ranger_permission_information + adding_a_ranger_access_permission_policy_for_cdl adding_a_ranger_access_permission_policy_for_hdfs adding_a_ranger_access_permission_policy_for_hbase adding_a_ranger_access_permission_policy_for_hive @@ -40,5 +43,6 @@ Using Ranger adding_a_ranger_access_permission_policy_for_spark2x adding_a_ranger_access_permission_policy_for_kafka adding_a_ranger_access_permission_policy_for_hetuengine + configuring_ranger_specifications ranger_log_overview common_issues_about_ranger/index diff --git a/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/configuring_the_drop_partition_command_to_support_batch_deletion.rst b/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/configuring_the_drop_partition_command_to_support_batch_deletion.rst new file mode 100644 index 0000000..da1aa1e --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/configuring_the_drop_partition_command_to_support_batch_deletion.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_24745.html + +.. _mrs_01_24745: + +Configuring the Drop Partition Command to Support Batch Deletion +================================================================ + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Scenario +-------- + +Currently, the **Drop Partition** command in Spark supports partition deletion using only the equal sign (=). This configuration allows multiple filter criteria to be used to delete partitions in batches, for example, **<**, **<=**, **>**, **>=**, **!>**, and **!<**. + +Configuration +------------- + +Log in to FusionInsight Manager and choose **Cluster**. Click the name of the desired cluster and choose **Services** > **Spark2x**. On the page that is displayed, click the **Configurations** tab then **All Configurations**, and search for the following parameters. + ++-----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ +| Parameter | Description | Default Value | ++=========================================+================================================================================================================================================================+===============+ +| spark.sql.dropPartitionsInBatch.enabled | If this parameter is set to **true**, the **Drop Partition** command supports the following filter criteria: **<**, **<=**, **>**, **>=**, **!>**, and **!<**. | true | ++-----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ +| spark.sql.dropPartitionsInBatch.limit | Indicates the maximum number of partitions that can be batch dropped. | 1000 | ++-----------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/enabling_an_executor_to_execute_custom_code_when_exiting.rst b/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/enabling_an_executor_to_execute_custom_code_when_exiting.rst new file mode 100644 index 0000000..1eae141 --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/enabling_an_executor_to_execute_custom_code_when_exiting.rst @@ -0,0 +1,28 @@ +:original_name: mrs_01_24805.html + +.. _mrs_01_24805: + +Enabling an Executor to Execute Custom Code When Exiting +======================================================== + +.. note:: + + This section applies only to MRS 3.2.0 or later. + +Scenario +-------- + +You can configure the following parameters to execute custom code when Executor exits. + +Configuration Parameters +------------------------ + +Configure the following parameters in the **spark-defaults.conf** file of the Spark client. + ++-----------------------------------------------------+----------------------------------------------------------------------------------------------------+---------------+ +| Parameter | Description | Default Value | ++=====================================================+====================================================================================================+===============+ +| spark.executor.execute.shutdown.cleaner | If this parameter is set to **true**, an executor can execute custom code when the executor exits. | false | ++-----------------------------------------------------+----------------------------------------------------------------------------------------------------+---------------+ +| spark.executor.execute.shutdown.cleaner.max.timeout | Timeout interval for an executor to execute custom code. | 240s | ++-----------------------------------------------------+----------------------------------------------------------------------------------------------------+---------------+ diff --git a/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/index.rst b/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/index.rst index 8bde2fb..3c58fac 100644 --- a/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/index.rst +++ b/doc/component-operation-guide-lts/source/using_spark2x/basic_operation/scenario-specific_configuration/index.rst @@ -31,6 +31,8 @@ Scenario-Specific Configuration - :ref:`Configuring Local Disk Cache for JobHistory ` - :ref:`Configuring Spark SQL to Enable the Adaptive Execution Feature ` - :ref:`Configuring Event Log Rollover ` +- :ref:`Configuring the Drop Partition Command to Support Batch Deletion ` +- :ref:`Enabling an Executor to Execute Custom Code When Exiting ` .. toctree:: :maxdepth: 1 @@ -62,3 +64,5 @@ Scenario-Specific Configuration configuring_local_disk_cache_for_jobhistory configuring_spark_sql_to_enable_the_adaptive_execution_feature configuring_event_log_rollover + configuring_the_drop_partition_command_to_support_batch_deletion + enabling_an_executor_to_execute_custom_code_when_exiting diff --git a/doc/component-operation-guide-lts/source/using_spark2x/common_issues_about_spark2x/spark_sql_and_dataframe/index.rst b/doc/component-operation-guide-lts/source/using_spark2x/common_issues_about_spark2x/spark_sql_and_dataframe/index.rst index 8ff83dc..9b1c160 100644 --- a/doc/component-operation-guide-lts/source/using_spark2x/common_issues_about_spark2x/spark_sql_and_dataframe/index.rst +++ b/doc/component-operation-guide-lts/source/using_spark2x/common_issues_about_spark2x/spark_sql_and_dataframe/index.rst @@ -28,6 +28,7 @@ Spark SQL and DataFrame - :ref:`Why Are Some Functions Not Available when Another JDBCServer Is Connected? ` - :ref:`Why Does Spark2x Have No Access to DataSource Tables Created by Spark1.5? ` - :ref:`Why Does Spark-beeline Fail to Run and Error Message "Failed to create ThriftService instance" Is Displayed? ` +- :ref:`Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL? ` .. toctree:: :maxdepth: 1 @@ -56,3 +57,4 @@ Spark SQL and DataFrame why_are_some_functions_not_available_when_another_jdbcserver_is_connected why_does_spark2x_have_no_access_to_datasource_tables_created_by_spark1.5 why_does_spark-beeline_fail_to_run_and_error_message_failed_to_create_thriftservice_instance_is_displayed + why_cannot_i_query_newly_inserted_data_in_an_orc_hive_table_using_spark_sql diff --git a/doc/component-operation-guide-lts/source/using_spark2x/common_issues_about_spark2x/spark_sql_and_dataframe/why_cannot_i_query_newly_inserted_data_in_an_orc_hive_table_using_spark_sql.rst b/doc/component-operation-guide-lts/source/using_spark2x/common_issues_about_spark2x/spark_sql_and_dataframe/why_cannot_i_query_newly_inserted_data_in_an_orc_hive_table_using_spark_sql.rst new file mode 100644 index 0000000..2f696db --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_spark2x/common_issues_about_spark2x/spark_sql_and_dataframe/why_cannot_i_query_newly_inserted_data_in_an_orc_hive_table_using_spark_sql.rst @@ -0,0 +1,35 @@ +:original_name: mrs_01_24491.html + +.. _mrs_01_24491: + +Why Cannot I Query Newly Inserted Data in an ORC Hive Table Using Spark SQL? +============================================================================ + +Question +-------- + +Why cannot I query newly inserted data in an ORC Hive table using Spark SQL? This problem occurs in the following scenarios: + +- For partitioned tables and non-partitioned tables, after data is inserted on the Hive client, the latest inserted data cannot be queried using Spark SQL. +- After data is inserted into a partitioned table using Spark SQL, if the partition information remains unchanged, the newly inserted data cannot be queried using Spark SQL. + +Answer +------ + +To improve Spark performance, ORC metadata is cached. When the ORC table is updated by Hive or another means, the cached metadata remains unchanged, resulting in Spark SQL failing to query the newly inserted data. + +For an ORC Hive partition table, if the partition information remains unchanged after data is inserted, the cached metadata is not updated. As a result, the newly inserted data cannot be queried by Spark SQL. + +**Solution** + +#. To solve the query problem, update metadata before starting a Spark SQL query. + + **REFRESH TABLE** *table_name*\ **;** + + *table_name* indicates the name of the table to be updated. The table must exist. Otherwise, an error is reported. + + When the query statement is executed, the latest inserted data can be obtained. + +#. Run the following command to disable Spark optimization when using Spark: + + **set spark.sql.hive.convertMetastoreOrc=false;** diff --git a/doc/component-operation-guide-lts/source/using_tez/common_issues/table_data_is_empty_on_the_tezui_hivequeries_page.rst b/doc/component-operation-guide-lts/source/using_tez/common_issues/table_data_is_empty_on_the_tezui_hivequeries_page.rst index 47dd94c..af421f9 100644 --- a/doc/component-operation-guide-lts/source/using_tez/common_issues/table_data_is_empty_on_the_tezui_hivequeries_page.rst +++ b/doc/component-operation-guide-lts/source/using_tez/common_issues/table_data_is_empty_on_the_tezui_hivequeries_page.rst @@ -15,21 +15,7 @@ Answer To display Hive Queries task data on the Tez web UI, you need to set the following parameters: -On Manager, choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations** > **Hive** > **Customization**. Add the following configuration to **yarn-site.xml**: - -+-----------------------------------------------------------+---------------------------------+ -| Attribute | Attribute Value | -+===========================================================+=================================+ -| yarn.timeline-service.enabled | true | -+-----------------------------------------------------------+---------------------------------+ -| yarn.timeline-service.webapp.https.address | #{tl_hostname}:#{tl_https_port} | -+-----------------------------------------------------------+---------------------------------+ -| yarn.resourcemanager.system-metrics-publisher.enabled | true | -+-----------------------------------------------------------+---------------------------------+ -| yarn.timeline-service.generic-application-history.enabled | true | -+-----------------------------------------------------------+---------------------------------+ - -On Manager, choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations** > **HiveServer** > **Customization**. Add the following configuration to **hive-site.xml**: +On FusionInsight Manager, choose **Cluster** > **Services** > **Hive** > **Configurations** > **All Configurations** > **HiveServer** > **Customization**. Add the following configuration to **hive-site.xml**: ======================= ======================================= Attribute Attribute Value diff --git a/doc/component-operation-guide-lts/source/using_yarn/common_issues_about_yarn/why_are_local_logs_not_deleted_after_yarn_is_restarted.rst b/doc/component-operation-guide-lts/source/using_yarn/common_issues_about_yarn/why_are_local_logs_not_deleted_after_yarn_is_restarted.rst index a045251..093bee2 100644 --- a/doc/component-operation-guide-lts/source/using_yarn/common_issues_about_yarn/why_are_local_logs_not_deleted_after_yarn_is_restarted.rst +++ b/doc/component-operation-guide-lts/source/using_yarn/common_issues_about_yarn/why_are_local_logs_not_deleted_after_yarn_is_restarted.rst @@ -16,4 +16,10 @@ If Yarn is restarted in either of the following scenarios, local logs will not b Answer ------ -NodeManager has a restart recovery mechanism (for details, see https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/NodeManager.html#NodeManager_Restart). Go to the **All Configurations** page of Yarn by referring to :ref:`Modifying Cluster Service Configuration Parameters `. Set **yarn.nodemanager.recovery.enabled** of NodeManager to **true** to make the configuration take effect. The default value is **true**. In this way, redundant local logs are periodically deleted when the YARN is restarted. +NodeManager has a restart recovery mechanism. For details, visit the following: + +Versions earlier than MRS 3.2.0: https://hadoop.apache.org/docs/r3.1.1/hadoop-yarn/hadoop-yarn-site/NodeManager.html#NodeManager_Restart + +MRS 3.2.0 or later: https://hadoop.apache.org/docs/r3.3.1/hadoop-yarn/hadoop-yarn-site/NodeManager.html#NodeManager_Restart + +Go to the **All Configurations** page of Yarn by referring to :ref:`Modifying Cluster Service Configuration Parameters `. Set **yarn.nodemanager.recovery.enabled** of NodeManager to **true** to make the configuration take effect. The default value is **true**. In this way, redundant local logs are periodically deleted when the YARN is restarted. diff --git a/doc/component-operation-guide-lts/source/using_yarn/configuring_container_log_aggregation.rst b/doc/component-operation-guide-lts/source/using_yarn/configuring_container_log_aggregation.rst index dc5a653..3043899 100644 --- a/doc/component-operation-guide-lts/source/using_yarn/configuring_container_log_aggregation.rst +++ b/doc/component-operation-guide-lts/source/using_yarn/configuring_container_log_aggregation.rst @@ -43,6 +43,22 @@ The periodic log collection function applies only to MapReduce applications, for | | - The container logs that are generated before the parameter is set to **false** and the setting takes effect cannot be obtained from the web UI. | | | | - If you want to view the logs generated before on the web UI, you are advised to set this parameter to **true**. | | +----------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ + | yarn.log-aggregation.per-day.enabled | Whether to enable the function of collecting Yarn job logs by day. After this function is enabled, the logs generated on the same day are stored in a separate directory. This parameter takes effect only when: | false | + | | | | + | | - The value of **yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds** is not greater than 0. | | + | | - The value of **yarn.log-aggregation.retain-seconds** is greater than 86400 seconds. | | + | | | | + | | .. note:: | | + | | | | + | | This section applies to MRS 3.2.0 or later. | | + | | | | + | | After this function is enabled, the default log archive directory (configurable) changes to **/tmp/logs/**\ *{USER_NAME}*\ **/**\ *{DATE_STR}*\ **/logs/**\ *application_id*. | | + | | | | + | | - *USER_NAME*: indicates the user who runs the application. | | + | | - *DATE_STR*: indicates the time when the application is complete. | | + | | | | + | | For example, if user **testuser** runs a job whose ID is **application_11111111_1111** and the job is completed on December 12, 2021, the job log will be stored on the **/tmp/logs/testuser/20211212/logs/application_11111111_1111** directory. | | + +----------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+ | yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds | Interval for NodeManager to periodically collect logs | -1 | | | | | | | - If this parameter is set to **-1** or **0**, periodic log collection is disabled. Logs are collected at a time after application running is complete. | | diff --git a/doc/component-operation-guide-lts/source/using_yarn/configuring_ha_for_timelineserver.rst b/doc/component-operation-guide-lts/source/using_yarn/configuring_ha_for_timelineserver.rst new file mode 100644 index 0000000..59eeb0d --- /dev/null +++ b/doc/component-operation-guide-lts/source/using_yarn/configuring_ha_for_timelineserver.rst @@ -0,0 +1,34 @@ +:original_name: mrs_01_24814.html + +.. _mrs_01_24814: + +Configuring HA for TimelineServer +================================= + +Scenario +-------- + +As a role of the Yarn service, TimelineServer supports the HA mode since the current version. To prevent a single point of failure of TimelineServer, you can enable TimelineServer HA to ensure high availability of the TimelineServer role. + +.. note:: + + Currently, clusters in IPv6 security mode do not support TimelineServer HA. + + This function applies to MRS 3.2.0-LTS.1 or later. + +Impact on the System +-------------------- + +- Before the conversion, change the value of **TLS_FLOAT_IP** of TimelineServer to an available floating IP address. (In the case of a single instance, the service IP address of the node is used by default.) +- During the conversion, the configuration of the TimelineServer role will expire. In this case, you need to restart the instance whose configuration expires. + +Procedure +--------- + +#. Log in to FusionInsight Manager and choose **Cluster** > **Services** > **Yarn**. Click **Configurations**. +#. Change the value of **TLS_FLOAT_IP** to an available floating IP address (the floating IP address and the service IP addresses of the two TimelineServer instances must be in the same network segment) and click **Save** then **OK**. +#. Click the **Instance** tab. Click **Add Instance**, select a node to add a TimelineServer instance, and choose **Next** > **Next** > **Submit**. The instance is added. +#. On FusionInsight Manager, click |image1| next to the cluster name, select **Restart Configuration-Expired Instances**, and wait until the instance is restarted. +#. Check the status of each instance after the restart. For example, the active/standby status and running status of the TimelineServer instances are normal. + +.. |image1| image:: /_static/images/en-us_image_0000001533359808.jpg diff --git a/doc/component-operation-guide-lts/source/using_yarn/index.rst b/doc/component-operation-guide-lts/source/using_yarn/index.rst index 8bc8f27..4c9e162 100644 --- a/doc/component-operation-guide-lts/source/using_yarn/index.rst +++ b/doc/component-operation-guide-lts/source/using_yarn/index.rst @@ -22,6 +22,7 @@ Using Yarn - :ref:`Configuring ApplicationMaster Work Preserving ` - :ref:`Configuring the Localized Log Levels ` - :ref:`Configuring Users That Run Tasks ` +- :ref:`Configuring HA for TimelineServer ` - :ref:`Yarn Log Overview ` - :ref:`Yarn Performance Tuning ` - :ref:`Common Issues About Yarn ` @@ -47,6 +48,7 @@ Using Yarn configuring_applicationmaster_work_preserving configuring_the_localized_log_levels configuring_users_that_run_tasks + configuring_ha_for_timelineserver yarn_log_overview yarn_performance_tuning/index common_issues_about_yarn/index