GDS is a data service tool provided by GaussDB(DWS). Using the foreign table mechanism, this tool helps export data at a high speed.
To install, configure, and start GDS, perform the following steps. For details, see Installing, Configuring, and Starting GDS.
mkdir -p /opt/bin/dws
Use the SUSE Linux package as an example. Upload the GDS package dws_client_8.x.x_suse_x64.zip to the directory created in the previous step.
cd /opt/bin/dws unzip dws_client_8.x.x_suse_x64.zip
groupadd gdsgrp useradd -g gdsgrp gds_user
chown -R gds_user:gdsgrp /opt/bin/dws/gds chown -R gds_user:gdsgrp /input_data
su - gds_user
If the current cluster version is 8.0.x or earlier, skip 8 and go to 9.
If the current cluster version is 8.1.x, go to the next step.
cd /opt/bin/dws/gds/bin source gds_env
GDS is green software and can be started after being decompressed. There are two ways to start GDS.
Method 1: Run the gds command to set startup parameters.
Method 2: Write the startup parameters into the gds.conf configuration file and run the gds_ctl.py command to start GDS.
gds -d dir -p ip:port -H address_string -l log_file -D -t worker_num
Example:
/opt/bin/dws/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/dws/gds/gds_log.txt -D -t 2
gds -d dir -p ip:port -H address_string -l log_file -D -t worker_num --enable-ssl --ssl-dir Cert_file
Example:
/opt/bin/dws/gds/bin/gds -d /input_data/ -p 192.168.0.90:5000 -H 10.10.0.1/24 -l /opt/bin/dws/gds/gds_log.txt -D --enable-ssl --ssl-dir /opt/bin/
Replace the information in italic as required.
GDS determines the number of threads based on the number of parallel import transactions. Even if multi-thread import is configured before GDS startup, the import of a single transaction will not be accelerated. By default, an INSERT statement is an import transaction.
vim /opt/bin/dws/gds/config/gds.conf
Example:
Configure the gds.conf file as follows:
<?xml version="1.0"?> <config> <gds name="gds1" ip="192.168.0.90" port="5000" data_dir="/input_data/" err_dir="/err" data_seg="100MB" err_seg="100MB" log_file="/log/gds_log.txt" host="10.10.0.1/24" daemon='true' recursive="true" parallel="32"></gds> </config>
Information in the configuration file is as follows:
python3 gds_ctl.py start
Example:
cd /opt/bin/dws/gds/bin
python3 gds_ctl.py start
Start GDS gds1 [OK]
gds [options]:
-d dir Set data directory.
-p port Set GDS listening port.
ip:port Set GDS listening ip address and port.
-l log_file Set log file.
-H secure_ip_range
Set secure IP checklist in CIDR notation. Required for GDS to start.
-e dir Set error log directory.
-E size Set size of per error log segment.(0 < size < 1TB)
-S size Set size of data segment.(1MB < size < 100TB)
-t worker_num Set number of worker thread in multi-thread mode, the upper limit is 200. If without setting, the default value is 8.
-s status_file Enable GDS status report.
-D Run the GDS as a daemon process.
-r Read the working directory recursively.
-h Display usage.