You can use Loader to import data from the SFTP server to HDFS.
This section applies to MRS clusters earlier than 3.x.
Prerequisites
- You have prepared service data.
- You have created an analysis cluster.
Procedure
- Access the Loader page.
- Access the cluster details page.
- For versions earlier than MRS 1.9.2, log in to MRS Manager and choose Services.
- For MRS 1.9.2 or later, click the cluster name on the MRS console and choose Components.
- Choose . In Hue Web UI of Hue Summary, click Hue (Active). The Hue web UI is displayed.
- Choose .
The job management tab page is displayed by default on the Loader page.
- On the Loader page, click Manage links.
- Click New link and create sftp-connector. For details, see File Server Link.
- Click New link, enter the link name, select hdfs-connector, and create hdfs-connector.
- On the Loader page, click Manage jobs.
- Click New Job.
- In Connection, set parameters.
- In Name, enter a job name.
- Select the source link created in 3 and the target link created in 4.
- In From, configure the job of the source link.
For details, see ftp-connector or sftp-connector.
- In To, configure the job of the target link.
For details, see hdfs-connector.
- In Task Config, set job running parameters.
Table 1 Loader job running propertiesParameter
|
Description
|
Extractors
|
Number of Map tasks
|
Loaders
|
Number of Reduce tasks
This parameter is displayed only when the destination field is HBase or Hive.
|
Max. Error Records in a Single Shard
|
Error record threshold. If the number of error records of a single Map task exceeds the threshold, the task automatically stops and the obtained data is not returned.
NOTE: Data is read and written in batches for MYSQL and MPPDB of generic-jdbc-connector by default. Errors are recorded once at most for each batch of data.
|
Dirty Data Directory
|
Directory for saving dirty data. If you leave this parameter blank, dirty data will not be saved.
|
- Click Save.