Files
doc-exports/docs/dataartsstudio/umn/dataartsstudio_01_0053.html
chenxiaoxiong f9e2808b7c DataArts UMN 20250810 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: chenxiaoxiong <chenxiaoxiong@huawei.com>
Co-committed-by: chenxiaoxiong <chenxiaoxiong@huawei.com>
2025-09-02 10:44:13 +00:00

118 lines
14 KiB
HTML

<a name="dataartsstudio_01_0053"></a><a name="dataartsstudio_01_0053"></a>
<h1 class="topictitle1">From HTTP</h1>
<div id="body8662426"><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p6222841216210">If the source link of a job is an HTTP link, configure the source job parameters based on <a href="#dataartsstudio_01_0053__en-us_topic_0114711714_table5046103815165">Table 1</a>. <span id="dataartsstudio_01_0053__en-us_topic_0114711714_ph5509115455119">Currently, data can only be exported from the HTTP URLs.</span></p>
<div class="tablenoborder"><a name="dataartsstudio_01_0053__en-us_topic_0114711714_table5046103815165"></a><a name="en-us_topic_0114711714_table5046103815165"></a><table cellpadding="4" cellspacing="0" summary="" id="dataartsstudio_01_0053__en-us_topic_0114711714_table5046103815165" frame="border" border="1" rules="all"><caption><b>Table 1 </b>Parameter description</caption><thead align="left"><tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row585315215165"><th align="left" class="cellrowborder" valign="top" width="14.531453145314533%" id="mcps1.3.2.2.4.1.1"><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p1626397215165">Parameter</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="64.41644164416441%" id="mcps1.3.2.2.4.1.2"><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p4231334915165">Description</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="21.052105210521052%" id="mcps1.3.2.2.4.1.3"><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p482921015165">Example Value</p>
</th>
</tr>
</thead>
<tbody><tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row4012116315165"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p2858877215165">File URL</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p41044235294">Use the GET method to obtain data from the HTTP/HTTPS URL.</p>
<p id="dataartsstudio_01_0053__en-us_topic_0114711714_p0123115172917">These connectors are used to read files with an HTTP/HTTPS URL, such as reading public files on the third-party object storage system and web disks.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p205821717567">-</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row1120101174711"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p282108154917">Pull List File</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p18821854917">If this parameter is set to <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b89265253321040">Yes</strong>, the system pulls the files corresponding to the URLs in the text file to be uploaded and stores them on OBS. The text file records the file paths on HDFS.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p78388124915">Yes</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row174571658174617"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p192561997469">OBS Link of List File</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p16256159104610">Select an existing OBS link.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p725609194610">obs_link</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row777075517465"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p27225524616">OBS Bucket of entries files</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p672212524619">Name of the OBS bucket that stores the text file</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p3722853467">obs-cdm</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row1650174910460"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p1491233724510">Path/Directory of entries files</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p17912143794515">Custom OBS directories that store the text file. Use slashes (/) to separate different directories.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275442_p189121537124517">test1</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row1497845915165"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p529563715165">File Format</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p440214575817">Format used for transmitting data. The CSV and JSON formats are supported for migration to tables, and the binary format is supported for file migration.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p3753014815165">Binary</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row195259453167"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p5526745191614">Compression Format</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><div class="p" id="dataartsstudio_01_0053__en-us_topic_0114711714_p2806544011518">Compression format of the source files. The options are as follows:<ul id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108272831_ul64234801103023"><li id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_li6351528115423"><strong id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_b84235270615655_1">NONE</strong>: Files in all formats can be transferred.</li><li id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_li2191882915426"><strong id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_b161191081482">GZIP</strong>: Only files in gzip format can be transferred.</li><li id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_li146183210612"><strong id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_b84235270615655_3">ZIP</strong>: Only files in Zip format can be transferred.</li><li id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_li1754311673810"><strong id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_b16279301073">TAR.GZ</strong>: Files in TAR.GZ format are transferred.</li></ul>
</div>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p165261145181617">NONE</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row24820469167"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p71401359203717">Compressed File Suffix</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p16140159143718">This parameter is displayed when <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_b56981326292">Compression Format</strong> is not <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_b869819261797">NONE</strong>.</p>
<p id="dataartsstudio_01_0053__en-us_topic_0114711714_p1214010593377">This parameter specifies the extension of the files to be decompressed. The decompression operation is performed only when the file name extension is used in a batch of files. Otherwise, files are transferred in the original format. If you enter <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_en-us_topic_0108275319_b19098768612106">*</strong> or leave the parameter blank, all files are decompressed.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p71406594373">*</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row175761546101619"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p4577104611615">File Separator</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p8813175612198">File separator. When multiple files are transferred, CDM uses the file separator to identify files. The default value is <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b1595122231717">|</strong>. This parameter is not displayed if <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b17012583299">Pull List File</strong> is set to <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b924915211309">Yes</strong>.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p11577154631620">|</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row1539591785713"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p173961917165716">Query Parameter</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><ul id="dataartsstudio_01_0053__en-us_topic_0114711714_ul887516311069"><li id="dataartsstudio_01_0053__en-us_topic_0114711714_li9875431666">If you set this parameter to <span class="parmvalue" id="dataartsstudio_01_0053__en-us_topic_0114711714_parmvalue763411421516"><b>Yes</b></span>, the name of the objects uploaded to OBS does not include the <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b1963412143159">query</strong> parameter.</li><li id="dataartsstudio_01_0053__en-us_topic_0114711714_li76907159154">If you set this parameter to <span class="parmvalue" id="dataartsstudio_01_0053__en-us_topic_0114711714_parmvalue206463412155"><b>No</b></span>, the name of the objects uploaded to OBS includes the <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b46433420158">query</strong> parameter.</li></ul>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p143961517125716">No</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row156119916465"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p131273401892">Disregard Non-existent Path or File</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p16127540398">If this is set to <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b3157554194315">Yes</strong>, the job can be successfully executed even if the source path does not exist.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p712717406915">No</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row10363922103613"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p116778319321">MD5 File Extension</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p205522046409">This parameter is used to check whether the files extracted by CDM are consistent with source files. </p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p267718319321">.md5</p>
</td>
</tr>
<tr id="dataartsstudio_01_0053__en-us_topic_0114711714_row03971236135910"><td class="cellrowborder" valign="top" width="14.531453145314533%" headers="mcps1.3.2.2.4.1.1 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p102221939115916">Query Parameter</p>
</td>
<td class="cellrowborder" valign="top" width="64.41644164416441%" headers="mcps1.3.2.2.4.1.2 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p939703610592">If this parameter is set to <strong id="dataartsstudio_01_0053__en-us_topic_0114711714_b71231711116">Yes</strong>, the name of the object to be uploaded is a string with the query parameter removed.</p>
</td>
<td class="cellrowborder" valign="top" width="21.052105210521052%" headers="mcps1.3.2.2.4.1.3 "><p id="dataartsstudio_01_0053__en-us_topic_0114711714_p17397183655915">No</p>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dataartsstudio_01_0047.html">Configuring CDM Source Job Parameters</a></div>
</div>
</div>