forked from docs/doc-exports
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com> Co-authored-by: chenxiaoxiong <chenxiaoxiong@huawei.com> Co-committed-by: chenxiaoxiong <chenxiaoxiong@huawei.com>
29 lines
15 KiB
HTML
29 lines
15 KiB
HTML
<a name="dataartsstudio_01_0112"></a><a name="dataartsstudio_01_0112"></a>
|
||
|
||
<h1 class="topictitle1">Incremental File Migration</h1>
|
||
<div id="body32001227"><p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p5755184962816">CDM supports incremental migration of file systems. After full migration is complete, all new files or only specified directories or files can be exported.</p>
|
||
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p1788415308396">Currently, CDM supports the following incremental migration modes:</p>
|
||
<ol id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ol151611021132916"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li016162111293"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b176111491795">Exporting the files in a specified directory</strong><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul016102172919"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li171611421112920">Application scenarios: The migration source is a file system (OBS/HDFS/FTP/SFTP). In incremental migration, only the specified files are written to the migration destination. The existing records are not updated or deleted.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li41611621152915">Key configurations: <a href="#dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442">File/Path Filter</a> and Schedule Execution</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li7161122113293">Prerequisites: The source directory or file name contains the time field.</li></ul>
|
||
</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li51612217291"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b164242119119">Exporting the files modified after the specified time point</strong><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul716116217293"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li1161921102919">Application scenarios: The migration source is a file system (OBS/HDFS/FTP/SFTP). The specified time point refers to the time when the file is modified. CDM migrates the files modified at or after the specified time point.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li116116213297">Key configurations: <a href="#dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142">Time Filter</a> and Schedule Execution</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li11611212297">Prerequisites: None</li></ul>
|
||
</li></ol>
|
||
<div class="note" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_note45791757134910"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_p52974484409">If you have configured a macro variable of date and time and schedule a CDM job through <span id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_text9997118203">DataArts Studio DataArts Factory</span>, the system replaces the macro variable of date and time with (<em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i799871152012">Planned start time of the data development job</em> – <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i0998101192020">Offset</em>) rather than (<em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i299821132018">Actual start time of the CDM job</em> – <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i129981917200">Offset</em>).</p>
|
||
</div></div>
|
||
<div class="section" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442"></a><h4 class="sectiontitle">File/Path Filter</h4><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul1799771715539"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li149201623195417">Parameter position: When creating a table/file migration job, if the migration source is a file system, set <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname3155195675010"><b>Filter Type</b></span> in advanced attributes of <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b61555561500">Source Job Configuration</strong> to <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b13156105611508">Wildcard</strong> or <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b1715685655015">Regular expression</strong>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li1999751710531">Parameter principle: If you select <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname160916614566"><b>Wildcard</b></span> for <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b861496185618">Filter Type</strong>, CDM filters files or paths based on the configured wildcard character and migrates only files or paths that meet the specified condition.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li168702052021">Example configurations:<div class="p" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p74551859825"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li168702052021"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li168702052021"></a>Suppose that the source file name contains the date and time field, such as <span class="uicontrol" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_uicontrol186634790810464"><b>2017-10-15 20:25:26</b></span>, the <span class="filepath" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275452_filepath29204443164512"><b>/opt/data/file_20171015202526.data</b></span> file is generated. Set the parameters as follows:<ol id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275452_ol429223101744"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275452_li393696511744"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b87635407457">Filter Type</strong>: Select <span class="parmvalue" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmvalue1348274664514"><b>Wildcard</b></span>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li4984141755918"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b1827272416518">File Filter</strong>: Enter <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b911273605919">"*${dateformat(yyyyMMdd,-1,DAY)}*"</strong>, which is the format of the macro variables of date and time supported by CDM. For details, see <a href="dataartsstudio_01_0114.html">Using Macro Variables of Date and Time</a>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li1139718593313">Schedule Execution: Set <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname1729817512498"><b>Cycle (days)</b></span> to <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b792319533498">1</strong>.</li></ol>
|
||
</div>
|
||
</li></ul>
|
||
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p86931646142420">In this way, you can import the files generated in the previous day to the destination directory every day to implement incremental synchronization.</p>
|
||
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p12513101092815">In incremental file migration, <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname272361413454"><b>Path Filter</b></span> is used in the same way as <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname4673173513459"><b>File Filter</b></span>. The path name must contain the time field. In this case, all files in the specified path can be synchronized periodically.</p>
|
||
</div>
|
||
<div class="section" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142"></a><h4 class="sectiontitle">Time Filter</h4><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul3948167155"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li128321438455">Parameter position: When creating a table/file migration job, if the migration source is a file system, set select <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b1192854819175">Yes</strong> for <span class="parmvalue" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmvalue119355484171"><b>Time Filter</b></span>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li994817552">Parameter principle: After you specify the start time and end time, only files that are modified between the start time (included) and end time (excluded) will be migrated.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li14267165018612">Example configurations:<div class="p" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p15425223102718"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li14267165018612"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li14267165018612"></a>For example, if you want CDM to synchronize only the files generated from January 1, 2021 to January 1, 2022 to the destination, configure the following parameters:<ol id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ol642532318276"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li038111346711"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b162044238576">Time Filter</strong>: select <span class="parmvalue" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmvalue1243519428713"><b>Yes</b></span>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li4776193384"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b20607194042118">Minimum Timestamp</strong>: Enter a value in the format of <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_i0607740102116">yyyy-MM-dd HH:mm:ss</em>, such as <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b2060814404218">2021-01-01 00:00:00</strong>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li3943897297"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b2708142319456">Maximum Timestamp</strong>: Enter a value in the format of <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_i1170812354514">yyyy-MM-dd HH:mm:ss</em>, such as <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b470915233450">2022-01-01 00:00:00</strong>.</li></ol>
|
||
</div>
|
||
</li></ul>
|
||
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p1644232315278">In this way, the CDM job migrates only the files generated from January 1, 2021 to January 1, 2022, and performs incremental synchronization next time it is started.</p>
|
||
</div>
|
||
</div>
|
||
<div>
|
||
<div class="familylinks">
|
||
<div class="parentlink"><strong>Parent topic:</strong> <a href="dataartsstudio_01_0111.html">Incremental Migration</a></div>
|
||
</div>
|
||
</div>
|
||
|