Files
doc-exports/docs/dataartsstudio/umn/dataartsstudio_01_0112.html
chenxiaoxiong f9e2808b7c DataArts UMN 20250810 version
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: chenxiaoxiong <chenxiaoxiong@huawei.com>
Co-committed-by: chenxiaoxiong <chenxiaoxiong@huawei.com>
2025-09-02 10:44:13 +00:00

29 lines
15 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<a name="dataartsstudio_01_0112"></a><a name="dataartsstudio_01_0112"></a>
<h1 class="topictitle1">Incremental File Migration</h1>
<div id="body32001227"><p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p5755184962816">CDM supports incremental migration of file systems. After full migration is complete, all new files or only specified directories or files can be exported.</p>
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p1788415308396">Currently, CDM supports the following incremental migration modes:</p>
<ol id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ol151611021132916"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li016162111293"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b176111491795">Exporting the files in a specified directory</strong><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul016102172919"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li171611421112920">Application scenarios: The migration source is a file system (OBS/HDFS/FTP/SFTP). In incremental migration, only the specified files are written to the migration destination. The existing records are not updated or deleted.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li41611621152915">Key configurations: <a href="#dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442">File/Path Filter</a> and Schedule Execution</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li7161122113293">Prerequisites: The source directory or file name contains the time field.</li></ul>
</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li51612217291"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b164242119119">Exporting the files modified after the specified time point</strong><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul716116217293"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li1161921102919">Application scenarios: The migration source is a file system (OBS/HDFS/FTP/SFTP). The specified time point refers to the time when the file is modified. CDM migrates the files modified at or after the specified time point.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li116116213297">Key configurations: <a href="#dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142">Time Filter</a> and Schedule Execution</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li11611212297">Prerequisites: None</li></ul>
</li></ol>
<div class="note" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_note45791757134910"><img src="public_sys-resources/note_3.0-en-us.png"><span class="notetitle"> </span><div class="notebody"><p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_p52974484409">If you have configured a macro variable of date and time and schedule a CDM job through <span id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_text9997118203">DataArts Studio DataArts Factory</span>, the system replaces the macro variable of date and time with (<em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i799871152012">Planned start time of the data development job</em> <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i0998101192020">Offset</em>) rather than (<em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i299821132018">Actual start time of the CDM job</em> <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275319_i129981917200">Offset</em>).</p>
</div></div>
<div class="section" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section1070082019442"></a><h4 class="sectiontitle">File/Path Filter</h4><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul1799771715539"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li149201623195417">Parameter position: When creating a table/file migration job, if the migration source is a file system, set <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname3155195675010"><b>Filter Type</b></span> in advanced attributes of <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b61555561500">Source Job Configuration</strong> to <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b13156105611508">Wildcard</strong> or <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b1715685655015">Regular expression</strong>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li1999751710531">Parameter principle: If you select <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname160916614566"><b>Wildcard</b></span> for <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b861496185618">Filter Type</strong>, CDM filters files or paths based on the configured wildcard character and migrates only files or paths that meet the specified condition.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li168702052021">Example configurations:<div class="p" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p74551859825"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li168702052021"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li168702052021"></a>Suppose that the source file name contains the date and time field, such as <span class="uicontrol" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_uicontrol186634790810464"><b>2017-10-15 20:25:26</b></span>, the <span class="filepath" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275452_filepath29204443164512"><b>/opt/data/file_20171015202526.data</b></span> file is generated. Set the parameters as follows:<ol id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275452_ol429223101744"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_en-us_topic_0108275452_li393696511744"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b87635407457">Filter Type</strong>: Select <span class="parmvalue" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmvalue1348274664514"><b>Wildcard</b></span>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li4984141755918"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b1827272416518">File Filter</strong>: Enter <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b911273605919">"*${dateformat(yyyyMMdd,-1,DAY)}*"</strong>, which is the format of the macro variables of date and time supported by CDM. For details, see <a href="dataartsstudio_01_0114.html">Using Macro Variables of Date and Time</a>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li1139718593313">Schedule Execution: Set <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname1729817512498"><b>Cycle (days)</b></span> to <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b792319533498">1</strong>.</li></ol>
</div>
</li></ul>
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p86931646142420">In this way, you can import the files generated in the previous day to the destination directory every day to implement incremental synchronization.</p>
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p12513101092815">In incremental file migration, <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname272361413454"><b>Path Filter</b></span> is used in the same way as <span class="parmname" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmname4673173513459"><b>File Filter</b></span>. The path name must contain the time field. In this case, all files in the specified path can be synchronized periodically.</p>
</div>
<div class="section" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_section2012420511142"></a><h4 class="sectiontitle">Time Filter</h4><ul id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ul3948167155"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li128321438455">Parameter position: When creating a table/file migration job, if the migration source is a file system, set select <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b1192854819175">Yes</strong> for <span class="parmvalue" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmvalue119355484171"><b>Time Filter</b></span>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li994817552">Parameter principle: After you specify the start time and end time, only files that are modified between the start time (included) and end time (excluded) will be migrated.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li14267165018612">Example configurations:<div class="p" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p15425223102718"><a name="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li14267165018612"></a><a name="en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li14267165018612"></a>For example, if you want CDM to synchronize only the files generated from January 1, 2021 to January 1, 2022 to the destination, configure the following parameters:<ol id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_ol642532318276"><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li038111346711"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b162044238576">Time Filter</strong>: select <span class="parmvalue" id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_parmvalue1243519428713"><b>Yes</b></span>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li4776193384"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b20607194042118">Minimum Timestamp</strong>: Enter a value in the format of <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_i0607740102116">yyyy-MM-dd HH:mm:ss</em>, such as <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b2060814404218">2021-01-01 00:00:00</strong>.</li><li id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_li3943897297"><strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b2708142319456">Maximum Timestamp</strong>: Enter a value in the format of <em id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_i1170812354514">yyyy-MM-dd HH:mm:ss</em>, such as <strong id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_b470915233450">2022-01-01 00:00:00</strong>.</li></ol>
</div>
</li></ul>
<p id="dataartsstudio_01_0112__en-us_topic_0000001151619650_en-us_topic_0000001197578895_en-us_topic_0108275366_p1644232315278">In this way, the CDM job migrates only the files generated from January 1, 2021 to January 1, 2022, and performs incremental synchronization next time it is started.</p>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dataartsstudio_01_0111.html">Incremental Migration</a></div>
</div>
</div>