Files
doc-exports/docs/dli/sqlreference/dli_08_15040.html
Su, Xiaomeng be9eabe464 dli_sqlreference_20250305
Reviewed-by: Pruthi, Vineet <vineet.pruthi@t-systems.com>
Co-authored-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
Co-committed-by: Su, Xiaomeng <suxiaomeng1@huawei.com>
2025-03-25 09:06:21 +00:00

143 lines
11 KiB
HTML

<a name="dli_08_15040"></a><a name="dli_08_15040"></a>
<h1 class="topictitle1">OBS Source Table</h1>
<div id="body0000001764398041"><div class="section" id="dli_08_15040__en-us_topic_0000001201521669_dli_08_0256_en-us_topic_0132788972_section108631122164917"><h4 class="sectiontitle">Function</h4><p id="dli_08_15040__p728914141135">The file system connector can be used to read single files or entire directories into a single table.</p>
<p id="dli_08_15040__p1928981413315">When using a directory as the source path, there is no defined order of ingestion for the files inside the directory. For more information, see <a href="https://nightlies.apache.org/flink/flink-docs-release-1.15/zh/docs/connectors/table/filesystem/" target="_blank" rel="noopener noreferrer">FileSystem SQL Connector</a>.</p>
</div>
<div class="section" id="dli_08_15040__en-us_topic_0000001201521669_section434381984316"><h4 class="sectiontitle">Syntax</h4><div class="codecoloring" codetype="Sql" id="dli_08_15040__en-us_topic_0000001201521669_screen153431199432"><div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
<span class="normal"> 2</span>
<span class="normal"> 3</span>
<span class="normal"> 4</span>
<span class="normal"> 5</span>
<span class="normal"> 6</span>
<span class="normal"> 7</span>
<span class="normal"> 8</span>
<span class="normal"> 9</span>
<span class="normal">10</span>
<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">sink_table</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="n">string</span><span class="p">,</span>
<span class="w"> </span><span class="n">num</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span>
<span class="w"> </span><span class="n">p_day</span><span class="w"> </span><span class="n">string</span><span class="p">,</span>
<span class="w"> </span><span class="n">p_hour</span><span class="w"> </span><span class="n">string</span>
<span class="p">)</span><span class="w"> </span><span class="n">partitioned</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="p">(</span><span class="n">p_day</span><span class="p">,</span><span class="w"> </span><span class="n">p_hour</span><span class="p">)</span><span class="w"> </span><span class="k">WITH</span><span class="w"> </span><span class="p">(</span>
<span class="w"> </span><span class="s1">'connector'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'filesystem'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'path'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'obs://*** '</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'format'</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">'parquet'</span><span class="p">,</span>
<span class="w"> </span><span class="s1">'source.monitor-interval'</span><span class="o">=</span><span class="s1">''</span>
<span class="p">);</span>
</pre></div></td></tr></table></div>
</div>
</div>
<div class="section" id="dli_08_15040__en-us_topic_0000001201521669_dli_08_0256_section4299113491"><h4 class="sectiontitle">Parameter Description</h4><ul id="dli_08_15040__ul6427153295116"><li id="dli_08_15040__li64272327513"><strong id="dli_08_15040__b1819674342314">Directory watching</strong><p id="dli_08_15040__p285454719119">By default, the file system connector is bounded, that is it will scan the configured path once and then close itself.</p>
<p id="dli_08_15040__p385484731118">You can enable continuous directory watching by configuring the <strong id="dli_08_15040__b7101153882511">source.monitor-interval</strong> parameter:</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15040__table3785175616115" frame="border" border="1" rules="all"><thead align="left"><tr id="dli_08_15040__row57982561118"><th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.3.2.1.4.1.5.1.1"><p id="dli_08_15040__p1079817569119">Key</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.3.2.1.4.1.5.1.2"><p id="dli_08_15040__p1979925610116">Default Value</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.3.2.1.4.1.5.1.3"><p id="dli_08_15040__p4799165615116">Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="25%" id="mcps1.3.3.2.1.4.1.5.1.4"><p id="dli_08_15040__p479919564118">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_08_15040__row2079945610114"><td class="cellrowborder" align="left" valign="top" width="25%" headers="mcps1.3.3.2.1.4.1.5.1.1 "><p id="dli_08_15040__p6799155631115">source.monitor-interval</p>
</td>
<td class="cellrowborder" align="left" valign="top" width="25%" headers="mcps1.3.3.2.1.4.1.5.1.2 "><p id="dli_08_15040__p47991856161120">None</p>
</td>
<td class="cellrowborder" align="left" valign="top" width="25%" headers="mcps1.3.3.2.1.4.1.5.1.3 "><p id="dli_08_15040__p5799156131118">Duration</p>
</td>
<td class="cellrowborder" align="left" valign="top" width="25%" headers="mcps1.3.3.2.1.4.1.5.1.4 "><p id="dli_08_15040__p129814575516">The interval in which the source checks for new files. The interval must be greater than 0.</p>
<p id="dli_08_15040__p1951511115216">Each file is uniquely identified by its path, and will be processed once, as soon as it is discovered.</p>
<p id="dli_08_15040__p48471320521">The set of files already processed is kept in state during the whole lifecycle of the source, so it's persisted in checkpoints and savepoints together with the source state.</p>
<p id="dli_08_15040__p721611195210">Shorter intervals mean that files are discovered more quickly, but also imply more frequent listing or directory traversal of the file system/object store.</p>
<p id="dli_08_15040__p37991356121120">If this config option is not set, the provided path will be scanned once, hence the source will be bounded.</p>
</td>
</tr>
</tbody>
</table>
</div>
</li></ul>
</div>
<ul id="dli_08_15040__ul1767003011512"><li id="dli_08_15040__li7670123010514"><strong id="dli_08_15040__b17369101816716">Available Metadata</strong><p id="dli_08_15040__p153693184712">The following connector metadata can be accessed as metadata columns in a table definition. All the metadata are read only.</p>
<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" id="dli_08_15040__en-us_topic_0000001201521669_table11617424154613" frame="border" border="1" rules="all"><thead align="left"><tr id="dli_08_15040__en-us_topic_0000001201521669_row146177242466"><th align="left" class="cellrowborder" valign="top" width="43.02%" id="mcps1.3.4.1.3.1.4.1.1"><p id="dli_08_15040__en-us_topic_0000001201521669_p1361712418461">Key</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="31.35%" id="mcps1.3.4.1.3.1.4.1.2"><p id="dli_08_15040__en-us_topic_0000001201521669_p176171424114615">Data Type</p>
</th>
<th align="left" class="cellrowborder" valign="top" width="25.629999999999995%" id="mcps1.3.4.1.3.1.4.1.3"><p id="dli_08_15040__p25718295117">Description</p>
</th>
</tr>
</thead>
<tbody><tr id="dli_08_15040__en-us_topic_0000001201521669_row136171242461"><td class="cellrowborder" valign="top" width="43.02%" headers="mcps1.3.4.1.3.1.4.1.1 "><p id="dli_08_15040__p1034211531495">file.path</p>
</td>
<td class="cellrowborder" valign="top" width="31.35%" headers="mcps1.3.4.1.3.1.4.1.2 "><p id="dli_08_15040__p144597183101">STRING NOT NULL</p>
</td>
<td class="cellrowborder" valign="top" width="25.629999999999995%" headers="mcps1.3.4.1.3.1.4.1.3 "><p id="dli_08_15040__p1945122815102">Full path of the input file</p>
</td>
</tr>
<tr id="dli_08_15040__en-us_topic_0000001201521669_row1961742414462"><td class="cellrowborder" valign="top" width="43.02%" headers="mcps1.3.4.1.3.1.4.1.1 "><p id="dli_08_15040__p79618571914">file.name</p>
</td>
<td class="cellrowborder" valign="top" width="31.35%" headers="mcps1.3.4.1.3.1.4.1.2 "><p id="dli_08_15040__p127151571826">STRING NOT NULL</p>
</td>
<td class="cellrowborder" valign="top" width="25.629999999999995%" headers="mcps1.3.4.1.3.1.4.1.3 "><p id="dli_08_15040__p15715195713218">Name of the file, that is the farthest element from the root of the filepath</p>
</td>
</tr>
<tr id="dli_08_15040__en-us_topic_0000001201521669_row1761802415461"><td class="cellrowborder" valign="top" width="43.02%" headers="mcps1.3.4.1.3.1.4.1.1 "><p id="dli_08_15040__p1818813111106">file.size</p>
</td>
<td class="cellrowborder" valign="top" width="31.35%" headers="mcps1.3.4.1.3.1.4.1.2 "><p id="dli_08_15040__p1471585713210">STRING NOT NULL</p>
</td>
<td class="cellrowborder" valign="top" width="25.629999999999995%" headers="mcps1.3.4.1.3.1.4.1.3 "><p id="dli_08_15040__p17715757023">Byte count of the file</p>
</td>
</tr>
<tr id="dli_08_15040__row91724710103"><td class="cellrowborder" valign="top" width="43.02%" headers="mcps1.3.4.1.3.1.4.1.1 "><p id="dli_08_15040__p19172187151011">file.modification-time</p>
</td>
<td class="cellrowborder" valign="top" width="31.35%" headers="mcps1.3.4.1.3.1.4.1.2 "><p id="dli_08_15040__p1317216751012">TIMESTAMP_LTZ(3) NOT NULL</p>
</td>
<td class="cellrowborder" valign="top" width="25.629999999999995%" headers="mcps1.3.4.1.3.1.4.1.3 "><p id="dli_08_15040__p10172207151016">Modification time of the file</p>
</td>
</tr>
</tbody>
</table>
</div>
</li></ul>
<div class="section" id="dli_08_15040__en-us_topic_0000001201521669_section8515152835418"><h4 class="sectiontitle">Example</h4><p id="dli_08_15040__p27893535527">Read data from the OBS table as the data source and output it to the Print connector.</p>
<pre class="screen" id="dli_08_15040__screen878405718917">CREATE TABLE obs_source(
name string,
num INT,
`file.path` STRING NOT NULL METADATA
) WITH (
'connector' = 'filesystem',
'path' = '<em id="dli_08_15040__i11382039145512"><strong id="dli_08_15040__b18382439105518">obs://demo</strong></em><strong id="dli_08_15040__b93831139135520"><em id="dli_08_15040__i164457469552">/sink_parquent_obs</em></strong>',
'format' = 'parquet',
'source.monitor-interval'='1 h'
);
CREATE TABLE print (
name string,
num INT,
path STRING
) WITH (
'connector' = 'print'
);
insert into print
select * from obs_source;
</pre>
</div>
<div class="p" id="dli_08_15040__p17479178711">Print result:<pre class="screen" id="dli_08_15040__screen125775131418">+I[0e72e, 841255524, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0]
+I[53524, -2032270969, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0]
+I[77225, 245599258, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0]
+I[fc202, -545621464, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0]
+I[07e9d, 1511139764, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0]
+I[4e48b, 278014413, /spark.db/sink_parquent_obs/compacted-part-fd4d4cc8-8b18-42d5-b522-9b524500fa23-0-0]</pre>
</div>
</div>
<div>
<div class="familylinks">
<div class="parentlink"><strong>Parent topic:</strong> <a href="dli_08_15039.html">OBS</a></div>
</div>
</div>