Introduction to Hudi

Apache Hudi indicates Hadoop Upserts Deletes and Incrementals. It is used to manage large analysis data sets stored on the DFS in Hadoop.

Hudi is not just a data format. It is also a set of data access methods (similar to the access layer of GaussDB(DWS) storage). In Apache Hudi 0.9, big data components such as Spark and Flink have their own clients. The following figure shows the logical storage of Hudi.