Merge rows (diff)
This transform is useful for comparing data collected at two different times.
For example, the source system of your data warehouse might not contain a timestamp of the last data update.
You could use this transform to compare the two data streams and merge the dates and timestamps in the rows.
Based on keys for comparison, this transform merges reference rows (previous data) with compare rows (new data) and creates merged output rows.
A flag in the row indicates how the values were compared and merged. Flag values include:
changed: The key was found in both rows, but one or more compared values are different.
deleted: The key was not found in the compare rows.
If the row’s flag is identical or deleted, the merged output rows are based on the reference rows.
For new or changed rows, the merged output rows are based on the compare rows.
You can also send values from the merged and flagged rows to a subsequent transform in your pipeline, such as the Switch-Case transform or the Synchronize after merge transform. In the subsequent transform, you can use the flag field generated by Merge rows (diff) to control updates/inserts/deletes on a target table.
All fields of this transform support metadata injection. You can use this transform with ETL Metadata Injection to pass metadata to your pipeline at runtime.