Zipping Elements in a DataSet

    assigns consecutive labels to the elements, receiving a data set as input and returning a new data set of 2-tuples. This process requires two passes, first counting then labeling elements, and cannot be pipelined due to the synchronization of counts. The alternative works in a pipelined fashion and is preferred when a unique labeling is sufficient. For example, the following code:

    Java

    Python

    may yield the tuples: (0,G), (1,H), (2,A), (3,B), (4,C), (5,D), (6,E), (7,F)

    Zip with a Unique Identifier

    Java

    Scala