Zipping Elements in a DataSet
assigns consecutive labels to the elements, receiving a data set as input and returning a new data set of 2-tuples. This process requires two passes, first counting then labeling elements, and cannot be pipelined due to the synchronization of counts. The alternative works in a pipelined fashion and is preferred when a unique labeling is sufficient. For example, the following code:
Java
Python
may yield the tuples: (0,G), (1,H), (2,A), (3,B), (4,C), (5,D), (6,E), (7,F)
Zip with a Unique Identifier
Java
Scala