Scikit Transformers Examples

    The is synonymous with StringIndexer in Spark, however there are a couple of unique features of the scikit transformer that we need to account for:

    1. The output of the LabelEncoder is a numpy array of shape (1,n) instead of (n,1), which is required for further processing like One-Hot-Encoding

    Next step is to combine the label indexer with a OneHotEncoder

    One of the short-comings of Scikit’s OneHotEncoder is that it’s missing a functionality that’s required in ML pipelines.MLeap comes with it’s own OneHotEncoder that enables that function