Scikit Transformers Examples
The is synonymous with StringIndexer
in Spark, however there are a couple of unique features of the scikit transformer that we need to account for:
- The output of the
LabelEncoder
is a numpy array of shape (1,n) instead of (n,1), which is required for further processing like One-Hot-Encoding
Next step is to combine the label indexer with a OneHotEncoder
One of the short-comings of Scikit’s OneHotEncoder is that it’s missing a functionality that’s required in ML pipelines.MLeap comes with it’s own OneHotEncoder that enables that function