MLeap Spark Integration

  • Serialization/Deserialization of Transformers and Pipelines to and from Bundle.ML
  • Support of additional feature transformers and models (ex: SVM, OneVsRest, MapTransform)
  • Support for custom transformers

To use MLeap you do not have to change how you construct your existing pipelines, so the rest of the documentation is going to focus on how to serialize and deserialize your pipeline to and from bundle.ml.To see how to execute your pipeline outside of Spark, refer to the MLeap Runtime section.

Serializing with Spark

In order to serialize to a zip file, make sure the URI begins with and ends with a .zip.

  1. implicit val context = SparkBundleContext().withDataset(sparkTransformed)
  2. for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) {
  3. }

Protobuf Format

In order to serialize to a directory, make sure the URI begins withfile.

  1. pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)(context)
  2. }

Protobuf Format

Deserializing is just as easy as serializing. You don’t need to know theformat the MLeap Bundle was serialized as beforehand, you just need toknow where the bundle is.

  1. // Deserialize a zip bundle
  2. // Use Scala ARM to make sure resources are managed properly
  3. val zipBundle = (for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) yield {
  4. }).opt.get

Directory Bundle