Basic Demo

    In this section we will programmatically create a simple Spark MLpipeline then export it to an MLeap Bundle. Our pipeline is very simple,it performs string indexing on a categorical feature then runs theresult through a binarizer to force the result to a 1 or 0. Thispipeline has no real-world purpose, but illustrates how easy it is tocreate MLeap Bundles from Spark ML pipelines.

    NOTE: right click and “Save As…”, Gitbook prevents directly clicking on the link.

    Import and MLeap Bundle

    1. import ml.combust.mleap.runtime.MleapSupport._
    2. import resource._
    3. // load the Spark pipeline we saved in the previous section
    4. val bundle = (for(bundleFile <- managed(BundleFile("jar:file:/tmp/simple-spark-pipeline.zip"))) yield {
    5. bundleFile.loadMleapBundle().get
    6. }).opt.get
    7. import ml.combust.mleap.runtime.frame.{DefaultLeapFrame, Row}
    8. // MLeap makes extensive use of monadic types like Try
    9. val schema = StructType(StructField("test_string", ScalarType.String),
    10. StructField("test_double", ScalarType.Double)).get
    11. val data = Seq(Row("hello", 0.6), Row("MLeap", 0.2))
    12. val frame = DefaultLeapFrame(schema, data)
    13. // transform the dataframe using our pipeline
    14. val data2 = frame2.dataset
    15. // get data from the transformed rows and make some assertions
    16. assert(data2(0).getDouble(2) == 1.0) // string indexer output
    17. assert(data2(0).getDouble(3) == 1.0) // binarizer output
    18. // the second row
    19. assert(data2(1).getDouble(2) == 2.0)
    20. assert(data2(1).getDouble(3) == 0.0)

    That’s it! This is a very simple example. Most likely you will not bemanually constructing Spark ML pipelines as we have done here, butrather you will be using estimators and pipelines together to train onyour data and produce useful models. For a more advanced example, seeour .