Basic Demo
In this section we will programmatically create a simple Spark MLpipeline then export it to an MLeap Bundle. Our pipeline is very simple,it performs string indexing on a categorical feature then runs theresult through a binarizer to force the result to a 1 or 0. Thispipeline has no real-world purpose, but illustrates how easy it is tocreate MLeap Bundles from Spark ML pipelines.
NOTE: right click and “Save As…”, Gitbook prevents directly clicking on the link.
Import and MLeap Bundle
import ml.combust.mleap.runtime.MleapSupport._
import resource._
// load the Spark pipeline we saved in the previous section
val bundle = (for(bundleFile <- managed(BundleFile("jar:file:/tmp/simple-spark-pipeline.zip"))) yield {
bundleFile.loadMleapBundle().get
}).opt.get
import ml.combust.mleap.runtime.frame.{DefaultLeapFrame, Row}
// MLeap makes extensive use of monadic types like Try
val schema = StructType(StructField("test_string", ScalarType.String),
StructField("test_double", ScalarType.Double)).get
val data = Seq(Row("hello", 0.6), Row("MLeap", 0.2))
val frame = DefaultLeapFrame(schema, data)
// transform the dataframe using our pipeline
val data2 = frame2.dataset
// get data from the transformed rows and make some assertions
assert(data2(0).getDouble(2) == 1.0) // string indexer output
assert(data2(0).getDouble(3) == 1.0) // binarizer output
// the second row
assert(data2(1).getDouble(2) == 2.0)
assert(data2(1).getDouble(3) == 0.0)
That’s it! This is a very simple example. Most likely you will not bemanually constructing Spark ML pipelines as we have done here, butrather you will be using estimators and pipelines together to train onyour data and produce useful models. For a more advanced example, seeour .