MLeap Frequently Asked Questions

    For documentation on writing custom transformers, see the Custom Transformers page.

    What is MLeap Runtime’s Inference Performance?

    MLeap is optimized to deliver execution of ML Pipelines in microseconds (1/1000 of milliseconds, because we get asked to clarify this).

    Actual executions speed will depend on how many nodes are in your pipeline, but we standardize benchmarking on our AirBnb pipeline and test it against using the SparkContext with a DataFrame.The two sets of benchmarks share the same feature pipeline, comprised of vector assemblers, standard scalers, string indexers, one-hot-encoders, but at the end execute:

    • Linear Regression: 6.2 microseconds (.0062 milliseconds) vs 106 milliseconds with Spark LocalRelation
    • Random Forest: 6.8 microseconds (0.0068 milliseconds) vs 101 milliseconds with Spark LocalRelation

    Run Our Benchmarks

    To run our benchmarks, or to see how to test your own, see our project.

    More benchmarks can be found on the MLeap Benchmarks’s README.

    MLeap serialization is built with the following goals and requirements in mind:

    1. It should be easy for developers to add custom transformers in Scala and Java (we are adding Python and C support as well)
    2. Serialization should be optimized for ML Transformers and Pipelines
    3. Serialization should be accessible for all environments and platforms, including low-level languages like C, C++ and Rust
    4. Provide a common serialization framework for Spark, Scikit, and TensorFlow transformers (ex: a standard scaler executes the same on any framework)

    Is MLeap Ready for Production?

    MLeap 0.9.0 release provides a stable serialization format and runtime API for ML Pipelines. Backwards compatibility will officially be guaranteed in version 1.0.0, but we do not foresee any major structural changes going forward.

    APIs relying on Spark Context can be optimized to process queries in ~100ms, and that is often too slow for many enterprise needs. For example, marketing platformsneed sub-5 millisecond response times for many requests. MLeap offers execution of complex pipelines with sub-millisecond performance. MLeap’s performance is attributed to supporting technologies like the Scala Breeze library for linear algebra.

    Is Spark MLlib Supported?

    Spark ML Pipelines already support a lot of the same transformers and models that are part of MLlib. In addition, we offer a wrapper around MLlib SupportVectorMachine in our mleap-spark-extension module.If you find that something is missing from Spark ML that is found in MLlib, please let us know or contribute your own wrapper to MLeap.

    How Can I Contribute?

    • Contribute an Estimator/Transformer from Spark or your own custom transformer
    • Write documentation
    • Write a tutorial/walkthrough for an interesting ML problem
    • Talk with us on Gitter

    You can also reach out to us directly at hollin@combust.ml and mikhail@combust.ml