Running the Apache Beam samples
You’ll already have Java installed to run Apache Hop.
Even though Hop runs fine on Java versions higher than 8, both Apache Hop and Apache Beam are built with java 8.
Make sure you’re using a Java 8 runtime to avoid any potential issues, either by setting java 8 as your default JRE or through the environment variable.
Double-check your java version with the java -version
command. Your output should look similar to the one below.
The Hop samples project comes with a number of sample pipelines for Apache Beam. Your default Hop installation comes with the samples project by default. If your Hop installation doesn’t come with this project, create a new project and point its Home folder to <HOP>/config/projects/samples
.
The Samples project contains the following pipeline run configurations
Dataflow: the Apache Beam run configuration for Google Cloud Dataflow.
Direct: the direct runner Apache Beam run configuration. The Direct Runner executes pipelines on your machine and is designed to validate that pipelines adhere to the Apache Beam model as closely as possible. Instead of focusing on efficient pipeline execution, the Direct Runner performs additional checks to ensure that users do not rely on semantics that are not guaranteed by the model.
Spark:the Apache Beam run configuration for Apache Spark.
Apache Beam requires a so-called that bundles all required Java classes and their dependencies into a single jar file.
Save this file in a convenient location and file name. Either store this outside of your project folder or add it to your .gitignore
. You don’t want to accidentally add hundreds of MB to your git repository.
You’ll need to pass your project’s metadata to JSON to pass it to either or flink run
.
Export your project metadata through Tools → Export metadata to JSON
.
Save this file in a convenient location and file name. Either store this outside of your project folder or add it to your . Your project’s metadata folder should already be in version control , you don’t want to add this full metadata point-in-time export once again.