Hop Conf - The Hop command line configuration tool

The available options are listed below:

The `hop-conf` script offers many options to edit environment definitions.

Creating an environment

$ sh hop-conf.sh \
     --environment-create \
     --environment hop2 \
     --environment-project hop2 \
     --environment-purpose=Development \
     --environment-config-files=/home/user/projects/hop2-conf.json
Creating environment 'hop2'
Environment 'hop2' was created in Hop configuration file <path-to-hop>/config/hop-config.json
2021/02/01 16:37:02 - General - ERROR: Configuration file '/home/user/projects/hop2-conf.json' does not exist to read variables from.
Created empty environment configuration file : /home/user/projects/hop2-conf.json
  hop2
    Purpose: Development
    Configuration files:
    Project name: hop2

As you can see from the log, an empty file was created to set variables in:

{ }

Setting variables in an environment

This command adds a variable to the environment configuration file:

If we look at the file `hop2-conf.json` we’ll see that the variables were added:

{
  "variables" : [ {
    "name" : "DB_HOSTNAME",
    "value" : "localhost",
    "description" : ""
  }, {
    "name" : "DB_PASSWORD",
    "value" : "abcd",
    "description" : ""
  } ]
}

Please note that you can add descriptions for the variables as well with the option. Please run hop-conf without options to see all the possibilities.

Deleting an environment

The following deletes an environment from the Hop configuration file:

$ $ sh hop-conf.sh --environment-delete --environment hop2
Lifecycle environment 'hop2' was deleted from Hop configuration file <path-to-hop>/config/hop-config.json

There are various options to configure the behavior of the `Projects` plugin itself. In Hop configuration file `hop-config.json` we can find the following options:

You can specify an environment or a project when executing a pipeline or a workflow. By doing so you are automatically configuring metadata, variables without too much fuss.

The easiest example is shown by executing the “complex” pipeline from the Apache Beam examples:

$ sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
2021/02/01 16:52:15 - HopRun - Enabling project 'samples'
2021/02/01 16:52:25 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
2021/02/01 16:52:26 - General - Created Apache Beam pipeline with name 'complex'
2021/02/01 16:52:27 - General - Handled transform (INPUT) : Customer data
2021/02/01 16:52:27 - General - Handled transform (INPUT) : State data
2021/02/01 16:52:27 - General - Handled Group By (STEP) : countPerState, gets data from 1 previous transform(s)
2021/02/01 16:52:27 - General - Handled transform (STEP) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled Merge Join (STEP) : Merge join
2021/02/01 16:52:27 - General - Handled transform (STEP) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
2021/02/01 16:52:27 - General - Handled transform (STEP) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
2021/02/01 16:52:27 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
2021/02/01 16:52:27 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
2021/02/01 16:52:27 - General - Handled transform (STEP) : CA, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
2021/02/01 16:52:27 - General - Handled transform (STEP) : NY, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
2021/02/01 16:52:27 - General - Handled transform (STEP) : FL, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
2021/02/01 16:52:27 - General - Handled transform (STEP) : Default, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled transform (STEP) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled transform (OUTPUT) : complex, gets data from Collect
2021/02/01 16:52:27 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'
2021/02/01 16:52:34 - General - Beam pipeline execution has finished.

To execute an Apache Beam pipeline a lot of information and metadata is needed. Let’s dive into a few fun information tidbits:

By referencing the `samples` project Hop knows where the project is located (`config/projects/samples` )
Since we know the location of the project, we can specify pipelines and workflows with a relative path
The project knows where its metadata is stored (`config/projects/samples/metadata` ) so it knows where to find the `Direct` pipeline run configuration (`config/projects/samples/metadata/pipeline-run-configuration/Direct.json` )
This run configuration defines its own pipeline engine specific variables, in this case the output folder : `DATA_OUTPUT=${PROJECT_HOME}/beam/output/`
The output of the samples is as such written to `config/projects/samples/beam/output`

To reference an environment you can execute using `-e` or `--environment` . The only difference is that you’ll have a number of extra environment variables set while executing.

Hop Conf can be used to configure your AWS, Azure and Google Cloud (Cloud Storage and Drive) accounts with Hop through

N/A

Set the account, block increment size for new files and your Azure key

      -aza, --azure-account=<account>
                            The account to use for the Azure VFS
      -azi, --azure-block-increment=<blockIncrement>
                            The block increment size for new files on Azure,
                              multiples of 512 only.
      -azk, --azure-key=<key>
                            The key to use for the Azure VFS

Google Cloud Storage

Google Drive

Set the path to your Google Drive credentials JSON file or Google Drive tokens folder.

      -gdc, --google-drive-credentials-file=<credentialsFile>
                            Configure the path to a Google Drive credentials JSON
                              file
                            Configure the path to a Google Drive tokens folder