Hop Conf - The Hop command line configuration tool
The available options are listed below:
The `hop-conf`
script offers many options to edit environment definitions.
Creating an environment
$ sh hop-conf.sh \
--environment-create \
--environment hop2 \
--environment-project hop2 \
--environment-purpose=Development \
--environment-config-files=/home/user/projects/hop2-conf.json
Creating environment 'hop2'
Environment 'hop2' was created in Hop configuration file <path-to-hop>/config/hop-config.json
2021/02/01 16:37:02 - General - ERROR: Configuration file '/home/user/projects/hop2-conf.json' does not exist to read variables from.
Created empty environment configuration file : /home/user/projects/hop2-conf.json
hop2
Purpose: Development
Configuration files:
Project name: hop2
As you can see from the log, an empty file was created to set variables in:
{ }
Setting variables in an environment
This command adds a variable to the environment configuration file:
If we look at the file `hop2-conf.json`
we’ll see that the variables were added:
{
"variables" : [ {
"name" : "DB_HOSTNAME",
"value" : "localhost",
"description" : ""
}, {
"name" : "DB_PASSWORD",
"value" : "abcd",
"description" : ""
} ]
}
Please note that you can add descriptions for the variables as well with the option. Please run hop-conf without options to see all the possibilities.
Deleting an environment
The following deletes an environment from the Hop configuration file:
$ $ sh hop-conf.sh --environment-delete --environment hop2
Lifecycle environment 'hop2' was deleted from Hop configuration file <path-to-hop>/config/hop-config.json
There are various options to configure the behavior of the `Projects`
plugin itself. In Hop configuration file `hop-config.json`
we can find the following options:
You can specify an environment or a project when executing a pipeline or a workflow. By doing so you are automatically configuring metadata, variables without too much fuss.
The easiest example is shown by executing the “complex” pipeline from the Apache Beam examples:
$ sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
2021/02/01 16:52:15 - HopRun - Enabling project 'samples'
2021/02/01 16:52:25 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
2021/02/01 16:52:26 - General - Created Apache Beam pipeline with name 'complex'
2021/02/01 16:52:27 - General - Handled transform (INPUT) : Customer data
2021/02/01 16:52:27 - General - Handled transform (INPUT) : State data
2021/02/01 16:52:27 - General - Handled Group By (STEP) : countPerState, gets data from 1 previous transform(s)
2021/02/01 16:52:27 - General - Handled transform (STEP) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled Merge Join (STEP) : Merge join
2021/02/01 16:52:27 - General - Handled transform (STEP) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
2021/02/01 16:52:27 - General - Handled transform (STEP) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
2021/02/01 16:52:27 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
2021/02/01 16:52:27 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
2021/02/01 16:52:27 - General - Handled transform (STEP) : CA, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
2021/02/01 16:52:27 - General - Handled transform (STEP) : NY, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
2021/02/01 16:52:27 - General - Handled transform (STEP) : FL, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
2021/02/01 16:52:27 - General - Handled transform (STEP) : Default, gets data from 1 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled transform (STEP) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
2021/02/01 16:52:27 - General - Handled transform (OUTPUT) : complex, gets data from Collect
2021/02/01 16:52:27 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'
2021/02/01 16:52:34 - General - Beam pipeline execution has finished.
To execute an Apache Beam pipeline a lot of information and metadata is needed. Let’s dive into a few fun information tidbits:
By referencing the
`samples`
project Hop knows where the project is located (`config/projects/samples`
)Since we know the location of the project, we can specify pipelines and workflows with a relative path
The project knows where its metadata is stored (
`config/projects/samples/metadata`
) so it knows where to find the`Direct`
pipeline run configuration (`config/projects/samples/metadata/pipeline-run-configuration/Direct.json`
)This run configuration defines its own pipeline engine specific variables, in this case the output folder :
`DATA_OUTPUT=${PROJECT_HOME}/beam/output/`
The output of the samples is as such written to
`config/projects/samples/beam/output`
To reference an environment you can execute using `-e`
or `--environment`
. The only difference is that you’ll have a number of extra environment variables set while executing.
Hop Conf can be used to configure your AWS, Azure and Google Cloud (Cloud Storage and Drive) accounts with Hop through
N/A
Set the account, block increment size for new files and your Azure key
-aza, --azure-account=<account>
The account to use for the Azure VFS
-azi, --azure-block-increment=<blockIncrement>
The block increment size for new files on Azure,
multiples of 512 only.
-azk, --azure-key=<key>
The key to use for the Azure VFS
Google Cloud Storage
Google Drive
Set the path to your Google Drive credentials JSON file or Google Drive tokens folder.
-gdc, --google-drive-credentials-file=<credentialsFile>
Configure the path to a Google Drive credentials JSON
file
Configure the path to a Google Drive tokens folder