Hop Conf - The Hop command line configuration tool

    The available options are listed below:

    The `hop-conf` script offers many options to edit environment definitions.

    Creating an environment

    1. $ sh hop-conf.sh \
    2. --environment-create \
    3. --environment hop2 \
    4. --environment-project hop2 \
    5. --environment-purpose=Development \
    6. --environment-config-files=/home/user/projects/hop2-conf.json
    7. Creating environment 'hop2'
    8. Environment 'hop2' was created in Hop configuration file <path-to-hop>/config/hop-config.json
    9. 2021/02/01 16:37:02 - General - ERROR: Configuration file '/home/user/projects/hop2-conf.json' does not exist to read variables from.
    10. Created empty environment configuration file : /home/user/projects/hop2-conf.json
    11. hop2
    12. Purpose: Development
    13. Configuration files:
    14. Project name: hop2

    As you can see from the log, an empty file was created to set variables in:

    1. { }

    Setting variables in an environment

    This command adds a variable to the environment configuration file:

    If we look at the file `hop2-conf.json` we’ll see that the variables were added:

    1. {
    2. "variables" : [ {
    3. "name" : "DB_HOSTNAME",
    4. "value" : "localhost",
    5. "description" : ""
    6. }, {
    7. "name" : "DB_PASSWORD",
    8. "value" : "abcd",
    9. "description" : ""
    10. } ]
    11. }

    Please note that you can add descriptions for the variables as well with the option. Please run hop-conf without options to see all the possibilities.

    Deleting an environment

    The following deletes an environment from the Hop configuration file:

    1. $ $ sh hop-conf.sh --environment-delete --environment hop2
    2. Lifecycle environment 'hop2' was deleted from Hop configuration file <path-to-hop>/config/hop-config.json

    There are various options to configure the behavior of the `Projects` plugin itself. In Hop configuration file `hop-config.json` we can find the following options:

    You can specify an environment or a project when executing a pipeline or a workflow. By doing so you are automatically configuring metadata, variables without too much fuss.

    The easiest example is shown by executing the “complex” pipeline from the Apache Beam examples:

    1. $ sh hop-run.sh --project samples --file 'beam/pipelines/complex.hpl' --runconfig Direct
    2. 2021/02/01 16:52:15 - HopRun - Enabling project 'samples'
    3. 2021/02/01 16:52:25 - HopRun - Relative path filename specified: config/projects/samples/beam/pipelines/complex.hpl
    4. 2021/02/01 16:52:26 - General - Created Apache Beam pipeline with name 'complex'
    5. 2021/02/01 16:52:27 - General - Handled transform (INPUT) : Customer data
    6. 2021/02/01 16:52:27 - General - Handled transform (INPUT) : State data
    7. 2021/02/01 16:52:27 - General - Handled Group By (STEP) : countPerState, gets data from 1 previous transform(s)
    8. 2021/02/01 16:52:27 - General - Handled transform (STEP) : uppercase state, gets data from 1 previous transform(s), targets=0, infos=0
    9. 2021/02/01 16:52:27 - General - Handled Merge Join (STEP) : Merge join
    10. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Lookup count per state, gets data from 1 previous transform(s), targets=0, infos=1
    11. 2021/02/01 16:52:27 - General - Handled transform (STEP) : name<n, gets data from 1 previous transform(s), targets=2, infos=0
    12. 2021/02/01 16:52:27 - General - Transform Label: N-Z reading from previous transform targeting this one using : name<n - TARGET - Label: N-Z
    13. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Label: N-Z, gets data from 1 previous transform(s), targets=0, infos=0
    14. 2021/02/01 16:52:27 - General - Transform Label: A-M reading from previous transform targeting this one using : name<n - TARGET - Label: A-M
    15. 2021/02/01 16:52:27 - General - Transform CA reading from previous transform targeting this one using : Switch / case - TARGET - CA
    16. 2021/02/01 16:52:27 - General - Handled transform (STEP) : CA, gets data from 1 previous transform(s), targets=0, infos=0
    17. 2021/02/01 16:52:27 - General - Transform NY reading from previous transform targeting this one using : Switch / case - TARGET - NY
    18. 2021/02/01 16:52:27 - General - Handled transform (STEP) : NY, gets data from 1 previous transform(s), targets=0, infos=0
    19. 2021/02/01 16:52:27 - General - Transform FL reading from previous transform targeting this one using : Switch / case - TARGET - FL
    20. 2021/02/01 16:52:27 - General - Handled transform (STEP) : FL, gets data from 1 previous transform(s), targets=0, infos=0
    21. 2021/02/01 16:52:27 - General - Transform Default reading from previous transform targeting this one using : Switch / case - TARGET - Default
    22. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Default, gets data from 1 previous transform(s), targets=0, infos=0
    23. 2021/02/01 16:52:27 - General - Handled transform (STEP) : Collect, gets data from 4 previous transform(s), targets=0, infos=0
    24. 2021/02/01 16:52:27 - General - Handled transform (OUTPUT) : complex, gets data from Collect
    25. 2021/02/01 16:52:27 - General - Executing this pipeline using the Beam Pipeline Engine with run configuration 'Direct'
    26. 2021/02/01 16:52:34 - General - Beam pipeline execution has finished.

    To execute an Apache Beam pipeline a lot of information and metadata is needed. Let’s dive into a few fun information tidbits:

    • By referencing the `samples` project Hop knows where the project is located (`config/projects/samples` )

    • Since we know the location of the project, we can specify pipelines and workflows with a relative path

    • The project knows where its metadata is stored (`config/projects/samples/metadata` ) so it knows where to find the `Direct` pipeline run configuration (`config/projects/samples/metadata/pipeline-run-configuration/Direct.json` )

    • This run configuration defines its own pipeline engine specific variables, in this case the output folder : `DATA_OUTPUT=${PROJECT_HOME}/beam/output/`

    • The output of the samples is as such written to `config/projects/samples/beam/output`

    To reference an environment you can execute using `-e` or `--environment` . The only difference is that you’ll have a number of extra environment variables set while executing.

    Hop Conf can be used to configure your AWS, Azure and Google Cloud (Cloud Storage and Drive) accounts with Hop through

    N/A

    Set the account, block increment size for new files and your Azure key

    1. -aza, --azure-account=<account>
    2. The account to use for the Azure VFS
    3. -azi, --azure-block-increment=<blockIncrement>
    4. The block increment size for new files on Azure,
    5. multiples of 512 only.
    6. -azk, --azure-key=<key>
    7. The key to use for the Azure VFS

    Google Cloud Storage

    Google Drive

    Set the path to your Google Drive credentials JSON file or Google Drive tokens folder.

    1. -gdc, --google-drive-credentials-file=<credentialsFile>
    2. Configure the path to a Google Drive credentials JSON
    3. file
    4. Configure the path to a Google Drive tokens folder