Python Data Science Module Package

    This section contains the following information:

    For information about the Greenplum Database PL/Python Language, see Greenplum PL/Python Language Extension.

    Parent topic:

    Modules provided in the Python Data Science package include:

    Before you install the Python Data Science Module package, make sure that your Greenplum Database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY and $GPHOME environment variables are set.

    1. Locate the Python Data Science module package that you built or downloaded.

      The file name format of the package is DataSciencePython-<version>-relhel<N>-x86_64.gppkg.

    2. Copy the package to the Greenplum Database master host.

    3. Follow the instructions in Verifying the Greenplum Database Software Download to verify the integrity of the Greenplum Procedural Languages Python Data Science Package software.

    4. Restart Greenplum Database. You must re-source greenplum_path.sh before restarting your Greenplum cluster:

    The Greenplum Database Python Data Science Modules are installed in the following directory:

    1. $GPHOME/ext/DataSciencePython/lib/python2.7/site-packages/

    Use the utility to uninstall the Python Data Science Module package. You must include the version number in the package name you provide to gppkg.

    To determine your Python Data Science Module package version number and remove this package:

    The command removes the Python Data Science modules from your Greenplum Database cluster. It also updates the PYTHONPATH, PATH, and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file to their pre-installation values.

    1. $ . /usr/local/greenplum-db/greenplum_path.sh

    Note: When you uninstall the Python Data Science Module package from your Greenplum Database cluster, any UDFs that you have created that import Python modules installed with this package will return an error.