R Data Science Library Package

    For information about the Greenplum Database PL/R Language, see .

    Libraries provided in the R Data Science package include:

    • abind

    • adabag

    • arm

    • assertthat

    • BH

    • bitops

    • car

    • caret

    • caTools

    • coda

    • colorspace

    • compHclust

    • curl

    • data.table

    • DBI

    • dichromat

    • digest

    • dplyr

    • e1071

    • flashClust

    • forecast

    • foreign

    • ggplot2

    • glmnet

    • gtable

    • gtools

    • hms

    • hybridHclust

    • igraph

    • labeling

    • lattice

    • lazyeval

    • lme4

    • lmtest

    • magrittr

    • MASS

    • Matrix

    • MCMCpack

    • minqa

    • MTS

    • munsell

    • neuralnet

    • nloptr

    • nnet

    • pbkrtest

    • plyr

    • quantreg

    • R2jags

    • R6

    • RColorBrewer

    • Rcpp

    • RcppEigen

    • reshape2

    • rjags

    • RobustRankAggreg

    • ROCR

    • rpart

    • RPostgreSQL

    • sandwich

    • scales

    • SparseM

    • stringi

    • stringr

    • survival

    • tibble

    • tseries

    • zoo

    Before you install the R Data Science Library package, make sure that your Greenplum Database is running, you have sourced greenplum_path.sh, and that the $MASTER_DATA_DIRECTORY and $GPHOME environment variables are set.

    1. Locate the R Data Science library package that you built or downloaded.

      The file name format of the package is DataScienceR-<version>-relhel<N>-x86_64.gppkg.

    2. Copy the package to the Greenplum Database master host.

    3. Follow the instructions in to verify the integrity of the Greenplum Procedural Languages R Data Science Package software.

    4. Use the gppkg command to install the package. For example:

      gppkg installs the R Data Science libraries on all nodes in your Greenplum Database cluster. The command also sets the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your file.

    5. Restart Greenplum Database. You must re-source greenplum_path.sh before restarting your Greenplum cluster:

      1. $ source /usr/local/greenplum-db/greenplum_path.sh
      2. $ gpstop -r

    The Greenplum Database R Data Science Modules are installed in the following directory:

    Note: rjags libraries are installed in the $GPHOME/ext/DataScienceR/extlib/lib directory. If you want to use rjags and your $GPHOME is not /usr/local/greenplum-db, you must perform additional configuration steps to create a symbolic link from $GPHOME to on each node in your Greenplum Database cluster. For example:

    1. $ gpssh -f all_hosts -e 'ln -s $GPHOME /usr/local/greenplum-db'
    2. $ gpssh -f all_hosts -e 'chown -h gpadmin /usr/local/greenplum-db'

    Use the gppkg utility to uninstall the R Data Science Library package. You must include the version number in the package name you provide to gppkg.

    To determine your R Data Science Library package version number and remove this package:

    The command removes the R Data Science libraries from your Greenplum Database cluster. It also removes the R_LIBS_USER environment variable and updates the PATH and LD_LIBRARY_PATH environment variables in your greenplum_path.sh file to their pre-installation values.

    Re-source greenplum_path.sh and restart Greenplum Database after you remove the R Data Science Library package:

    1. $ gpstop -r