Configuring Your Systems and Installing Greenplum
Perform the following tasks in order:
- Make sure your systems meet the System Requirements
- Setting the Greenplum Recommended OS Parameters
- Installing the Greenplum Database Software
- Creating the Data Storage Areas
- Next Steps
Unless noted, these tasks should be performed for all hosts in your Greenplum Database array (master, standby master and segments).
For information about running Greenplum Database in the cloud see Cloud Services in the Tanzu Greenplum Partner Marketplace.
Important: When data loss is not acceptable for a Greenplum Database cluster, Greenplum master and segment mirroring is recommended. If mirroring is not enabled then Greenplum stores only one copy of the data, so the underlying storage media provides the only guarantee for data availability and correctness in the event of a hardware failure.
For information about master and segment mirroring, see in the Greenplum Database Administrator Guide.
Note: For information about upgrading Tanzu Greenplum Database from a previous version, see the Greenplum Database Release Notes for the release that you are installing.
Parent topic:
The following table lists minimum recommended specifications for servers intended to support Greenplum Database on Linux systems in a production environment. All servers in your Greenplum Database system must have the same hardware and software configuration. Greenplum also provides hardware build guides for its certified hardware platforms. It is recommended that you work with a Greenplum Systems Engineer to review your anticipated environment to ensure an appropriate hardware configuration for Greenplum Database.
Important: SSL is supported only on the Greenplum Database master host system. It is not supported on the segment host systems.
Important: For all Greenplum Database host systems, SELinux must be disabled. You should also disable firewall software, although firewall software can be enabled if it is required for security purposes. See .
Parent topic: Configuring Your Systems and Installing Greenplum
Disabling SELinux and Firewall Software
For all Greenplum Database host systems, SELinux must be disabled. Follow these steps:
As the root user, check the status of SELinux:
If SELinux is not disabled, disable it by editing the
/etc/selinux/config
file. As root, change the value of theSELINUX
parameter in theconfig
file as follows:SELINUX=disabled
Reboot the system to apply any changes that you made to
/etc/selinux/config
and verify that SELinux is disabled.
For information about disabling SELinux, see the SELinux documentation.
You should also disable firewall software such as iptables
(on systems such as RHEL 6.x and CentOS 6.x ) or firewalld
(on systems such as RHEL 7.x and CentOS 7.x). Follow these steps:
As the root user, check the status of
iptables
:# /sbin/chkconfig --list iptables
If
iptables
is disabled, the command output is:iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off
If necessary, execute this command as root to disable
iptables
:/sbin/chkconfig iptables off
You will need to reboot your system after applying the change.
For systems with
firewalld
, check the status offirewalld
with the command:# systemctl status firewalld
If
firewalld
is disabled, the command output is:* firewalld.service - firewalld - dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)
If necessary, execute these commands as root to disable
firewalld
:# systemctl stop firewalld.service
# systemctl disable firewalld.service
For more information about configuring your firewall software, see the documentation for the firewall or your operating system.
Parent topic: Configuring Your Systems and Installing Greenplum
Greenplum requires the certain Linux operating system (OS) parameters be set on all hosts in your Greenplum Database system (masters and segments).
In general, the following categories of system parameters need to be altered:
- Shared Memory - A Greenplum Database instance will not work unless the shared memory segment for your kernel is properly sized. Most default OS installations have the shared memory values set too low for Greenplum Database. On Linux systems, you must also disable the OOM (out of memory) killer. For information about Greenplum Database shared memory requirements, see the Greenplum Database server configuration parameter
shared_buffers
in the Greenplum Database Reference Guide. - Network - On high-volume Greenplum Database systems, certain network-related tuning parameters must be set to optimize network connections made by the Greenplum interconnect.
- User Limits - User limits control the resources available to processes started by a user’s shell. Greenplum Database requires a higher limit on the allowed number of file descriptors that a single process can have open. The default settings may cause some Greenplum Database queries to fail because they will run out of file descriptors needed to process the query.
Parent topic: Configuring Your Systems and Installing Greenplum
Edit the
/etc/hosts
file and make sure that it includes the host names and all interface address names for every machine participating in your Greenplum Database system.Set the following parameters in the
/etc/sysctl.conf
file and reload withsysctl -p
:# kernel.shmall = _PHYS_PAGES / 2 # See Note 1
kernel.shmall = 197951838
# kernel.shmmax = kernel.shmall * PAGE_SIZE # See Note 1
kernel.shmmax = 810810728448
kernel.shmmni = 4096
vm.overcommit_memory = 2
vm.overcommit_ratio = 95 # See Note 2
net.ipv4.ip_local_port_range = 10000 65535 # See Note 3
kernel.sem = 250 2048000 200 8192
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.swappiness = 10
vm.zone_reclaim_mode = 0
vm.dirty_expire_centisecs = 500
vm.dirty_writeback_centisecs = 100
vm.dirty_background_ratio = 0 # See Note 5
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736
vm.dirty_bytes = 4294967296
Note: The listed
sysctl.conf
parameters are for performance in a wide variety of environments. However, the settings might require changes in specific situations. These are additional notes about some of thesysctl.conf
parameters.Greenplum Database uses shared memory to communicate between
postgres
processes that are part of the samepostgres
instance.kernel.shmall
sets the total amount of shared memory, in pages, that can be used system wide.kernel.shmmax
sets the maximum size of a single shared memory segment in bytes.Set
kernel.shmall
andkernel.shmax
values based on your system’s physical memory and page size. In general, the value for both parameters should be one half of the system physical memory.Use the operating system variables
_PHYS_PAGES
andPAGE_SIZE
to set the parameters.kernel.shmall = ( _PHYS_PAGES / 2)
kernel.shmmax = ( _PHYS_PAGES / 2) * PAGE_SIZE
To calculate the values for
kernel.shmall
andkernel.shmax
, run the following commands using thegetconf
command, which returns the value of an operating system variable.$ echo $(expr $(getconf _PHYS_PAGES) / 2)
$ echo $(expr $(getconf _PHYS_PAGES) / 2 \* $(getconf PAGE_SIZE))
As best practice, we recommend you set the following values in the
/etc/sysctl.conf
file using calculated values. For example, a host system has 1583 GB of memory installed and returns these values: _PHYS_PAGES = 395903676 and PAGE_SIZE = 4096. These would be thekernel.shmall
andkernel.shmmax
values:kernel.shmall = 197951838
kernel.shmmax = 810810728448
If the Greeplum Database master the has a different shared memory configuration than the segment hosts, the _PHYS_PAGES and PAGE_SIZE values might differ, and the
kernel.shmall
andkernel.shmax
values on the master host will differ from those on the segment hosts.When
vm.overcommit_memory
is 2, you specify a value forvm.overcommit_ratio
. For information about calculating the value forvm.overcommit_ratio
when using resource queue-based resource management, see the Greenplum Database server configuration parameter gp_vmem_protect_limit in the Greenplum Database Reference Guide. If you are using resource group-based resource management, tune the operating systemvm.overcommit_ratio
as necessary. If your memory utilization is too low, increase thevm.overcommit_ratio
value; if your memory or swap usage is too high, decrease the value.To avoid port conflicts between Greenplum Database and other applications when initializing Greenplum Database, do not specify Greenplum Database ports in the range specified by the operating system parameter
net.ipv4.ip_local_port_range
. For example, ifnet.ipv4.ip_local_port_range = 10000 65535
, you could set the Greenplum Database base port numbers to these values.PORT_BASE = 6000
MIRROR_PORT_BASE = 7000
REPLICATION_PORT_BASE = 8000
MIRROR_REPLICATION_PORT_BASE = 9000
For information about the port ranges that are used by Greenplum Database, see .
Azure deployments require Greenplum Database to not use port 65330. Add the following line to sysctl.conf:
net.ipv4.ip_local_reserved_ports=65330
For additional requirements and recommendations for cloud deployments, see Greenplum Database Cloud Technical Recommendations.
For host systems with more than 64GB of memory, these settings are recommended:
vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 1610612736 # 1.5GB
vm.dirty_bytes = 4294967296 # 4GB
For host systems with 64GB of memory or less, remove
vm.dirty_background_bytes
andvm.dirty_bytes
and set the tworatio
parameters to these values:vm.dirty_background_ratio = 3
vm.dirty_ratio = 10
Increase
vm.min_free_kbytes
to ensurePF_MEMALLOC
requests from network and storage drivers are easily satisfied. This is especially critical on systems with large amounts of system memory. The default value is often far too low on these systems. Use this awk command to setvm.min_free_kbytes
to a recommended 3% of system physical memory:awk 'BEGIN {OFMT = "%.0f";} /MemTotal/ {print "vm.min_free_kbytes =", $2 * .03;}'
/proc/meminfo >> /etc/sysctl.conf
Do not set
vm.min_free_kbytes
to higher than 5% of system memory as doing so might cause out of memory conditions.
Set the following parameters in the
/etc/security/limits.conf
file:* soft nofile 524288
* hard nofile 524288
* soft nproc 131072
* hard nproc 131072
For Red Hat Enterprise Linux (RHEL) and CentOS systems, parameter values in the
/etc/security/limits.d/90-nproc.conf
file (RHEL/CentOS 6) or/etc/security/limits.d/20-nproc.conf
file (RHEL/CentOS 7) override the values in thelimits.conf
file. Ensure that any parameters in the override file are set to the required value. The Linux modulepam_limits
sets user limits by reading the values from thelimits.conf
file and then from the override file. For information about PAM and user limits, see the documentation on PAM andpam_limits
.Execute the
ulimit -u
command on each segment host to display the maximum number of processes that are available to each user. Validate that the return value is 131072.XFS is the preferred file system on Linux platforms for data storage. The following XFS mount options are recommended:
See the manual page (man) for the
mount
command for more information about using that command (man mount
opens the man page).The XFS options can also be set in the
/etc/fstab
file. This example entry from anfstab
file specifies the XFS options./dev/data /data xfs nodev,noatime,nobarrier,inode64 0 0
Each disk device file should have a read-ahead (
blockdev
) value of 16384.To verify the read-ahead value of a disk device:
# /sbin/blockdev --getra <devname>
For example:
# /sbin/blockdev --getra /dev/sdb
To set blockdev (read-ahead) on a device:
# /sbin/blockdev --setra <bytes> <devname>
For example:
# /sbin/blockdev --setra 16384 /dev/sdb
See the manual page (man) for the
blockdev
command for more information about using that command (man blockdev
opens the man page).Note: The
blockdev --setra
command is not persistent. You must ensure the read-ahead value is set whenever the system restarts. How to set the value will vary based on your system.One method to set the
blockdev
value at system startup is by adding the/sbin/blockdev --setra
command in therc.local
file. For example, add this line to therc.local
file to set the read-ahead value for the disksdb
./sbin/blockdev --setra 16384 /dev/sdb
On systems that use systemd, you must also set the execute permissions on the
rc.local
file to enable it to run at startup. For example, on a RHEL/CentOS 7 system, this command sets execute permissions on the file.# chmod +x /etc/rc.d/rc.local
The Linux disk I/O scheduler for disk access supports different policies, such as
CFQ
,AS
, anddeadline
.The
deadline
scheduler option is recommended. To specify a scheduler until the next system reboot, run the following:For example:
# echo deadline > /sys/block/sbd/queue/scheduler
Note: Setting the disk I/O scheduler policy with the
echo
command is not persistent, and must be run when the system is rebooted. If you use theecho
command to set the policy, you must ensure the command is run when the system reboots. How to run the command will vary based on your system.One method to set the I/O scheduler policy at boot time is with the
elevator
kernel parameter. Add the parameterelevator=deadline
to the kernel command in the file/boot/grub/grub.conf
, the GRUB boot loader configuration file. This is an example kernel command from agrub.conf
file on RHEL 6.x or CentOS 6.x. The command is on multiple lines for readability.kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/
elevator=deadline crashkernel=128M@16M quiet console=tty1
console=ttyS1,115200 panic=30 transparent_hugepage=never
initrd /initrd-2.6.18-274.3.1.el5.img
To specify the I/O scheduler at boot time on systems that use
grub2
such as RHEL 7.x or CentOS 7.x, use the system utilitygrubby
. This command adds the parameter when run as root.# grubby --update-kernel=ALL --args="elevator=deadline"
After adding the parameter, reboot the system.
This
grubby
command displays kernel parameter settings.# grubby --info=ALL
For more information about the
grubby
utility, see your operating system documentation. If thegrubby
command does not update the kernels, see the at the end of the section.Disable Transparent Huge Pages (THP). RHEL 6.0 or higher enables THP by default. THP degrades Greenplum Database performance. One way to disable THP on RHEL 6.x is by adding the parameter
transparent_hugepage=never
to the kernel command in the file/boot/grub/grub.conf
, the GRUB boot loader configuration file. This is an example kernel command from agrub.conf
file. The command is on multiple lines for readability:kernel /vmlinuz-2.6.18-274.3.1.el5 ro root=LABEL=/
elevator=deadline crashkernel=128M@16M quiet console=tty1
console=ttyS1,115200 panic=30 transparent_hugepage=never
initrd /initrd-2.6.18-274.3.1.el5.img
On systems that use
grub2
such as RHEL 7.x or CentOS 7.x, use the system utilitygrubby
. This command adds the parameter when run as root.# grubby --update-kernel=ALL --args="transparent_hugepage=never"
After adding the parameter, reboot the system.
This cat command checks the state of THP. The output indicates that THP is disabled.
$ cat /sys/kernel/mm/*transparent_hugepage/enabled
always [never]
For more information about Transparent Huge Pages or the
grubby
utility, see your operating system documentation. If thegrubby
command does not update the kernels, see the Note at the end of the section.Disable IPC object removal for RHEL 7.2 or CentOS 7.2. The default
systemd
settingRemoveIPC=yes
removes IPC connections when non-system user accounts log out. This causes the Greenplum Database utilitygpinitsystem
to fail with semaphore errors. Perform one of the following to avoid this issue.When you add the
gpadmin
operating system user account to the master node in , create the user as a system account. You must also add the user as a system account on the segment hosts manually or using thegpseginstall
command (described in later installation step Installing and Configuring Greenplum on all Hosts).Note: When you run the
gpseginstall
utility as theroot
user to install Greenplum Database on host systems, the utility creates thegpadmin
operating system user as a system account on the hosts.Disable
RemoveIPC
. Set this parameter in/etc/systemd/logind.conf
on the Greenplum Database host systems.RemoveIPC=no
The setting takes effect after restarting the
systemd-login
service or rebooting the system. To restart the service, run this command as the root user.service systemd-logind restart
Certain Greenplum Database management utilities including
gpexpand
,gpinitsystem
, andgpaddmirrors
, utilize secure shell (SSH) connections between systems to perform their tasks. In large Greenplum Database deployments, cloud deployments, or deployments with a large number of segments per host, these utilities may exceed the host’s maximum threshold for unauthenticated connections. When this occurs, you receive errors such as:ssh_exchange_identification: Connection closed by remote host.
.To increase this connection threshold for your Greenplum Database system, update the SSH
MaxStartups
andMaxSessions
configuration parameters in one of the/etc/ssh/sshd_config
or/etc/sshd_config
SSH daemon configuration files.If you specify
MaxStartups
andMaxSessions
using a single integer value, you identify the maximum number of concurrent unauthenticated connections (MaxStartups
) and maximum number of open shell, login, or subsystem sessions permitted per network connection (MaxSessions
). For example:MaxStartups 200
MaxSessions 200
If you specify
MaxStartups
using the “start:rate:full” syntax, you enable random early connection drop by the SSH daemon. start identifies the maximum number of unauthenticated SSH connection attempts allowed. Once start number of unauthenticated connection attempts is reached, the SSH daemon refuses rate percent of subsequent connection attempts. full identifies the maximum number of unauthenticated connection attempts after which all attempts are refused. For example:Max Startups 10:30:200
MaxSessions 200
Restart the SSH daemon after you update
MaxStartups
andMaxSessions
. For example, on a CentOS 6 system, run the following command as theroot
user:# service sshd restart
For detailed information about SSH configuration options, refer to the SSH documentation for your Linux distribution.
On some SUSE Linux Enterprise Server platforms, the Greenplum Database utility
gpssh
fails with the error messageout of pty devices
. A workaround is to add Greenplum Database operating system users, for examplegpadmin
, to the tty group. On SUSE systems, tty is required to rungpssh
Note: If the grubby
command does not update the kernels of a RHEL 7.x or CentOS 7.x system, you can manually update all kernels on the system. For example, to add the parameter transparent_hugepage=never
to all kernels on a system.
Add the parameter to the
GRUB_CMDLINE_LINUX
line in the file parameter in/etc/default/grub
.GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=cl/root rd.lvm.lv=cl/swap rhgb quiet transparent_hugepage=never"
GRUB_DISABLE_RECOVERY="true"
As root, run the
grub2-mkconfig
command to update the kernels.# grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the system.
Creating the Greenplum Database Administrative User Account
You must create a dedicated operating system user account on the master node to run Greenplum Database. You administer Greenplum Database as this operating system user. This user account is named, by convention, gpadmin
.
Note: If you are installing the Greenplum Database RPM distribution, create the gpadmin
user on every host in the Greenplum Database cluster because the installer does not create the gpadmin
user for you. See the note under Installing the Greenplum Database Software for more information.
You cannot run the Greenplum Database server as root
.
The gpadmin
user account must have permission to access the services and directories required to install and run Greenplum Database.
To create the gpadmin
operating system user account, run the groupadd
, useradd
, and passwd
commands as the root
user.
Note: If you are installing Greenplum Database on RHEL 7.2 or CentOS 7.2 and chose to disable IPC object removal by creating the gpadmin
user as a system account, provide the options -r
(create the user as a system account) and -m
(create the user home directory if it does not exist) to the useradd
command.
Note: Make sure the gpadmin
user has the same user id (uid) and group id (gid) numbers on each host to prevent problems with scripts or services that use them for identity or permissions. For example, backing up Greenplum databases to some networked filesystems or storage appliances could fail if the gpadmin
user has different uid or gid numbers on different segment hosts. When you create the gpadmin
group and user, you can use the groupadd -g
option to specify a gid number and the useradd -u
option to specify the uid number. Use the command id gpadmin
to see the uid and gid for the gpadmin
user on the current host.
This example creates the gpadmin
operating system group and creates the user account as a system account:
# groupadd gpadmin
# useradd gpadmin -r -m -g gpadmin
# passwd gpadmin
New password: <changeme>
Retype new password: <changeme>
Tanzu distributes the Greenplum Database software both as a downloadable RPM file and as a binary installer. You can use either distribution to install the software, but there are important differences between the two installation methods:
- If you use the RPM distribution, install the RPM file on the master, standby master, and every segment host. You will need to create the
gpadmin
user on every host. (See Creating the Greenplum Database Administrative User Account.) After the RPM file is installed on every host, you must enable passwordless SSH access for thegpadmin
user from each host to every other host. - If you use the binary installer, you can install the distribution on the master host only, and then use the Greenplum Database
gpseginstall
utility to copy the installation from the master host to all other hosts in the cluster. Thegpseginstall
utility creates thegpadmin
user on each host, if it does not already exist, and enables passwordless SSH for thegpadmin
user.
Warning: It is possible to install the RPM distribution on the master host, and then use the gpseginstall
utility to copy the Greenplum Database installation directory to all other hosts. However, this is not recommended because gpseginstall
does not install the RPM package on the other hosts, so you will be unable to use the OS package management utilities to remove or upgrade the Greenplum software on the standby master host or segment hosts.
If you do not have root access on the master host machine, run the binary installer as the gpadmin
user and install the software into a directory in which you have write permission.
Parent topic:
Installing the RPM Distribution
Perform these steps on the master host, standby master host, and on every segment host in the Greenplum Database cluster.
Important: You require sudo or root user access to install from a pre-built RPM file.
Install Greenplum to the Default Directory
Follow these steps to install Greenplum under the default directory, /usr/local
. If you want to install to a non-default directory, use the instructions in (Optional) Install Greenplum to a Non-Default Directory instead.
Download the Greenplum Database Server software package from . The distribution file name has the format
greenplum-db-<version>-<platform>.rpm
, where<platform>
is similar torhel7-x86_64
(Red Hat 7 64-bit).VMware generates a SHA256 fingerprint for each Greenplum Database software download available from Tanzu Network. This fingerprint enables you to verify that your downloaded file is unaltered from the original. Follow the instructions in Verifying the VMware Tanzu Greenplum Software Download to verify the integrity of the Greenplum Database Server software.
Copy the Greenplum Database package to the
gpadmin
user’s home directory on the master, standby master, and every segment host machine.With sudo (or as root), install the Greenplum Database package on each host machine using the
yum
package manager software:$ sudo yum install ./greenplum-db-<version>-<platform>.rpm
The command copies the Greenplum Database software files into a version-specific directory under
/usr/local
,/usr/local/greenplum-db-<version>
, and creates the symbolic link/usr/local/greenplum-db
to the installation directory.Change the ownership and group of the installed files to
gpadmin
:$ sudo chown -R gpadmin /usr/local/greenplum*
$ sudo chgrp -R gpadmin /usr/local/greenplum*
(Optional) Install Greenplum to a Non-Default Directory
You can use the rpm
command with the --prefix
option to install Greenplum Database to a non-default directory (instead of under /usr/local
).
Follow these instructions to install Greenplum Database to a specific directory.
Download the Greenplum Database Server software package from VMware Tanzu Network. The distribution file name has the format
greenplum-db-<version>-<platform>.rpm
, where<platform>
is similar torhel7-x86_64
(Red Hat 7 64-bit).VMware generates a SHA256 fingerprint for each Greenplum Database software download available from Tanzu Network. This fingerprint enables you to verify that your downloaded file is unaltered from the original. Follow the instructions in to verify the integrity of the Greenplum Database Server software.
Copy the Greenplum Database package to the
gpadmin
user’s home directory on the master, standby master, and every segment host machine.Use
rpm
with the--prefix
option to install the Greenplum Database package to your chosen installation directory on each host machine:$ sudo rpm --install ./greenplum-db-<version>-<platform>.rpm --prefix=<directory>
The
rpm
command copies the Greenplum Database software files into a version-specific directory under your chosen<directory>
,<directory>/greenplum-db-<version>
, and creates the symbolic link<directory>/greenplum-db
to the versioned directory.Change the owner and group of the installed files to
gpadmin
:$ sudo chown -R gpadmin:gpadmin <directory>/greenplum*
$ sudo chgrp -R gpadmin <directory>/greenplum*
Note: All example procedures in the Greenplum Database documentation assume that you installed to the default directory, /usr/local
. If you install to a non-default directory, substitute that directory for /usr/local
.
If you install to a non-default directory using rpm
, you will need to continue using rpm
(and not yum
) to perform minor version upgrades; these changes are covered in the upgrade documentation.
Enable Passwordless SSH
After the RPM has been installed on all hosts in the cluster, use the gpssh-exkeys
utility to set up passwordless SSH for the gpadmin
user.
Log in to the master host as the
gpadmin
user.Source the
path
file in the Greenplum Database installation directory.$ source /usr/local/greenplum-db-<version>/greenplum_path.sh
In the
gpadmin
home directory, create a file namedhostfile_exkeys
that has the machine configured host names and host addresses (interface names) for each host in your Greenplum system (master, standby master, and segment hosts). Make sure there are no blank lines or extra spaces. Check the/etc/hosts
file on your systems for the correct host names to use for your environment. For example, if you have a master, standby master, and three segment hosts with two unbonded network interfaces per host, your file would look something like this:mdw
mdw-1
mdw-2
smdw
smdw-1
smdw-2
sdw1
sdw1-1
sdw1-2
sdw2
sdw2-1
sdw2-2
sdw3
sdw3-1
sdw3-2
Run the
gpssh-exkeys
utility with yourhostfile_exkeys
file to enable passwordless SSH for thegpadmin
user.$ gpssh-exkeys -f hostfile_exkeys
Note: You can run the gpssh-exkeys
utility again as the root
user if you want to enable passwordless SSH for root
.
Follow the steps in to verify that the Greenplum Database software is installed correctly.
Installing the Binary Distribution
Log in as
root
on the machine that will become the Greenplum Database master host.If you do not have root access on the master host machine, run the binary installer as the
gpadmin
user and install the software into a directory in which you have write permission.Download the Greenplum Database Server Binary Installer software package from . The Binary Installer distribution filename has the format
greenplum-db-<version>-<platform>.zip
where<platform>
is similar torhel7-x86_64
(Red Hat 64-bit) orsles11-x86_64
(SuSe Linux 64 bit).Unzip the installer file:
# unzip greenplum-db-<version>-<platform>.zip
Launch the installer using
bash
:# /bin/bash greenplum-db-<version>-<platform>.bin
The installer prompts you to accept the Greenplum Database license agreement. Type
yes
to accept the license agreement.The installer prompts you to provide an installation path. Press
ENTER
to accept the default install path (/usr/local/greenplum-db-<version>
), or enter an absolute path to a custom install location. You must have write permission to the location you specify.The installer installs the Greenplum Database software and creates a
greenplum-db
symbolic link one directory level above the version-specific installation directory. The symbolic link is used to facilitate patch maintenance and upgrades between versions. The installed location is referred to as$GPHOME
.If you installed as
root
, change the ownership and group of the installed files togpadmin
:# chown -R gpadmin /usr/local/greenplum*
# chgrp -R gpadmin /usr/local/greenplum*
To perform additional required system configuration tasks and to install Greenplum Database on other hosts, go to the next task Installing and Configuring Greenplum on all Hosts.
greenplum_path.sh
— This file contains the environment variables for Greenplum Database. See Setting Greenplum Environment Variables.- bin — This directory contains the Greenplum Database management utilities. This directory also contains the PostgreSQL client and server programs, most of which are also used in Greenplum Database.
- docs/cli_help — This directory contains help files for Greenplum Database command-line utilities.
- docs/cli_help/gpconfigs — This directory contains sample
gpinitsystem
configuration files and host files that can be modified and used when installing and initializing a Greenplum Database system. - docs/javadoc — This directory contains javadocs for the gNet extension (gphdfs protocol). The jar files for the gNet extension are installed in the
$GPHOME/lib/hadoop
directory. - etc — Sample configuration file for OpenSSL and a sample configuration file to be used with the
gpcheck
management utility. - ext — Bundled programs (such as Python) used by some Greenplum Database utilities.
- include — The C header files for Greenplum Database.
- lib — Greenplum Database and PostgreSQL library files.
- sbin — Supporting/Internal scripts and programs.
- share — Shared files for Greenplum Database.
Installing and Configuring Greenplum on all Hosts
When run as root
, gpseginstall copies the Greenplum Database installation from the current host and installs it on a list of specified hosts, creates the Greenplum operating system user account (typically named gpadmin
), sets the account password (default is changeme
), sets the ownership of the Greenplum Database installation directory, and exchanges ssh keys between all specified host address names (both as root
and as the specified user account).
Note: If you are setting up a single node system, you can still use to perform the required system configuration tasks on the current host. In this case, the hostfile_exkeys
should have only the current host name.
Note: The gpseginstall
utility copies the installed files from the current host to the remote hosts. It does not use rpm
to install Greenplum Database on the remote hosts, even if you used rpm
to install Greenplum Database on the current host.
To install and configure Greenplum Database on all specified hosts
Log in to the master host as
root
:Source the path file from your master host’s Greenplum Database installation directory:
# source /usr/local/greenplum-db/greenplum_path.sh
In the
gpadmin
user’s home directory, create a file calledhostfile_exkeys
that has the machine configured host names and host addresses (interface names) for each host in your Greenplum system (master, standby master and segments). Make sure there are no blank lines or extra spaces. For example, if you have a master, standby master and three segments with two unbonded network interfaces per host, your file would look something like this:mdw-1
mdw-2
smdw
smdw-1
smdw-2
sdw1
sdw1-1
sdw1-2
sdw2-1
sdw2-2
sdw3
sdw3-1
sdw3-2
Check the
/etc/hosts
file on your systems for the correct host names to use for your environment.The Greenplum Database segment host naming convention is sdwN where sdw is a prefix and N is an integer. For example, segment host names would be
sdw1
,sdw2
and so on. NIC bonding is recommended for hosts with multiple interfaces, but when the interfaces are not bonded, the convention is to append a dash (-
) and number to the host name. For example,sdw1-1
andsdw1-2
are the two interface names for hostsdw1
.Run the utility referencing the
hostfile_exkeys
file you just created. This example runs the utility asroot
. The utility creates the Greenplum operating system user accountgpadmin
as a system account on all hosts and sets the account password tochangeme
for that user on all segment hosts.# gpseginstall -f hostfile_exkeys
Use the
-u
and-p
options to specify a different operating system account name and password. See gpseginstall for option information and information about running the utility as a non-root user.
Recommended security best practices:
- Do not use the default password option for production environments.
- Change the password immediately after installation.
Parent topic:
Confirming Your Installation
To make sure the Greenplum software was installed and configured correctly, run the following confirmation steps from your Greenplum master host. If necessary, correct any problems before continuing on to the next task.
Log in to the master host as
gpadmin
:$ su - <gpadmin>
Source the path file from Greenplum Database installation directory:
# source /usr/local/greenplum-db/greenplum_path.sh
Use the
gpssh
utility to see if you can login to all hosts without a password prompt, and to confirm that the Greenplum software was installed on all hosts. Use thehostfile_exkeys
file you used for installation. For example:$ gpssh -f hostfile_exkeys -e ls -l $GPHOME
If the installation was successful, you can log in to all hosts without a password prompt. All hosts should show that they have the same contents in their installation directories, and that the directories are owned by the
gpadmin
user.If you are prompted for a password, run the following command to redo the ssh key exchange:
$ gpssh-exkeys -f hostfile_exkeys
Every Greenplum Database master and segment instance has a designated storage area on disk that is called the data directory location. This is the file system location where the directories that store segment instance data will be created. The master host needs a data storage location for the master data directory. Each segment host needs a data directory storage location for its primary segments, and another for its mirror segments.
Parent topic: Configuring Your Systems and Installing Greenplum
Creating a Data Storage Area on the Master Host
A data storage area is required on the Greenplum Database master host to store Greenplum Database system data such as catalog data and other system metadata.
To create the data directory location on the master
The data directory location on the master is different than those on the segments. The master does not store any user data, only the system catalog tables and system metadata are stored on the master instance, therefore you do not need to designate as much storage space as on the segments.
Create or choose a directory that will serve as your master data storage area. This directory should have sufficient disk space for your data and be owned by the
gpadmin
user and group. For example, run the following commands asroot
:# mkdir -p /data/master
Change ownership of this directory to the
gpadmin
user. For example:# chown gpadmin /data/master
Using , create the master data directory location on your standby master as well. For example:
# source /usr/local/greenplum-db/greenplum_path.sh
# gpssh -h smdw -e 'mkdir -p /data/master'
# gpssh -h smdw -e 'chown gpadmin /data/master'
Parent topic: Creating the Data Storage Areas
Creating Data Storage Areas on Segment Hosts
Data storage areas are required on the Greenplum Database segment hosts for primary segments. Separate storage areas are required for mirror segments.
To create the data directory locations on all segment hosts
On the master host, log in as
root
:# su
Create a file called
hostfile_gpssh_segonly
. This file should have only one machine configured host name for each segment host. For example, if you have three segment hosts:sdw1
sdw2
sdw3
Using , create the primary and mirror data directory locations on all segment hosts at once using the
hostfile_gpssh_segonly
file you just created. For example:# source /usr/local/greenplum-db/greenplum_path.sh
# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'mkdir -p /data/mirror'
# gpssh -f hostfile_gpssh_segonly -e 'chown -R gpadmin /data/*'
Parent topic: Creating the Data Storage Areas
Synchronizing System Clocks
You should use NTP (Network Time Protocol) to synchronize the system clocks on all hosts that comprise your Greenplum Database system. See www.ntp.org for more information about NTP.
NTP on the segment hosts should be configured to use the master host as the primary time source, and the standby master as the secondary time source. On the master and standby master hosts, configure NTP to point to your preferred time server.
To configure NTP
On the master host, log in as root and edit the
/etc/ntp.conf
file. Set theserver
parameter to point to your data center’s NTP time server. For example (if10.6.220.20
was the IP address of your data center’s NTP server):server 10.6.220.20
On each segment host, log in as root and edit the
/etc/ntp.conf
file. Set the firstserver
parameter to point to the master host, and the second server parameter to point to the standby master host. For example:server mdw prefer
server smdw
On the standby master host, log in as root and edit the
/etc/ntp.conf
file. Set the firstserver
parameter to point to the primary master host, and the second server parameter to point to your data center’s NTP time server. For example:server mdw prefer
server 10.6.220.20
On the master host, use the NTP daemon synchronize the system clocks on all Greenplum hosts. For example, using gpssh:
# gpssh -f hostfile_gpssh_allhosts -v -e 'ntpd'
Parent topic:
Enabling iptables
On Linux systems, you can configure and enable the iptables
firewall to work with Greenplum Database.
Note: Greenplum Database performance might be impacted when iptables
is enabled. You should test the performance of your application with iptables
enabled to ensure that performance is acceptable.
For more information about iptables
see the iptables
and firewall documentation for your operating system.
How to Enable iptables
As gpadmin, the Greenplum Database administrator, run this command on the Greenplum Database master host to stop Greenplum Database:
$ gpstop -a
On the Greenplum Database hosts:
Update the file
/etc/sysconfig/iptables
based on the Example iptables Rules.As root user, run these commands to enable
iptables
:# chkconfig iptables on
# service iptables start
As gpadmin, run this command on the Greenplum Database master host to start Greenplum Database:
$ gpstart -a
Warning: After enabling iptables
, this error in the /var/log/messages
file indicates that the setting for the iptables
table is too low and needs to be increased.
ip_conntrack: table full, dropping packet.
As root user, run this command to view the iptables
table value:
# sysctl net.ipv4.netfilter.ip_conntrack_max
The following is the recommended setting to ensure that the Greenplum Database workload does not overflow the iptables
table. The value might need to be adjusted for your hosts: net.ipv4.netfilter.ip_conntrack_max=6553600
You can update /etc/sysctl.conf
file with the value. For setting values in the file, see .
To set the value until the next reboots run this command as root.
# sysctl net.ipv4.netfilter.ip_conntrack_max=6553600
Parent topic: Configuring Your Systems and Installing Greenplum
Example iptables Rules
When iptables
is enabled, iptables
manages the IP communication on the host system based on configuration settings (rules). The example rules are used to configure iptables
for Greenplum Database master host, standby master host, and segment hosts.
The two sets of rules account for the different types of communication Greenplum Database expects on the master (primary and standby) and segment hosts. The rules should be added to the /etc/sysconfig/iptables
file of the Greenplum Database hosts. For Greenplum Database, iptables
rules should allow the following communication:
For customer facing communication with the Greenplum Database master, allow at least
postgres
and28080
(eth1
interface in the example).For Greenplum Database system interconnect, allow communication using
tcp
,udp
, andicmp
protocols (eth4
andeth5
interfaces in the example).The network interfaces that you specify in the
iptables
settings are the interfaces for the Greenplum Database hosts that you list in the hostfile_gpinitsystem file. You specify the file when you run thegpinitsystem
command to intialize a Greenplum Database system. See Initializing a Greenplum Database System for information about the hostfile_gpinitsystem file and thegpinitsystem
command.For the administration network on a Greenplum DCA, allow communication using
ssh
,snmp
,ntp
, andicmp
protocols. (eth0
interface in the example).
In the iptables
file, each append rule command (lines starting with -A
) is a single line.
The example rules should be adjusted for your configuration. For example:
- The append command, the
-A
lines and connection parameter-i
should match the connectors for your hosts. - the CIDR network mask information for the source parameter
-s
should match the IP addresses for your network.
Example Master and Standby Master iptables Rules
Example iptables
rules with comments for the /etc/sysconfig/iptables
file on the Greenplum Database master host and standby master host.
*filter
# Following 3 are default rules. If the packet passes through
# the rule set it gets these rule.
# Drop all inbound packets by default.
# Drop all forwarded (routed) packets.
# Let anything outbound go through.
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
# Accept anything on the loopback interface.
-A INPUT -i lo -j ACCEPT
# If a connection has already been established allow the
# remote host packets for the connection to pass through.
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# These rules let all tcp and udp through on the standard
# interconnect IP addresses and on the interconnect interfaces.
# NOTE: gpsyncmaster uses random tcp ports in the range 1025 to 65535
# and Greenplum Database uses random udp ports in the range 1025 to 65535.
-A INPUT -i eth4 -p udp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth5 -p udp -s 198.51.100.0/22 -j ACCEPT
-A INPUT -i eth4 -p tcp -s 192.0.2.0/22 -j ACCEPT --syn -m state --state NEW
-A INPUT -i eth5 -p tcp -s 198.51.100.0/22 -j ACCEPT --syn -m state --state NEW
\# Allow snmp connections on the admin network on Greenplum DCA.
-A INPUT -i eth0 -p udp --dport snmp -s 203.0.113.0/21 -j ACCEPT
-A INPUT -i eth0 -p tcp --dport snmp -s 203.0.113.0/21 -j ACCEPT --syn -m state --state NEW
\# Allow udp/tcp ntp connections on the admin network on Greenplum DCA.
-A INPUT -i eth0 -p udp --dport ntp -s 203.0.113.0/21 -j ACCEPT
-A INPUT -i eth0 -p tcp --dport ntp -s 203.0.113.0/21 -j ACCEPT --syn -m state --state NEW
# Allow ssh on all networks (This rule can be more strict).
-A INPUT -p tcp --dport ssh -j ACCEPT --syn -m state --state NEW
# Allow Greenplum Database on all networks.
-A INPUT -p tcp --dport postgres -j ACCEPT --syn -m state --state NEW
# Allow Greenplum Command Center on the customer facing network.
-A INPUT -i eth1 -p tcp --dport 28080 -j ACCEPT --syn -m state --state NEW
# Allow ping and any other icmp traffic on the interconnect networks.
-A INPUT -i eth4 -p icmp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth5 -p icmp -s 198.51.100.0/22 -j ACCEPT
\# Allow ping only on the admin network on Greenplum DCA.
-A INPUT -i eth0 -p icmp --icmp-type echo-request -s 203.0.113.0/21 -j ACCEPT
# Log an error if a packet passes through the rules to the default
# INPUT rule (a DROP).
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7
COMMIT
Example Segment Host iptables Rules
Example iptables
rules for the /etc/sysconfig/iptables
file on the Greenplum Database segment hosts. The rules for segment hosts are similar to the master rules with fewer interfaces and fewer udp
and tcp
services.
*filter
:INPUT DROP
:FORWARD DROP
:OUTPUT ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -i eth2 -p udp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth3 -p udp -s 198.51.100.0/22 -j ACCEPT
-A INPUT -i eth2 -p tcp -s 192.0.2.0/22 -j ACCEPT --syn -m state --state NEW
-A INPUT -i eth3 -p tcp -s 198.51.100.0/22 -j ACCEPT --syn -m state --state NEW
-A INPUT -i eth0 -p udp --dport snmp -s 203.0.113.0/21 -j ACCEPT
-A INPUT -i eth0 -p tcp --dport snmp -j ACCEPT --syn -m state --state NEW
-A INPUT -p tcp --dport ssh -j ACCEPT --syn -m state --state NEW
-A INPUT -i eth2 -p icmp -s 192.0.2.0/22 -j ACCEPT
-A INPUT -i eth3 -p icmp -s 198.51.100.0/22 -j ACCEPT
-A INPUT -i eth0 -p icmp --icmp-type echo-request -s 203.0.113.0/21 -j ACCEPT
-A INPUT -m limit --limit 5/min -j LOG --log-prefix "iptables denied: " --log-level 7
COMMIT
Parent topic: Configuring Your Systems and Installing Greenplum