The gp_toolkit Administrative Schema

    This documentation describes the most useful views in gp_toolkit. You may notice other objects (views, functions, and external tables) within the gp_toolkit schema that are not described in this documentation (these are supporting objects to the views described in this section).

    Warning: Do not change database objects in the gp_toolkit schema. Do not create database objects in the schema. Changes to objects in the schema might affect the accuracy of administrative information returned by schema objects. Any changes made in the gp_toolkit schema are lost when the database is backed up and then restored with the gpbackup and gprestore utilities.

    These are the categories for views in the gp_toolkit schema.

    Parent topic: Greenplum Database Reference Guide

    The following views can help identify tables that need routine table maintenance (VACUUM and/or ANALYZE).

    The VACUUM or VACUUM FULL command reclaims disk space occupied by deleted or obsolete rows. Because of the MVCC transaction concurrency model used in Greenplum Database, data rows that are deleted or updated still occupy physical space on disk even though they are not visible to any new transactions. Expired rows increase table size on disk and eventually slow down scans of the table.

    The ANALYZE command collects column-level statistics needed by the query optimizer. Greenplum Database uses a cost-based query optimizer that relies on database statistics. Accurate statistics allow the query optimizer to better estimate selectivity and the number of rows retrieved by a query operation in order to choose the most efficient query plan.

    Parent topic: The gp_toolkit Administrative Schema

    This view shows regular heap-storage tables that have bloat (the actual number of pages on disk exceeds the expected number of pages given the table statistics). Tables that are bloated require a VACUUM or a VACUUM FULL in order to reclaim disk space occupied by deleted or obsolete rows. This view is accessible to all users, however non-superusers will only be able to see the tables that they have permission to access.

    Note: For diagnostic functions that return append-optimized table information, see Checking Append-Optimized Tables.

    gp_stats_missing

    This view shows tables that do not have statistics and therefore may require an ANALYZE be run on the table.

    ColumnDescription
    smischemaSchema name.
    smitableTable name.
    smisizeDoes this table have statistics? False if the table does not have row count and row sizing statistics recorded in the system catalog, which may indicate that the table needs to be analyzed. This will also be false if the table does not contain any rows. For example, the parent tables of partitioned tables are always empty and will always return a false result.
    smicolsNumber of columns in the table.
    smirecsNumber of rows in the table.

    Checking for Locks

    When a transaction accesses a relation (such as a table), it acquires a lock. Depending on the type of lock acquired, subsequent transactions may have to wait before they can access the same relation. For more information on the types of locks, see “Managing Data” in the Greenplum Database Administrator Guide. Greenplum Database resource queues (used for resource management) also use locks to control the admission of queries into the system.

    The gp_locks_* family of views can help diagnose queries and sessions that are waiting to access an object due to a lock.

    Parent topic: The gp_toolkit Administrative Schema

    gp_locks_on_relation

    This view shows any locks currently being held on a relation, and the associated session information about the query associated with the lock. For more information on the types of locks, see “Managing Data” in the Greenplum Database Administrator Guide. This view is accessible to all users, however non-superusers will only be able to see the locks for relations that they have permission to access.

    ColumnDescription
    lorlocktypeType of the lockable object: relation, extend, page, tuple, transactionid, object, userlock, resource queue, or advisory
    lordatabaseObject ID of the database in which the object exists, zero if the object is a shared object.
    lorrelnameThe name of the relation.
    lorrelationThe object ID of the relation.
    lortransactionThe transaction ID that is affected by the lock.
    lorpidProcess ID of the server process holding or awaiting this lock. NULL if the lock is held by a prepared transaction.
    lormodeName of the lock mode held or desired by this process.
    lorgrantedDisplays whether the lock is granted (true) or not granted (false).
    lorcurrentqueryThe current query in the session.

    gp_locks_on_resqueue

    Note: The gp_locks_on_resqueue view is valid only when resource queue-based resource management is active.

    This view shows any locks currently being held on a resource queue, and the associated session information about the query associated with the lock. This view is accessible to all users, however non-superusers will only be able to see the locks associated with their own sessions.

    ColumnDescription
    lorusenameName of the user executing the session.
    lorrsqnameThe resource queue name.
    lorlocktypeType of the lockable object: resource queue
    lorobjidThe ID of the locked transaction.
    lortransactionThe ID of the transaction that is affected by the lock.
    lorpidThe process ID of the transaction that is affected by the lock.
    lormodeThe name of the lock mode held or desired by this process.
    lorgrantedDisplays whether the lock is granted (true) or not granted (false).
    lorwaitingDisplays whether or not the session is waiting.

    Checking Append-Optimized Tables

    The gp_toolkit schema includes a set of diagnostic functions you can use to investigate the state of append-optimized tables.

    When an append-optimized table (or column-oriented append-optimized table) is created, another table is implicitly created, containing metadata about the current state of the table. The metadata includes information such as the number of records in each of the table’s segments.

    Append-optimized tables may have non-visible rows—rows that have been updated or deleted, but remain in storage until the table is compacted using VACUUM. The hidden rows are tracked using an auxiliary visibility map table, or visimap.

    The following functions let you access the metadata for append-optimized and column-oriented tables and view non-visible rows. Some of the functions have two versions: one that takes the oid of the table, and one that takes the name of the table. The latter version has “_name” appended to the function name.

    Parent topic: The gp_toolkit Administrative Schema

    __gp_aovisimap_compaction_info(oid)

    This function displays compaction information for an append-optimized table. The information is for the on-disk data files on Greenplum Database segments that store the table data. You can use the information to determine the data files that will be compacted by a VACUUM operation on an append-optimized table.

    Note: Until a VACUUM operation deletes the row from the data file, deleted or updated data rows occupy physical space on disk even though they are hidden to new transactions. The configuration parameter gp_appendonly_compaction controls the functionality of the VACUUM command.

    This table describes the __gp_aovisimap_compaction_info function output table.

    ColumnDescription
    contentGreenplum Database segment ID.
    datafileID of the data file on the segment.
    compaction_possibleThe value is either t or f. The value t indicates that the data in data file be compacted when a VACUUM operation is performed.
    The server configuration parameter affects this value.
    hidden_tupcountIn the data file, the number of hidden (deleted or updated) rows.
    total_tupcountIn the data file, the total number of rows.
    percent_hiddenIn the data file, the ratio (as a percentage) of hidden (deleted or updated) rows to total rows.

    __gp_aoseg_name(‘table_name’)

    ColumnDescription
    segnoThe file segment number.
    eofThe effective end of file for this file segment.
    tupcountThe total number of tuples in the segment, including invisible tuples.
    varblockcountThe total number of varblocks in the file segment.
    eof_uncompressedThe end of file if the file segment were uncompressed.
    modcountThe number of data modification operations.
    stateThe state of the file segment. Indicates if the segment is active or ready to be dropped after compaction.

    __gp_aoseg_history(oid)

    This function returns metadata information contained in the append-optimized table’s on-disk segment file. It displays all different versions (heap tuples) of the aoseg meta information. The data is complex, but users with a deep understanding of the system may find it usefulfor debugging.

    The input argument is the oid of the append-optimized table.

    Call __gp_aoseg_history_name(‘table_name’) to get the same result with the table name as an argument.

    ColumnDescription
    gptidThe id of the tuple.
    gp_xminThe id of the earliest transaction.
    gp_xmin_statusStatus of the gp_xmin transaction.
    gp_xmin_commitThe commit distribution id of the gpxmin transaction.
    gp_xmaxThe id of the latest transaction.
    gp_xmax_statusThe status of the latest transaction.
    gp_xmax_commitThe commit distribution id of the gp_xmax transaction.
    gp_command_idThe id of the query command.
    gp_infomaskA bitmap containing state information.
    gp_update_tidThe ID of the newer tuple if the row is updated.
    gp_visibilityThe tuple visibility status.
    segnoThe number of the segment in the segment file.
    tupcountThe number of tuples, including hidden tuples.
    eofThe effective end of file for the segment.
    eof_uncompressedThe end of file for the segment if data were uncompressed.
    modcountA count of data modifications.
    stateThe status of the segment.

    __gp_aocsseg(oid)

    This function returns metadata information contained in a column-oriented append-optimized table’s on-disk segment file, excluding non-visible rows. Each row describes a segment for a column in the table.

    The input argument is the oid of a column-oriented append-optimized table. Call as __gp_aocsseg_name(‘table_name’) to get the same result with the table name as an argument.

    ColumnDescription
    gp_tidThe table id.
    segnoThe segment number.
    column_numThe column number.
    physical_segnoThe number of the segment in the segment file.
    tupcountThe number of rows in the segment, excluding hidden tuples.
    eofThe effective end of file for the segment.
    eof_uncompressedThe end of file for the segment if the data were uncompressed.
    modcountA count of data modification operations for the segment.
    stateThe status of the segment.

    __gp_aocsseg_history(oid)

    This function returns metadata information contained in a column-oriented append-optimized table’s on-disk segment file. Each row describes a segment for a column in the table. The data is complex, but users with a deep understanding of the system may find it useful for debugging.

    The input argument is the oid of a column-oriented append-optimized table. Call as __gp_aocsseg_history_name(‘table_name’) to get the same result with the table name as argument.

    ColumnDescription
    gptidThe oid of the tuple.
    gp_xminThe earliest transaction.
    gp_xmin_statusThe status of the gp_xmin transaction.
    gp_xminText representation of gpxmin.
    gp_xmaxThe latest transaction.
    gp_xmax_statusThe status of the gp_xmax transaction.
    gp_xmaxText representation of gp_max.
    gp_command_idID of the command operating on the tuple.
    gp_infomaskA bitmap containing state information.
    gp_update_tidThe ID of the newer tuple if the row is updated.
    gp_visibilityThe tuple visibility status.
    segnoThe segment number in the segment file.
    column_numThe column number.
    physical_segnoThe segment containing data for the column.
    tupcountThe total number of tuples in the segment.
    eofThe effective end of file for the segment.
    eof_uncompressedThe end of file for the segment if the data were uncompressed.
    modcountA count of the data modification operations.
    stateThe state of the segment.

    __gp_aovisimap(oid)

    This function returns the tuple id, the segment file, and the row number of each non-visible tuple according to the visibility map.

    The input argument is the oid of an append-optimized table.

    Use __gp_aovisimap_name(‘table_name’) to get the same result with the table name as argument.

    ColumnDescription
    tidThe tuple id.
    segnoThe number of the segment file.
    row_numThe row number of a row that has been deleted or updated.

    __gp_aovisimap_hidden_info(oid)

    This function returns the numbers of hidden and visible tuples in the segment files for an append-optimized table.

    The input argument is the oid of the append-optimized table.

    Call __gp_aovisimap_hidden_info_name(‘table_name’) to get the same result with a table name argument.

    ColumnDescription
    segnoThe number of the segment file.
    hidden_tupcountThe number of hidden tuples in the segment file.
    total_tupcountThe total number of tuples in the segment file.

    __gp_aovisimap_entry(oid)

    This function returns information about each visibility map entry for the table.

    The input argument is the oid of an append-optimized table.

    Call __gp_aovisimap_entry_name(‘table_name’) to get the same result with a table name argument.

    ColumnDescription
    segnoSegment number of the visibility map entry.
    first_row_numThe first row number of the entry.
    hidden_tupcountThe number of hidden tuples in the entry.
    bitmapA text representation of the visibility bitmap.

    Viewing Greenplum Database Server Log Files

    Each component of a Greenplum Database system (master, standby master, primary segments, and mirror segments) keeps its own server log files. The gp_log_* family of views allows you to issue SQL queries against the server log files to find particular entries of interest. The use of these views require superuser permissions.

    Parent topic: The gp_toolkit Administrative Schema

    gp_log_command_timings

    This view uses an external table to read the log files on the master and report the execution time of SQL commands executed in a database session. The use of this view requires superuser permissions.

    ColumnDescription
    logsessionThe session identifier (prefixed with “con”).
    logcmdcountThe command number within a session (prefixed with “cmd”).
    logdatabaseThe name of the database.
    loguserThe name of the database user.
    logpidThe process id (prefixed with “p”).
    logtimeminThe time of the first log message for this command.
    logtimemaxThe time of the last log message for this command.
    logdurationStatement duration from start to end time.

    gp_log_database

    This view uses an external table to read the server log files of the entire Greenplum system (master, segments, and mirrors) and lists log entries associated with the current database. Associated log entries can be identified by the session id (logsession) and command id (logcmdcount). The use of this view requires superuser permissions.

    ColumnDescription
    logtimeThe timestamp of the log message.
    loguserThe name of the database user.
    logdatabaseThe name of the database.
    logpidThe associated process id (prefixed with “p”).
    logthreadThe associated thread count (prefixed with “th”).
    loghostThe segment or master host name.
    logportThe segment or master port.
    logsessiontimeTime session connection was opened.
    logtransactionGlobal transaction id.
    logsessionThe session identifier (prefixed with “con”).
    logcmdcountThe command number within a session (prefixed with “cmd”).
    logsegmentThe segment content identifier (prefixed with “seg” for primary or “mir” for mirror. The master always has a content id of -1).
    logsliceThe slice id (portion of the query plan being executed).
    logdistxactDistributed transaction id.
    loglocalxactLocal transaction id.
    logsubxactSubtransaction id.
    logseverityLOG, ERROR, FATAL, PANIC, DEBUG1 or DEBUG2.
    logstateSQL state code associated with the log message.
    logmessageLog or error message text.
    logdetailDetail message text associated with an error message.
    loghintHint message text associated with an error message.
    logqueryThe internally-generated query text.
    logqueryposThe cursor index into the internally-generated query text.
    logcontextThe context in which this message gets generated.
    logdebugQuery string with full detail for debugging.
    logcursorposThe cursor index into the query string.
    logfunctionThe function in which this message is generated.
    logfileThe log file in which this message is generated.
    loglineThe line in the log file in which this message is generated.
    logstackFull text of the stack trace associated with this message.

    This view uses an external table to read a subset of the log fields from the master log file. The use of this view requires superuser permissions.

    gp_log_system

    This view uses an external table to read the server log files of the entire Greenplum system (master, segments, and mirrors) and lists all log entries. Associated log entries can be identified by the session id (logsession) and command id (logcmdcount). The use of this view requires superuser permissions.

    ColumnDescription
    logtimeThe timestamp of the log message.
    loguserThe name of the database user.
    logdatabaseThe name of the database.
    logpidThe associated process id (prefixed with “p”).
    logthreadThe associated thread count (prefixed with “th”).
    loghostThe segment or master host name.
    logportThe segment or master port.
    logsessiontimeTime session connection was opened.
    logtransactionGlobal transaction id.
    logsessionThe session identifier (prefixed with “con”).
    logcmdcountThe command number within a session (prefixed with “cmd”).
    logsegmentThe segment content identifier (prefixed with “seg” for primary or “mir” for mirror. The master always has a content id of -1).
    logsliceThe slice id (portion of the query plan being executed).
    logdistxactDistributed transaction id.
    loglocalxactLocal transaction id.
    logsubxactSubtransaction id.
    logseverityLOG, ERROR, FATAL, PANIC, DEBUG1 or DEBUG2.
    logstateSQL state code associated with the log message.
    logmessageLog or error message text.
    logdetailDetail message text associated with an error message.
    loghintHint message text associated with an error message.
    logqueryThe internally-generated query text.
    logqueryposThe cursor index into the internally-generated query text.
    logcontextThe context in which this message gets generated.
    logdebugQuery string with full detail for debugging.
    logcursorposThe cursor index into the query string.
    logfunctionThe function in which this message is generated.
    logfileThe log file in which this message is generated.
    loglineThe line in the log file in which this message is generated.
    logstackFull text of the stack trace associated with this message.

    Each component of a Greenplum Database system (master, standby master, primary segments, and mirror segments) has its own server configuration file (postgresql.conf). The following gp_toolkit objects can be used to check parameter settings across all primary postgresql.conf files in the system:

    Parent topic: The gp_toolkit Administrative Schema

    gp_param_setting(‘parameter_name’)

    This function takes the name of a server configuration parameter and returns the postgresql.conf value for the master and each active segment. This function is accessible to all users.

    ColumnDescription
    paramsegmentThe segment content id (only active segments are shown). The master content id is always -1.
    paramnameThe name of the parameter.
    paramvalueThe value of the parameter.

    Example:

    1. SELECT * FROM gp_param_setting('max_connections');

    gp_param_settings_seg_value_diffs

    Server configuration parameters that are classified as local parameters (meaning each segment gets the parameter value from its own postgresql.conf file), should be set identically on all segments. This view shows local parameter settings that are inconsistent. Parameters that are supposed to have different values (such as port) are not included. This view is accessible to all users.

    ColumnDescription
    psdnameThe name of the parameter.
    psdvalueThe value of the parameter.
    psdcountThe number of segments that have this value.

    Checking for Failed Segments

    The view can be used to check for down segments.

    Parent topic: The gp_toolkit Administrative Schema

    gp_pgdatabase_invalid

    This view shows information about segments that are marked as down in the system catalog. This view is accessible to all users.

    ColumnDescription
    pgdbidbidThe segment dbid. Every segment has a unique dbid.
    pgdbiisprimaryIs the segment currently acting as the primary (active) segment? (t or f)
    pgdbicontentThe content id of this segment. A primary and mirror will have the same content id.
    pgdbivalidIs this segment up and valid? (t or f)
    pgdbidefinedprimaryWas this segment assigned the role of primary at system initialization time? (t or f)

    Checking Resource Group Activity and Status

    Note: The resource group activity and status views described in this section are valid only when resource group-based resource management is active.

    Resource groups manage transactions to avoid exhausting system CPU and memory resources. Every database user is assigned a resource group. Greenplum Database evaluates every transaction submitted by a user against the limits configured for the user’s resource group before running the transaction.

    You can use the gp_resgroup_config view to check the configuration of each resource group. You can use the gp_resgroup_status view to display the current transaction status and resource usage of each resource group.

    Parent topic:

    gp_resgroup_config

    The gp_resgroup_config view allows administrators to see the current CPU, memory, and concurrency limits for a resource group. The view also displays proposed limit settings. A proposed limit will differ from the current limit when the limit has been altered, but the new value could not be immediately applied.

    This view is accessible to all users.

    ColumnDescription
    groupidThe ID of the resource group.
    groupnameThe name of the resource group.
    concurrencyThe concurrency (CONCURRENCY) value specified for the resource group.
    proposed_concurrencyThe pending concurrency value for the resource group.
    cpu_rate_limitThe CPU limit (CPU_RATE_LIMIT) value specified for the resource group, or -1.
    memory_limitThe memory limit (MEMORY_LIMIT) value specified for the resource group.
    proposed_memory_limitThe pending memory limit value for the resource group.
    memory_shared_quotaThe shared memory quota (MEMORY_SHARED_QUOTA) value specified for the resource group.
    proposed_memory_shared_quotaThe pending shared memory quota value for the resource group.
    memory_spill_ratioThe memory spill ratio (MEMORY_SPILL_RATIO) value specified for the resource group.
    proposed_memory_spill_ratioThe pending memory spill ratio value for the resource group.
    memory_auditorThe memory auditor for the resource group.
    cpusetThe CPU cores reserved for the resource group, or -1.

    gp_resgroup_status

    The gp_resgroup_status view allows administrators to see status and activity for a resource group. It shows how many queries are waiting to run and how many queries are currently active in the system for each resource group. The view also displays current memory and CPU usage for the resource group.

    Note: Resource groups use the Linux control groups (cgroups) configured on the host systems. The cgroups are used to manage host system resources. When resource groups use cgroups that are as part of a nested set of cgroups, resource group limits are relative to the parent cgroup allotment. For information about nested cgroups and Greenplum Database resource group limits, see Understanding Role and Component Resource Groups.

    ColumnDescription
    rsgnameThe name of the resource group.
    groupidThe ID of the resource group.
    num_runningThe number of transactions currently executing in the resource group.
    num_queueingThe number of currently queued transactions for the resource group.
    num_queuedThe total number of queued transactions for the resource group since the Greenplum Database cluster was last started, excluding the num_queueing.
    num_executedThe total number of executed transactions in the resource group since the Greenplum Database cluster was last started, excluding the num_running.
    total_queue_durationThe total time any transaction was queued since the Greenplum Database cluster was last started.
    cpu_usageThe real-time CPU usage of the resource group on each Greenplum Database segment’s host.
    memory_usageThe real-time memory usage of the resource group on each Greenplum Database segment’s host.

    The cpu_usage field is a JSON-formatted, key:value string that identifies, for each resource group, the per-segment CPU usage percentage. The key is segment id, the value is the percentage of CPU usage by the resource group on the segment host. The total CPU usage of all segments running on a segment host should not exceed the gp_resource_group_cpu_limit. Example cpu_usage column output:

    In this example, segment 0 and segment 1 are running on the same host; their CPU usage is the same.

    The memory_usage field is also a JSON-formatted, key:value string. The string contents differ depending upon the type of resource group. For each resource group that you assign to a role (default memory auditor vmtracker), this string identifies the used, available, granted, and proposed fixed and shared memory quota allocations on each segment. The key is segment id. The values are memory values displayed in MB units. The following example shows memory_usage column output for a single segment for a resource group that you assign to a role:

    1. "0":{"used":0, "available":76, "quota_used":-1, "quota_available":60, "quota_granted":60, "quota_proposed":60, "shared_used":0, "shared_available":16, "shared_granted":16, "shared_proposed":16}

    For each resource group that you assign to an external component, the memory_usage JSON-formatted string identifies the memory used and the memory limit on each segment. The following example shows memory_usage column output for an external component resource group for a single segment:

    Checking Resource Queue Activity and Status

    Note: The resource queue activity and status views described in this section are valid only when resource queue-based resource management is active.

    The purpose of resource queues is to limit the number of active queries in the system at any given time in order to avoid exhausting system resources such as memory, CPU, and disk I/O. All database users are assigned to a resource queue, and every statement submitted by a user is first evaluated against the resource queue limits before it can run. The gp_resq_* family of views can be used to check the status of statements currently submitted to the system through their respective resource queue. Note that statements issued by superusers are exempt from resource queuing.

    Parent topic:

    gp_resq_activity

    For the resource queues that have active workload, this view shows one row for each active statement submitted through a resource queue. This view is accessible to all users.

    ColumnDescription
    resqprocpidProcess ID assigned to this statement (on the master).
    resqroleUser name.
    resqoidResource queue object id.
    resqnameResource queue name.
    resqstartTime statement was issued to the system.
    resqstatusStatus of statement: running, waiting or cancelled.

    gp_resq_activity_by_queue

    For the resource queues that have active workload, this view shows a summary of queue activity. This view is accessible to all users.

    ColumnDescription
    resqoidResource queue object id.
    resqnameResource queue name.
    resqlastTime of the last statement issued to the queue.
    resqstatusStatus of last statement: running, waiting or cancelled.
    resqtotalTotal statements in this queue.

    gp_resq_priority_statement

    This view shows the resource queue priority, session ID, and other information for all statements currently running in the Greenplum Database system. This view is accessible to all users.

    ColumnDescription
    rqpdatnameThe database name that the session is connected to.
    rqpusenameThe user who issued the statement.
    rqpsessionThe session ID.
    rqpcommandThe number of the statement within this session (the command id and session id uniquely identify a statement).
    rqppriorityThe resource queue priority for this statement (MAX, HIGH, MEDIUM, LOW).
    rqpweightAn integer value associated with the priority of this statement.
    rqpqueryThe query text of the statement.

    gp_resq_role

    This view shows the resource queues associated with a role. This view is accessible to all users.

    ColumnDescription
    rrrolnameRole (user) name.
    rrrsqnameThe resource queue name assigned to this role. If a role has not been explicitly assigned to a resource queue, it will be in the default resource queue (pg_default).

    gp_resqueue_status

    This view allows administrators to see status and activity for a resource queue. It shows how many queries are waiting to run and how many queries are currently active in the system from a particular resource queue.

    ColumnDescription
    queueidThe ID of the resource queue.
    rsqnameThe name of the resource queue.
    rsqcountlimitThe active query threshold of the resource queue. A value of -1 means no limit.
    rsqcountvalueThe number of active query slots currently being used in the resource queue.
    rsqcostlimitThe query cost threshold of the resource queue. A value of -1 means no limit.
    rsqcostvalueThe total cost of all statements currently in the resource queue.
    rsqmemorylimitThe memory limit for the resource queue.
    The total memory used by all statements currently in the resource queue.
    rsqwaitersThe number of statements currently waiting in the resource queue.
    rsqholdersThe number of statements currently running on the system from this resource queue.

    The gp_workfile_* views show information about all the queries that are currently using disk spill space. Greenplum Database creates work files on disk if it does not have sufficient memory to execute the query in memory. This information can be used for troubleshooting and tuning queries. The information in the views can also be used to specify the values for the Greenplum Database configuration parameters gp_workfile_limit_per_query and gp_workfile_limit_per_segment.

    Parent topic:

    gp_workfile_entries

    This view contains one row for each operator using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

    ColumnTypeReferencesDescription
    command_cntinteger Command ID of the query.
    contentsmallint The content identifier for a segment instance.
    current_querytext Current query that the process is running.
    datnamename Greenplum database name.
    directorytext Path to the work file.
    optypetext The query operator type that created the work file.
    procpidinteger Process ID of the server process.
    sess_idinteger Session ID.
    sizebigint The size of the work file in bytes.
    numfilesbigint The number of files created.
    slicesmallint The query plan slice. The portion of the query plan that is being executed.
    statetext The state of the query that created the work file.
    usenamename Role name.
    workmeminteger The amount of memory allocated to the operator in KB.

    gp_workfile_usage_per_query

    This view contains one row for each query using disk space for workfiles on a segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

    ColumnTypeReferencesDescription
    command_cntinteger Command ID of the query.
    contentsmallint The content identifier for a segment instance.
    current_querytext Current query that the process is running.
    datnamename Greenplum database name.
    procpidinteger Process ID of the server process.
    sess_idinteger Session ID.
    sizebigint The size of the work file in bytes.
    numfilesbigint The number of files created.
    statetext The state of the query that created the work file.
    usenamename Role name.

    This view contains one row for each segment. Each row displays the total amount of disk space used for workfiles on the segment at the current time. The view is accessible to all users, however non-superusers only to see information for the databases that they have permission to access.

    Viewing Users and Groups (Roles)

    It is frequently convenient to group users (roles) together to ease management of object privileges: that way, privileges can be granted to, or revoked from, a group as a whole. In Greenplum Database this is done by creating a role that represents the group, and then granting membership in the group role to individual user roles.

    The gp_roles_assigned view can be used to see all of the roles in the system, and their assigned members (if the role is also a group role).

    Parent topic:

    gp_roles_assigned

    This view shows all of the roles in the system, and their assigned members (if the role is also a group role). This view is accessible to all users.

    ColumnDescription
    raroleidThe role object ID. If this role has members (users), it is considered a group role.
    rarolenameThe role (user or group) name.
    ramemberidThe role object ID of the role that is a member of this role.
    ramembernameName of the role that is a member of this role.

    Checking Database Object Sizes and Disk Space

    The gp_size_* family of views can be used to determine the disk space usage for a distributed Greenplum Database, schema, table, or index. The following views calculate the total size of an object across all primary segments (mirrors are not included in the size calculations).

    The table and index sizing views list the relation by object ID (not by name). To check the size of a table or index by name, you must look up the relation name (relname) in the pg_class table. For example:

    1. SELECT relname as name, sotdsize as size, sotdtoastsize as
    2. toast, sotdadditionalsize as other
    3. FROM gp_size_of_table_disk as sotd, pg_class

    Parent topic: The gp_toolkit Administrative Schema

    gp_size_of_all_table_indexes

    This view shows the total size of all indexes for a table. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.

    ColumnDescription
    soatioidThe object ID of the table
    soatisizeThe total size of all table indexes in bytes
    soatischemanameThe schema name
    soatitablenameThe table name

    gp_size_of_database

    This view shows the total size of a database. This view is accessible to all users, however non-superusers will only be able to see databases that they have permission to access.

    ColumnDescription
    sodddatnameThe name of the database
    sodddatsizeThe size of the database in bytes

    gp_size_of_index

    This view shows the total size of an index. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.

    ColumnDescription
    soioidThe object ID of the index
    soitableoidThe object ID of the table to which the index belongs
    soisizeThe size of the index in bytes
    soiindexschemanameThe name of the index schema
    soiindexnameThe name of the index
    soitableschemanameThe name of the table schema
    soitablenameThe name of the table

    gp_size_of_partition_and_indexes_disk

    This view shows the size on disk of partitioned child tables and their indexes. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access..

    ColumnDescription
    sopaidparentoidThe object ID of the parent table
    sopaidpartitionoidThe object ID of the partition table
    sopaidpartitiontablesizeThe partition table size in bytes
    sopaidpartitionindexessizeThe total size of all indexes on this partition
    SopaidparentschemanameThe name of the parent schema
    SopaidparenttablenameThe name of the parent table
    SopaidpartitionschemanameThe name of the partition schema
    sopaidpartitiontablenameThe name of the partition table

    gp_size_of_schema_disk

    This view shows schema sizes for the public schema and the user-created schemas in the current database. This view is accessible to all users, however non-superusers will be able to see only the schemas that they have permission to access.

    ColumnDescription
    sosdnspThe name of the schema
    sosdschematablesizeThe total size of tables in the schema in bytes
    sosdschemaidxsizeThe total size of indexes in the schema in bytes

    gp_size_of_table_and_indexes_disk

    This view shows the size on disk of tables and their indexes. This view is accessible to all users, however non-superusers will only be able to see relations that they have permission to access.

    ColumnDescription
    sotaidoidThe object ID of the parent table
    sotaidtablesizeThe disk size of the table
    sotaididxsizeThe total size of all indexes on the table
    sotaidschemanameThe name of the schema
    sotaidtablenameThe name of the table

    gp_size_of_table_and_indexes_licensing

    This view shows the total size of tables and their indexes for licensing purposes. The use of this view requires superuser permissions.

    ColumnDescription
    sotailoidThe object ID of the table
    sotailtablesizediskThe total disk size of the table
    sotailtablesizeuncompressedIf the table is a compressed append-optimized table, shows the uncompressed table size in bytes.
    sotailindexessizeThe total size of all indexes in the table
    sotailschemanameThe schema name
    sotailtablenameThe table name

    gp_size_of_table_disk

    This view shows the size of a table on disk. This view is accessible to all users, however non-superusers will only be able to see tables that they have permission to access

    ColumnDescription
    sotdoidThe object ID of the table
    sotdsizeThe size of the table in bytes. The size is only the main table size. The size does not include auxiliary objects such as oversized (toast) attributes, or additional storage objects for AO tables.
    sotdtoastsizeThe size of the TOAST table (oversized attribute storage), if there is one.
    sotdadditionalsizeReflects the segment and block directory table sizes for append-optimized (AO) tables.
    sotdschemanameThe schema name
    sotdtablenameThe table name

    gp_size_of_table_uncompressed

    This view shows the uncompressed table size for append-optimized (AO) tables. Otherwise, the table size on disk is shown. The use of this view requires superuser permissions.

    ColumnDescription
    sotuoidThe object ID of the table
    sotusizeThe uncomressed size of the table in bytes if it is a compressed AO table. Otherwise, the table size on disk.
    sotuschemanameThe schema name
    sotutablenameThe table name

    gp_disk_free

    This external table runs the df (disk free) command on the active segment hosts and reports back the results. Inactive mirrors are not included in the calculation. The use of this external table requires superuser permissions.

    ColumnDescription
    dfsegmentThe content id of the segment (only active segments are shown)
    dfhostnameThe hostname of the segment host
    dfdeviceThe device name
    dfspaceFree disk space in the segment file system in kilobytes

    Checking for Uneven Data Distribution

    All tables in Greenplum Database are distributed, meaning their data is divided across all of the segments in the system. If the data is not distributed evenly, then query processing performance may decrease. The following views can help diagnose if a table has uneven data distribution:

    Parent topic:

    gp_skew_coefficients

    This view shows data distribution skew by calculating the coefficient of variation (CV) for the data stored on each segment. This view is accessible to all users, however non-superusers will only be able to see tables that they have permission to access

    ColumnDescription
    skcoidThe object id of the table.
    skcnamespaceThe namespace where the table is defined.
    skcrelnameThe table name.
    skccoeffThe coefficient of variation (CV) is calculated as the standard deviation divided by the average. It takes into account both the average and variability around the average of a data series. The lower the value, the better. Higher values indicate greater data skew.

    gp_skew_idle_fractions

    ColumnDescription
    sifoidThe object id of the table.
    sifnamespaceThe namespace where the table is defined.
    sifrelnameThe table name.
    siffractionThe percentage of the system that is idle during a table scan, which is an indicator of uneven data distribution or query processing skew. For example, a value of 0.1 indicates 10% skew, a value of 0.5 indicates 50% skew, and so on. Tables that have more than 10% skew should have their distribution policies evaluated.