List of Metrics

    There are two types of metrics in Alluxio, cluster-wide aggregated metrics, and per-process detailed metrics.

    • Cluster metrics are collected and calculated by the leading master and displayed in the metrics tab of the web UI. These metrics are designed to provide a snapshot of the cluster state and the overall amount of data and metadata served by Alluxio.

    • Process metrics are collected by each Alluxio process and exposed in a machine-readable format through any configured sinks. Process metrics are highly detailed and are intended to be consumed by third-party monitoring tools. Users can then view fine-grained dashboards with time-series graphs of each metric, such as data transferred or the number of RPC invocations.

    Metrics in Alluxio have the following format for master node metrics:

    Metrics in Alluxio have the following format for non-master node metrics:

    Tags are additional pieces of metadata for the metric such as user name or under storage location. Tags can be used to further filter or aggregate on various characteristics.

    Workers and clients send metrics data to the Alluxio master through heartbeats. The interval is defined by property and respectively.

    Bytes metrics are aggregated value from workers or clients. Bytes throughput metrics are calculated on the leading master. The values of bytes throughput metrics equal to bytes metrics counter value divided by the metrics record time and shown as bytes per minute.

    Default master metrics:

    NameTypeDescription
    Master.CompleteFileOpsCOUNTERTotal number of the CompleteFile operations
    Master.CreateDirectoryOpsCOUNTERTotal number of the CreateDirectory operations
    Master.CreateFileOpsCOUNTERTotal number of the CreateFile operations
    Master.DeletePathOpsCOUNTERTotal number of the Delete operations
    Master.DirectoriesCreatedCOUNTERTotal number of the succeed CreateDirectory operations
    Master.EdgeCacheSizeGAUGETotal number of edges (inode metadata) cached. The edge cache is responsible for managing the mapping from (parentId, childName) to childId.
    Master.FileBlockInfosGotCOUNTERTotal number of succeed GetFileBlockInfo operations
    Master.FileInfosGotCOUNTERTotal number of the succeed GetFileInfo operations
    Master.FilesCompletedCOUNTERTotal number of the succeed CompleteFile operations
    Master.FilesCreatedCOUNTERTotal number of the succeed CreateFile operations
    Master.FilesFreedCOUNTERTotal number of succeed FreeFile operations
    Master.FilesPersistedCOUNTERTotal number of successfully persisted files
    Master.FilesPinnedGAUGETotal number of currently pinned files
    Master.FreeFileOpsCOUNTERTotal number of FreeFile operations
    Master.GetFileBlockInfoOpsCOUNTERTotal number of GetFileBlockInfo operations
    Master.GetFileInfoOpsCOUNTERTotal number of the GetFileInfo operations
    Master.GetNewBlockOpsCOUNTERTotal number of the GetNewBlock operations
    Master.InodeCacheSizeGAUGETotal number of inodes (inode metadata) cached
    Master.JournalFlushFailureCOUNTERTotal number of failed journal flush
    Master.JournalFlushTimerTIMERThe timer statistics of journal flush
    Master.JournalGainPrimacyTimerTIMERThe timer statistics of journal gain primacy
    Master.LastBackupEntriesCountGAUGEThe total number of entries written in the last leading master metadata backup
    Master.LastBackupRestoreCountGAUGEThe total number of entries restored from backup when a leading master initializes its metadata
    Master.LastBackupRestoreTimeMsGAUGEThe process time of the last restore from backup
    Master.LastBackupTimeMsGAUGEThe process time of the last backup
    Master.ListingCacheSizeGAUGEThe size of master listing cache
    Master.MountOpsCOUNTERTotal number of Mount operations
    Master.NewBlocksGotCOUNTERTotal number of the succeed GetNewBlock operations
    Master.PathsDeletedCOUNTERTotal number of the succeed Delete operations
    Master.PathsMountedCOUNTERTotal number of succeed Mount operations
    Master.PathsRenamedCOUNTERTotal number of succeed Rename operations
    Master.PathsUnmountedCOUNTERTotal number of succeed Unmount operations
    Master.RenamePathOpsCOUNTERTotal number of Rename operations
    Master.SetAclOpsCOUNTERTotal number of SetAcl operations
    Master.SetAttributeOpsCOUNTERTotal number of SetAttribute operations
    Master.TotalPathsGAUGETotal number of files and directory in Alluxio namespace
    Master.UfsJournalFailureRecoverTimeTIMERThe timer statistics of ufs journal failure recover
    Master.UnmountOpsCOUNTERTotal number of Unmount operations

    Dynamically generated master metrics:

    Metric NameDescription
    Master.CapacityTotalTierTotal capacity in tier of the Alluxio file system in bytes
    Master.CapacityUsedTierUsed capacity in tier of the Alluxio file system in bytes
    Master.CapacityFreeTierFree capacity in tier of the Alluxio file system in bytes
    Master.UfsSessionCount-Ufs:The total number of currently opened UFS sessions to connect to the given
    Master..UFS:.UFS_TYPE:.User:The details UFS rpc operation done by the current master
    Master.PerUfsOp.UFS:The aggregated number of UFS operation ran on UFS by leading master
    Master.The duration statistics of RPC calls exposed on leading master

    Dynamically generated master metrics:

    Metric NameDescription
    Worker.UfsSessionCount-Ufs:The total number of currently opened UFS sessions to connect to the given
    Worker.The duration statistics of RPC calls exposed on workers

    Each client metric will be recorded with its local hostname or is configured. If is configured, multiple clients can be combined into a logical application.

    NameTypeDescription
    Client.BytesReadLocalCOUNTERTotal number of bytes short-circuit read from local storage by this client
    Client.BytesReadLocalThroughputMETERBytes throughput short-circuit read from local storage by this client
    Client.BytesWrittenLocalCOUNTERTotal number of bytes short-circuit written to local storage by this client
    Client.BytesWrittenLocalThroughputMETERBytes throughput short-circuit written to local storage by this client
    Client.BytesWrittenUfsCOUNTERTotal number of bytes write to Alluxio UFS by this client
    Client.CacheBytesEvictedMETERTotal number of bytes evicted from the client cache.
    Client.CacheBytesReadCacheMETERTotal number of bytes read from the client cache.
    Client.CacheBytesReadExternalMETERTotal number of bytes read from external storage due to a cache miss on the client cache.
    Client.CacheBytesRequestedExternalMETERTotal number of bytes the user requested to read which resulted in a cache miss. This number may be smaller than Client.CacheBytesReadExternal due to chunk reads.
    Client.CacheBytesWrittenCacheMETERTotal number of bytes written to the client cache.
    Client.CacheCreateErrorsCOUNTERNumber of failures when creating a cache in the client cache.
    Client.CacheDeleteErrorsCOUNTERNumber of failures when deleting cached data in the client cache.
    Client.CacheGetErrorsCOUNTERNumber of failures when getting cached data in the client cache.
    Client.CacheGetFailedReadErrorsCOUNTERNumber of failures when getting cached data in the client cache due to read failures from local storage.
    Client.CacheHitRateGAUGECache hit rate: (# bytes read from cache) / (# bytes requested).
    Client.CachePagesEvictedMETERTotal number of pages evicted from the client cache.
    Client.CachePutErrorsCOUNTERNumber of failures when putting cached data in the client cache.
    Client.CachePutFailedWriteErrorsCOUNTERNumber of failures when putting cached data in the client cache due to write failures to local storage.
    Client.CacheSpaceAvailableGAUGEAmount of bytes available in the client cache.
    Client.CacheSpaceUsedGAUGEAmount of bytes used by the client cache.

    The following metrics are collected on each instance (Master, Worker or Client).

    Metric NameDescription
    PS-MarkSweep.countTotal number of mark and sweep
    PS-MarkSweep.timeThe time used to mark and sweep
    PS-Scavenge.countTotal number of scavenge
    PS-Scavenge.timeThe time used to scavenge

    Alluxio provides overall and detailed memory usage information. Detailed memory usage information of code cache, compressed class space, metaspace, PS Eden space, PS old gen, and PS survivor space is collected in each process.

    A subset of the memory usage metrics are listed as following:

    Metric NameDescription
    total.committedThe amount of memory in bytes that is guaranteed to be available for use by the JVM
    total.initThe amount of the memory in bytes that is available for use by the JVM
    total.maxThe maximum amount of memory in bytes that is available for use by the JVM
    total.usedThe amount of memory currently used in bytes
    heap.committedThe amount of memory from heap area guaranteed to be available
    heap.initThe amount of memory from heap area available at initialization
    heap.maxThe maximum amount of memory from heap area that is available
    heap.usageThe amount of memory from heap area currently used in GB
    heap.usedThe amount of memory from heap area that has been used
    pools.Code-Cache.usedUsed memory of collection usage from the pool from which memory is used for compilation and storage of native code
    pools.Compressed-Class-Space.usedUsed memory of collection usage from the pool from which memory is use for class metadata
    pools.PS-Eden-Space.usedUsed memory of collection usage from the pool from which memory is initially allocated for most objects
    pools.PS-Survivor-Space.usedUsed memory of collection usage from the pool containing objects that have survived the garbage collection of the Eden space