Common Configurations

    A default scheme (and authority) is used if paths to files do not explicitly specify a file system scheme (and authority).

    For example, if the default file system configured as , then a file path of /user/hugo/in.txt is interpreted as hdfs://localhost:9000/user/hugo/in.txt.

    Connection limiting

    For example, small HDFS clusters with few RPC handlers can sometimes be overwhelmed by a large Flink job trying to build up many connections during a checkpoint.

    To limit a specific file system’s connections, add the following entries to the Flink configuration. The file system to be limited is identified by its scheme.

    1. fs.<scheme>.limit.total: (number, 0/-1 mean no limit)
    2. fs.<scheme>.limit.stream-timeout: (milliseconds, 0 means infinite)

    To prevent inactive streams from taking up the full pool (preventing new connections to be opened), you can add an inactivity timeout which forcibly closes them if they do not read/write any bytes for at least that amount of time: fs.<scheme>.limit.stream-timeout.

    Limit enforcement on a per TaskManager/file system basis. Because file systems creation occurs per scheme and authority, different authorities have independent connection pools. For example hdfs://myhdfs:50010/ and hdfs://anotherhdfs:4399/ will have separate pools.