String Processing

    Name: REGEXMATCH

    Input Series: Only support a single input series. The data type is TEXT.

    Parameter:

    • : The regular expression to match in the text. All grammars supported by Java are acceptable, for example, \d+\.\d+\.\d+\.\d+ is expected to match any IPv4 addresses.
    • group: The wanted group index in the matched result. Reference to java.util.regex, group 0 is the whole pattern and the next ones are numbered with the appearance order of left parentheses. For example, the groups in A(B(CD)) are: 0-A(B(CD)), 1-B(CD), 2-CD.

    Output Series: Output a single series. The type is TEXT.

    Note: Those points with null values or not matched with the given pattern will not return any results.

    Examples

    Input series:

    SQL for query:

    1. select regexmatch(s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0") from root.test.d1

    Output series:

    1. +-----------------------------+----------------------------------------------------------------------+
    2. | Time|regexmatch(root.test.d1.s1, "regex"="\d+\.\d+\.\d+\.\d+", "group"="0")|
    3. +-----------------------------+----------------------------------------------------------------------+
    4. |2021-01-01T00:00:01.000+08:00| 192.168.0.1|
    5. |2021-01-01T00:00:02.000+08:00| 192.168.0.24|
    6. |2021-01-01T00:00:03.000+08:00| 192.168.0.2|
    7. |2021-01-01T00:00:04.000+08:00| 192.168.0.5|
    8. |2021-01-01T00:00:05.000+08:00| 192.168.0.124|
    9. +-----------------------------+----------------------------------------------------------------------+

    The function is used to replace the specific regular expression matches with given string.

    Name: REGEXREPLACE

    Input Series: Only support a single input series. The data type is TEXT.

    Parameter:

    • regex: The target regular expression to be replaced. All grammars supported by Java are acceptable.
    • replace: The string to be put on and back reference notes in Java is also supported, for example, ‘$1’ refers to group 1 in the regex which will be filled with corresponding matched results.
    • limit: The number of matches to be replaced which should be an integer no less than -1, default to -1 which means all matches will be replaced.
    • offset: The number of matches to be skipped, which means the first offset matches will not be replaced, default to 0.

    Examples

    Input series:

    1. +-----------------------------+-------------------------------+
    2. | Time| root.test.d1.s1|
    3. +-----------------------------+-------------------------------+
    4. |2021-01-01T00:00:01.000+08:00| [192.168.0.1] [SUCCESS]|
    5. |2021-01-01T00:00:02.000+08:00| [192.168.0.24] [SUCCESS]|
    6. |2021-01-01T00:00:03.000+08:00| [192.168.0.2] [FAIL]|
    7. |2021-01-01T00:00:05.000+08:00| [192.168.0.124] [SUCCESS]|
    8. +-----------------------------+-------------------------------+

    SQL for query:

    1. select regexreplace(s1, "regex"="192\.168\.0\.(\d+)", "replace"="cluster-$1", "limit"="1") from root.test.d1

    Output series:

    The function is used to split text with given regular expression and return specific element.

    Name: REGEXSPLIT

    Input Series: Only support a single input series. The data type is TEXT.

    Parameter:

    • regex: The regular expression used to split the text. All grammars supported by Java are acceptable, for example, ['"] is expected to match ' and ".
    • index: The wanted index of elements in the split result. It should be an integer no less than -1, default to -1 which means the length of the result array is returned and any non-negative integer is used to fetch the text of the specific index starting from 0.

    Output Series: Output a single series. The type is INT32 when index is -1 and TEXT when it’s an valid index.

    Note: When index is out of the range of the result array, for example 0,1,2 split with , and index is set to 3, no result are returned for that record.

    Examples

    Input series:

    1. +-----------------------------+---------------+
    2. | Time|root.test.d1.s1|
    3. +-----------------------------+---------------+
    4. |2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
    5. |2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
    6. |2021-01-01T00:00:03.000+08:00| B+,B,B|
    7. |2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
    8. |2021-01-01T00:00:05.000+08:00| A,B-,B,B|
    9. +-----------------------------+---------------+

    SQL for query:

    1. select regexsplit(s1, "regex"=",", "index"="-1") from root.test.d1

    Output series:

    1. +-----------------------------+------------------------------------------------------+
    2. | Time|regexsplit(root.test.d1.s1, "regex"=",", "index"="-1")|
    3. +-----------------------------+------------------------------------------------------+
    4. |2021-01-01T00:00:01.000+08:00| 4|
    5. |2021-01-01T00:00:02.000+08:00| 4|
    6. |2021-01-01T00:00:03.000+08:00| 3|
    7. |2021-01-01T00:00:05.000+08:00| 4|
    8. +-----------------------------+------------------------------------------------------+

    SQL for query:

    1. select regexsplit(s1, "regex"=",", "index"="3") from root.test.d1

    Output series:

    The function is used to replace the specific substring with given string.

    Name: STRREPLACE

    Input Series: Only support a single input series. The data type is TEXT.

    Parameter:

    • target: The target substring to be replaced.
    • replace: The string to be put on.
    • limit: The number of matches to be replaced which should be an integer no less than -1, default to -1 which means all matches will be replaced.
    • offset: The number of matches to be skipped, which means the first offset matches will not be replaced, default to 0.
    • reverse: Whether to count all the matches reversely, default to ‘false’.

    Output Series: Output a single series. The type is TEXT.

    Examples

    Input series:

    1. +-----------------------------+---------------+
    2. | Time|root.test.d1.s1|
    3. +-----------------------------+---------------+
    4. |2021-01-01T00:00:01.000+08:00| A,B,A+,B-|
    5. |2021-01-01T00:00:02.000+08:00| A,A+,A,B+|
    6. |2021-01-01T00:00:03.000+08:00| B+,B,B|
    7. |2021-01-01T00:00:04.000+08:00| A+,A,A+,A|
    8. |2021-01-01T00:00:05.000+08:00| A,B-,B,B|
    9. +-----------------------------+---------------+

    SQL for query:

    1. select strreplace(s1, "target"=",", "replace"="/", "limit"="2") from root.test.d1

    Output series:

    1. +-----------------------------+-----------------------------------------+
    2. | Time|strreplace(root.test.d1.s1, "target"=",",|
    3. | | "replace"="/", "limit"="2")|
    4. +-----------------------------+-----------------------------------------+
    5. |2021-01-01T00:00:01.000+08:00| A/B/A+,B-|
    6. |2021-01-01T00:00:02.000+08:00| A/A+/A,B+|
    7. |2021-01-01T00:00:03.000+08:00| B+/B/B|
    8. |2021-01-01T00:00:04.000+08:00| A+/A/A+,A|
    9. |2021-01-01T00:00:05.000+08:00| A/B-/B,B|
    10. +-----------------------------+-----------------------------------------+

    Another SQL for query:

      Output series: