Data Repairing

    Name: TIMESTAMPREPAIR

    Input Series: Only support a single input series. The data type is INT32 / INT64 / FLOAT / DOUBLE.

    Parameters:

    • : The standard time interval whose unit is millisecond. It is a positive integer. By default, it will be estimated according to the given method.
    • method: The method to estimate the standard time interval, which is ‘median’, ‘mode’ or ‘cluster’. This parameter is only valid when interval is not given. By default, median will be used.

    Output Series: Output a single series. The type is the same as the input. This series is the input after repairing.

    Manually Specify the Standard Time Interval

    When interval is given, this function repairs according to the given standard time interval.

    Input series:

    SQL for query:

    1. select timestamprepair(s1,'interval'='10000') from root.test.d2

    Output series:

    1. +-----------------------------+----------------------------------------------------+
    2. | Time|timestamprepair(root.test.d2.s1, "interval"="10000")|
    3. +-----------------------------+----------------------------------------------------+
    4. |2021-07-01T12:00:00.000+08:00| 1.0|
    5. |2021-07-01T12:00:10.000+08:00| 2.0|
    6. |2021-07-01T12:00:20.000+08:00| 3.0|
    7. |2021-07-01T12:00:30.000+08:00| 4.0|
    8. |2021-07-01T12:00:40.000+08:00| 5.0|
    9. |2021-07-01T12:00:50.000+08:00| 6.0|
    10. |2021-07-01T12:01:00.000+08:00| 7.0|
    11. |2021-07-01T12:01:10.000+08:00| 8.0|
    12. |2021-07-01T12:01:20.000+08:00| 9.0|
    13. |2021-07-01T12:01:30.000+08:00| 10.0|
    14. +-----------------------------+----------------------------------------------------+

    Automatically Estimate the Standard Time Interval

    When interval is default, this function estimates the standard time interval.

    Input series is the same as above, the SQL for query is shown below:

    1. select timestamprepair(s1) from root.test.d2

    Output series:

    1. +-----------------------------+--------------------------------+
    2. | Time|timestamprepair(root.test.d2.s1)|
    3. +-----------------------------+--------------------------------+
    4. |2021-07-01T12:00:00.000+08:00| 1.0|
    5. |2021-07-01T12:00:10.000+08:00| 2.0|
    6. |2021-07-01T12:00:20.000+08:00| 3.0|
    7. |2021-07-01T12:00:30.000+08:00| 4.0|
    8. |2021-07-01T12:00:40.000+08:00| 5.0|
    9. |2021-07-01T12:00:50.000+08:00| 6.0|
    10. |2021-07-01T12:01:00.000+08:00| 7.0|
    11. |2021-07-01T12:01:10.000+08:00| 8.0|
    12. |2021-07-01T12:01:30.000+08:00| 10.0|
    13. +-----------------------------+--------------------------------+

    Name: ValueFill Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

    Parameters:

    • method: {“mean”, “previous”, “linear”, “likelihood”, “AR”, “MA”, “SCREEN”}, default “linear”. Method to use for imputation in series. “mean”: use global mean value to fill holes; “previous”: propagate last valid observation forward to next valid. “linear”: simplest interpolation method; “likelihood”:Maximum likelihood estimation based on the normal distribution of speed; “AR”: auto regression; “MA”: moving average; “SCREEN”: speed constraint.

    Output Series: Output a single series. The type is the same as the input. This series is the input after repairing.

    Note: AR method use AR(1) model. Input value should be auto-correlated, or the function would output a single point (0, 0.0).

    Fill with linear

    When method is “linear” or the default, Screen method is used to impute.

    Input series:

    SQL for query:

    1. select valuefill(s1) from root.test.d2

    Output series:

    1. +-----------------------------+-----------------------+
    2. | Time|valuefill(root.test.d2)|
    3. +-----------------------------+-----------------------+
    4. |2020-01-01T00:00:02.000+08:00| NaN|
    5. |2020-01-01T00:00:04.000+08:00| 102.0|
    6. |2020-01-01T00:00:06.000+08:00| 104.0|
    7. |2020-01-01T00:00:08.000+08:00| 126.0|
    8. |2020-01-01T00:00:10.000+08:00| 108.0|
    9. |2020-01-01T00:00:14.000+08:00| 108.0|
    10. |2020-01-01T00:00:15.000+08:00| 113.0|
    11. |2020-01-01T00:00:16.000+08:00| 114.0|
    12. |2020-01-01T00:00:18.000+08:00| 116.0|
    13. |2020-01-01T00:00:20.000+08:00| 118.7|
    14. |2020-01-01T00:00:22.000+08:00| 121.3|
    15. |2020-01-01T00:00:26.000+08:00| 124.0|
    16. |2020-01-01T00:00:28.000+08:00| 126.0|
    17. |2020-01-01T00:00:30.000+08:00| 128.0|
    18. +-----------------------------+-----------------------+

    Previous Fill

    When method is “previous”, previous method is used.

    Input series is the same as above, the SQL for query is shown below:

    1. select valuefill(s1,"method"="previous") from root.test.d2

    Output series:

    1. +-----------------------------+-------------------------------------------+
    2. | Time|valuefill(root.test.d2,"method"="previous")|
    3. +-----------------------------+-------------------------------------------+
    4. |2020-01-01T00:00:02.000+08:00| NaN|
    5. |2020-01-01T00:00:03.000+08:00| 101.0|
    6. |2020-01-01T00:00:04.000+08:00| 102.0|
    7. |2020-01-01T00:00:06.000+08:00| 104.0|
    8. |2020-01-01T00:00:08.000+08:00| 126.0|
    9. |2020-01-01T00:00:10.000+08:00| 108.0|
    10. |2020-01-01T00:00:14.000+08:00| 110.5|
    11. |2020-01-01T00:00:15.000+08:00| 113.0|
    12. |2020-01-01T00:00:16.000+08:00| 114.0|
    13. |2020-01-01T00:00:18.000+08:00| 116.0|
    14. |2020-01-01T00:00:20.000+08:00| 116.0|
    15. |2020-01-01T00:00:22.000+08:00| 116.0|
    16. |2020-01-01T00:00:26.000+08:00| 124.0|
    17. |2020-01-01T00:00:28.000+08:00| 126.0|
    18. |2020-01-01T00:00:30.000+08:00| 128.0|
    19. +-----------------------------+-------------------------------------------+

    Name: VALUEREPAIR

    Input Series: Only support a single input series. The type is INT32 / INT64 / FLOAT / DOUBLE.

    Parameters:

    • minSpeed: This parameter is only valid with Screen. It is the speed threshold. Speeds below it will be regarded as outliers. By default, it is the median minus 3 times of median absolute deviation.
    • maxSpeed: This parameter is only valid with Screen. It is the speed threshold. Speeds above it will be regarded as outliers. By default, it is the median plus 3 times of median absolute deviation.
    • center: This parameter is only valid with LsGreedy. It is the center of the Gaussian distribution of speed changes. By default, it is 0.
    • sigma: This parameter is only valid with LsGreedy. It is the standard deviation of the Gaussian distribution of speed changes. By default, it is the median absolute deviation.

    Output Series: Output a single series. The type is the same as the input. This series is the input after repairing.

    Note: NaN will be filled with linear interpolation before repairing.

    Repair with Screen

    When is ‘Screen’ or the default, Screen method is used.

    Input series:

    SQL for query:

    1. select valuerepair(s1) from root.test.d2

    Output series:

    1. +-----------------------------+----------------------------+
    2. | Time|valuerepair(root.test.d2.s1)|
    3. +-----------------------------+----------------------------+
    4. |2020-01-01T00:00:02.000+08:00| 100.0|
    5. |2020-01-01T00:00:03.000+08:00| 101.0|
    6. |2020-01-01T00:00:04.000+08:00| 102.0|
    7. |2020-01-01T00:00:06.000+08:00| 104.0|
    8. |2020-01-01T00:00:08.000+08:00| 106.0|
    9. |2020-01-01T00:00:10.000+08:00| 108.0|
    10. |2020-01-01T00:00:14.000+08:00| 112.0|
    11. |2020-01-01T00:00:15.000+08:00| 113.0|
    12. |2020-01-01T00:00:16.000+08:00| 114.0|
    13. |2020-01-01T00:00:18.000+08:00| 116.0|
    14. |2020-01-01T00:00:20.000+08:00| 118.0|
    15. |2020-01-01T00:00:22.000+08:00| 120.0|
    16. |2020-01-01T00:00:26.000+08:00| 124.0|
    17. |2020-01-01T00:00:28.000+08:00| 126.0|
    18. |2020-01-01T00:00:30.000+08:00| 128.0|
    19. +-----------------------------+----------------------------+

    Repair with LsGreedy

    When method is ‘LsGreedy’, LsGreedy method is used.

    Input series is the same as above, the SQL for query is shown below:

    1. select valuerepair(s1,'method'='LsGreedy') from root.test.d2
    1. +-----------------------------+-------------------------------------------------+
    2. | Time|valuerepair(root.test.d2.s1, "method"="LsGreedy")|
    3. +-----------------------------+-------------------------------------------------+
    4. |2020-01-01T00:00:02.000+08:00| 100.0|
    5. |2020-01-01T00:00:03.000+08:00| 101.0|
    6. |2020-01-01T00:00:04.000+08:00| 102.0|
    7. |2020-01-01T00:00:06.000+08:00| 104.0|
    8. |2020-01-01T00:00:08.000+08:00| 106.0|
    9. |2020-01-01T00:00:10.000+08:00| 108.0|
    10. |2020-01-01T00:00:14.000+08:00| 112.0|
    11. |2020-01-01T00:00:15.000+08:00| 113.0|
    12. |2020-01-01T00:00:16.000+08:00| 114.0|
    13. |2020-01-01T00:00:18.000+08:00| 116.0|
    14. |2020-01-01T00:00:20.000+08:00| 118.0|
    15. |2020-01-01T00:00:22.000+08:00| 120.0|
    16. |2020-01-01T00:00:26.000+08:00| 124.0|
    17. |2020-01-01T00:00:28.000+08:00| 126.0|
    18. |2020-01-01T00:00:30.000+08:00| 128.0|