缺失值

    missing values propagate automatically when passed to standard mathematical operators and functions. For these functions, uncertainty about the value of one of the operands induces uncertainty about the result. In practice, this means a math operation involving a missing value generally returns missing

    As missing is a normal Julia object, this propagation rule only works for functions which have opted in to implement this behavior. This can be achieved either via a specific method defined for arguments of type Missing, or simply by accepting arguments of this type, and passing them to functions which propagate them (like standard math operators). Packages should consider whether it makes sense to propagate missing values when defining new functions, and define methods appropriately if that is the case. Passing a missing value to a function for which no method accepting arguments of type Missing is defined throws a , just like for any other type.

    Functions that do not propagate missing values can be made to do so by wrapping them in the passmissing function provided by the Missings.jl package. For example, f(x) becomes passmissing(f)(x).

    相等和比较运算符

    标准相等和比较运算符遵循上面给出的传播规则:如果任何操作数是 missing,那么结果是 missing。这是一些例子

    1. julia> missing == 1
    2. missing
    3. julia> missing == missing
    4. missing
    5. julia> missing < 1
    6. missing
    7. julia> 2 >= missing
    8. missing

    特别要注意,missing == missing 返回 missing,所以 == 不能用于测试值是否为缺失值。要测试 x 是否为 missing,请用 。

    特殊的比较运算符 isequal 和 是传播规则的例外:它们总返回一个 Bool 值,即使存在 missing 值,并认为 missingmissing 相等且其与任何其它值不同。因此,它们可用于测试某个值是否为 missing

    1. julia> missing === 1
    2. false
    3. julia> isequal(missing, 1)
    4. false
    5. julia> missing === missing
    6. true
    7. julia> isequal(missing, missing)
    8. true

    isless 运算符是另一个例外:missing 被认为比任何其它值大。此运算符被用于 ,因此 missing 值被放置在所有其它值之后。

    1. julia> isless(1, missing)
    2. true
    3. julia> isless(missing, Inf)
    4. false
    5. julia> isless(missing, missing)
    6. false

    逻辑(或布尔)运算符 |、 和 xor 是另一种特殊情况,因为它们只有在逻辑上是必需的时传递 missing 值。对于这些运算符来说,结果是否不确定取决于具体操作,其遵循的既定规则,这些规则也由 SQL 中的 NULL 以及 R 中的 NA 实现。这个抽象的定义实际上对应于一系列相对自然的行为,这最好通过具体的例子来解释。

    让我们用逻辑「或」运算符 | 来说明这个原理。按照布尔逻辑的规则,如果其中一个操作数是 true,则另一个操作数对结果没影响,结果总是 true

    1. julia> true | true
    2. true
    3. julia> true | false
    4. true
    5. julia> false | true
    6. true
    1. julia> true | missing
    2. true
    3. julia> missing | true

    相反地,如果其中一个操作数是 false,结果可能是 truefalse,这取决于另一个操作数的值。因此,如果一个操作数是 missing,那么结果也是 missing

    1. julia> false | true
    2. true
    3. julia> true | false
    4. true
    5. false
    6. julia> false | missing
    7. missing
    8. julia> missing | false
    9. missing

    逻辑「且」运算符 的行为与 | 运算符相似,区别在于当其中一个操作数为 false 时,值的缺失不会传播。例如,当第一个操作数是 false

    1. julia> false & false
    2. false
    3. julia> false & true
    4. false
    5. julia> false & missing
    6. false

    另一方面,当其中一个操作数为 true 时,值的缺失会传播,例如,当第一个操作数是 true

    最后,逻辑「异或」运算符 xor 总传播 missing 值,因为两个操作数都总是对结果产生影响。还要注意,否定运算符 在操作数是 missing 时返回 missing,这就像其它一元运算符。

    流程控制和短路运算符

    流程控制操作符,包括 if、 和三元运算符 x ? y : z,不允许缺失值。这是因为如果我们能够观察实际值,它是 true 还是 false 是不确定的,这意味着我们不知道程序应该如何运行。一旦在以下上下文中遇到 missing 值,就会抛出

    1. julia> if missing
    2. println("here")
    3. end
    4. ERROR: TypeError: non-boolean (Missing) used in boolean context

    出于同样的原因,并与上面给出的逻辑运算符相反,短路布尔运算符 && 和 在当前操作数的值决定下一个操作数是否求值时不允许 missing 值。例如

    1. julia> missing || false
    2. ERROR: TypeError: non-boolean (Missing) used in boolean context
    3. julia> missing && false
    4. ERROR: TypeError: non-boolean (Missing) used in boolean context
    5. julia> true && missing && false
    6. ERROR: TypeError: non-boolean (Missing) used in boolean context

    另一方面,如果无需 missing 值即可确定结果,则不会引发错误。代码在对 missing 操作数求值前短路,以及 missing 是最后一个操作数都是这种情况。

    1. julia> true && missing
    2. missing
    3. julia> false && missing
    4. false

    包含缺失值的数组的创建就像其它数组

    1. julia> [1, missing]
    2. 2-element Array{Union{Missing, Int64},1}:
    3. 1
    4. missing

    如此示例所示,此类数组的元素类型为 Union{Missing, T},其中 T 为非缺失值的类型。这简单地反映了以下事实:数组条目可以具有类型 T(在这是 Int64)或类型 Missing。此类数组使用高效的内存存储,其等价于一个 Array{T} 组合一个 Array{UInt8},前者保存实际值,后者表示条目类型(即它是 Missing 还是 T)。

    1. julia> Array{Union{Missing, String}}(missing, 2, 3)
    2. 2×3 Array{Union{Missing, String},2}:
    3. missing missing missing
    4. missing missing missing

    允许但不包含 missing 值的数组可使用 convert 转换回不允许缺失值的数组。如果该数组包含 missing 值,在类型转换时会抛出 MethodError

    1. julia> x = Union{Missing, String}["a", "b"]
    2. 2-element Array{Union{Missing, String},1}:
    3. "b"
    4. julia> convert(Array{String}, x)
    5. 2-element Array{String,1}:
    6. "a"
    7. "b"
    8. julia> y = Union{Missing, String}[missing, "b"]
    9. 2-element Array{Union{Missing, String},1}:
    10. missing
    11. "b"
    12. julia> convert(Array{String}, y)

    跳过缺失值

    由于 missing 会随着标准数学运算符传播,归约函数会在调用的数组包含缺失值时返回 missing

    1. julia> sum([1, missing])
    2. missing

    在这种情况下,使用 即可跳过缺失值

    This convenience function returns an iterator which filters out missing values efficiently. It can therefore be used with any function which supports iterators

    1. julia> x = skipmissing([3, missing, 2, 1])
    2. skipmissing(Union{Missing, Int64}[3, missing, 2, 1])
    3. julia> maximum(x)
    4. 3
    5. julia> mean(x)
    6. 2.0
    7. julia> mapreduce(sqrt, +, x)
    8. 4.146264369941973

    Objects created by calling skipmissing on an array can be indexed using indices from the parent array. Indices corresponding to missing values are not valid for these objects and an error is thrown when trying to use them (they are also skipped by keys and eachindex)

    1. julia> x[1]
    2. 3
    3. julia> x[2]
    4. ERROR: MissingException: the value at index (2,) is missing
    5. [...]

    This allows functions which operate on indices to work in combination with skipmissing. This is notably the case for search and find functions, which return indices valid for the object returned by skipmissing which are also the indices of the matching entries in the parent array

    1. julia> findall(==(1), x)
    2. 1-element Array{Int64,1}:
    3. 4
    4. julia> findfirst(!iszero, x)
    5. 1
    6. julia> argmax(x)
    7. 1

    Use collect to extract non-missing values and store them in an array

    1. julia> collect(x)
    2. 3-element Array{Int64,1}:
    3. 3
    4. 2
    5. 1

    上面描述的逻辑运算符的三值逻辑也适用于针对数组的函数。因此,使用 运算符的数组相等性测试中,若在未知 missing 条目实际值时无法确定结果,就返回 missing。在实际应用中意味着,在待比较数组中所有非缺失值都相等,且某个或全部数组包含缺失值(也许在不同位置)时会返回 missing

    1. julia> [1, missing] == [2, missing]
    2. false
    3. julia> [1, missing] == [1, missing]
    4. missing
    5. julia> [1, 2, missing] == [1, missing, 2]
    6. missing

    对于单个值,isequal 会将 missing 值视为与其它 missing 值相等但与非缺失值不同。

    1. julia> isequal([1, missing], [1, missing])
    2. true
    3. julia> isequal([1, 2, missing], [1, missing, 2])
    4. false
    1. julia> all([true, missing])
    2. missing
    3. julia> all([false, missing])
    4. false
    5. julia> any([true, missing])
    6. true
    7. julia> any([false, missing])