LLVM Branch Weight Metadata

    Branch weights might be fetch from the profiling file, or generated based on__builtin_expect instruction.

    All weights are represented as an unsigned 32-bit values, where higher valueindicates greater chance to be taken.

    Metadata is only assigned to the conditional branches. There are two extraoperands for the true and the false branch.

    SwitchInst

    Branch weights are assigned to every case (including the case whichis always case #0).

    1. !0 = metadata !{
    2. metadata !"branch_weights",
    3. i32 <DEFAULT_BRANCH_WEIGHT>
    4. [ , i32 <CASE_BRANCH_WEIGHT> ... ]
    5. }

    CallInst

    Calls may have branch weight metadata, containing the execution count ofthe call. It is currently used in SamplePGO mode only, to augment theblock and entry counts which may not be accurate with sampling.

    1. metadata !"branch_weights",
    2. i32 <CALL_BRANCH_WEIGHT>
    3. }

    Other terminator instructions are not allowed to contain Branch Weight Metadata.

    __builtin_expect(long exp, long c) instruction provides branch predictioninformation. The return value is the value of exp.

    It is especially useful in conditional statements. Currently Clang supports twoconditional statements:

    if statement

    The exp parameter is the value. The c parameter is the expectedvalue. If the expected value doesn’t show on the cases list, the defaultcase is assumed to be likely taken.

    1. switch (__builtin_expect(x, 5)) {
    2. default: break;
    3. case 0: // ...
    4. case 3: // ...
    5. case 5: // This case is likely to be taken.

    Branch Weight Metatada is not proof against CFG changes. If terminator operands’are changed some action should be taken. In other case some misoptimizations mayoccur due to incorrect branch prediction information.

    To allow comparing different functions during inter-procedural analysis andoptimization, nodes can also be assigned to a function definition.The first operand is a string indicating the name of the associated counter.

    Currently, one counter is supported: “function_entry_count”. The second operandis a 64-bit counter that indicates the number of times that this function wasinvoked (in the case of instrumentation-based profiles). In the case ofsampling-based profiles, this operand is an approximation of how many timesthe function was invoked.

    If “function_entry_count” has more than 2 operands, the later operands arethe GUID of the functions that needs to be imported by ThinLTO. This is onlyset by sampling based profile. It is needed because the sampling based profilewas collected on a binary that had already imported and inlined these functions,and we need to ensure the IR matches in the ThinLTO backends for profileannotation. The reason why we cannot annotate this on the callsite is that itcan only goes down 1 level in the call chain. For the cases wherefoo_in_a_cc()->bar_in_b_cc()->baz_in_c_cc(), we will need to go down 2 levelsin the call chain to import both bar_in_b_cc and baz_in_c_cc.