INDEX
    Explanations

    phrases related to contradictions or negations in context

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.41
    3:0.06
    4:0.06
    5:0.03
    6:0.13
    7:0.02
    8:0.04
    9:0.03
    10:0.05
    11:0.05
    Negative Logits
     Gael
    -1.99
    iHUD
    -1.73
    itus
    -1.57
     Kills
    -1.53
    ogl
    -1.51
     qualification
    -1.50
     Rain
    -1.49
    -1.49
     proportions
    -1.43
    Scope
    -1.43
    POSITIVE LOGITS
     themselves
    2.32
    selves
    2.21
     selves
    1.98
    ently
    1.75
     THEIR
    1.74
    undai
    1.70
    okers
    1.70
    itimate
    1.66
    tten
    1.63
     abundantly
    1.62
    Act Density 0.055%

    No Known Activations