INDEX
    Explanations

    negations or phrases indicating the absence of something

    New Auto-Interp
    Negative Logits
     Steck
    -0.64
     Laird
    -0.62
    + 
    -0.59
     cime
    -0.59
     Chau
    -0.57
    miu
    -0.57
     Jinping
    -0.56
     
    -0.56
     Luiz
    -0.56
    ervo
    -0.56
    POSITIVE LOGITS
    NOT
    1.37
     NOT
    1.23
    Not
    1.22
     Not
    1.20
    not
    1.15
    isNot
    0.92
    ENOT
    0.83
    assertNot
    0.81
    IsNot
    0.79
     not
    0.78
    Act Density 0.122%

    No Known Activations