INDEX
    Explanations

    negations and contractions related to uncertainty or denial

    New Auto-Interp
    Negative Logits
     not
    -0.30
     NOT
    -0.19
     nicht
    -0.18
    Ïģθ
    -0.18
     no
    -0.17
     never
    -0.17
     не
    -0.17
     không
    -0.16
    hen
    -0.16
     Not
    -0.16
    POSITIVE LOGITS
     necessarily
    0.37
     anymore
    0.26
     yet
    0.24
    ched
    0.22
    ecessarily
    0.21
    ori
    0.21
    ches
    0.20
     quite
    0.19
    yet
    0.19
    epad
    0.19
    Act Density 0.196%

    No Known Activations