INDEX
    Explanations

    contractions indicating negation

    New Auto-Interp
    Negative Logits
    ’s
    -0.18
    ’n
    -0.16
     not
    -0.16
    äºĭæĥħ
    -0.15
    hen
    -0.15
    es
    -0.15
     (“
    -0.15
     â
    -0.15
    ye
    -0.15
    �s
    -0.15
    POSITIVE LOGITS
     necessarily
    0.34
    '
    0.24
     anymore
    0.22
    ches
    0.22
    ori
    0.20
     even
    0.20
    ecessarily
    0.19
    /'
    0.19
    ÂĿ
    0.18
     quite
    0.17
    Act Density 0.195%

    No Known Activations