INDEX
    Explanations

    negations or expressions of denial

    New Auto-Interp
    Negative Logits
    Ïģθ
    -0.16
    hen
    -0.16
    uit
    -0.15
    inski
    -0.15
    es
    -0.14
    hop
    -0.14
    hone
    -0.14
    ois
    -0.13
    oader
    -0.13
    ein
    -0.13
    POSITIVE LOGITS
     necessarily
    0.25
    ches
    0.22
    ori
    0.21
     quite
    0.20
    amp
    0.18
     anymore
    0.18
    quot
    0.18
    ched
    0.17
    CHED
    0.17
    rica
    0.16
    Act Density 0.186%

    No Known Activations