INDEX
    Explanations

    negations or forms of the word "not."

    New Auto-Interp
    Negative Logits
     no
    -0.15
    es
    -0.14
    hop
    -0.14
    ej
    -0.14
    ein
    -0.14
     not
    -0.13
    erate
    -0.13
    hen
    -0.13
     niet
    -0.13
    Ñĥки
    -0.13
    POSITIVE LOGITS
     necessarily
    0.24
    ori
    0.20
     anymore
    0.19
    ches
    0.19
    oriously
    0.17
    ched
    0.17
     quite
    0.17
     yet
    0.16
    tingham
    0.16
    rica
    0.16
    Act Density 0.183%

    No Known Activations