INDEX
    Explanations

    negations and words indicating the concept of "not."

    New Auto-Interp
    Negative Logits
    strup
    -0.15
    isko
    -0.15
    OfDay
    -0.14
    age
    -0.14
    hong
    -0.14
    861
    -0.14
    863
    -0.13
    dae
    -0.13
    essen
    -0.13
    hei
    -0.13
    POSITIVE LOGITS
    just
    0.15
    ensch
    0.15
    etur
    0.15
    ost
    0.15
    achi
    0.15
    byt
    0.15
    agna
    0.15
    ouve
    0.14
    wich
    0.14
     ones
    0.14
    Act Density 0.069%

    No Known Activations