INDEX
    Explanations

    negations and phrases expressing absence or exceptions

    New Auto-Interp
    Negative Logits
    gan
    -0.17
    oog
    -0.17
    ook
    -0.15
    xaf
    -0.15
    utow
    -0.15
    agnostics
    -0.14
    ersions
    -0.14
    ghi
    -0.14
    roud
    -0.14
    inson
    -0.14
    POSITIVE LOGITS
     Zot
    0.15
     Mention
    0.15
     either
    0.15
    vest
    0.15
    _SR
    0.14
    ys
    0.14
    ÑĨин
    0.14
     altogether
    0.14
     stint
    0.14
    chwitz
    0.14
    Act Density 0.347%

    No Known Activations