INDEX
    Explanations

    punctuation marks, specifically periods and exclamation points

    New Auto-Interp
    Negative Logits
    OKIE
    -0.19
     fashion
    -0.17
     Nor
    -0.17
    stdout
    -0.15
    esture
    -0.15
    -fashion
    -0.15
    okie
    -0.15
    .refs
    -0.14
    ouro
    -0.14
     Fashion
    -0.14
    POSITIVE LOGITS
    unkt
    0.16
    istrovstvÃŃ
    0.15
    uze
    0.14
    anko
    0.14
    afen
    0.14
    hire
    0.13
     grit
    0.13
     again
    0.13
     Bender
    0.13
     vyh
    0.13
    Act Density 0.001%

    No Known Activations