INDEX
    Explanations

    periods at the end of sentences

    New Auto-Interp
    Negative Logits
    rms
    -0.16
    d
    -0.15
    b
    -0.14
    oord
    -0.13
    p
    -0.13
    ½æķ°
    -0.13
     Cass
    -0.13
    bib
    -0.13
    [
    -0.13
    s
    -0.13
    POSITIVE LOGITS
    ÄįnÃŃk
    0.17
    uchen
    0.15
    IRCLE
    0.15
    ERO
    0.15
    /sdk
    0.15
    ottes
    0.15
    Ñģклад
    0.14
    ÙĥÙĪÙħ
    0.14
    erdale
    0.14
    IZE
    0.14
    Act Density 0.431%

    No Known Activations