INDEX
    Explanations

    punctuation marks, specifically periods

    New Auto-Interp
    Negative Logits
    ataka
    -0.18
    rible
    -0.17
    адж
    -0.16
    ediator
    -0.16
    ooter
    -0.15
    rax
    -0.15
    ambah
    -0.15
    angs
    -0.15
    ngr
    -0.15
    izable
    -0.14
    POSITIVE LOGITS
     Lt
    0.15
     Liberation
    0.15
     Koh
    0.14
    ãĤĦãģĻ
    0.14
    [layer
    0.14
     Miles
    0.14
     Dez
    0.14
    _modes
    0.14
    ck
    0.14
     Fle
    0.14
    Act Density 0.010%

    No Known Activations