INDEX
    Explanations

    proper nouns and names, particularly those related to places and characters

    New Auto-Interp
    Negative Logits
    ppy
    -0.15
    ty
    -0.15
     Walton
    -0.14
    586
    -0.14
    ini
    -0.14
     Pax
    -0.14
    etch
    -0.14
    .gg
    -0.14
    asy
    -0.14
    ead
    -0.13
    POSITIVE LOGITS
    ROID
    0.16
    orthand
    0.15
    VICE
    0.15
    roid
    0.15
    è¿·
    0.15
     Voll
    0.15
    воÑİ
    0.15
     pinch
    0.14
    undos
    0.14
    ạnh
    0.14
    Act Density 0.012%

    No Known Activations