INDEX
    Explanations

    punctuation and question constructs in the text

    New Auto-Interp
    Negative Logits
    ÙĴÙĩ
    -0.15
    kaar
    -0.15
    ãĥ¼ãĥĨ
    -0.15
    avier
    -0.15
    iron
    -0.14
    ikip
    -0.14
    ostel
    -0.14
    oll
    -0.14
    _SLAVE
    -0.14
    iverse
    -0.13
    POSITIVE LOGITS
    rita
    0.18
    æ´¥
    0.15
    556
    0.15
    ervo
    0.15
    æł¹
    0.15
    ahlen
    0.15
    ritt
    0.14
    ãĥ¼ãĥĸ
    0.14
     somewhere
    0.14
     nothing
    0.14
    Act Density 0.010%

    No Known Activations