INDEX
    Explanations

    proper nouns, particularly names

    New Auto-Interp
    Negative Logits
    ingt
    -0.15
    aky
    -0.15
    igo
    -0.15
    okus
    -0.14
    urgeon
    -0.14
    ixo
    -0.14
    ills
    -0.14
    mand
    -0.14
    cob
    -0.14
    bang
    -0.14
    POSITIVE LOGITS
     ÄijÃłn
    0.17
    428
    0.16
    ubes
    0.14
    UBE
    0.14
    лоÑĩ
    0.14
    ılıç
    0.14
    ÙĪÙĨت
    0.14
    ippet
    0.14
    linger
    0.14
    /sdk
    0.13
    Act Density 0.045%

    No Known Activations