INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ilir
    -0.16
    icket
    -0.16
     thang
    -0.16
    ilton
    -0.15
    ÃIJ
    -0.15
     Palestin
    -0.15
    spender
    -0.15
    orial
    -0.14
    ellen
    -0.14
    imentary
    -0.14
    POSITIVE LOGITS
    acre
    0.16
    iesel
    0.15
    ÅĻeba
    0.15
    anker
    0.14
     Bishop
    0.14
    cream
    0.14
    ÏĨα
    0.14
    еÑĢеж
    0.14
    ens
    0.14
    peg
    0.14
    Act Density 0.003%

    No Known Activations