INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     koy
    -0.17
    ivec
    -0.16
    OUNDS
    -0.15
    eref
    -0.15
    ennai
    -0.15
     gent
    -0.15
    εÏģγ
    -0.14
     má
    -0.14
    udder
    -0.14
     mal
    -0.14
    POSITIVE LOGITS
    west
    0.23
    adr
    0.23
    ole
    0.22
    obi
    0.22
    sz
    0.21
    iele
    0.21
    iedy
    0.20
    ry
    0.20
    iel
    0.19
    ier
    0.19
    Act Density 0.008%

    No Known Activations