INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Osborne
    -0.07
     Amazing
    -0.07
    loi
    -0.06
     stroj
    -0.06
    ْه
    -0.06
    achten
    -0.06
    annah
    -0.06
     roleName
    -0.06
    -0.06
     Sonra
    -0.06
    POSITIVE LOGITS
     "::
    0.07
     nalez
    0.07
     greet
    0.07
    EFF
    0.06
     tzv
    0.06
    dep
    0.06
     pick
    0.06
     fov
    0.06
    งค
    0.06
     muh
    0.06
    Act Density 0.001%

    No Known Activations