INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     jubil
    -0.08
     Clement
    -0.07
     соврем
    -0.07
     Prag
    -0.07
    Greater
    -0.07
     parted
    -0.07
     divided
    -0.07
    861
    -0.07
    п
    -0.07
     cooler
    -0.07
    POSITIVE LOGITS
     قلي
    0.09
     slightest
    0.08
     nourriture
    0.08
     fácilmente
    0.08
     θ
    0.08
     tweaks
    0.08
     tweaking
    0.08
    กรรม
    0.08
     wording
    0.08
    改变
    0.08
    Act Density 0.027%

    No Known Activations