INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     araba
    -0.08
     Ninja
    -0.08
     frog
    -0.08
     awake
    -0.08
    _signature
    -0.08
     glove
    -0.07
    orest
    -0.07
     Ene
    -0.07
     Ár
    -0.07
    Signature
    -0.07
    POSITIVE LOGITS
     alike
    0.09
     (*.
    0.08
     સહિત
    0.08
     Uniform
    0.08
     ধর
    0.08
    える
    0.08
    一起
    0.08
    indruck
    0.08
     सहित
    0.08
    -major
    0.08
    Act Density 0.053%

    No Known Activations