INDEX
    Explanations

    multiple languages

    New Auto-Interp
    Negative Logits
     Gem
    -0.08
     прият
    -0.08
    جير
    -0.07
     inscritos
    -0.07
     cuyos
    -0.07
     introd
    -0.07
     Cous
    -0.07
     Martial
    -0.07
    рош
    -0.07
    -अ
    -0.07
    POSITIVE LOGITS
     I'd
    0.11
     i'd
    0.10
    Personally
    0.10
     myself
    0.10
     personally
    0.10
     Personally
    0.09
     הייתי
    0.09
     હું
    0.09
     предпоч
    0.09
     עצמי
    0.09
    Act Density 0.087%

    No Known Activations