INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Tutor
    -0.08
    éph
    -0.07
     scholarship
    -0.07
     әсер
    -0.07
     competencia
    -0.07
    дагы
    -0.07
     rzecz
    -0.07
    owan
    -0.07
    дағы
    -0.07
     mém
    -0.07
    POSITIVE LOGITS
     outskirts
    0.09
    yard
    0.08
     Butterfly
    0.08
     uphe
    0.08
     Camb
    0.07
     cabbage
    0.07
    Butter
    0.07
     CCTV
    0.07
     Ries
    0.07
     sod
    0.07
    Act Density 0.004%

    No Known Activations