INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Work
    -0.07
    -points
    -0.06
    .bl
    -0.06
     etki
    -0.06
     work
    -0.06
    encrypt
    -0.06
     груз
    -0.06
    ł
    -0.06
     panoramic
    -0.06
     Diary
    -0.06
    POSITIVE LOGITS
    ваем
    0.07
    остей
    0.07
     rost
    0.06
     subt
    0.06
    صر
    0.06
    steder
    0.06
     hm
    0.06
     може
    0.06
     dov
    0.06
     gorge
    0.06
    Act Density 0.012%

    No Known Activations