INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ي
    1.26
    i
    1.01
    is
    0.96
     is
    0.88
     l
    0.87
     ajuda
    0.86
    ك
    0.85
    }$
    0.84
    يت
    0.84
    ;
    0.84
    POSITIVE LOGITS
    ä
    0.78
    وا
    0.75
    ор
    0.71
    ním
    0.71
    ästä
    0.66
    ста
    0.63
    مل
    0.63
    мам
    0.63
     교통
    0.63
    вати
    0.62
    Act Density 3.403%

    No Known Activations