INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    q
    1.17
    f
    1.13
    v
    0.91
    0.87
    se
    0.86
    ک
    0.84
    ale
    0.80
    oxicity
    0.80
    k
    0.80
    تر
    0.80
    POSITIVE LOGITS
    ي
    1.01
     
    0.91
    ת
    0.83
    יות
    0.73
     провер
    0.72
    0.72
    يئة
    0.68
     stalwart
    0.68
     불구하고
    0.68
    ין
    0.68
    Act Density 0.112%

    No Known Activations