INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ه
    1.56
    ال
    1.38
    ה
    1.26
    الہ
    1.04
    יש
    1.02
    a
    1.02
    هَا
    1.02
    이니까
    0.99
     தன்மை
    0.98
     impulso
    0.96
    POSITIVE LOGITS
    m
    1.51
    ang
    1.45
    ні
    1.41
    م
    1.35
    не
    1.28
    á
    1.15
    de
    1.14
    де
    1.13
    ir
    1.13
    aj
    1.05
    Act Density 0.002%

    No Known Activations