INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     standpoint
    0.81
    ا
    0.80
    Hemos
    0.67
     }^{*}$
    0.67
     hearth
    0.66
     }^{+}$
    0.65
     laundry
    0.63
    0.63
     amiss
    0.63
    ה
    0.62
    POSITIVE LOGITS
    0.82
    𝚍
    0.81
    েই
    0.71
    খানে
    0.70
    ب
    0.69
    0.69
    0.68
    การ
    0.68
    dır
    0.67
    зю
    0.66
    Act Density 0.034%

    No Known Activations