INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ا
    2.77
    וה
    2.42
    อ่อน
    2.28
    ү
    2.27
    ла
    2.20
    2.19
    де
    2.09
    ه
    2.09
    ни
    2.05
    و
    2.00
    POSITIVE LOGITS
    s
    2.47
    tedir
    2.45
    self
    1.94
    니다
    1.94
    raum
    1.91
    th
    1.86
    day
    1.83
    am
    1.78
    quy
    1.77
    im
    1.76
    Act Density 0.003%

    No Known Activations