INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ä
    1.70
     on
    1.50
    ari
    1.35
    t
    1.35
     at
    1.20
    anda
    1.18
    é
    1.15
     or
    1.13
    ox
    1.10
    aring
    1.09
    POSITIVE LOGITS
    ו
    1.48
    1.45
    ви
    1.41
    هم
    1.38
    1.31
    ا
    1.30
    1.26
    其他
    1.18
    1.16
    ايا
    1.15
    Act Density 0.000%

    No Known Activations