INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ה
    1.74
     to
    1.38
    to
    1.36
    p
    1.35
    c
    1.34
    то
    1.25
    ع
    1.18
    .
    1.13
    1.12
    x
    1.08
    POSITIVE LOGITS
    ac
    1.05
    ب
    1.05
    1.04
    aston
    1.04
    ر
    0.98
    ின்
    0.97
    ast
    0.96
    ov
    0.96
    形式
    0.95
    ק
    0.94
    Act Density 0.126%

    No Known Activations