INDEX
    Explanations

    rephrasing and recommending

    New Auto-Interp
    Negative Logits
    يد
    1.31
    "
    1.27
    н
    1.20
    1.15
    ти
    1.14
    َا
    1.05
    1.03
    ى
    1.02
    1.02
    ě
    1.02
    POSITIVE LOGITS
     
    1.26
    ac
    1.13
    (
    1.06
    asis
    1.05
     (
    1.02
    นำ
    1.02
    วา
    1.02
    รับ
    1.00
    0.98
    بد
    0.95
    Act Density 0.033%

    No Known Activations