INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    IE
    1.11
    ali
    1.10
    (
    1.10
    =>
    1.09
    1.09
     can
    1.08
    1.08
    ЗИ
    1.05
    )
    1.05
    ]
    1.05
    POSITIVE LOGITS
    ش
    1.89
    ى
    1.49
    1.37
    !\
    1.30
    1.30
    د
    1.29
    شون
    1.23
    ш
    1.22
    يا
    1.14
    ن
    1.13
    Act Density 0.007%

    No Known Activations