INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     be
    1.34
    س
    1.23
     a
    1.22
    á
    1.14
    1.11
    ب
    1.05
    is
    1.01
    ز
    1.01
     ont
    1.00
    s
    1.00
    POSITIVE LOGITS
    W
    1.70
    w
    1.40
    I
    1.16
    ре
    1.14
    IES
    1.07
    IO
    1.05
    U
    1.02
    و
    0.98
    quele
    0.95
    あなたが
    0.95
    Act Density 0.027%

    No Known Activations