INDEX
    Explanations

    names starting with bor, mor, dor, kor, tor

    New Auto-Interp
    Negative Logits
    in
    1.63
     in
    1.44
    y
    1.30
    1.23
    1.20
    1.16
    وين
    1.14
    1.14
    IT
    1.10
    ח
    1.09
    POSITIVE LOGITS
    <0x80>
    1.21
     powied
    1.20
    Y
    1.10
    1.06
     añad
    1.00
    0.98
    تی
    0.98
     கூற
    0.97
    ästä
    0.96
    ۹
    0.95
    Act Density 0.187%

    No Known Activations