INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    poor
    1.91
    HAN
    1.91
    KAN
    1.90
    Yup
    1.90
    CLES
    1.89
    Keg
    1.82
    proof
    1.78
    LEM
    1.70
     זאת
    1.70
    1.70
    POSITIVE LOGITS
    на
    4.03
    ל
    3.42
    ان
    3.27
    ج
    3.08
    2.91
    ك
    2.80
    نا
    2.72
    2.70
    ف
    2.53
    ת
    2.44
    Act Density 0.143%

    No Known Activations