INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ى
    2.20
    ك
    1.93
    ing
    1.90
    ة
    1.90
    typen
    1.84
     bzw
    1.79
    Ι
    1.79
    č
    1.73
    čku
    1.70
    خراج
    1.70
    POSITIVE LOGITS
    ان
    2.17
    ۰
    1.84
    на
    1.81
    ל
    1.80
     대로
    1.63
    ست
    1.51
    1.49
     керу
    1.49
    1.48
    ал
    1.45
    Act Density 0.000%

    No Known Activations