INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    1.37
    ما
    1.26
    se
    1.24
    د
    1.19
    1.06
    1.05
    ש
    1.04
    ш
    1.02
    ור
    1.02
    ب
    0.97
    POSITIVE LOGITS
    in
    1.05
    0.84
    ag
    0.82
    ino
    0.81
     física
    0.80
    il
    0.79
    0.77
    ごはん
    0.77
     içeri
    0.75
     څرنګوالی
    0.75
    Act Density 0.006%

    No Known Activations