INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     a
    0.98
    c
    0.92
     that
    0.85
    لي
    0.84
     atteint
    0.82
     לר
    0.80
     reale
    0.79
     whack
    0.79
    ผ่านมา
    0.79
     and
    0.79
    POSITIVE LOGITS
    р
    1.41
    ر
    1.18
    1.12
    к
    1.10
    ר
    1.10
    er
    1.01
    л
    1.00
    0.98
    ل
    0.96
    ي
    0.93
    Act Density 0.001%

    No Known Activations