INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     on
    1.52
    s
    1.37
     and
    1.23
     that
    1.22
     was
    1.17
     ovat
    0.99
     is
    0.98
    ب
    0.96
     at
    0.94
    𝘱
    0.94
    POSITIVE LOGITS
    ت
    1.50
    м
    1.04
    Сер
    1.02
    <
    1.02
    >
    0.95
    Ми
    0.93
    Сма
    0.93
    تس
    0.91
    [
    0.91
    <td>
    0.91
    Act Density 0.002%

    No Known Activations