INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    $
    2.32
    2.20
    ی
    2.17
     plz
    2.09
    তর
    2.00
     newline
    1.98
     znacz
    1.98
    \[
    1.96
    <li>
    1.87
    }$
    1.86
    POSITIVE LOGITS
    et
    2.61
    2.33
    ע
    2.26
    ^{*}
    2.26
    2.25
    র্জাতিক
    2.10
    ου
    2.05
    2.05
    וד
    2.02
    אות
    1.97
    Act Density 0.589%

    No Known Activations