INDEX
    Explanations

    still followed by description

    New Auto-Interp
    Negative Logits
    :
    1.26
    O
    1.18
    i
    1.16
     by
    1.13
    ה
    1.10
    াই
    1.09
    ;
    1.06
    ুল
    1.05
     effic
    1.05
    a
    1.04
    POSITIVE LOGITS
    м
    1.34
    h
    1.30
    с
    1.22
    к
    1.18
    ت
    1.17
    л
    1.13
    م
    1.05
    س
    1.02
    ري
    1.01
    ли
    0.96
    Act Density 0.258%

    No Known Activations