INDEX
    Explanations

    non-latin script fragments

    New Auto-Interp
    Negative Logits
    1.38
    ،
    1.37
    á
    1.25
    1.20
    1.11
    1.10
    。</
    1.05
    كَ
    1.03
    1.02
     an
    1.01
    POSITIVE LOGITS
    ות
    1.26
    .
    1.26
    ل
    1.17
    1.13
    1.00
    0.98
    '
    0.96
    ul
    0.94
     मानें
    0.93
     guise
    0.91
    Act Density 0.023%

    No Known Activations