INDEX
    Explanations

    words followed by punctuation

    New Auto-Interp
    Negative Logits
    ֨
    0.98
    <unused2197>
    0.90
    ?),
    0.89
    ?')
    0.86
    ֜
    0.85
     بنائیں
    0.83
    ?")
    0.83
    \")
    0.83
    ',)
    0.83
    ֩
    0.80
    POSITIVE LOGITS
    .
    3.65
    2.99
    ®.
    2.62
    2.52
    ™.
    2.50
    ().
    2.47
    ​.
    2.43
    。.
    2.40
    .\\
    2.34
    .**
    2.33
    Act Density 0.125%

    No Known Activations