INDEX
    Explanations

    Latin characters and specific combinations of characters, possibly related to a specific language or encoding

    special character tokens or specific linguistic symbols

    New Auto-Interp
    Negative Logits
    hyde
    -0.89
    aged
    -0.74
    assies
    -0.70
    ngth
    -0.68
     humming
    -0.67
    aging
    -0.66
    lain
    -0.64
     mathemat
    -0.64
    ipolar
    -0.64
    avis
    -0.63
    POSITIVE LOGITS
    ×Ļ×
    1.03
    IJ
    1.01
    ת
    1.01
    ãĤī
    0.99
    ׾
    0.98
    ×ķ
    0.94
    κ
    0.92
    ä¸ī
    0.86
    ãĥ¼
    0.79
    ãĥĥ
    0.79
    Act Density 0.017%

    No Known Activations