INDEX
    Explanations

    characters from different languages and scripts, as well as special characters

    characters or symbols used in encoding or text representation

    New Auto-Interp
    Negative Logits
    hyde
    -0.87
     mathemat
    -0.74
    owler
    -0.73
    ufact
    -0.71
    espie
    -0.70
     ILCS
    -0.69
    esville
    -0.69
    abase
    -0.68
    ngth
    -0.68
    assic
    -0.68
    POSITIVE LOGITS
    ×Ļ×
    1.17
    IJ
    1.16
    ת
    1.14
    ׾
    1.08
    ×ķ
    1.07
    κ
    1.00
    ä¸ī
    0.98
    ãĤī
    0.98
    ×
    0.92
    ר
    0.91
    Act Density 0.008%

    No Known Activations