INDEX
    Explanations

    words related to language or location, with a focus on specific languages or countries

    occurrences of certain non-English or special characters in the text

    New Auto-Interp
    Negative Logits
    hyde
    -0.86
    aged
    -0.71
    agall
    -0.71
    ipolar
    -0.68
    ammy
    -0.67
    ucket
    -0.66
    abase
    -0.65
    ahar
    -0.65
    aging
    -0.65
    ngth
    -0.61
    POSITIVE LOGITS
    ãĤī
    1.00
    ת
    0.96
    ×Ļ×
    0.95
    IJ
    0.93
    κ
    0.92
    ׾
    0.90
    ä
    0.85
    ×ķ
    0.85
    ä¸ī
    0.85
    æĢ
    0.80
    Act Density 0.033%

    No Known Activations