INDEX
    Explanations

    non-English characters appearing in English text

    characters or symbols that represent specific linguistic or cultural elements, particularly in non-Latin scripts

    New Auto-Interp
    Negative Logits
     manif
    -0.89
     disadvant
    -0.83
     misunder
    -0.80
     horizont
    -0.80
     federation
    -0.77
     stake
    -0.76
     constitu
    -0.75
     womb
    -0.74
     proble
    -0.74
     agre
    -0.74
    POSITIVE LOGITS
    à¨
    1.00
     ILCS
    0.99
    ı
    0.93
    ħ
    0.92
    ®
    0.91
    ãĥ¥
    0.90
    à¥
    0.89
    æľ
    0.88
    ¤
    0.88
    STAR
    0.88
    Act Density 0.022%

    No Known Activations