INDEX
Explanations
non-English characters appearing in English text
characters or symbols that represent specific linguistic or cultural elements, particularly in non-Latin scripts
New Auto-Interp
Negative Logits
manif
-0.89
disadvant
-0.83
misunder
-0.80
horizont
-0.80
federation
-0.77
stake
-0.76
constitu
-0.75
womb
-0.74
proble
-0.74
agre
-0.74
POSITIVE LOGITS
à¨
1.00
ILCS
0.99
ı
0.93
ħ
0.92
®
0.91
ãĥ¥
0.90
à¥
0.89
æľ
0.88
¤
0.88
STAR
0.88
Activations Density 0.022%