INDEX
Explanations
symbols or characters that represent non-traditional or special characters
New Auto-Interp
Negative Logits
Levin
-0.15
cult
-0.14
Âģ
-0.14
â̝
-0.14
ãĤ¿ãĥ¼
-0.14
imagined
-0.14
.btnClose
-0.14
à¹IJ
-0.14
obs
-0.13
fort
-0.13
POSITIVE LOGITS
µ
0.18
±
0.17
mu
0.16
ä
0.15
±
0.15
é
0.15
_,,
0.15
»
0.14
_mu
0.14
iec
0.14
Activations Density 0.005%