INDEX
Explanations
unusual unicode characters
instances of the vowel 'i'
New Auto-Interp
Negative Logits
xus
-0.91
compr
-0.87
subsequ
-0.84
confir
-0.83
srf
-0.83
traged
-0.82
misunder
-0.82
theless
-0.82
condem
-0.81
undercover
-0.81
POSITIVE LOGITS
cup
0.99
æµ
0.94
į
0.94
ı
0.93
ÙĪ
0.93
ł
0.92
ä¼
0.92
âĶģ
0.88
Ùħ
0.85
IJ
0.85
Activations Density 0.002%