INDEX
Explanations
punctuation marks used within the text
New Auto-Interp
Negative Logits
mey
-0.15
agem
-0.15
aña
-0.14
ench
-0.14
veau
-0.14
foy
-0.14
grep
-0.14
nde
-0.14
إذ
-0.14
FFE
-0.13
POSITIVE LOGITS
000
0.42
500
0.29
600
0.27
700
0.24
300
0.24
800
0.24
400
0.23
Û°Û°Û°
0.23
ooo
0.22
900
0.20
Activations Density 0.071%