INDEX
Explanations
punctuation marks, particularly periods
New Auto-Interp
Negative Logits
honor
-0.17
gain
-0.15
ntl
-0.15
elle
-0.15
elles
-0.14
honour
-0.14
ej
-0.14
fy
-0.14
esModule
-0.14
.eql
-0.13
POSITIVE LOGITS
anford
0.15
andum
0.15
olini
0.15
ãĥ§
0.14
quets
0.14
otti
0.14
asi
0.14
aris
0.14
penet
0.14
_HAVE
0.14
Activations Density 0.038%