INDEX
Explanations
punctuation marks and conjunctions in the text
New Auto-Interp
Negative Logits
man
-0.17
vel
-0.16
ucha
-0.16
ules
-0.15
(*((
-0.15
t
-0.14
ul
-0.14
obus
-0.14
men
-0.14
ungeons
-0.14
POSITIVE LOGITS
xEC
0.15
isman
0.15
Flags
0.15
ITO
0.15
okit
0.14
aure
0.14
quals
0.14
Voll
0.13
ENN
0.13
.strict
0.13
Activations Density 0.004%