INDEX
Explanations
long or complex words with specific endings or suffixes
New Auto-Interp
Negative Logits
inth
-0.17
-DD
-0.16
ocom
-0.16
emi
-0.16
=================================================================================
-0.15
inson
-0.15
еÑı
-0.15
/DD
-0.15
untime
-0.14
ecome
-0.14
POSITIVE LOGITS
angan
0.18
usu
0.15
adr
0.14
elden
0.14
eff
0.14
nos
0.13
Maher
0.13
lue
0.13
woods
0.13
dump
0.13
Activations Density 0.010%