INDEX
Explanations
phrases indicating redundancy or excess
New Auto-Interp
Negative Logits
ferred
-0.15
огÑĢам
-0.15
rong
-0.14
panion
-0.14
strcasecmp
-0.14
eger
-0.14
inals
-0.13
udies
-0.13
erne
-0.13
æk
-0.13
POSITIVE LOGITS
say
0.20
say
0.19
isay
0.18
days
0.18
stays
0.17
distractions
0.17
lessly
0.16
mention
0.16
sagen
0.16
.say
0.16
Activations Density 0.009%