INDEX
Explanations
high-frequency adverbs and conjunctions that indicate expectation or negation
New Auto-Interp
Negative Logits
unge
-0.18
UGE
-0.16
ote
-0.16
haze
-0.16
itters
-0.15
ungan
-0.15
/pub
-0.14
Unary
-0.14
isms
-0.14
ibbon
-0.14
POSITIVE LOGITS
icter
0.16
anager
0.15
prung
0.15
elik
0.15
Base
0.14
پاÛĮÙĩ
0.14
abase
0.14
Base
0.14
айд
0.14
Skipping
0.14
Activations Density 0.001%