INDEX
Explanations
present tense verbs or gerunds
New Auto-Interp
Negative Logits
iable
-0.19
501
-0.17
izen
-0.15
WARE
-0.15
Able
-0.15
tie
-0.14
503
-0.14
coma
-0.14
noop
-0.14
Tie
-0.14
POSITIVE LOGITS
Ã¥r
0.28
ick
0.19
enuity
0.18
rep
0.18
icks
0.17
epar
0.16
redi
0.16
ens
0.15
unker
0.15
enting
0.15
Activations Density 0.004%