INDEX
Explanations
verbs related to giving commands or instructions
verbs related to taking action or implementing changes
New Auto-Interp
Negative Logits
rikes
-0.62
reluct
-0.61
mith
-0.60
ohn
-0.60
lit
-0.60
133
-0.58
exert
-0.58
arag
-0.57
idal
-0.57
hell
-0.57
POSITIVE LOGITS
ables
1.25
ings
1.24
Yourself
1.17
able
1.16
ments
1.14
yourselves
1.10
yourself
1.03
INGS
1.01
ability
1.01
acion
0.94
Activations Density 0.383%