INDEX
Explanations
imperatives or instructions
directives or suggestions for action
New Auto-Interp
Negative Logits
edge
-0.76
emale
-0.70
"]=>
-0.63
uton
-0.63
essions
-0.60
hell
-0.60
ieval
-0.60
ungle
-0.59
apo
-0.59
ranch
-0.59
POSITIVE LOGITS
yourselves
1.28
yourself
1.23
Yourself
0.89
your
0.84
wisely
0.84
ye
0.81
sparing
0.81
cknow
0.78
thy
0.74
carefully
0.73
Activations Density 0.249%