INDEX
Explanations
action-oriented instructions or suggestions
suggestions or calls to action
New Auto-Interp
Negative Logits
dominates
-0.61
emale
-0.60
ariat
-0.58
omach
-0.56
wheelchair
-0.54
indle
-0.52
purported
-0.52
obs
-0.52
ovie
-0.52
DNA
-0.52
POSITIVE LOGITS
yourself
1.36
yourselves
1.23
Yourself
1.10
wisely
0.97
your
0.95
carefully
0.93
preferably
0.88
sparing
0.86
ASAP
0.84
cknow
0.81
Activations Density 0.226%