INDEX
Explanations
phrases indicating a command or instruction
negative imperatives or prohibition phrases
New Auto-Interp
Negative Logits
ancest
-0.72
CVE
-0.63
Reloaded
-0.62
Frie
-0.61
milo
-0.56
behavi
-0.56
parser
-0.56
gnu
-0.55
wrapper
-0.55
spirited
-0.54
POSITIVE LOGITS
hesitate
0.90
forget
0.89
bother
0.87
expect
0.86
Í
0.85
intend
0.81
necessarily
0.80
CARE
0.80
ude
0.80
erest
0.77
Activations Density 0.057%