INDEX
Explanations
expressions of preference using the phrase "rather" followed by an action or situation
expressions of preference and comparison
New Auto-Interp
Negative Logits
orig
-0.68
mentioned
-0.65
INAL
-0.65
idden
-0.64
ults
-0.62
uss
-0.62
bing
-0.62
Yard
-0.61
Sah
-0.60
uum
-0.60
POSITIVE LOGITS
prioritize
0.81
emulate
0.78
than
0.78
settle
0.77
lose
0.76
spend
0.76
avoid
0.75
tolerate
0.75
stay
0.74
survive
0.73
Activations Density 0.019%