INDEX
Explanations
phrases comparing actions or choices between two alternatives, highlighting a preference for one over the other
comparative phrases emphasizing preference or choice
New Auto-Interp
Negative Logits
isf
-0.73
Aren
-0.70
iola
-0.68
MG
-0.68
Origin
-0.67
Corn
-0.65
lied
-0.64
ENE
-0.64
Accessory
-0.64
eneg
-0.64
POSITIVE LOGITS
necessarily
1.11
relying
0.94
bothering
0.90
anything
0.86
outright
0.86
simply
0.84
merely
0.84
letting
0.81
bother
0.78
allowing
0.77
Activations Density 0.046%