INDEX
Explanations
phrases where a comparison is made between two alternatives, emphasizing a preference for one over the other
comparative phrases indicating preference or alternatives
New Auto-Interp
Negative Logits
lance
-0.72
cia
-0.72
LO
-0.71
CCC
-0.70
meric
-0.70
iren
-0.70
ulner
-0.69
estern
-0.69
DO
-0.69
WIND
-0.68
POSITIVE LOGITS
relying
1.47
letting
1.16
bothering
1.12
merely
1.12
simply
1.11
necessarily
1.10
focusing
1.10
risking
1.09
concentrating
1.08
having
1.05
Activations Density 0.065%