INDEX
Explanations
phrases indicating preference or choice among options
comparisons emphasizing alternatives or preferences
New Auto-Interp
Negative Logits
iola
-0.76
endez
-0.70
eur
-0.70
MG
-0.69
essen
-0.68
isf
-0.67
oops
-0.66
SPA
-0.66
ENE
-0.64
eneg
-0.64
POSITIVE LOGITS
necessarily
0.93
relying
0.83
rely
0.82
simply
0.80
anything
0.80
merely
0.77
outright
0.76
perish
0.76
speculate
0.76
acle
0.74
Activations Density 0.025%