INDEX
Explanations
comparative uses of the word "better" in the text
expressions of improvement or superiority
New Auto-Interp
Negative Logits
mad
-0.71
Py
-0.68
erity
-0.67
NH
-0.66
MIT
-0.64
idon
-0.64
Ar
-0.63
BRE
-0.62
psy
-0.62
TRY
-0.61
POSITIVE LOGITS
suited
0.97
behaved
0.89
than
0.84
manag
0.75
payoff
0.72
seller
0.70
acquainted
0.70
tailor
0.69
behav
0.68
bet
0.67
Activations Density 0.032%