INDEX
Explanations
comparisons or contrasts in the form of "more than" statements
comparative phrases that highlight the complexity or severity of a situation
New Auto-Interp
Negative Logits
EMENT
-0.81
antage
-0.79
imity
-0.76
uto
-0.76
ance
-0.71
FTWARE
-0.70
urther
-0.70
ulpt
-0.69
bilt
-0.69
autions
-0.68
POSITIVE LOGITS
usual
0.91
anything
0.75
ours
0.73
ordinary
0.72
placebo
0.72
superficial
0.71
average
0.71
ever
0.70
ĻĤ
0.70
Watergate
0.67
Activations Density 0.067%