INDEX
Explanations
comparisons or contrasts
comparative phrases highlighting superiority or preference
New Auto-Interp
Negative Logits
catentry
-0.90
ahime
-0.88
ulations
-0.78
ulative
-0.72
enser
-0.71
ements
-0.71
ulated
-0.70
INTON
-0.69
uther
-0.68
ulators
-0.68
POSITIVE LOGITS
ours
1.10
yours
0.89
theirs
0.85
hers
0.77
yourselves
0.73
those
0.69
yourself
0.68
this
0.66
Jav
0.65
ourselves
0.64
Activations Density 0.064%