INDEX
Explanations
phrases related to comparisons or contrasts
repeated instances of the word "the."
New Auto-Interp
Negative Logits
arate
-0.79
antes
-0.79
Ò
-0.77
imi
-0.76
thood
-0.75
ceive
-0.73
icia
-0.72
bg
-0.71
arettes
-0.71
ania
-0.71
POSITIVE LOGITS
latter
1.30
biggest
1.21
vast
1.19
majority
1.16
sheer
1.14
simplest
1.12
absence
1.10
slightest
1.10
latest
1.09
oret
1.08
Activations Density 0.357%