INDEX
Explanations
phrases related to size and scale
references to sizes, both large and small, in various contexts
New Auto-Interp
Negative Logits
ighth
-0.75
BILITY
-0.75
EMENT
-0.70
iversary
-0.69
CHAT
-0.67
911
-0.66
SON
-0.65
ãĥĹ
-0.64
BILITIES
-0.64
uador
-0.64
POSITIVE LOGITS
to
0.72
too
0.71
for
0.67
lest
0.67
ideologically
0.67
paced
0.66
politically
0.64
offensively
0.64
amus
0.63
(>
0.62
Activations Density 0.126%