INDEX
Explanations
phrases emphasizing the large or total quantity of something
phrases indicating universality or generalization
New Auto-Interp
Negative Logits
ubi
-0.76
uel
-0.67
only
-0.63
yna
-0.63
uers
-0.61
alin
-0.61
kind
-0.60
Balls
-0.60
most
-0.60
Dull
-0.59
POSITIVE LOGITS
imaginable
0.88
WHERE
0.87
THING
0.76
hyde
0.75
conceivable
0.75
代
0.65
Ùĩ
0.64
body
0.63
extant
0.62
else
0.62
Activations Density 0.112%