INDEX
Explanations
phrases related to being in the spotlight or gaining attention
New Auto-Interp
Negative Logits
ponder
-0.18
usto
-0.16
aleb
-0.16
oplan
-0.15
oire
-0.14
afa
-0.14
#
-0.14
Gaut
-0.14
olec
-0.14
gone
-0.14
POSITIVE LOGITS
attention
0.22
attention
0.18
Attention
0.17
tro
0.15
Shed
0.14
etro
0.14
spot
0.14
recip
0.14
suffix
0.14
rou
0.14
Activations Density 0.050%