INDEX
Explanations
terms related to focus or attention
New Auto-Interp
Negative Logits
hans
-0.19
Hans
-0.17
Malone
-0.16
tl
-0.15
iams
-0.15
erce
-0.15
overs
-0.15
904
-0.14
IMS
-0.14
ijd
-0.14
POSITIVE LOGITS
ussed
0.23
cus
0.18
uss
0.16
als
0.16
USED
0.16
λια
0.16
selling
0.15
cing
0.15
usses
0.15
imbus
0.15
Activations Density 0.007%