INDEX
Explanations
positive associations with brightness and optimism
New Auto-Interp
Negative Logits
hort
-0.16
stÃŃ
-0.16
hlen
-0.15
ationToken
-0.14
rowser
-0.14
dds
-0.14
hape
-0.13
.Frame
-0.13
hiro
-0.13
Gravity
-0.13
POSITIVE LOGITS
ening
0.43
ened
0.35
-eyed
0.35
en
0.32
eners
0.29
ens
0.29
eyed
0.28
eyed
0.28
side
0.28
ener
0.27
Activations Density 0.029%