INDEX
Explanations
elements related to humorous or playful situations
New Auto-Interp
Negative Logits
bench
-0.20
Bench
-0.19
bol
-0.18
jom
-0.17
Lincoln
-0.17
inston
-0.16
_unix
-0.16
LIN
-0.15
bench
-0.15
487
-0.15
POSITIVE LOGITS
Brian
1.23
Brian
1.12
Brain
0.59
β
0.55
bean
0.51
Brain
0.51
brain
0.49
beta
0.48
Bean
0.47
rian
0.47
Activations Density 0.015%