INDEX
Explanations
references to the color yellow and its associations
New Auto-Interp
Negative Logits
loo
-0.17
cci
-0.15
agan
-0.15
IW
-0.14
ema
-0.14
elier
-0.14
dark
-0.14
blue
-0.14
\base
-0.14
è¦
-0.14
POSITIVE LOGITS
ish
0.24
knife
0.24
hammer
0.23
-orange
0.22
/red
0.22
Yellow
0.21
cake
0.19
Pages
0.19
stone
0.19
-yellow
0.19
Activations Density 0.013%