INDEX
Explanations
variations of the word "yellow."
New Auto-Interp
Negative Logits
blue
-0.19
loo
-0.18
istor
-0.16
agan
-0.16
purple
-0.15
zos
-0.15
격
-0.15
BLUE
-0.15
Cunningham
-0.15
dark
-0.14
POSITIVE LOGITS
/red
0.24
-yellow
0.22
-orange
0.21
ish
0.21
-green
0.18
ribbon
0.17
stein
0.17
aires
0.17
-headed
0.17
/or
0.17
Activations Density 0.011%