INDEX
Explanations
references to color and shading in various contexts
New Auto-Interp
Negative Logits
redhead
-0.18
beige
-0.17
pink
-0.16
ebi
-0.16
yellow
-0.15
crimson
-0.15
èħ°
-0.15
.yellow
-0.15
golden
-0.15
orange
-0.15
POSITIVE LOGITS
Blue
0.63
blue
0.62
Blue
0.58
BLUE
0.57
blue
0.56
-blue
0.56
BLUE
0.50
_blue
0.44
.Blue
0.44
èĵĿ
0.43
Activations Density 0.040%