INDEX
Explanations
mentions of specific colors
references to color
New Auto-Interp
Negative Logits
OHN
-0.82
ammad
-0.80
atican
-0.77
ITAL
-0.76
CHA
-0.75
IJ
-0.74
CHAT
-0.72
Xi
-0.71
Kaplan
-0.70
PER
-0.69
POSITIVE LOGITS
colours
1.24
colour
1.20
colour
1.13
palette
0.97
Colour
0.94
stripe
0.87
coloured
0.87
anguage
0.86
colors
0.86
stripes
0.82
Activations Density 0.010%