INDEX
Explanations
words related to colors or descriptions of color
terms related to racial categorizations and associated contexts
New Auto-Interp
Negative Logits
BIL
-1.02
bilt
-0.79
ICLE
-0.75
Cosponsors
-0.68
«ĺ
-0.68
hran
-0.67
VERTISEMENT
-0.67
ronic
-0.67
Hug
-0.66
fecture
-0.66
POSITIVE LOGITS
azeera
0.84
ady
0.67
revol
0.66
stun
0.65
Reloaded
0.63
esp
0.63
road
0.61
oyle
0.61
sword
0.61
espresso
0.60
Activations Density 0.077%