INDEX
Explanations
mentions of the word "Red"
New Auto-Interp
Negative Logits
blackColor
-0.19
astro
-0.16
grey
-0.16
blue
-0.14
Haram
-0.14
gray
-0.14
ów
-0.14
ierrez
-0.14
golden
-0.14
quoise
-0.14
POSITIVE LOGITS
emption
0.25
empt
0.23
dish
0.23
acted
0.22
dest
0.22
-red
0.20
/red
0.20
Red
0.20
red
0.20
.Red
0.20
Activations Density 0.024%