INDEX
Explanations
references to the color red in various contexts
New Auto-Interp
Negative Logits
ogue
-0.15
intage
-0.14
206
-0.14
fare
-0.14
обоÑĢ
-0.14
Gray
-0.14
atically
-0.14
ESP
-0.13
able
-0.13
blue
-0.13
POSITIVE LOGITS
dest
0.19
-red
0.18
/red
0.18
-purple
0.16
-hot
0.16
/ros
0.16
ness
0.15
ahlen
0.15
-yellow
0.15
èī²çļĦ
0.14
Activations Density 0.072%