INDEX
Explanations
mentions of the color pink
references to the word "Pink."
New Auto-Interp
Negative Logits
dependence
-0.70
entry
-0.66
SER
-0.65
warfare
-0.63
endi
-0.63
stabilized
-0.63
vari
-0.62
Thor
-0.62
ravis
-0.62
clause
-0.61
POSITIVE LOGITS
Pink
3.79
Pink
3.04
pink
1.81
Purple
1.62
Rainbow
1.41
Lime
1.25
Yellow
1.16
Lesbian
1.13
Orange
1.12
Green
1.09
Activations Density 0.014%