INDEX
Explanations
specific references to princesses
references to "Princess" in various contexts
New Auto-Interp
Negative Logits
sych
-0.73
spaced
-0.70
ucl
-0.68
smoker
-0.67
ulhu
-0.65
EFF
-0.64
neur
-0.63
lder
-0.63
oker
-0.62
pson
-0.61
POSITIVE LOGITS
Leia
1.22
Princess
1.09
Bride
1.04
Diana
1.01
Celest
0.95
Peach
0.95
anova
0.93
princess
0.85
Fiona
0.82
Alexandra
0.81
Activations Density 0.013%