INDEX
Explanations
references to royalty or princesses
references to princess characters
New Auto-Interp
Negative Logits
smoker
-0.70
sych
-0.69
olson
-0.65
ORN
-0.63
andem
-0.62
neur
-0.61
oker
-0.61
entimes
-0.61
revolving
-0.60
EFF
-0.60
POSITIVE LOGITS
Leia
1.20
Princess
1.09
Bride
0.97
Peach
0.93
Diana
0.92
anova
0.91
Celest
0.91
princess
0.85
ette
0.84
cess
0.79
Activations Density 0.007%