INDEX
Explanations
mentions of the word "Princess" at varying intensities, possibly related to different contexts or relationships
references to various princesses
New Auto-Interp
Negative Logits
spaced
-0.69
sych
-0.68
ophon
-0.65
ulhu
-0.63
neur
-0.62
ucl
-0.62
rils
-0.61
funn
-0.61
appa
-0.61
oller
-0.60
POSITIVE LOGITS
Leia
1.12
Bride
1.10
Celest
1.01
Diana
0.97
anova
0.97
Princess
0.95
Peach
0.92
princess
0.88
Fiona
0.84
cess
0.83
Activations Density 0.029%