INDEX
Explanations
phrases related to storytelling and diverse perspectives
New Auto-Interp
Head Attr Weights
0:0.02
1:0.04
2:0.10
3:0.11
4:0.02
5:0.04
6:0.04
7:0.13
8:0.23
9:0.05
10:0.08
11:0.09
Negative Logits
psey
-1.34
wered
-1.18
achable
-1.18
uffed
-1.18
umo
-1.17
versible
-1.12
approved
-1.11
irmed
-1.07
iless
-1.06
compatible
-1.06
POSITIVE LOGITS
behav
1.14
psyche
1.12
warr
1.05
retina
1.02
stranger
1.02
Amon
1.00
Charm
0.94
charact
0.94
fandom
0.93
monarchy
0.92
Activations Density 0.053%