INDEX
Explanations
references to thoughts, prayers, and emotional expressions related to support and empathy
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.11
3:0.05
4:0.05
5:0.03
6:0.41
7:0.04
8:0.05
9:0.04
10:0.08
11:0.06
Negative Logits
itself
-1.35
Choice
-1.28
�
-1.27
Cumm
-1.26
Wad
-1.23
Maiden
-1.22
Mellon
-1.20
Cola
-1.19
td
-1.19
Wing
-1.18
POSITIVE LOGITS
xual
1.54
inders
1.53
visor
1.48
orsi
1.41
acts
1.35
abwe
1.27
reflect
1.25
orno
1.22
inki
1.22
overwhelm
1.21
Activations Density 0.002%