INDEX
Explanations
instances of dialogue or quotes in text
New Auto-Interp
Head Attr Weights
0:0.01
1:0.04
2:0.12
3:0.05
4:0.02
5:0.04
6:0.14
7:0.15
8:0.07
9:0.12
10:0.08
11:0.09
Negative Logits
agos
-1.00
elaide
-0.94
eatured
-0.92
exhibition
-0.89
/#
-0.88
clusive
-0.88
ROR
-0.88
Mississ
-0.87
hoops
-0.85
Reilly
-0.84
POSITIVE LOGITS
helm
1.11
.--
1.04
inctions
1.02
imil
1.02
regor
0.99
few
0.98
Citiz
0.95
</
0.95
<+
0.95
�
0.93
Activations Density 0.006%