INDEX
Explanations
instances of conversational filler and hesitations in dialogue
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.17
3:0.09
4:0.17
5:0.02
6:0.12
7:0.09
8:0.05
9:0.03
10:0.07
11:0.08
Negative Logits
resur
-1.37
awaits
-1.30
ersed
-1.27
Sunder
-1.26
rous
-1.23
advertis
-1.20
paired
-1.19
upper
-1.17
Upper
-1.16
ngth
-1.16
POSITIVE LOGITS
jah
1.46
fml
1.42
ende
1.41
�
1.36
Ago
1.33
srfAttach
1.28
龍�
1.27
��
1.26
iera
1.26
cember
1.25
Activations Density 0.004%