INDEX
Explanations
conversational transitions and filler phrases
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.04
3:0.14
4:0.22
5:0.01
6:0.16
7:0.17
8:0.03
9:0.02
10:0.05
11:0.08
Negative Logits
synonymous
-1.45
Ambro
-1.42
Osc
-1.34
appl
-1.29
denote
-1.29
Pwr
-1.28
stret
-1.27
amounted
-1.26
centerpiece
-1.24
spree
-1.24
POSITIVE LOGITS
osaurs
1.59
vantage
1.50
ggies
1.48
comrade
1.42
\",
1.39
comrades
1.38
DragonMagazine
1.38
anners
1.34
wat
1.34
ileged
1.31
Activations Density 0.008%