INDEX
Explanations
dialogue attributions (e.g., "he said," "she yelled")
dialogue related to characters' interactions
New Auto-Interp
Negative Logits
arding
-0.70
raviolet
-0.69
atana
-0.68
Architects
-0.67
ardless
-0.66
otal
-0.65
iencies
-0.65
artifacts
-0.63
selection
-0.63
adelphia
-0.63
POSITIVE LOGITS
muttered
1.37
exclaimed
1.34
whispered
1.29
shouted
1.28
yelled
1.28
murm
1.21
exclaim
1.19
screamed
1.16
chanted
1.15
replied
1.12
Activations Density 0.140%