INDEX
Explanations
names with associated statements or questions
instances of dialogue or conversational exchanges
New Auto-Interp
Negative Logits
ascus
-0.81
anmar
-0.71
eatures
-0.67
inement
-0.66
ometown
-0.66
abad
-0.65
unification
-0.63
administ
-0.63
agric
-0.62
bride
-0.61
POSITIVE LOGITS
Yeah
1.16
Huh
1.14
Hmm
1.09
Alright
1.07
Exactly
1.06
Hmm
1.05
Exactly
1.05
Oh
1.04
Uh
1.04
Yes
1.02
Activations Density 0.167%