INDEX
Explanations
phrases starting with "Well," and often followed by an explanation or continuation
transitional phrases and conversational cues
New Auto-Interp
Negative Logits
anim
-0.67
Amos
-0.61
Sok
-0.61
sing
-0.57
spir
-0.57
sideline
-0.56
]
-0.55
fig
-0.55
cele
-0.55
>
-0.55
POSITIVE LOGITS
Answer
1.05
swers
1.02
answer
0.96
Answer
0.93
answers
0.85
ription
0.82
Well
0.82
ccording
0.78
guessed
0.75
swer
0.72
Activations Density 0.302%