INDEX
Explanations
phrases starting with "Well,"
instances of the word "well" as a discourse marker
New Auto-Interp
Negative Logits
İĭ
-0.70
FN
-0.67
mage
-0.63
untarily
-0.62
=~=~
-0.60
neighb
-0.60
coni
-0.60
emo
-0.60
sk
-0.59
atile
-0.59
POSITIVE LOGITS
yeah
1.05
uh
0.94
guess
0.92
maybe
0.90
fortunately
0.87
congr
0.87
luckily
0.87
congratulations
0.84
yes
0.84
sorry
0.83
Activations Density 0.047%