INDEX
Explanations
phrases indicating a transition or introduction to a new topic
phrases indicating a lack of delay or a direct approach to a topic
New Auto-Interp
Negative Logits
eson
-0.86
bered
-0.68
pak
-0.66
payers
-0.65
aceous
-0.63
tten
-0.63
Bird
-0.61
nor
-0.61
whichever
-0.60
ta
-0.59
POSITIVE LOGITS
typo
0.86
exaggeration
0.84
ado
0.82
interruption
0.77
hesitation
0.76
sarc
0.72
understatement
0.71
confused
0.69
confusion
0.69
delay
0.68
Activations Density 0.148%