INDEX
Explanations
phrases related to reactions or replies
phrases indicating a reaction or response to events or questions
New Auto-Interp
Negative Logits
brakes
-0.70
flo
-0.70
Marin
-0.68
\\\\\\\\
-0.67
hemat
-0.65
Dull
-0.64
knots
-0.64
oret
-0.63
gin
-0.63
utters
-0.62
POSITIVE LOGITS
thereto
0.84
reply
0.83
ively
0.82
briefs
0.79
response
0.77
responses
0.76
uberty
0.75
response
0.71
feedback
0.68
naires
0.67
Activations Density 0.014%