INDEX
Explanations
expressions of agreement or affirmative responses
Beginning of agreement/acknowledgment statements
yeah, wow, ah
New Auto-Interp
Negative Logits
()",
-0.64
:",
-0.63
=",
-0.61
/>";
-0.61
();*/
-0.57
"""
-0.57
:",
-0.56
/*",
-0.56
Portail
-0.56
-0.56
POSITIVE LOGITS
,
0.97
thats
0.80
we
0.76
maybe
0.75
look
0.74
sorry
0.73
I
0.73
they
0.72
!
0.69
no
0.68
Activations Density 0.110%