INDEX
Explanations
phrases or sentences containing greetings
punctuation or conversational interjections that indicate dialogue or interaction
New Auto-Interp
Negative Logits
İĭ
-0.76
ourse
-0.74
inction
-0.74
ilater
-0.66
arez
-0.65
%:
-0.64
arov
-0.62
/
-0.60
lab
-0.59
arus
-0.59
POSITIVE LOGITS
yeah
0.89
Wait
0.77
dunno
0.74
dear
0.72
yes
0.72
maybe
0.71
sorry
0.70
Weird
0.70
Butt
0.69
Sue
0.69
Activations Density 0.077%