INDEX
Explanations
specific phrases mentioning "yes" followed by confirming statements
affirmative phrases or confirmations
New Auto-Interp
Negative Logits
ethyst
-0.71
arted
-0.70
rils
-0.70
rance
-0.70
patch
-0.69
enta
-0.69
İĭ
-0.68
rival
-0.68
gall
-0.66
liner
-0.66
POSITIVE LOGITS
sir
0.96
yes
0.78
THERE
0.67
thank
0.67
technically
0.66
yeah
0.64
please
0.64
terday
0.60
Mistress
0.59
Woo
0.59
Activations Density 0.042%