INDEX
Explanations
meta-instructions or requests emphasizing actions that should or should not be taken
instructions and requests for compliance in online interactions
New Auto-Interp
Negative Logits
azeera
-0.71
Ö¼
-0.71
MpServer
-0.70
ilogy
-0.68
ufact
-0.64
culosis
-0.63
ailability
-0.63
aturated
-0.60
ãĥĬ
-0.60
quickShipAvailable
-0.59
POSITIVE LOGITS
responsibly
0.70
sir
0.68
iquette
0.67
sacrific
0.66
Submit
0.65
politely
0.65
PLEASE
0.64
Refresh
0.64
reprint
0.63
yourselves
0.62
Activations Density 0.083%