INDEX
Explanations
requests or instructions for action
instances of the word "please" in various contexts
New Auto-Interp
Negative Logits
arc
-0.78
MpServer
-0.78
ARC
-0.69
é¾
-0.67
senal
-0.67
mund
-0.64
cler
-0.64
phrine
-0.64
UF
-0.64
Huntington
-0.63
POSITIVE LOGITS
advise
0.90
excuse
0.88
fill
0.88
Ignore
0.82
ignore
0.82
note
0.81
sir
0.81
forgive
0.79
refrain
0.79
inquire
0.78
Activations Density 0.015%