INDEX
Explanations
formal requests or instructions
requests for actions or confirmations, particularly the word "please."
New Auto-Interp
Negative Logits
pires
-0.61
lite
-0.55
ulated
-0.54
onom
-0.51
rist
-0.51
Tav
-0.50
somew
-0.48
antz
-0.47
JUSTICE
-0.46
liner
-0.46
POSITIVE LOGITS
Cancel
0.81
verify
0.75
login
0.71
refresh
0.68
reload
0.68
retake
0.64
subscribe
0.64
enter
0.63
try
0.63
pload
0.62
Activations Density 0.011%