INDEX
Explanations
instances of polite requests or forms of "please."
New Auto-Interp
Negative Logits
ation
-0.77
ATION
-0.71
Kopp
-0.67
ations
-0.64
낼
-0.64
Erik
-0.64
untergang
-0.61
autorité
-0.60
Haan
-0.60
GMENT
-0.60
POSITIVE LOGITS
please
1.24
Please
1.20
Please
1.18
PLEASE
1.15
please
1.13
PLEASE
1.13
Kindly
1.11
Pls
1.03
Bitte
1.01
Pls
1.00
Activations Density 0.018%