INDEX
Explanations
requests or calls to action within text
the word "please" in various contexts
New Auto-Interp
Negative Logits
teenth
-0.68
anes
-0.58
raft
-0.58
lier
-0.55
oba
-0.55
roup
-0.53
Vec
-0.53
ao
-0.53
eways
-0.52
waged
-0.51
POSITIVE LOGITS
please
3.62
please
2.63
PLEASE
2.62
Please
1.97
Please
1.64
beware
1.32
kindly
1.14
THANK
1.04
sorry
1.02
thank
1.01
Activations Density 0.012%