INDEX
Explanations
phrases prompting actions or interactions, often involving sharing with others
commands and requests for information sharing
New Auto-Interp
Negative Logits
Ö¼
-0.68
opes
-0.65
comparatively
-0.63
©¶æ¥µ
-0.62
admittedly
-0.61
£ı
-0.59
à¨
-0.59
ensibly
-0.58
itled
-0.57
going
-0.57
POSITIVE LOGITS
us
1.41
me
1.30
yourself
1.22
yourselves
1.11
Yourself
1.06
your
0.99
someone
0.99
others
0.99
Us
0.98
someone
0.93
Activations Density 0.197%