INDEX
Explanations
instances of communication or informing in various contexts
New Auto-Interp
Negative Logits
yourself
-0.37
your
-0.36
your
-0.34
Yourself
-0.33
Your
-0.32
yourselves
-0.30
Your
-0.29
ä½łçļĦ
-0.26
YOUR
-0.26
ваÑĪ
-0.25
POSITIVE LOGITS
you
0.45
YOU
0.37
thee
0.33
æĤ¨
0.32
you
0.32
bạn
0.27
YOU
0.27
vous
0.27
ä½ł
0.27
You
0.26
Activations Density 0.266%