INDEX
Explanations
words related to communication and instruction
New Auto-Interp
Negative Logits
yourself
-0.33
Yourself
-0.28
yourselves
-0.23
your
-0.22
Your
-0.21
можеÑĤе
-0.18
ï¼Įä½ł
-0.18
your
-0.18
Ihrem
-0.17
Your
-0.17
POSITIVE LOGITS
him
0.31
thee
0.29
ya
0.23
cha
0.23
ihn
0.23
us
0.22
CHA
0.20
inya
0.19
Ihnen
0.19
ÑĤебÑı
0.19
Activations Density 0.200%