INDEX
Explanations
words indicating necessity and urgency
New Auto-Interp
Negative Logits
themselves
-0.23
itself
-0.15
undry
-0.15
arti
-0.14
alm
-0.14
â
-0.14
Morr
-0.13
oui
-0.13
磨
-0.13
ags
-0.13
POSITIVE LOGITS
yourself
0.40
yourselves
0.31
Yourself
0.27
your
0.24
your
0.23
ä½łçļĦ
0.22
можеÑĤе
0.21
Your
0.19
rott
0.16
ırak
0.16
Activations Density 1.414%