INDEX
Explanations
instances of requests or commands directed at individuals or groups
New Auto-Interp
Negative Logits
жаÑĢ
-0.15
èĹ
-0.15
ÎŃÏĤ
-0.14
éĹ
-0.14
xAE
-0.14
ieties
-0.14
ebp
-0.14
argins
-0.14
olia
-0.14
.AF
-0.13
POSITIVE LOGITS
wer
0.15
bợi
0.15
by
0.14
oleh
0.14
coun
0.14
ries
0.14
amen
0.14
eed
0.13
Wesley
0.13
ta
0.13
Activations Density 0.117%