INDEX
Explanations
words and phrases related to leadership and authority
New Auto-Interp
Negative Logits
iyel
-0.18
hoa
-0.17
_simps
-0.17
morgan
-0.16
alls
-0.15
MouseButton
-0.15
ches
-0.14
att
-0.14
اÙĩÙĦ
-0.14
argon
-0.14
POSITIVE LOGITS
ande
0.14
602
0.14
Ms
0.14
antan
0.13
Beat
0.13
鬼
0.13
Amar
0.13
ein
0.13
urity
0.13
721
0.13
Activations Density 0.028%