INDEX
Explanations
words and phrases related to authority or leadership
New Auto-Interp
Negative Logits
iances
-0.16
ĵĺ
-0.16
вÑģÑı
-0.15
ref
-0.15
inch
-0.14
夫
-0.14
antu
-0.14
tha
-0.14
red
-0.14
sh
-0.14
POSITIVE LOGITS
umbn
0.20
AXB
0.19
ursors
0.18
ongyang
0.16
ault
0.16
tember
0.15
weeney
0.15
ible
0.14
udder
0.14
upert
0.14
Activations Density 0.023%