INDEX
Explanations
references to people in managerial or leadership positions
New Auto-Interp
Negative Logits
ÑĤап
-0.16
erli
-0.14
ãģ£ãģ¨
-0.14
ÑĮ
-0.14
aign
-0.13
woods
-0.13
è¥
-0.13
registers
-0.13
bam
-0.13
Cah
-0.13
POSITIVE LOGITS
esen
0.17
ader
0.15
elters
0.14
ĵn
0.14
uada
0.14
iesz
0.14
ekte
0.14
losed
0.14
urring
0.14
è«ĸ
0.13
Activations Density 0.015%