INDEX
Explanations
titles and roles of officials, especially those related to government or political positions
New Auto-Interp
Negative Logits
orro
-0.17
stead
-0.17
enburg
-0.15
APS
-0.15
st
-0.15
ÙħÙĪØ¯
-0.15
ilt
-0.15
ternet
-0.15
detriment
-0.14
jun
-0.14
POSITIVE LOGITS
noop
0.16
ignum
0.15
resp
0.15
æ¸Ī
0.14
linger
0.14
ibel
0.14
ibe
0.14
minded
0.14
_bw
0.14
auc
0.14
Activations Density 0.019%