INDEX
Explanations
references to geographical locations or administrative regions
New Auto-Interp
Negative Logits
amilia
-0.15
uten
-0.15
udd
-0.14
anmar
-0.14
ipp
-0.14
nings
-0.14
rze
-0.14
curity
-0.14
qualities
-0.14
اتÛĮ
-0.14
POSITIVE LOGITS
wide
0.33
/state
0.23
sheriff
0.21
enance
0.21
Sheriff
0.21
-wide
0.20
vise
0.20
/count
0.18
erc
0.18
립
0.17
Activations Density 0.023%