INDEX
Explanations
references to Arab identity and associated terms
New Auto-Interp
Negative Logits
yon
-0.17
lements
-0.15
TING
-0.15
aran
-0.15
iro
-0.14
lement
-0.14
wit
-0.14
Ïĥε
-0.14
Barnes
-0.14
км
-0.14
POSITIVE LOGITS
-Israel
0.20
Gulf
0.17
isation
0.17
-American
0.17
-major
0.16
-speaking
0.16
Monetary
0.16
ophone
0.16
-Americans
0.15
/black
0.15
Activations Density 0.005%