INDEX
Explanations
references to specific individuals or groups, particularly those of Middle Eastern descent or with Arabic names
New Auto-Interp
Negative Logits
ero
-0.19
enta
-0.17
ldr
-0.17
ruh
-0.16
ERO
-0.16
заÑģÑĤ
-0.15
ENTA
-0.15
aha
-0.15
ncy
-0.14
elman
-0.14
POSITIVE LOGITS
amed
0.19
anned
0.17
essen
0.17
issan
0.17
оÑĢÑĥж
0.16
hourly
0.16
sein
0.15
ÅŁam
0.14
pit
0.14
emed
0.14
Activations Density 0.028%