INDEX
Explanations
specific symbols or letters, possibly indicating special formatting or coding contexts
New Auto-Interp
Negative Logits
ův
-0.15
arend
-0.15
egis
-0.15
apol
-0.14
ward
-0.14
inox
-0.14
orne
-0.14
ospace
-0.14
correct
-0.13
enced
-0.13
POSITIVE LOGITS
Israeli
0.23
Palestinian
0.21
Beit
0.20
Palestinians
0.20
Palestine
0.19
Israel
0.19
Israeli
0.18
Palestin
0.17
ÙģÙĦس
0.17
Pale
0.17
Activations Density 0.002%