INDEX
Explanations
references to Israel and related political or social contexts
New Auto-Interp
Negative Logits
emin
-0.15
umbo
-0.15
lug
-0.15
/Private
-0.14
zzle
-0.14
ãĥijãĥ³
-0.14
dre
-0.14
enna
-0.14
ucer
-0.14
assaulting
-0.14
POSITIVE LOGITS
led
0.22
Defense
0.16
gnu
0.16
ognito
0.16
rael
0.15
gın
0.15
گاÙĨ
0.15
son
0.14
ite
0.14
crowds
0.14
Activations Density 0.018%