INDEX
Explanations
references to political dynamics and conflicts involving Israel
New Auto-Interp
Negative Logits
olas
-0.20
hlas
-0.17
hin
-0.17
actice
-0.15
ær
-0.14
ulo
-0.14
deaux
-0.14
ilan
-0.14
quares
-0.14
ULA
-0.13
POSITIVE LOGITS
utt
0.17
wakeup
0.15
domina
0.14
asma
0.14
utura
0.14
allery
0.14
utoff
0.14
onden
0.14
íĻ©
0.13
isia
0.13
Activations Density 0.325%