INDEX
Explanations
references to historical or political contexts related to Israel and Palestine
New Auto-Interp
Negative Logits
oun
-0.17
deriv
-0.17
hemorrh
-0.16
iqué
-0.15
weren
-0.15
Visit
-0.14
iquer
-0.14
Mock
-0.14
pund
-0.14
proven
-0.14
POSITIVE LOGITS
tire
0.21
attire
0.20
lance
0.20
pose
0.19
agit
0.19
teste
0.19
ifie
0.19
relie
0.18
filme
0.18
fait
0.18
Activations Density 0.023%