INDEX
Explanations
mentions of people and interactions in a narrative context
New Auto-Interp
Negative Logits
Jewish
-0.18
Israeli
-0.17
ATUS
-0.16
auer
-0.15
tang
-0.15
Äįet
-0.15
Israel
-0.15
Bilim
-0.14
Israel
-0.14
ModelProperty
-0.14
POSITIVE LOGITS
Moran
0.22
Mos
0.22
Av
0.22
Guy
0.21
Gil
0.20
Guy
0.20
Av
0.20
Hag
0.19
Adv
0.18
Riv
0.18
Activations Density 0.041%