INDEX
Explanations
references to violent events and their impact
New Auto-Interp
Negative Logits
/browse
-0.16
dera
-0.15
aleur
-0.14
bil
-0.14
SCI
-0.14
تÙĬÙĨ
-0.14
YLON
-0.14
Äĥng
-0.13
orne
-0.13
度
-0.13
POSITIVE LOGITS
FG
0.19
Tamb
0.18
gov
0.17
hood
0.17
presidency
0.16
Presidency
0.16
Gov
0.16
herd
0.15
fuel
0.15
Hood
0.15
Activations Density 0.061%