INDEX
Explanations
references to terrorist groups and their affiliations
New Auto-Interp
Negative Logits
utar
-0.18
allery
-0.17
IDI
-0.15
itra
-0.14
tir
-0.14
dikke
-0.14
istra
-0.14
293
-0.13
631
-0.13
370
-0.13
POSITIVE LOGITS
groups
0.25
spl
0.21
outfits
0.20
-groups
0.20
groups
0.20
group
0.20
grupos
0.19
splitter
0.19
nhóm
0.18
organizations
0.18
Activations Density 0.101%