INDEX
Explanations
names or terms related to specific groups or individuals
mentions of specific militant groups and their leaders
New Auto-Interp
Negative Logits
crochet
-0.68
livious
-0.68
erection
-0.66
acebook
-0.64
xp
-0.63
Entered
-0.61
Pony
-0.59
Nadu
-0.59
+/-
-0.59
gears
-0.59
POSITIVE LOGITS
azeera
1.09
awi
0.99
aghd
0.96
abi
0.86
adr
0.84
aq
0.83
aida
0.82
adi
0.82
andi
0.81
aeda
0.81
Activations Density 0.053%