INDEX
Explanations
references to terrorist incidents and their impacts
New Auto-Interp
Negative Logits
oulos
-0.15
assic
-0.15
674
-0.14
_placement
-0.14
ofire
-0.14
motion
-0.14
Tet
-0.14
Bender
-0.13
ettel
-0.13
McGr
-0.13
POSITIVE LOGITS
Prevent
0.30
Counter
0.29
counter
0.28
MI
0.26
Counter
0.25
extremism
0.24
Ext
0.24
counter
0.23
grooming
0.22
MI
0.22
Activations Density 0.012%