INDEX
Explanations
references to armed groups and violence
New Auto-Interp
Negative Logits
ÙĨاÙħ
-0.17
ảo
-0.16
elfast
-0.16
æª
-0.15
boom
-0.15
gether
-0.15
pul
-0.15
ỡ
-0.15
pty
-0.14
innen
-0.14
POSITIVE LOGITS
ζί
0.16
linger
0.15
idel
0.15
mith
0.15
ivec
0.15
Ĥ¬
0.15
aket
0.15
.$.
0.14
nit
0.14
Bend
0.14
Activations Density 0.016%