INDEX
Explanations
references to specific individuals, groups, or beliefs that hold extremist views
New Auto-Interp
Negative Logits
argon
-0.15
addtogroup
-0.15
ìĦĿ
-0.15
emsp
-0.15
afort
-0.15
,{"-0.14
éĪ
-0.14
encion
-0.14
رد
-0.14
ruption
-0.14
POSITIVE LOGITS
WND
0.20
CNS
0.20
World
0.17
AIM
0.16
TPL
0.15
åŁºåľ°
0.14
éĥİ
0.14
World
0.14
oman
0.14
similarly
0.14
Activations Density 0.000%