INDEX
Explanations
terms related to extremism or radicalism
references to extremism or extreme behavior
New Auto-Interp
Negative Logits
gerald
-0.70
Labrador
-0.63
REF
-0.61
steroids
-0.59
McCabe
-0.58
behold
-0.57
Journals
-0.56
Tec
-0.56
Bai
-0.55
Hath
-0.55
POSITIVE LOGITS
antle
1.16
phasis
1.12
nants
1.09
nant
1.09
arkable
1.06
ovable
1.02
acy
1.00
ploy
0.99
essage
0.98
ittance
0.98
Activations Density 0.011%