INDEX
Explanations
words or phrases related to political topics
terms related to extremism and inflammatory ideologies
New Auto-Interp
Negative Logits
minim
-0.76
capsule
-0.73
cyan
-0.72
coast
-0.71
Samar
-0.69
collecting
-0.69
downs
-0.69
decomp
-0.69
buggy
-0.68
missed
-0.67
POSITIVE LOGITS
ONSORED
0.97
£
0.92
¹
0.89
aspx
0.85
Ibid
0.85
gypt
0.84
º
0.84
į
0.81
¯
0.80
¬
0.78
Activations Density 0.164%