INDEX
Explanations
the word "anti" with varying levels of intensity
terms or phrases related to anti-government sentiments
New Auto-Interp
Negative Logits
mable
-0.89
ding
-0.74
sed
-0.73
nets
-0.71
wings
-0.71
lined
-0.70
manship
-0.70
come
-0.68
spring
-0.68
matically
-0.68
POSITIVE LOGITS
iso
0.92
qua
0.88
Devi
0.88
anti
0.83
ucci
0.82
ño
0.78
opsis
0.78
arius
0.75
0.75
Å¡
0.74
Activations Density 0.028%