INDEX
Explanations
references to conspiracy theories and related skepticism
New Auto-Interp
Negative Logits
hev
-0.17
çŃĨ
-0.15
ehr
-0.15
ijd
-0.14
Ậ
-0.14
Beit
-0.14
446
-0.14
Zen
-0.14
Cool
-0.14
461
-0.14
POSITIVE LOGITS
inan
0.18
exion
0.15
adolu
0.14
alam
0.13
COPE
0.13
utin
0.13
ureau
0.13
chest
0.13
rawer
0.13
ýt
0.13
Activations Density 0.100%