INDEX
Explanations
instances of politically charged statements or events
New Auto-Interp
Negative Logits
according
-0.07
çı
-0.06
OPS
-0.06
ÙģÙĩ
-0.06
ÐŀÐł
-0.06
ña
-0.06
according
-0.06
iper
-0.06
WORD
-0.06
ellas
-0.06
POSITIVE LOGITS
esktop
0.07
/REC
0.06
andom
0.06
/my
0.06
no
0.06
kata
0.06
urer
0.06
rám
0.06
yles
0.06
don
0.06
Activations Density 0.012%