INDEX
Explanations
references to systemic flaws and corruption in governance or media
New Auto-Interp
Negative Logits
zilla
-0.16
ence
-0.15
gl
-0.14
Benedict
-0.14
late
-0.14
ist
-0.14
upe
-0.14
ÐĴС
-0.14
HT
-0.14
aster
-0.13
POSITIVE LOGITS
ugins
0.17
ocale
0.17
alink
0.16
frauen
0.16
NavItem
0.16
acak
0.16
_Entity
0.15
inton
0.15
.SEVER
0.14
Combo
0.14
Activations Density 0.082%