INDEX
Explanations
words related to political discourse and criticism
critiques of political behavior and accountability
New Auto-Interp
Negative Logits
iple
-0.89
fm
-0.74
mx
-0.72
ibles
-0.70
vr
-0.70
DX
-0.68
VERTISEMENT
-0.68
isode
-0.68
uce
-0.66
rieving
-0.65
POSITIVE LOGITS
namely
1.25
irrespective
1.20
thereby
1.20
undermining
1.14
lest
1.13
ignoring
1.13
whereas
1.12
notwithstanding
1.11
regardless
1.09
perpetrated
1.08
Activations Density 0.394%