INDEX
Explanations
phrases related to criticism or controversy
expressions of distrust and criticism towards authority or institutions
New Auto-Interp
Negative Logits
ellar
-0.82
restricted
-0.78
interrupted
-0.78
rapnel
-0.74
uttered
-0.74
orb
-0.71
iffe
-0.68
bott
-0.65
ocated
-0.63
oof
-0.63
POSITIVE LOGITS
democracy
0.97
democratic
0.90
innocent
0.87
taxp
0.87
democratically
0.86
livelihood
0.85
legitimate
0.85
morals
0.85
dignity
0.84
credibility
0.84
Activations Density 0.803%