INDEX
Explanations
phrases related to moral or ethical judgments
phrases expressing discontent or disagreement with societal norms and politics
New Auto-Interp
Negative Logits
utenberg
-0.77
arently
-0.72
ciation
-0.66
lost
-0.66
imilar
-0.66
oaded
-0.64
iHUD
-0.64
ixt
-0.63
missing
-0.62
dizz
-0.62
POSITIVE LOGITS
civilized
1.30
democracy
1.23
democracies
1.07
democratic
1.00
decency
0.99
Democracy
0.97
journalism
0.91
taxp
0.90
society
0.87
morals
0.87
Activations Density 0.421%