INDEX
Explanations
words related to dictatorship and political power
references to the term "dictator."
New Auto-Interp
Negative Logits
meric
-0.82
ciation
-0.77
grass
-0.71
vier
-0.70
awa
-0.70
bye
-0.69
Sisters
-0.68
knit
-0.67
zee
-0.67
WAYS
-0.67
POSITIVE LOGITS
ures
0.91
ainers
0.90
orians
0.90
Cumber
0.87
orian
0.86
opia
0.83
ory
0.83
о
0.82
uple
0.82
ainer
0.81
Activations Density 0.025%