INDEX
Explanations
words related to power dynamics and conflicts
references to political power dynamics and strategies
New Auto-Interp
Negative Logits
DERR
-0.74
learners
-0.72
Redd
-0.72
Osw
-0.71
crochet
-0.68
uploaded
-0.68
Sapp
-0.67
recy
-0.66
spaced
-0.66
filler
-0.65
POSITIVE LOGITS
iances
1.01
ablishment
0.95
ocracy
0.94
Government
0.93
establishment
0.90
ente
0.89
itions
0.89
unta
0.88
appro
0.88
dictatorship
0.87
Activations Density 0.331%