INDEX
Explanations
words related to power, control, authority, and influence
terms related to power and control dynamics
New Auto-Interp
Negative Logits
iate
-0.70
ODE
-0.69
individual
-0.68
ãĥ£
-0.68
rine
-0.67
ECD
-0.66
rum
-0.66
Scientist
-0.66
ribe
-0.65
aired
-0.65
POSITIVE LOGITS
xual
0.86
anship
0.84
stemming
0.81
reversal
0.77
compulsion
0.77
prejudice
0.76
escalation
0.74
uality
0.73
avoidance
0.73
eering
0.71
Activations Density 0.078%