INDEX
Explanations
words related to authority, control, and influence
mentions of power and its various implications in different contexts
New Auto-Interp
Negative Logits
Von
-0.77
lov
-0.75
Taste
-0.72
ALK
-0.71
riad
-0.70
romeda
-0.69
eryl
-0.68
verett
-0.68
dk
-0.67
aro
-0.65
POSITIVE LOGITS
levers
1.03
vested
1.02
FUL
0.85
houses
0.84
lessness
0.83
conferred
0.83
outage
0.83
wielded
0.80
powerless
0.80
delegated
0.80
Activations Density 0.035%