INDEX
Explanations
phrases related to political power or authority
references to authority or influence
New Auto-Interp
Negative Logits
eor
-0.66
Alvarez
-0.66
Sil
-0.65
Absent
-0.64
Gil
-0.64
rian
-0.63
Murd
-0.62
ded
-0.61
Tud
-0.61
Ign
-0.61
POSITIVE LOGITS
powers
1.02
Reviewer
0.92
hell
0.90
uits
0.84
delegated
0.82
superpower
0.81
conferred
0.79
oshenko
0.75
wu
0.72
olve
0.72
Activations Density 0.013%