INDEX
Explanations
instances of the word "authority"
references to authority figures or concepts
New Auto-Interp
Negative Logits
Lens
-0.74
Shal
-0.70
Lovely
-0.68
auder
-0.68
irling
-0.68
-0.68
neys
-0.67
lla
-0.66
eful
-0.66
esta
-0.64
POSITIVE LOGITS
authority
1.00
delegated
0.98
vested
0.97
Reviewer
0.91
exercised
0.85
figures
0.85
conferred
0.84
overseeing
0.78
nomine
0.78
confir
0.77
Activations Density 0.028%