INDEX
Explanations
terms associated with authority, control, and oversight in various contexts
New Auto-Interp
Head Attr Weights
0:0.08
1:0.01
2:0.36
3:0.15
4:0.03
5:0.06
6:0.03
7:0.01
8:0.10
9:0.04
10:0.05
11:0.02
Negative Logits
Boone
-1.29
outp
-1.24
OD
-1.20
apest
-1.16
minster
-1.12
ffen
-1.12
clicked
-1.10
vana
-1.08
aristocracy
-1.06
osphere
-1.05
POSITIVE LOGITS
guise
1.62
��
1.54
��
1.34
pretext
1.33
provocation
1.31
�
1.30
�
1.30
��
1.29
う
1.29
�
1.26
Activations Density 0.050%