INDEX
Explanations
references to authority or positions of power
New Auto-Interp
Head Attr Weights
0:0.32
1:0.13
2:0.04
3:0.04
4:0.02
5:0.14
6:0.07
7:0.01
8:0.05
9:0.07
10:0.03
11:0.03
Negative Logits
princ
-1.58
VIDIA
-1.45
withdrawals
-1.43
capt
-1.43
INGTON
-1.39
TextColor
-1.38
CS
-1.36
ste
-1.34
elev
-1.34
cartoon
-1.33
POSITIVE LOGITS
.,
1.84
friends
1.74
aea
1.71
aceae
1.69
ema
1.68
yne
1.64
ngth
1.64
iour
1.63
-.
1.62
forums
1.61
Activations Density 0.039%