INDEX
Explanations
concepts related to dominance and influence within various institutions and systems
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.07
3:0.06
4:0.02
5:0.04
6:0.10
7:0.12
8:0.04
9:0.05
10:0.07
11:0.34
Negative Logits
orrow
-1.54
ividual
-1.33
psey
-1.33
obyl
-1.28
etooth
-1.25
ectar
-1.20
etimes
-1.20
elaide
-1.19
azeera
-1.19
etheus
-1.18
POSITIVE LOGITS
(>
1.78
imaginable
1.48
EVER
1.41
(<
1.24
nowadays
1.22
dominates
1.15
-+-+
1.09
viz
1.07
ever
1.07
(−
1.07
Activations Density 0.009%