INDEX
Explanations
phrases related to legal and institutional policies
New Auto-Interp
Head Attr Weights
0:0.03
1:0.01
2:0.05
3:0.06
4:0.09
5:0.02
6:0.04
7:0.38
8:0.03
9:0.03
10:0.08
11:0.12
Negative Logits
temperament
-1.36
matchup
-1.31
iannopoulos
-1.31
temper
-1.30
nerv
-1.26
quar
-1.24
icult
-1.23
juggling
-1.22
ctic
-1.21
trak
-1.21
POSITIVE LOGITS
edIn
1.44
Collective
1.35
Neuroscience
1.30
Sov
1.29
lvl
1.28
Solid
1.28
itialized
1.25
states
1.25
collective
1.24
millenn
1.22
Activations Density 0.001%